From patchwork Sun Apr 18 10:07:57 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 26964
Delivered-To: andriy.gelman@gmail.com
Received: by 2002:a25:49c5:0:0:0:0:0 with SMTP id w188csp228738yba;
        Sun, 18 Apr 2021 03:20:05 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJw7yRDw1AllnGyPjVLY97I5qmm6ILbBSwVjUKCfnPQuTZC9L7Rv+7RWBjj/hXrY9JD9keQu
X-Received: by 2002:a17:906:3945:: with SMTP id
 g5mr16544756eje.427.1618741204824;
        Sun, 18 Apr 2021 03:20:04 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1618741204; cv=none;
        d=google.com; s=arc-20160816;
        b=GkiIzs/oUa5hZaEe6NIUTFVyJqc81EwBVLofaxSiupvwzgyiYaQK/1EKHNkraQrHHw
         eMy0Of07LJFgvGBZIgXKMY8ymD24ZbP17E59Oclh3cKns+cym+5mF70ACP4zt3krpUHj
         qgWbS5By8t/LHiidBP7/+DkyVxAPLovkL4jqjjwOu8Pp8PsB4l5AScVrmXkf1Jf/QYSr
         XYJ/w4EVGafpXBneDwt/IPFQ9Vi1PxSGXyTrqc52Jcm/IJ+ZChTgrx0DhNTjfTnQMTLC
         sFL+QKnlXtESutwEjfC5H+bEQ84T7zFPTAZPXvHmTcBmXhOBa3HMG4CuiZZjTSBvVXag
         KlbQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:message-id:date:to:from:ironport-sdr
         :ironport-sdr:delivered-to;
        bh=q7Q5i5jieDCVmIcIV1X5qNU6Ty+JvqODDUa+cvV7rFU=;
        b=o+ppN9R/9lyP2ocXJx6eqs2pEeSRlUif6TVhJZy5zTk00f2w32MVSE+9EiVAGlCahb
         KVbSHlmr3Mw32B42hn0e2vR0+CMBHjCFWvVcx2j7U1/rRtZWeIvnv1H4PHl/80rOri3D
         /ybDyOk4kb5LUp9mIOVG0sB+kOth96h70bD4HKzo7PxedOYmH7+EMzGIAixizdEXTarN
         Vq5bXHF/v6IXxJAz+VQzGUCzAGobAZjRL85djDxirrTFmv74/LUvEoU+rFeAFdGXFkE+
         OOFiPwHKKf5wPhgxe9KD9fqiGwxTIUSVHelMZRmBhq2Nlxt1OEamdBAN5GItnb3BrX++
         ZlSg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 ds10si10976242ejc.709.2021.04.18.03.20.04;
        Sun, 18 Apr 2021 03:20:04 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5F50E680998;
	Sun, 18 Apr 2021 13:20:02 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 57A01680577
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Apr 2021 13:19:55 +0300 (EEST)
IronPort-SDR: 
 yJq9IwsOsLUcw/PaEQuZPHub51paBDjvhFHrUxc2+5tQTM5XozPU/yJjAlQPM5EP+0hWzDfkYI
 uavlheXtApXg==
X-IronPort-AV: E=McAfee;i="6200,9189,9957"; a="256523543"
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="256523543"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Apr 2021 03:19:52 -0700
IronPort-SDR: 
 QCBAnqJ5+2wzO0Ldr9yfDJVYj4nqb5HGmjqahSN5oBszUlEBFhcoBHXba866lBZ3wvZz/fi8w9
 MhFS43b2ZBxw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="453918908"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga002.fm.intel.com with ESMTP; 18 Apr 2021 03:19:50 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Apr 2021 18:07:57 +0800
Message-Id: <20210418100802.19017-1-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
Subject: [FFmpeg-devel] [PATCH 1/6] lavfi/dnn_backend_openvino.c: unify code
 for infer request for sync/async
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: 6NUzfpGiTZMJ
Content-Length: 4508

---
 libavfilter/dnn/dnn_backend_openvino.c | 49 +++++++++++---------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index 0757727a9c..874354ecef 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -52,9 +52,6 @@ typedef struct OVModel{
     ie_core_t *core;
     ie_network_t *network;
     ie_executable_network_t *exe_network;
-    ie_infer_request_t *infer_request;
-
-    /* for async execution */
     SafeQueue *request_queue;   // holds RequestItem
     Queue *task_queue;          // holds TaskItem
 } OVModel;
@@ -269,12 +266,9 @@ static void infer_completion_callback(void *args)
     ie_blob_free(&output_blob);
 
     request->task_count = 0;
-
-    if (task->async) {
-        if (ff_safe_queue_push_back(requestq, request) < 0) {
-            av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
-            return;
-        }
+    if (ff_safe_queue_push_back(requestq, request) < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
+        return;
     }
 }
 
@@ -347,11 +341,6 @@ static DNNReturnType init_model_ov(OVModel *ov_model, const char *input_name, co
         goto err;
     }
 
-    // create infer_request for sync execution
-    status = ie_exec_network_create_infer_request(ov_model->exe_network, &ov_model->infer_request);
-    if (status != OK)
-        goto err;
-
     // create infer_requests for async execution
     if (ctx->options.nireq <= 0) {
         // the default value is a rough estimation
@@ -502,10 +491,9 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     OVModel *ov_model = model;
     OVContext *ctx = &ov_model->ctx;
     TaskItem task;
-    RequestItem request;
+    RequestItem *request;
     AVFrame *in_frame = NULL;
     AVFrame *out_frame = NULL;
-    TaskItem *ptask = &task;
     IEStatusCode status;
     input_shapes_t input_shapes;
 
@@ -557,11 +545,16 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
-    request.infer_request = ov_model->infer_request;
-    request.task_count = 1;
-    request.tasks = &ptask;
+    request = ff_safe_queue_pop_front(ov_model->request_queue);
+    if (!request) {
+        av_frame_free(&out_frame);
+        av_frame_free(&in_frame);
+        av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
+        return DNN_ERROR;
+    }
+    request->tasks[request->task_count++] = &task;
 
-    ret = execute_model_ov(&request);
+    ret = execute_model_ov(request);
     *output_width = out_frame->width;
     *output_height = out_frame->height;
 
@@ -633,8 +626,7 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     OVModel *ov_model = model->model;
     OVContext *ctx = &ov_model->ctx;
     TaskItem task;
-    RequestItem request;
-    TaskItem *ptask = &task;
+    RequestItem *request;
 
     if (!in_frame) {
         av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
@@ -674,11 +666,14 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
-    request.infer_request = ov_model->infer_request;
-    request.task_count = 1;
-    request.tasks = &ptask;
+    request = ff_safe_queue_pop_front(ov_model->request_queue);
+    if (!request) {
+        av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
+        return DNN_ERROR;
+    }
+    request->tasks[request->task_count++] = &task;
 
-    return execute_model_ov(&request);
+    return execute_model_ov(request);
 }
 
 DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
@@ -821,8 +816,6 @@ void ff_dnn_free_model_ov(DNNModel **model)
         }
         ff_queue_destroy(ov_model->task_queue);
 
-        if (ov_model->infer_request)
-            ie_infer_request_free(&ov_model->infer_request);
         if (ov_model->exe_network)
             ie_exec_network_free(&ov_model->exe_network);
         if (ov_model->network)

From patchwork Sun Apr 18 10:07:58 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 26967
Delivered-To: andriy.gelman@gmail.com
Received: by 2002:a25:49c5:0:0:0:0:0 with SMTP id w188csp228824yba;
        Sun, 18 Apr 2021 03:20:14 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJzFTnoVY8BuOQld69AMeJHurkLd8AOUDP7xsaLlTSXPi21SYdCyXmqHvq3wjWa1+3+OyfPB
X-Received: by 2002:a05:6402:11c7:: with SMTP id
 j7mr1741679edw.119.1618741214476;
        Sun, 18 Apr 2021 03:20:14 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1618741214; cv=none;
        d=google.com; s=arc-20160816;
        b=jIoLHwwZW0ebxHZDFQeT8JqMwP5H6VTpcdHIuD20FtxZE9cgXTV+Ub7X+EFjrVfZVf
         n1LgG7WrB7VwiZNLWvWymCcVZmzxZEfqkzpOh38na6nsHIts2HmpaLY66sKwSlTmrFlu
         plMJlM1OV8h7JWNIw+E4XDN07w3sdMM+GZu3/7Mai28Vo6m9f6ceBf843IfNZmlg327a
         c4fvikr1taPxF9Dq5r0DB/20ZudAU+ur4kQu2pT4RX868c4vVo2oK54iSsuxMZPKeNte
         2vOGlpZATszqsZIPNR6+3GaEObku1JXsvyIE6JnY//xI0uYb+CAYdPV4rX97lI/XjX+H
         rHwQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=dF9l/4tUQSOC2vOHRHqIrtfYD+H0Kz8eoH3BZWKZkPc=;
        b=L2uOFzTEwwW3MFdhgNm86kapnVIOcxPOatIx68of/Zt3hf3se+Z12WknWXzlS0uKa7
         GwIBEcb0jOmxH9h8ccXhQR20zOTkH6nhSoCmboXrnzXgXOzC1oQub6m4dYhu4L83xLoT
         mhXdJZP8dAgtFlJjDO8H9Yj+fnqePIZbbJt005M5ZHhL6+iQ4nT52R1VENWLejGs2nE7
         +hoDArCuuYaz6F6ig+8djLOtaNVQ+RU1EjW7lpeQB118267HSk+sstPok4D7NkFtEN1p
         Moze8vH0VbjiaARoa0mwOuIZg+bHc+QtpqMNcAIOxiP+/c4ZQk50YNg/jygQ1qs6briG
         DDFw==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 h18si1487023eds.187.2021.04.18.03.20.14;
        Sun, 18 Apr 2021 03:20:14 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C53176809EB;
	Sun, 18 Apr 2021 13:20:04 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CC502680652
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Apr 2021 13:19:55 +0300 (EEST)
IronPort-SDR: 
 vuLW0qQzSzAStBSHymHmp+dX+bIBaSEYHILmAxpz7v3O51wGIFecgSGj7RVewZWXPg5RUv59XR
 Cthms1AhhocA==
X-IronPort-AV: E=McAfee;i="6200,9189,9957"; a="256523544"
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="256523544"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Apr 2021 03:19:53 -0700
IronPort-SDR: 
 L+wdvjAaqFj8tEZhQnMyXZBzc3afkEZizRo1JSyf6O0KADR3rHGr0zADqwmk5qEaYr6f4gcyfJ
 Zp0kxlvMugQw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="453918914"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga002.fm.intel.com with ESMTP; 18 Apr 2021 03:19:51 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Apr 2021 18:07:58 +0800
Message-Id: <20210418100802.19017-2-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210418100802.19017-1-yejun.guo@intel.com>
References: <20210418100802.19017-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH 2/6] lavfi/dnn_backend_openvino.c: add
 InferenceItem between TaskItem and RequestItem
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: SfJwv5pYtzF4
Content-Length: 13463

There's one task item for one function call from dnn interface,
there's one request item for one call to openvino. For classify,
one task might need multiple inference for classification on every
bounding box, so add InferenceItem.
---
 libavfilter/dnn/dnn_backend_openvino.c | 157 ++++++++++++++++++-------
 1 file changed, 115 insertions(+), 42 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index 874354ecef..3692a381e2 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -54,8 +54,10 @@ typedef struct OVModel{
     ie_executable_network_t *exe_network;
     SafeQueue *request_queue;   // holds RequestItem
     Queue *task_queue;          // holds TaskItem
+    Queue *inference_queue;     // holds InferenceItem
 } OVModel;
 
+// one task for one function call from dnn interface
 typedef struct TaskItem {
     OVModel *ov_model;
     const char *input_name;
@@ -64,13 +66,20 @@ typedef struct TaskItem {
     AVFrame *out_frame;
     int do_ioproc;
     int async;
-    int done;
+    uint32_t inference_todo;
+    uint32_t inference_done;
 } TaskItem;
 
+// one task might have multiple inferences
+typedef struct InferenceItem {
+    TaskItem *task;
+} InferenceItem;
+
+// one request for one call to openvino
 typedef struct RequestItem {
     ie_infer_request_t *infer_request;
-    TaskItem **tasks;
-    int task_count;
+    InferenceItem **inferences;
+    uint32_t inference_count;
     ie_complete_call_back_t callback;
 } RequestItem;
 
@@ -127,7 +136,12 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request
     IEStatusCode status;
     DNNData input;
     ie_blob_t *input_blob = NULL;
-    TaskItem *task = request->tasks[0];
+    InferenceItem *inference;
+    TaskItem *task;
+
+    inference = ff_queue_peek_front(ov_model->inference_queue);
+    av_assert0(inference);
+    task = inference->task;
 
     status = ie_infer_request_get_blob(request->infer_request, task->input_name, &input_blob);
     if (status != OK) {
@@ -159,9 +173,14 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request
     // change to be an option when necessary.
     input.order = DCO_BGR;
 
-    av_assert0(request->task_count <= dims.dims[0]);
-    for (int i = 0; i < request->task_count; ++i) {
-        task = request->tasks[i];
+    for (int i = 0; i < ctx->options.batch_size; ++i) {
+        inference = ff_queue_pop_front(ov_model->inference_queue);
+        if (!inference) {
+            break;
+        }
+        request->inferences[i] = inference;
+        request->inference_count = i + 1;
+        task = inference->task;
         if (task->do_ioproc) {
             if (ov_model->model->frame_pre_proc != NULL) {
                 ov_model->model->frame_pre_proc(task->in_frame, &input, ov_model->model->filter_ctx);
@@ -183,7 +202,8 @@ static void infer_completion_callback(void *args)
     precision_e precision;
     IEStatusCode status;
     RequestItem *request = args;
-    TaskItem *task = request->tasks[0];
+    InferenceItem *inference = request->inferences[0];
+    TaskItem *task = inference->task;
     SafeQueue *requestq = task->ov_model->request_queue;
     ie_blob_t *output_blob = NULL;
     ie_blob_buffer_t blob_buffer;
@@ -229,10 +249,11 @@ static void infer_completion_callback(void *args)
     output.dt       = precision_to_datatype(precision);
     output.data     = blob_buffer.buffer;
 
-    av_assert0(request->task_count <= dims.dims[0]);
-    av_assert0(request->task_count >= 1);
-    for (int i = 0; i < request->task_count; ++i) {
-        task = request->tasks[i];
+    av_assert0(request->inference_count <= dims.dims[0]);
+    av_assert0(request->inference_count >= 1);
+    for (int i = 0; i < request->inference_count; ++i) {
+        task = request->inferences[i]->task;
+        task->inference_done++;
 
         switch (task->ov_model->model->func_type) {
         case DFT_PROCESS_FRAME:
@@ -259,13 +280,13 @@ static void infer_completion_callback(void *args)
             break;
         }
 
-        task->done = 1;
+        av_freep(&request->inferences[i]);
         output.data = (uint8_t *)output.data
                       + output.width * output.height * output.channels * get_datatype_size(output.dt);
     }
     ie_blob_free(&output_blob);
 
-    request->task_count = 0;
+    request->inference_count = 0;
     if (ff_safe_queue_push_back(requestq, request) < 0) {
         av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
         return;
@@ -370,11 +391,11 @@ static DNNReturnType init_model_ov(OVModel *ov_model, const char *input_name, co
             goto err;
         }
 
-        item->tasks = av_malloc_array(ctx->options.batch_size, sizeof(*item->tasks));
-        if (!item->tasks) {
+        item->inferences = av_malloc_array(ctx->options.batch_size, sizeof(*item->inferences));
+        if (!item->inferences) {
             goto err;
         }
-        item->task_count = 0;
+        item->inference_count = 0;
     }
 
     ov_model->task_queue = ff_queue_create();
@@ -382,6 +403,11 @@ static DNNReturnType init_model_ov(OVModel *ov_model, const char *input_name, co
         goto err;
     }
 
+    ov_model->inference_queue = ff_queue_create();
+    if (!ov_model->inference_queue) {
+        goto err;
+    }
+
     return DNN_SUCCESS;
 
 err:
@@ -389,15 +415,24 @@ err:
     return DNN_ERROR;
 }
 
-static DNNReturnType execute_model_ov(RequestItem *request)
+static DNNReturnType execute_model_ov(RequestItem *request, Queue *inferenceq)
 {
     IEStatusCode status;
     DNNReturnType ret;
-    TaskItem *task = request->tasks[0];
-    OVContext *ctx = &task->ov_model->ctx;
+    InferenceItem *inference;
+    TaskItem *task;
+    OVContext *ctx;
+
+    if (ff_queue_size(inferenceq) == 0) {
+        return DNN_SUCCESS;
+    }
+
+    inference = ff_queue_peek_front(inferenceq);
+    task = inference->task;
+    ctx = &task->ov_model->ctx;
 
     if (task->async) {
-        if (request->task_count < ctx->options.batch_size) {
+        if (ff_queue_size(inferenceq) < ctx->options.batch_size) {
             if (ff_safe_queue_push_front(task->ov_model->request_queue, request) < 0) {
                 av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
                 return DNN_ERROR;
@@ -430,7 +465,7 @@ static DNNReturnType execute_model_ov(RequestItem *request)
             return DNN_ERROR;
         }
         infer_completion_callback(request);
-        return task->done ? DNN_SUCCESS : DNN_ERROR;
+        return (task->inference_done == task->inference_todo) ? DNN_SUCCESS : DNN_ERROR;
     }
 }
 
@@ -484,6 +519,31 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input
     return DNN_ERROR;
 }
 
+static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, TaskItem *task, Queue *inference_queue)
+{
+    switch (func_type) {
+    case DFT_PROCESS_FRAME:
+    case DFT_ANALYTICS_DETECT:
+    {
+        InferenceItem *inference = av_malloc(sizeof(*inference));
+        if (!inference) {
+            return DNN_ERROR;
+        }
+        task->inference_todo = 1;
+        task->inference_done = 0;
+        inference->task = task;
+        if (ff_queue_push_back(inference_queue, inference) < 0) {
+            av_freep(&inference);
+            return DNN_ERROR;
+        }
+        return DNN_SUCCESS;
+    }
+    default:
+        av_assert0(!"should not reach here");
+        return DNN_ERROR;
+    }
+}
+
 static DNNReturnType get_output_ov(void *model, const char *input_name, int input_width, int input_height,
                                    const char *output_name, int *output_width, int *output_height)
 {
@@ -536,7 +596,6 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
         return DNN_ERROR;
     }
 
-    task.done = 0;
     task.do_ioproc = 0;
     task.async = 0;
     task.input_name = input_name;
@@ -545,6 +604,13 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+        av_frame_free(&out_frame);
+        av_frame_free(&in_frame);
+        av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
+        return DNN_ERROR;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_frame_free(&out_frame);
@@ -552,9 +618,8 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
-    request->tasks[request->task_count++] = &task;
 
-    ret = execute_model_ov(request);
+    ret = execute_model_ov(request, ov_model->inference_queue);
     *output_width = out_frame->width;
     *output_height = out_frame->height;
 
@@ -657,7 +722,6 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
         }
     }
 
-    task.done = 0;
     task.do_ioproc = 1;
     task.async = 0;
     task.input_name = input_name;
@@ -666,14 +730,18 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+        av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
+        return DNN_ERROR;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
-    request->tasks[request->task_count++] = &task;
 
-    return execute_model_ov(request);
+    return execute_model_ov(request, ov_model->inference_queue);
 }
 
 DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
@@ -707,7 +775,6 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
         return DNN_ERROR;
     }
 
-    task->done = 0;
     task->do_ioproc = 1;
     task->async = 1;
     task->input_name = input_name;
@@ -721,14 +788,18 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
         return DNN_ERROR;
     }
 
+    if (extract_inference_from_task(ov_model->model->func_type, task, ov_model->inference_queue) != DNN_SUCCESS) {
+        av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
+        return DNN_ERROR;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
 
-    request->tasks[request->task_count++] = task;
-    return execute_model_ov(request);
+    return execute_model_ov(request, ov_model->inference_queue);
 }
 
 DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **in, AVFrame **out)
@@ -740,7 +811,7 @@ DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **i
         return DAST_EMPTY_QUEUE;
     }
 
-    if (!task->done) {
+    if (task->inference_done != task->inference_todo) {
         return DAST_NOT_READY;
     }
 
@@ -760,21 +831,17 @@ DNNReturnType ff_dnn_flush_ov(const DNNModel *model)
     IEStatusCode status;
     DNNReturnType ret;
 
+    if (ff_queue_size(ov_model->inference_queue) == 0) {
+        // no pending task need to flush
+        return DNN_SUCCESS;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
 
-    if (request->task_count == 0) {
-        // no pending task need to flush
-        if (ff_safe_queue_push_back(ov_model->request_queue, request) < 0) {
-            av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
-            return DNN_ERROR;
-        }
-        return DNN_SUCCESS;
-    }
-
     ret = fill_model_input_ov(ov_model, request);
     if (ret != DNN_SUCCESS) {
         av_log(ctx, AV_LOG_ERROR, "Failed to fill model input.\n");
@@ -803,11 +870,17 @@ void ff_dnn_free_model_ov(DNNModel **model)
             if (item && item->infer_request) {
                 ie_infer_request_free(&item->infer_request);
             }
-            av_freep(&item->tasks);
+            av_freep(&item->inferences);
             av_freep(&item);
         }
         ff_safe_queue_destroy(ov_model->request_queue);
 
+        while (ff_queue_size(ov_model->inference_queue) != 0) {
+            TaskItem *item = ff_queue_pop_front(ov_model->inference_queue);
+            av_freep(&item);
+        }
+        ff_queue_destroy(ov_model->inference_queue);
+
         while (ff_queue_size(ov_model->task_queue) != 0) {
             TaskItem *item = ff_queue_pop_front(ov_model->task_queue);
             av_frame_free(&item->in_frame);

From patchwork Sun Apr 18 10:07:59 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 26961
Delivered-To: andriy.gelman@gmail.com
Received: by 2002:a25:49c5:0:0:0:0:0 with SMTP id w188csp228905yba;
        Sun, 18 Apr 2021 03:20:27 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJwy6zdU2rJi9R57fzYwYbd1nWzsXJaFmgX1DjRRm8AZnR/CF/tl2NZRYfkW8ciXFQM/NZB0
X-Received: by 2002:a05:6402:270e:: with SMTP id
 y14mr19902250edd.283.1618741227433;
        Sun, 18 Apr 2021 03:20:27 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1618741227; cv=none;
        d=google.com; s=arc-20160816;
        b=ntyYNrzRFcPuqFts1Q7loG6zQJzPa0PC3qfgCNodC+rWPwxA+ElFUqX5xjywOFfyry
         puoyFNwr+wa15wv0gMB/rqXkw3tibOJx7SKx9tsdZI7UhLdzxDmaQ/7NPQs4vrB2/PuR
         QPlO2X6TVntTwqd5fOyUo+NpkbGNh/vgdyevBJ70slavKz7z8q0kgHDOVcaIP2Cnixf+
         9eA4c5Lx4V3fuV9tD6a/zCwft2xzaGs8KpWJmcOTpNtD5ddbSElqSbmIbU1KNEQTj2l9
         BnyNvC6e7HHZYNmGLyCx1xMcjflUjwu9w4P3WEqGFufcGFGEDysgG6IVeya59wsT40gD
         ZUSA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=7k7EuPCxANB976C2IthZi0t8xqwD36quwIllnmWF+cw=;
        b=TG3EX5dtz0mPnOs2TMeJh4OW+woYGb2aPIR9ufBbRo3lD0smyZyc7aXKFdzRBuyzAN
         WCRRUZuduYKYktDnKFg8HFI1L/HXugjRojp4V1N1iCpdTi3RA2J1XOy6/oKchgrSaOJ0
         +jZn2L0Uhs4WiaqY0FCYLczfRZgEOuF0cDT62dlpPd3RusNqbVdKHkDkW6t8C0AWaj/x
         n76OgwpNIIy2uz43PMybnN+FiRJ9uhVht3H/6DHsSflanfnvRXJXDT0K7mzUCRSt6WfJ
         1+e0PAulZeoUnFyjUZHA+Phe9nMa9IBmg1eN2b/uxL1soY4dxE1H3E3UDuMHlxMGYNI0
         lH5g==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 eb8si10798895edb.353.2021.04.18.03.20.27;
        Sun, 18 Apr 2021 03:20:27 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 572BE680A2B;
	Sun, 18 Apr 2021 13:20:06 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4C8BD680987
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Apr 2021 13:19:57 +0300 (EEST)
IronPort-SDR: 
 4XpopCxiR2t0p6SsOCYR1TNGBAMGmWMxYkOFeS/rflAjilOsmkc/CLBSnkSr/tzCBqeGAd//gk
 s3qkkovG9gVQ==
X-IronPort-AV: E=McAfee;i="6200,9189,9957"; a="256523545"
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="256523545"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Apr 2021 03:19:54 -0700
IronPort-SDR: 
 7X8lej/1lKYjoAfvH/UfEVo+N1QIDaFEczIW6+w1BSdYGGIFS1xelv8x3YIuB+KdhKrErRVezT
 y6Ep8h77eyBw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="453918922"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga002.fm.intel.com with ESMTP; 18 Apr 2021 03:19:53 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Apr 2021 18:07:59 +0800
Message-Id: <20210418100802.19017-3-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210418100802.19017-1-yejun.guo@intel.com>
References: <20210418100802.19017-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH 3/6] lavfi/dnn_backend_openvino.c: move the
 logic for batch mode earlier
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: w0c9u1av+PkE
Content-Length: 1678

---
 libavfilter/dnn/dnn_backend_openvino.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index 3692a381e2..a695d863b5 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -432,13 +432,6 @@ static DNNReturnType execute_model_ov(RequestItem *request, Queue *inferenceq)
     ctx = &task->ov_model->ctx;
 
     if (task->async) {
-        if (ff_queue_size(inferenceq) < ctx->options.batch_size) {
-            if (ff_safe_queue_push_front(task->ov_model->request_queue, request) < 0) {
-                av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
-                return DNN_ERROR;
-            }
-            return DNN_SUCCESS;
-        }
         ret = fill_model_input_ov(task->ov_model, request);
         if (ret != DNN_SUCCESS) {
             return ret;
@@ -793,6 +786,11 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
         return DNN_ERROR;
     }
 
+    if (ff_queue_size(ov_model->inference_queue) < ctx->options.batch_size) {
+        // not enough inference items queued for a batch
+        return DNN_SUCCESS;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");

From patchwork Sun Apr 18 10:08:00 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 26954
Delivered-To: andriy.gelman@gmail.com
Received: by 2002:a25:49c5:0:0:0:0:0 with SMTP id w188csp228952yba;
        Sun, 18 Apr 2021 03:20:34 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxP5WO6QhV0KQ9nlazNzVeocBYTWphMKJ0GAJIxPoxE7b203cYEAyWlXykAl84CwbRLFNvw
X-Received: by 2002:a05:6402:105a:: with SMTP id
 e26mr19685077edu.164.1618741234766;
        Sun, 18 Apr 2021 03:20:34 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1618741234; cv=none;
        d=google.com; s=arc-20160816;
        b=NB6zrwER0zCdQTuwupTfK3+5O5AQviQTLdy8KIicOxUnvt7Aj5z/XqGIyGmzr6CK7O
         oc0F1FJIvLG7zbyVbc4K/cWFioILg9dmVygYRcfAur2jzLS1+uwIraVoRb1p1drsFx0D
         k7ABRDdrmGvxkGP0DJPIMy6QXA3XnNw/rW/gli7unZJ1nMgAMS2Ui7jdp3JzRWVdl8QS
         sv6B09uvJVO4w4AZVm/Q/uODa+RW25S7et6HBKubWrfUCiiTeAjin0qnI9Lln1UFt645
         eSge85slb65YUH4FEMcjJOTtvT3FoCi/ksx/ChO2BLAtErvHgwuZz1dQKg7ks1N+/Q0a
         rV+g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=50MLpwcFQcoXS9pxkS0QGD4W2UNCiiouGp0A1m2hJio=;
        b=xYbwfkRNu2XIsxFjXqD/Tv51xabafeQpjmnHjRZuaSUEzbU/0BEF6m4xXuk2/GmEdq
         8fvo4SBjEgnM69j6tZCqDRWZGyaVi1I3B5gx9puweTslF4nOTz8LlBtV9Tr6SYU/Y0BE
         Q6VTbQyvny6uwjFHdIkWp+Op10vASDruWgzZLzVNd/CvpLoZds3rLgGswhUPfSBwXupe
         DG2zMdWmyqoCkZjYpXI1AzlOdNbIurVuBVAArCiUl1QEDzKVlJl5g5AoRCwg69tljbxL
         Tf+sr/EWuzTdr9pd/UX7NygWF0Pcis3Vrd7tbDK0OxWxGyHL3WEdVrUimH1qdU8qLM2R
         SSfQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id j10si217189eds.134.2021.04.18.03.20.33;
        Sun, 18 Apr 2021 03:20:34 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DB38A680A7F;
	Sun, 18 Apr 2021 13:20:08 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 46D656809D2
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Apr 2021 13:20:01 +0300 (EEST)
IronPort-SDR: 
 tbWtIf/FwhKxZd6UDDQx7hX5VEG70GWz7enWedjUwUJfb9aLWK4nxR0OLza66Qlt9zNXOqICqg
 hh5Ioby02TtQ==
X-IronPort-AV: E=McAfee;i="6200,9189,9957"; a="256523547"
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="256523547"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Apr 2021 03:19:55 -0700
IronPort-SDR: 
 YnErYWxAMSgVHK1yh6eK9+4Wi3iuXou0vPZWQslg9g+E8+u9oSuVINrFS8VTypXi7Ax5lTWNki
 +U9eEmP4tY1Q==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="453918926"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga002.fm.intel.com with ESMTP; 18 Apr 2021 03:19:54 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Apr 2021 18:08:00 +0800
Message-Id: <20210418100802.19017-4-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210418100802.19017-1-yejun.guo@intel.com>
References: <20210418100802.19017-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH 4/6] lavfi/dnn: refine dnn interface to add
 DNNExecBaseParams
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: rh9E4/fOgXv5
Content-Length: 20082

Different function type of model requires different parameters, for
example, object detection detects lots of objects (cat/dog/...) in
the frame, and classifcation needs to know which object (cat or dog)
it is going to classify.

The current interface needs to add a new function with more parameters
to support new requirement, with this change, we can just add a new
struct (for example DNNExecClassifyParams) based on DNNExecBaseParams,
and so we can continue to use the current interface execute_model just
with params changed.
---
 libavfilter/dnn/Makefile               |  1 +
 libavfilter/dnn/dnn_backend_common.c   | 51 ++++++++++++++++++++++++++
 libavfilter/dnn/dnn_backend_common.h   | 31 ++++++++++++++++
 libavfilter/dnn/dnn_backend_native.c   | 15 +++-----
 libavfilter/dnn/dnn_backend_native.h   |  3 +-
 libavfilter/dnn/dnn_backend_openvino.c | 50 ++++++++-----------------
 libavfilter/dnn/dnn_backend_openvino.h |  6 +--
 libavfilter/dnn/dnn_backend_tf.c       | 18 +++------
 libavfilter/dnn/dnn_backend_tf.h       |  3 +-
 libavfilter/dnn_filter_common.c        | 20 ++++++++--
 libavfilter/dnn_interface.h            | 14 +++++--
 11 files changed, 139 insertions(+), 73 deletions(-)
 create mode 100644 libavfilter/dnn/dnn_backend_common.c
 create mode 100644 libavfilter/dnn/dnn_backend_common.h

diff --git a/libavfilter/dnn/Makefile b/libavfilter/dnn/Makefile
index d6d58f4b61..4cfbce0efc 100644
--- a/libavfilter/dnn/Makefile
+++ b/libavfilter/dnn/Makefile
@@ -2,6 +2,7 @@ OBJS-$(CONFIG_DNN)                           += dnn/dnn_interface.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_io_proc.o
 OBJS-$(CONFIG_DNN)                           += dnn/queue.o
 OBJS-$(CONFIG_DNN)                           += dnn/safe_queue.o
+OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_common.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_native.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_native_layers.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_native_layer_avgpool.o
diff --git a/libavfilter/dnn/dnn_backend_common.c b/libavfilter/dnn/dnn_backend_common.c
new file mode 100644
index 0000000000..a522ab5650
--- /dev/null
+++ b/libavfilter/dnn/dnn_backend_common.c
@@ -0,0 +1,51 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * DNN common functions different backends.
+ */
+
+#include "dnn_backend_common.h"
+
+int ff_check_exec_params(void *ctx, DNNBackendType backend, DNNFunctionType func_type, DNNExecBaseParams *exec_params)
+{
+    if (!exec_params) {
+        av_log(ctx, AV_LOG_ERROR, "exec_params is null when execute model.\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (!exec_params->in_frame) {
+        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (!exec_params->out_frame) {
+        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (exec_params->nb_output != 1 && backend != DNN_TF) {
+        // currently, the filter does not need multiple outputs,
+        // so we just pending the support until we really need it.
+        avpriv_report_missing_feature(ctx, "multiple outputs");
+        return AVERROR(EINVAL);
+    }
+
+    return 0;
+}
diff --git a/libavfilter/dnn/dnn_backend_common.h b/libavfilter/dnn/dnn_backend_common.h
new file mode 100644
index 0000000000..cd9c0f5339
--- /dev/null
+++ b/libavfilter/dnn/dnn_backend_common.h
@@ -0,0 +1,31 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * DNN common functions different backends.
+ */
+
+#ifndef AVFILTER_DNN_DNN_BACKEND_COMMON_H
+#define AVFILTER_DNN_DNN_BACKEND_COMMON_H
+
+#include "../dnn_interface.h"
+
+int ff_check_exec_params(void *ctx, DNNBackendType backend, DNNFunctionType func_type, DNNExecBaseParams *exec_params);
+
+#endif
diff --git a/libavfilter/dnn/dnn_backend_native.c b/libavfilter/dnn/dnn_backend_native.c
index d9762eeaf6..b5f1c16538 100644
--- a/libavfilter/dnn/dnn_backend_native.c
+++ b/libavfilter/dnn/dnn_backend_native.c
@@ -28,6 +28,7 @@
 #include "dnn_backend_native_layer_conv2d.h"
 #include "dnn_backend_native_layers.h"
 #include "dnn_io_proc.h"
+#include "dnn_backend_common.h"
 
 #define OFFSET(x) offsetof(NativeContext, x)
 #define FLAGS AV_OPT_FLAG_FILTERING_PARAM
@@ -372,23 +373,17 @@ static DNNReturnType execute_model_native(const DNNModel *model, const char *inp
     return DNN_SUCCESS;
 }
 
-DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                          const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     NativeModel *native_model = model->model;
     NativeContext *ctx = &native_model->ctx;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
+    if (ff_check_exec_params(ctx, DNN_NATIVE, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
     }
 
-    return execute_model_native(model, input_name, in_frame, output_names, nb_output, out_frame, 1);
+    return execute_model_native(model, exec_params->input_name, exec_params->in_frame,
+                                exec_params->output_names, exec_params->nb_output, exec_params->out_frame, 1);
 }
 
 int32_t ff_calculate_operand_dims_count(const DnnOperand *oprd)
diff --git a/libavfilter/dnn/dnn_backend_native.h b/libavfilter/dnn/dnn_backend_native.h
index d313c48f3a..89bcb8e358 100644
--- a/libavfilter/dnn/dnn_backend_native.h
+++ b/libavfilter/dnn/dnn_backend_native.h
@@ -130,8 +130,7 @@ typedef struct NativeModel{
 
 DNNModel *ff_dnn_load_model_native(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
 
-DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                          const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, DNNExecBaseParams *exec_params);
 
 void ff_dnn_free_model_native(DNNModel **model);
 
diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index a695d863b5..fcdd738f8a 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -33,6 +33,7 @@
 #include "queue.h"
 #include "safe_queue.h"
 #include <c_api/ie_c_api.h>
+#include "dnn_backend_common.h"
 
 typedef struct OVOptions{
     char *device_type;
@@ -678,28 +679,14 @@ err:
     return NULL;
 }
 
-DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     OVModel *ov_model = model->model;
     OVContext *ctx = &ov_model->ctx;
     TaskItem task;
     RequestItem *request;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame && model->func_type == DFT_PROCESS_FRAME) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (nb_output != 1) {
-        // currently, the filter does not need multiple outputs,
-        // so we just pending the support until we really need it.
-        avpriv_report_missing_feature(ctx, "multiple outputs");
+    if (ff_check_exec_params(ctx, DNN_OV, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
     }
 
@@ -709,7 +696,7 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     }
 
     if (!ov_model->exe_network) {
-        if (init_model_ov(ov_model, input_name, output_names[0]) != DNN_SUCCESS) {
+        if (init_model_ov(ov_model, exec_params->input_name, exec_params->output_names[0]) != DNN_SUCCESS) {
             av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n");
             return DNN_ERROR;
         }
@@ -717,10 +704,10 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
 
     task.do_ioproc = 1;
     task.async = 0;
-    task.input_name = input_name;
-    task.in_frame = in_frame;
-    task.output_name = output_names[0];
-    task.out_frame = out_frame;
+    task.input_name = exec_params->input_name;
+    task.in_frame = exec_params->in_frame;
+    task.output_name = exec_params->output_names[0];
+    task.out_frame = exec_params->out_frame ? exec_params->out_frame : exec_params->in_frame;
     task.ov_model = ov_model;
 
     if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
@@ -737,26 +724,19 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     return execute_model_ov(request, ov_model->inference_queue);
 }
 
-DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                            const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     OVModel *ov_model = model->model;
     OVContext *ctx = &ov_model->ctx;
     RequestItem *request;
     TaskItem *task;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when async execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame && model->func_type == DFT_PROCESS_FRAME) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when async execute model.\n");
+    if (ff_check_exec_params(ctx, DNN_OV, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
     }
 
     if (!ov_model->exe_network) {
-        if (init_model_ov(ov_model, input_name, output_names[0]) != DNN_SUCCESS) {
+        if (init_model_ov(ov_model, exec_params->input_name, exec_params->output_names[0]) != DNN_SUCCESS) {
             av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n");
             return DNN_ERROR;
         }
@@ -770,10 +750,10 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
 
     task->do_ioproc = 1;
     task->async = 1;
-    task->input_name = input_name;
-    task->in_frame = in_frame;
-    task->output_name = output_names[0];
-    task->out_frame = out_frame;
+    task->input_name = exec_params->input_name;
+    task->in_frame = exec_params->in_frame;
+    task->output_name = exec_params->output_names[0];
+    task->out_frame = exec_params->out_frame ? exec_params->out_frame : exec_params->in_frame;
     task->ov_model = ov_model;
     if (ff_queue_push_back(ov_model->task_queue, task) < 0) {
         av_freep(&task);
diff --git a/libavfilter/dnn/dnn_backend_openvino.h b/libavfilter/dnn/dnn_backend_openvino.h
index a484a7be32..046d0c5b5a 100644
--- a/libavfilter/dnn/dnn_backend_openvino.h
+++ b/libavfilter/dnn/dnn_backend_openvino.h
@@ -31,10 +31,8 @@
 
 DNNModel *ff_dnn_load_model_ov(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
 
-DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame);
-DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                            const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *exec_params);
+DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBaseParams *exec_params);
 DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **in, AVFrame **out);
 DNNReturnType ff_dnn_flush_ov(const DNNModel *model);
 
diff --git a/libavfilter/dnn/dnn_backend_tf.c b/libavfilter/dnn/dnn_backend_tf.c
index fb799d2b70..427edb08e1 100644
--- a/libavfilter/dnn/dnn_backend_tf.c
+++ b/libavfilter/dnn/dnn_backend_tf.c
@@ -33,7 +33,7 @@
 #include "dnn_backend_native_layer_pad.h"
 #include "dnn_backend_native_layer_maximum.h"
 #include "dnn_io_proc.h"
-
+#include "dnn_backend_common.h"
 #include <tensorflow/c/c_api.h>
 
 typedef struct TFOptions{
@@ -840,23 +840,17 @@ static DNNReturnType execute_model_tf(const DNNModel *model, const char *input_n
     return DNN_SUCCESS;
 }
 
-DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     TFModel *tf_model = model->model;
     TFContext *ctx = &tf_model->ctx;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
-        return DNN_ERROR;
+    if (ff_check_exec_params(ctx, DNN_TF, model->func_type, exec_params) != 0) {
+         return DNN_ERROR;
     }
 
-    return execute_model_tf(model, input_name, in_frame, output_names, nb_output, out_frame, 1);
+    return execute_model_tf(model, exec_params->input_name, exec_params->in_frame,
+                            exec_params->output_names, exec_params->nb_output, exec_params->out_frame, 1);
 }
 
 void ff_dnn_free_model_tf(DNNModel **model)
diff --git a/libavfilter/dnn/dnn_backend_tf.h b/libavfilter/dnn/dnn_backend_tf.h
index 8cec04748e..3dfd6e4280 100644
--- a/libavfilter/dnn/dnn_backend_tf.h
+++ b/libavfilter/dnn/dnn_backend_tf.h
@@ -31,8 +31,7 @@
 
 DNNModel *ff_dnn_load_model_tf(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
 
-DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, DNNExecBaseParams *exec_params);
 
 void ff_dnn_free_model_tf(DNNModel **model);
 
diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c
index 1b922455a3..c085884eb4 100644
--- a/libavfilter/dnn_filter_common.c
+++ b/libavfilter/dnn_filter_common.c
@@ -90,14 +90,26 @@ DNNReturnType ff_dnn_get_output(DnnContext *ctx, int input_width, int input_heig
 
 DNNReturnType ff_dnn_execute_model(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame)
 {
-    return (ctx->dnn_module->execute_model)(ctx->model, ctx->model_inputname, in_frame,
-                                            (const char **)&ctx->model_outputname, 1, out_frame);
+    DNNExecBaseParams exec_params = {
+        .input_name     = ctx->model_inputname,
+        .output_names   = (const char **)&ctx->model_outputname,
+        .nb_output      = 1,
+        .in_frame       = in_frame,
+        .out_frame      = out_frame,
+    };
+    return (ctx->dnn_module->execute_model)(ctx->model, &exec_params);
 }
 
 DNNReturnType ff_dnn_execute_model_async(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame)
 {
-    return (ctx->dnn_module->execute_model_async)(ctx->model, ctx->model_inputname, in_frame,
-                                                  (const char **)&ctx->model_outputname, 1, out_frame);
+    DNNExecBaseParams exec_params = {
+        .input_name     = ctx->model_inputname,
+        .output_names   = (const char **)&ctx->model_outputname,
+        .nb_output      = 1,
+        .in_frame       = in_frame,
+        .out_frame      = out_frame,
+    };
+    return (ctx->dnn_module->execute_model_async)(ctx->model, &exec_params);
 }
 
 DNNAsyncStatusType ff_dnn_get_async_result(DnnContext *ctx, AVFrame **in_frame, AVFrame **out_frame)
diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h
index ae5a488341..941670675d 100644
--- a/libavfilter/dnn_interface.h
+++ b/libavfilter/dnn_interface.h
@@ -63,6 +63,14 @@ typedef struct DNNData{
     DNNColorOrder order;
 } DNNData;
 
+typedef struct DNNExecBaseParams {
+    const char *input_name;
+    const char **output_names;
+    uint32_t nb_output;
+    AVFrame *in_frame;
+    AVFrame *out_frame;
+} DNNExecBaseParams;
+
 typedef int (*FramePrePostProc)(AVFrame *frame, DNNData *model, AVFilterContext *filter_ctx);
 typedef int (*DetectPostProc)(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx);
 
@@ -96,11 +104,9 @@ typedef struct DNNModule{
     // Loads model and parameters from given file. Returns NULL if it is not possible.
     DNNModel *(*load_model)(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
     // Executes model with specified input and output. Returns DNN_ERROR otherwise.
-    DNNReturnType (*execute_model)(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                   const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+    DNNReturnType (*execute_model)(const DNNModel *model, DNNExecBaseParams *exec_params);
     // Executes model with specified input and output asynchronously. Returns DNN_ERROR otherwise.
-    DNNReturnType (*execute_model_async)(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                         const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+    DNNReturnType (*execute_model_async)(const DNNModel *model, DNNExecBaseParams *exec_params);
     // Retrieve inference result.
     DNNAsyncStatusType (*get_async_result)(const DNNModel *model, AVFrame **in, AVFrame **out);
     // Flush all the pending tasks.

From patchwork Sun Apr 18 10:08:01 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 26977
Delivered-To: andriy.gelman@gmail.com
Received: by 2002:a25:49c5:0:0:0:0:0 with SMTP id w188csp229075yba;
        Sun, 18 Apr 2021 03:20:49 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxDyU57+UzCtf7noBWx0uwrFf9uaWNinKzn1aHzPWnRh7XDvqo/S22JYTskKI5vjt7lgV4I
X-Received: by 2002:a05:6402:b41:: with SMTP id
 bx1mr2245756edb.105.1618741249479;
        Sun, 18 Apr 2021 03:20:49 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1618741249; cv=none;
        d=google.com; s=arc-20160816;
        b=UleFCUE2yyK/YXEL98qFtkeWP6NvmWHAYf7LUkk2zSw/brIq/xJXkF81CPRJyMzfly
         7Q6bUoV6BgrGG62zOLLcUyifTDEsAnj7bpkh4lh5p9e0CiZnNInzLL2iAUdNI2Gc2eZ4
         qunCtWh9mmrmz5m85LXmtrLp7IZARIEpfudO64Ng7N+GaAsRQZ29QYMsRJvGsijrSWv5
         uTDaOd4bV0EXIimhJEDvODxG3muYoOC8WUsRpyBsjXN56gOb2I8d3mXJMVTiIUwiNqyF
         MwRgtkOsMsg8pTidTzyvYy5KqIwEZKLueSdFwdkVlagntRRYWKHdJ9NrD9bBIeJFFwPD
         3GhA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=+pKKztNmLyctN9drJw3qUIBMhUtGHfhNKP0vSAmkQjU=;
        b=UiU6iWKca1AdV2/xi6HApzUmobuCmSOGW7HRWujql2R93MqkiZ1Fz7/KnonZRm0/XH
         KV3WQyrXaFkfR51y9/XoOEKP6o3wtJYBY1waP2SVFfEBXfwl93ok5jSHQJMf7CC1GlYa
         yubpAgljdGRKYaB1S5DAgMwLF3mCH5r96Tb/LrVG6CUBQ/opBMcnUBA2F5uzIZ0kalTM
         L9YAt48sHVidsKUf7uq/C4Kgmgmd14vbJOheihEYdqODiXAVZGSrBEDxBiHTvzPVzbIT
         f5FRY7jtf99jm8EwFj7jqdFZPbJyQgbTEPHLWTl/tfsLbnnEkGwC/eZFgf2vWP4tUdVl
         LAEw==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 ca17si9477173edb.501.2021.04.18.03.20.49;
        Sun, 18 Apr 2021 03:20:49 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8D9836804FE;
	Sun, 18 Apr 2021 13:20:10 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E98E768096A
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Apr 2021 13:20:01 +0300 (EEST)
IronPort-SDR: 
 i/xuG6wcArjC5hTDDochYj4HEUdm6BEeVlXuNiRmMmTkNACEsH4L3bqQ4C6jz0B/5Ybi1apI9S
 Q7ksfeOQWRjg==
X-IronPort-AV: E=McAfee;i="6200,9189,9957"; a="256523548"
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="256523548"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Apr 2021 03:19:56 -0700
IronPort-SDR: 
 FLkTbUXk+2cqEMvMKI5+sHyXDbhpcW7kLK0o3/NE5bG3PzUzTpVuH1iPui9IUX11V0nlvGZX4y
 CyRr42cEg18g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="453918930"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga002.fm.intel.com with ESMTP; 18 Apr 2021 03:19:55 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Apr 2021 18:08:01 +0800
Message-Id: <20210418100802.19017-5-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210418100802.19017-1-yejun.guo@intel.com>
References: <20210418100802.19017-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH 5/6] lavfi/dnn: add classify support with
 openvino backend
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: /2Ra/bhyW6Fc
Content-Length: 17977

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 libavfilter/dnn/dnn_backend_openvino.c | 143 +++++++++++++++++++++----
 libavfilter/dnn/dnn_io_proc.c          |  60 +++++++++++
 libavfilter/dnn/dnn_io_proc.h          |   1 +
 libavfilter/dnn_filter_common.c        |  21 ++++
 libavfilter/dnn_filter_common.h        |   2 +
 libavfilter/dnn_interface.h            |  10 +-
 6 files changed, 218 insertions(+), 19 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index fcdd738f8a..4bdd6a3a6d 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -29,6 +29,7 @@
 #include "libavutil/avassert.h"
 #include "libavutil/opt.h"
 #include "libavutil/avstring.h"
+#include "libavutil/detection_bbox.h"
 #include "../internal.h"
 #include "queue.h"
 #include "safe_queue.h"
@@ -74,6 +75,7 @@ typedef struct TaskItem {
 // one task might have multiple inferences
 typedef struct InferenceItem {
     TaskItem *task;
+    uint32_t bbox_index;
 } InferenceItem;
 
 // one request for one call to openvino
@@ -182,12 +184,23 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request
         request->inferences[i] = inference;
         request->inference_count = i + 1;
         task = inference->task;
-        if (task->do_ioproc) {
-            if (ov_model->model->frame_pre_proc != NULL) {
-                ov_model->model->frame_pre_proc(task->in_frame, &input, ov_model->model->filter_ctx);
-            } else {
-                ff_proc_from_frame_to_dnn(task->in_frame, &input, ov_model->model->func_type, ctx);
+        switch (task->ov_model->model->func_type) {
+        case DFT_PROCESS_FRAME:
+        case DFT_ANALYTICS_DETECT:
+            if (task->do_ioproc) {
+                if (ov_model->model->frame_pre_proc != NULL) {
+                    ov_model->model->frame_pre_proc(task->in_frame, &input, ov_model->model->filter_ctx);
+                } else {
+                    ff_proc_from_frame_to_dnn(task->in_frame, &input, ov_model->model->func_type, ctx);
+                }
             }
+            break;
+        case DFT_ANALYTICS_CLASSIFY:
+            ff_frame_to_dnn_classify(task->in_frame, &input, inference->bbox_index, ctx);
+            break;
+        default:
+            av_assert0(!"should not reach here");
+            break;
         }
         input.data = (uint8_t *)input.data
                      + input.width * input.height * input.channels * get_datatype_size(input.dt);
@@ -276,6 +289,13 @@ static void infer_completion_callback(void *args)
             }
             task->ov_model->model->detect_post_proc(task->out_frame, &output, 1, task->ov_model->model->filter_ctx);
             break;
+        case DFT_ANALYTICS_CLASSIFY:
+            if (!task->ov_model->model->classify_post_proc) {
+                av_log(ctx, AV_LOG_ERROR, "classify filter needs to provide post proc\n");
+                return;
+            }
+            task->ov_model->model->classify_post_proc(task->out_frame, &output, request->inferences[i]->bbox_index, task->ov_model->model->filter_ctx);
+            break;
         default:
             av_assert0(!"should not reach here");
             break;
@@ -513,7 +533,44 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input
     return DNN_ERROR;
 }
 
-static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, TaskItem *task, Queue *inference_queue)
+static int contain_valid_detection_bbox(AVFrame *frame)
+{
+    AVFrameSideData *sd;
+    const AVDetectionBBoxHeader *header;
+    const AVDetectionBBox *bbox;
+
+    sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    if (!sd) { // this frame has nothing detected
+        return 0;
+    }
+
+    if (!sd->size) {
+        return 0;
+    }
+
+    header = (const AVDetectionBBoxHeader *)sd->data;
+    if (!header->nb_bboxes) {
+        return 0;
+    }
+
+    for (uint32_t i = 0; i < header->nb_bboxes; i++) {
+        bbox = av_get_detection_bbox(header, i);
+        if (bbox->x < 0 || bbox->w < 0 || bbox->x + bbox->w >= frame->width) {
+            return 0;
+        }
+        if (bbox->y < 0 || bbox->h < 0 || bbox->y + bbox->h >= frame->width) {
+            return 0;
+        }
+
+        if (bbox->classify_count == AV_NUM_DETECTION_BBOX_CLASSIFY) {
+            return 0;
+        }
+    }
+
+    return 1;
+}
+
+static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, TaskItem *task, Queue *inference_queue, DNNExecBaseParams *exec_params)
 {
     switch (func_type) {
     case DFT_PROCESS_FRAME:
@@ -532,6 +589,45 @@ static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, Task
         }
         return DNN_SUCCESS;
     }
+    case DFT_ANALYTICS_CLASSIFY:
+    {
+        const AVDetectionBBoxHeader *header;
+        AVFrame *frame = task->in_frame;
+        AVFrameSideData *sd;
+        DNNExecClassificationParams *params = (DNNExecClassificationParams *)exec_params;
+
+        task->inference_todo = 0;
+        task->inference_done = 0;
+
+        if (!contain_valid_detection_bbox(frame)) {
+            return DNN_SUCCESS;
+        }
+
+        sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+        header = (const AVDetectionBBoxHeader *)sd->data;
+
+        for (uint32_t i = 0; i < header->nb_bboxes; i++) {
+            InferenceItem *inference;
+            const AVDetectionBBox *bbox = av_get_detection_bbox(header, i);
+
+            if (av_strncasecmp(bbox->detect_label, params->target, sizeof(bbox->detect_label)) != 0) {
+                continue;
+            }
+
+            inference = av_malloc(sizeof(*inference));
+            if (!inference) {
+                return DNN_ERROR;
+            }
+            task->inference_todo++;
+            inference->task = task;
+            inference->bbox_index = i;
+            if (ff_queue_push_back(inference_queue, inference) < 0) {
+                av_freep(&inference);
+                return DNN_ERROR;
+            }
+        }
+        return DNN_SUCCESS;
+    }
     default:
         av_assert0(!"should not reach here");
         return DNN_ERROR;
@@ -598,7 +694,7 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
-    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue, NULL) != DNN_SUCCESS) {
         av_frame_free(&out_frame);
         av_frame_free(&in_frame);
         av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
@@ -690,6 +786,14 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *
         return DNN_ERROR;
     }
 
+    if (model->func_type == DFT_ANALYTICS_CLASSIFY) {
+        // Once we add async support for tensorflow backend and native backend,
+        // we'll combine the two sync/async functions in dnn_interface.h to
+        // simplify the code in filter, and async will be an option within backends.
+        // so, do not support now, and classify filter will not call this function.
+        return DNN_ERROR;
+    }
+
     if (ctx->options.batch_size > 1) {
         avpriv_report_missing_feature(ctx, "batch mode for sync execution");
         return DNN_ERROR;
@@ -710,7 +814,7 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *
     task.out_frame = exec_params->out_frame ? exec_params->out_frame : exec_params->in_frame;
     task.ov_model = ov_model;
 
-    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue, exec_params) != DNN_SUCCESS) {
         av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
         return DNN_ERROR;
     }
@@ -730,6 +834,7 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBasePa
     OVContext *ctx = &ov_model->ctx;
     RequestItem *request;
     TaskItem *task;
+    DNNReturnType ret;
 
     if (ff_check_exec_params(ctx, DNN_OV, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
@@ -761,23 +866,25 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBasePa
         return DNN_ERROR;
     }
 
-    if (extract_inference_from_task(ov_model->model->func_type, task, ov_model->inference_queue) != DNN_SUCCESS) {
+    if (extract_inference_from_task(model->func_type, task, ov_model->inference_queue, exec_params) != DNN_SUCCESS) {
         av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
         return DNN_ERROR;
     }
 
-    if (ff_queue_size(ov_model->inference_queue) < ctx->options.batch_size) {
-        // not enough inference items queued for a batch
-        return DNN_SUCCESS;
-    }
+    while (ff_queue_size(ov_model->inference_queue) >= ctx->options.batch_size) {
+        request = ff_safe_queue_pop_front(ov_model->request_queue);
+        if (!request) {
+            av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
+            return DNN_ERROR;
+        }
 
-    request = ff_safe_queue_pop_front(ov_model->request_queue);
-    if (!request) {
-        av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
-        return DNN_ERROR;
+        ret = execute_model_ov(request, ov_model->inference_queue);
+        if (ret != DNN_SUCCESS) {
+            return ret;
+        }
     }
 
-    return execute_model_ov(request, ov_model->inference_queue);
+    return DNN_SUCCESS;
 }
 
 DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **in, AVFrame **out)
diff --git a/libavfilter/dnn/dnn_io_proc.c b/libavfilter/dnn/dnn_io_proc.c
index e104cc5064..5f60d68078 100644
--- a/libavfilter/dnn/dnn_io_proc.c
+++ b/libavfilter/dnn/dnn_io_proc.c
@@ -22,6 +22,7 @@
 #include "libavutil/imgutils.h"
 #include "libswscale/swscale.h"
 #include "libavutil/avassert.h"
+#include "libavutil/detection_bbox.h"
 
 DNNReturnType ff_proc_from_dnn_to_frame(AVFrame *frame, DNNData *output, void *log_ctx)
 {
@@ -175,6 +176,65 @@ static enum AVPixelFormat get_pixel_format(DNNData *data)
     return AV_PIX_FMT_BGR24;
 }
 
+DNNReturnType ff_frame_to_dnn_classify(AVFrame *frame, DNNData *input, uint32_t bbox_index, void *log_ctx)
+{
+    const AVPixFmtDescriptor *desc;
+    int offsetx[4], offsety[4];
+    uint8_t *bbox_data[4];
+    struct SwsContext *sws_ctx;
+    int linesizes[4];
+    enum AVPixelFormat fmt;
+    int left, top, width, height;
+    const AVDetectionBBoxHeader *header;
+    const AVDetectionBBox *bbox;
+    AVFrameSideData *sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    av_assert0(sd);
+
+    header = (const AVDetectionBBoxHeader *)sd->data;
+    bbox = av_get_detection_bbox(header, bbox_index);
+
+    left = bbox->x;
+    width = bbox->w;
+    top = bbox->y;
+    height = bbox->h;
+
+    fmt = get_pixel_format(input);
+    sws_ctx = sws_getContext(width, height, frame->format,
+                             input->width, input->height, fmt,
+                             SWS_FAST_BILINEAR, NULL, NULL, NULL);
+    if (!sws_ctx) {
+        av_log(log_ctx, AV_LOG_ERROR, "Failed to create scale context for the conversion "
+               "fmt:%s s:%dx%d -> fmt:%s s:%dx%d\n",
+               av_get_pix_fmt_name(frame->format), width, height,
+               av_get_pix_fmt_name(fmt), input->width, input->height);
+        return DNN_ERROR;
+    }
+
+    if (av_image_fill_linesizes(linesizes, fmt, input->width) < 0) {
+        av_log(log_ctx, AV_LOG_ERROR, "unable to get linesizes with av_image_fill_linesizes");
+        sws_freeContext(sws_ctx);
+        return DNN_ERROR;
+    }
+
+    desc = av_pix_fmt_desc_get(frame->format);
+    offsetx[1] = offsetx[2] = AV_CEIL_RSHIFT(left, desc->log2_chroma_w);
+    offsetx[0] = offsetx[3] = left;
+
+    offsety[1] = offsety[2] = AV_CEIL_RSHIFT(top, desc->log2_chroma_h);
+    offsety[0] = offsety[3] = top;
+
+    for (int k = 0; frame->data[k]; k++)
+        bbox_data[k] = frame->data[k] + offsety[k] * frame->linesize[k] + offsetx[k];
+
+    sws_scale(sws_ctx, (const uint8_t *const *)&bbox_data, frame->linesize,
+                       0, height,
+                       (uint8_t *const *)(&input->data), linesizes);
+
+    sws_freeContext(sws_ctx);
+
+    return DNN_SUCCESS;
+}
+
 static DNNReturnType proc_from_frame_to_dnn_analytics(AVFrame *frame, DNNData *input, void *log_ctx)
 {
     struct SwsContext *sws_ctx;
diff --git a/libavfilter/dnn/dnn_io_proc.h b/libavfilter/dnn/dnn_io_proc.h
index 91ad3cb261..16dcdd6d1a 100644
--- a/libavfilter/dnn/dnn_io_proc.h
+++ b/libavfilter/dnn/dnn_io_proc.h
@@ -32,5 +32,6 @@
 
 DNNReturnType ff_proc_from_frame_to_dnn(AVFrame *frame, DNNData *input, DNNFunctionType func_type, void *log_ctx);
 DNNReturnType ff_proc_from_dnn_to_frame(AVFrame *frame, DNNData *output, void *log_ctx);
+DNNReturnType ff_frame_to_dnn_classify(AVFrame *frame, DNNData *input, uint32_t bbox_index, void *log_ctx);
 
 #endif
diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c
index c085884eb4..52c7a5392a 100644
--- a/libavfilter/dnn_filter_common.c
+++ b/libavfilter/dnn_filter_common.c
@@ -77,6 +77,12 @@ int ff_dnn_set_detect_post_proc(DnnContext *ctx, DetectPostProc post_proc)
     return 0;
 }
 
+int ff_dnn_set_classify_post_proc(DnnContext *ctx, ClassifyPostProc post_proc)
+{
+    ctx->model->classify_post_proc = post_proc;
+    return 0;
+}
+
 DNNReturnType ff_dnn_get_input(DnnContext *ctx, DNNData *input)
 {
     return ctx->model->get_input(ctx->model->model, input, ctx->model_inputname);
@@ -112,6 +118,21 @@ DNNReturnType ff_dnn_execute_model_async(DnnContext *ctx, AVFrame *in_frame, AVF
     return (ctx->dnn_module->execute_model_async)(ctx->model, &exec_params);
 }
 
+DNNReturnType ff_dnn_execute_model_classification(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame, char *target)
+{
+    DNNExecClassificationParams class_params = {
+        {
+            .input_name     = ctx->model_inputname,
+            .output_names   = (const char **)&ctx->model_outputname,
+            .nb_output      = 1,
+            .in_frame       = in_frame,
+            .out_frame      = out_frame,
+        },
+        .target = target,
+    };
+    return (ctx->dnn_module->execute_model_async)(ctx->model, &class_params.base);
+}
+
 DNNAsyncStatusType ff_dnn_get_async_result(DnnContext *ctx, AVFrame **in_frame, AVFrame **out_frame)
 {
     return (ctx->dnn_module->get_async_result)(ctx->model, in_frame, out_frame);
diff --git a/libavfilter/dnn_filter_common.h b/libavfilter/dnn_filter_common.h
index 8deb18b39a..e7736d2bac 100644
--- a/libavfilter/dnn_filter_common.h
+++ b/libavfilter/dnn_filter_common.h
@@ -50,10 +50,12 @@ typedef struct DnnContext {
 int ff_dnn_init(DnnContext *ctx, DNNFunctionType func_type, AVFilterContext *filter_ctx);
 int ff_dnn_set_frame_proc(DnnContext *ctx, FramePrePostProc pre_proc, FramePrePostProc post_proc);
 int ff_dnn_set_detect_post_proc(DnnContext *ctx, DetectPostProc post_proc);
+int ff_dnn_set_classify_post_proc(DnnContext *ctx, ClassifyPostProc post_proc);
 DNNReturnType ff_dnn_get_input(DnnContext *ctx, DNNData *input);
 DNNReturnType ff_dnn_get_output(DnnContext *ctx, int input_width, int input_height, int *output_width, int *output_height);
 DNNReturnType ff_dnn_execute_model(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame);
 DNNReturnType ff_dnn_execute_model_async(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_classification(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame, char *target);
 DNNAsyncStatusType ff_dnn_get_async_result(DnnContext *ctx, AVFrame **in_frame, AVFrame **out_frame);
 DNNReturnType ff_dnn_flush(DnnContext *ctx);
 void ff_dnn_uninit(DnnContext *ctx);
diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h
index 941670675d..799244ee14 100644
--- a/libavfilter/dnn_interface.h
+++ b/libavfilter/dnn_interface.h
@@ -52,7 +52,7 @@ typedef enum {
     DFT_NONE,
     DFT_PROCESS_FRAME,      // process the whole frame
     DFT_ANALYTICS_DETECT,   // detect from the whole frame
-    // we can add more such as detect_from_crop, classify_from_bbox, etc.
+    DFT_ANALYTICS_CLASSIFY, // classify for each bounding box
 }DNNFunctionType;
 
 typedef struct DNNData{
@@ -71,8 +71,14 @@ typedef struct DNNExecBaseParams {
     AVFrame *out_frame;
 } DNNExecBaseParams;
 
+typedef struct DNNExecClassificationParams {
+    DNNExecBaseParams base;
+    const char *target;
+} DNNExecClassificationParams;
+
 typedef int (*FramePrePostProc)(AVFrame *frame, DNNData *model, AVFilterContext *filter_ctx);
 typedef int (*DetectPostProc)(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx);
+typedef int (*ClassifyPostProc)(AVFrame *frame, DNNData *output, uint32_t bbox_index, AVFilterContext *filter_ctx);
 
 typedef struct DNNModel{
     // Stores model that can be different for different backends.
@@ -97,6 +103,8 @@ typedef struct DNNModel{
     FramePrePostProc frame_post_proc;
     // set the post process to interpret detect result from DNNData
     DetectPostProc detect_post_proc;
+    // set the post process to interpret classify result from DNNData
+    ClassifyPostProc classify_post_proc;
 } DNNModel;
 
 // Stores pointers to functions for loading, executing, freeing DNN models for one of the backends.

From patchwork Sun Apr 18 10:08:02 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 26957
Delivered-To: andriy.gelman@gmail.com
Received: by 2002:a25:49c5:0:0:0:0:0 with SMTP id w188csp229123yba;
        Sun, 18 Apr 2021 03:20:56 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJwzft5xHULmdJeQCRSe7Tq55UvwxtLYESMWiwr/tdbKOimI3k1VC+PUl7t6gpZ7XrsBqS0Q
X-Received: by 2002:a17:906:b0cd:: with SMTP id
 bk13mr11790539ejb.184.1618741256666;
        Sun, 18 Apr 2021 03:20:56 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1618741256; cv=none;
        d=google.com; s=arc-20160816;
        b=kyVatLQPCXAMXjzfmf+ed4XjjeX/Aq1NWz/ea+jwGjsGCKvPZNRb4ghzHqZTDnQ8oo
         MoPSlr9xU0Y+6y8UVpYV7IL/YW0czqW5hA6lSznAeHCUdn5rXCQ2CjrhyMxUrwLvCz9o
         +fzKzrGay2S9muluIwEzH0wip5vpbRxQJC+7PyvShaKTdPEEwAdf8WEPa9Gp6aYNV1k1
         OAdwBP55J7wyvXxT/DxNBoFVXnVqcV2iSpdgdu5gU+49P4HlieWjBTE8TxyeI4aAHp3T
         1SrvCVgJq9XhfrbTm/FTmkdTliEhmD7Jo3sMWEUMEIpTb/54oVgFXby7zvCttFAhJMeq
         L0FQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=ho0AhNzemsNoVr4oBKGbSK1Nsz2z7XiHKt4gKI5im1Y=;
        b=Del84akARPrUE8ELChfYiP72zaBZJmbA6pH9EZ7b5j4YDxKjekqT9C7IPRWTxhx8tH
         8eBgk1IMF9ntZmTlyptoszEPCBNvvNo7SI3pV2okwpzsDuRiwQeiRBFAlAkkUaEbNJIi
         pOFkpIpmIjazZbGYG4fo3yOqhLaW9I/2Zf91ZiJa7/evoRu/SHk/BrKZg2DRuE4dj8Rc
         57aZPcw7XeSocC++xXAD+NZZgOJmDid/2F0o91k0rAo04U+r7sypVKTQYCRSkzkB5zxU
         1J5EDoZFkomBlLBuFY7f8j2wSWBX1L0l6J00LOGIDM86kZZbupOHX4M8kBqyIftT9e0/
         tQlw==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 b1si10515133ejb.714.2021.04.18.03.20.56;
        Sun, 18 Apr 2021 03:20:56 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 283A6680B17;
	Sun, 18 Apr 2021 13:20:12 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3C8856809D2
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Apr 2021 13:20:03 +0300 (EEST)
IronPort-SDR: 
 8l7vHjzNlWB514JXTBrpXvLvr9k2HoxwHAw+37Auc6HzM4MIiKwBO8G/oMRSTcjX65mhBWBVll
 qj28Elx+UwJw==
X-IronPort-AV: E=McAfee;i="6200,9189,9957"; a="256523549"
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="256523549"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 Apr 2021 03:19:57 -0700
IronPort-SDR: 
 OqxfeA8KCT/2opOy28KUTXtvWt0Jw1J+Gs4lvXnIwZBi8d3x3uzih8pRmrw+gkg9pa4zldJwDU
 DFVBdKbWDrNg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,231,1613462400"; d="scan'208";a="453918934"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga002.fm.intel.com with ESMTP; 18 Apr 2021 03:19:56 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Apr 2021 18:08:02 +0800
Message-Id: <20210418100802.19017-6-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210418100802.19017-1-yejun.guo@intel.com>
References: <20210418100802.19017-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH 6/6] lavfi/dnn_classify: add filter
 dnn_classify for classification based on detection bounding boxes
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: l8ZSXALcmFBq
Content-Length: 16023

classification is done on every detection bounding box in frame's side data,
which are the results of object detection (filter dnn_detect).

Please refer to commit log of dnn_detect for the material for detection,
and see below for classification.

- download material for classifcation:
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.bin
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.xml
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.label

- run command as:
./ffmpeg -i cici.jpg -vf dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:input=data:output=detection_out:confidence=0.6:labels=face-detection-adas-0001.label,dnn_classify=dnn_backend=openvino:model=emotions-recognition-retail-0003.xml:input=data:output=prob_emotion:confidence=0.3:labels=emotions-recognition-retail-0003.label:target=face,showinfo -f null -

We'll see the detect&classify result as below:
[Parsed_showinfo_2 @ 0x55b7d25e77c0]   side data - detection bounding boxes:
[Parsed_showinfo_2 @ 0x55b7d25e77c0] source: face-detection-adas-0001.xml, emotions-recognition-retail-0003.xml
[Parsed_showinfo_2 @ 0x55b7d25e77c0] index: 0,  region: (1005, 813) -> (1086, 905), label: face, confidence: 10000/10000.
[Parsed_showinfo_2 @ 0x55b7d25e77c0]            classify:  label: happy, confidence: 6757/10000.
[Parsed_showinfo_2 @ 0x55b7d25e77c0] index: 1,  region: (888, 839) -> (967, 926), label: face, confidence: 6917/10000.
[Parsed_showinfo_2 @ 0x55b7d25e77c0]            classify:  label: anger, confidence: 4320/10000.

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 configure                     |   1 +
 doc/filters.texi              |  36 ++++
 libavfilter/Makefile          |   1 +
 libavfilter/allfilters.c      |   1 +
 libavfilter/vf_dnn_classify.c | 330 ++++++++++++++++++++++++++++++++++
 5 files changed, 369 insertions(+)
 create mode 100644 libavfilter/vf_dnn_classify.c

diff --git a/configure b/configure
index cc1013fb1d..d1fc0d05a7 100755
--- a/configure
+++ b/configure
@@ -3555,6 +3555,7 @@ derain_filter_select="dnn"
 deshake_filter_select="pixelutils"
 deshake_opencl_filter_deps="opencl"
 dilation_opencl_filter_deps="opencl"
+dnn_classify_filter_select="dnn"
 dnn_detect_filter_select="dnn"
 dnn_processing_filter_select="dnn"
 drawtext_filter_deps="libfreetype"
diff --git a/doc/filters.texi b/doc/filters.texi
index 68f17dd563..9975db7326 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -10127,6 +10127,42 @@ ffmpeg -i INPUT -f lavfi -i nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2
 @end example
 @end itemize
 
+@section dnn_classify
+
+Do classification with deep neural networks based on bounding boxes.
+
+The filter accepts the following options:
+
+@table @option
+@item dnn_backend
+Specify which DNN backend to use for model loading and execution. This option accepts
+only openvino now, tensorflow backends will be added.
+
+@item model
+Set path to model file specifying network architecture and its parameters.
+Note that different backends use different file formats.
+
+@item input
+Set the input name of the dnn network.
+
+@item output
+Set the output name of the dnn network.
+
+@item confidence
+Set the confidence threshold (default: 0.5).
+
+@item labels
+Set path to label file specifying the mapping between label id and name.
+Each label name is written in one line, tailing spaces and empty lines are skipped.
+The first line is the name of label id 0,
+and the second line is the name of label id 1, etc.
+The label id is considered as name if the label file is not provided.
+
+@item backend_configs
+Set the configs to be passed into backend
+
+@end table
+
 @section dnn_detect
 
 Do object detection with deep neural networks.
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index b77f2276a4..dd4decdd71 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -245,6 +245,7 @@ OBJS-$(CONFIG_DILATION_FILTER)               += vf_neighbor.o
 OBJS-$(CONFIG_DILATION_OPENCL_FILTER)        += vf_neighbor_opencl.o opencl.o \
                                                 opencl/neighbor.o
 OBJS-$(CONFIG_DISPLACE_FILTER)               += vf_displace.o framesync.o
+OBJS-$(CONFIG_DNN_CLASSIFY_FILTER)           += vf_dnn_classify.o
 OBJS-$(CONFIG_DNN_DETECT_FILTER)             += vf_dnn_detect.o
 OBJS-$(CONFIG_DNN_PROCESSING_FILTER)         += vf_dnn_processing.o
 OBJS-$(CONFIG_DOUBLEWEAVE_FILTER)            += vf_weave.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 0d2bf7bbee..9b24a2da29 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -230,6 +230,7 @@ extern AVFilter ff_vf_detelecine;
 extern AVFilter ff_vf_dilation;
 extern AVFilter ff_vf_dilation_opencl;
 extern AVFilter ff_vf_displace;
+extern AVFilter ff_vf_dnn_classify;
 extern AVFilter ff_vf_dnn_detect;
 extern AVFilter ff_vf_dnn_processing;
 extern AVFilter ff_vf_doubleweave;
diff --git a/libavfilter/vf_dnn_classify.c b/libavfilter/vf_dnn_classify.c
new file mode 100644
index 0000000000..dd61f743d6
--- /dev/null
+++ b/libavfilter/vf_dnn_classify.c
@@ -0,0 +1,330 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * implementing an classification filter using deep learning networks.
+ */
+
+#include "libavformat/avio.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/avassert.h"
+#include "libavutil/imgutils.h"
+#include "filters.h"
+#include "dnn_filter_common.h"
+#include "formats.h"
+#include "internal.h"
+#include "libavutil/time.h"
+#include "libavutil/avstring.h"
+#include "libavutil/detection_bbox.h"
+
+typedef struct DnnClassifyContext {
+    const AVClass *class;
+    DnnContext dnnctx;
+    float confidence;
+    char *labels_filename;
+    char *target;
+    char **labels;
+    int label_count;
+} DnnClassifyContext;
+
+#define OFFSET(x) offsetof(DnnClassifyContext, dnnctx.x)
+#define OFFSET2(x) offsetof(DnnClassifyContext, x)
+#define FLAGS AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
+static const AVOption dnn_classify_options[] = {
+    { "dnn_backend", "DNN backend",                OFFSET(backend_type),     AV_OPT_TYPE_INT,       { .i64 = 2 },    INT_MIN, INT_MAX, FLAGS, "backend" },
+#if (CONFIG_LIBOPENVINO == 1)
+    { "openvino",    "openvino backend flag",      0,                        AV_OPT_TYPE_CONST,     { .i64 = 2 },    0, 0, FLAGS, "backend" },
+#endif
+    DNN_COMMON_OPTIONS
+    { "confidence",  "threshold of confidence",    OFFSET2(confidence),      AV_OPT_TYPE_FLOAT,     { .dbl = 0.5 },  0, 1, FLAGS},
+    { "labels",      "path to labels file",        OFFSET2(labels_filename), AV_OPT_TYPE_STRING,    { .str = NULL }, 0, 0, FLAGS },
+    { "target",      "which one to be classified", OFFSET2(target),          AV_OPT_TYPE_STRING,    { .str = NULL }, 0, 0, FLAGS },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(dnn_classify);
+
+static int dnn_classify_post_proc(AVFrame *frame, DNNData *output, uint32_t bbox_index, AVFilterContext *filter_ctx)
+{
+    DnnClassifyContext *ctx = filter_ctx->priv;
+    float conf_threshold = ctx->confidence;
+    AVDetectionBBoxHeader *header;
+    AVDetectionBBox *bbox;
+    float *classifications;
+    uint32_t label_id;
+    float confidence;
+    AVFrameSideData *sd;
+
+    if (output->channels <= 0) {
+        return -1;
+    }
+
+    sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    header = (AVDetectionBBoxHeader *)sd->data;
+
+    if (bbox_index == 0) {
+        av_strlcat(header->source, ", ", sizeof(header->source));
+        av_strlcat(header->source, ctx->dnnctx.model_filename, sizeof(header->source));
+    }
+
+    classifications = output->data;
+    label_id = 0;
+    confidence= classifications[0];
+    for (int i = 1; i < output->channels; i++) {
+        if (classifications[i] > confidence) {
+            label_id = i;
+            confidence= classifications[i];
+        }
+    }
+
+    if (confidence < conf_threshold) {
+        return 0;
+    }
+
+    bbox = av_get_detection_bbox(header, bbox_index);
+    bbox->classify_confidences[bbox->classify_count] = av_make_q((int)(confidence * 10000), 10000);
+
+    if (ctx->labels && label_id < ctx->label_count) {
+        av_strlcpy(bbox->classify_labels[bbox->classify_count], ctx->labels[label_id], sizeof(bbox->classify_labels[bbox->classify_count]));
+    } else {
+        snprintf(bbox->classify_labels[bbox->classify_count], sizeof(bbox->classify_labels[bbox->classify_count]), "%d", label_id);
+    }
+
+    bbox->classify_count++;
+
+    return 0;
+}
+
+static void free_classify_labels(DnnClassifyContext *ctx)
+{
+    for (int i = 0; i < ctx->label_count; i++) {
+        av_freep(&ctx->labels[i]);
+    }
+    ctx->label_count = 0;
+    av_freep(&ctx->labels);
+}
+
+static int read_classify_label_file(AVFilterContext *context)
+{
+    int line_len;
+    FILE *file;
+    DnnClassifyContext *ctx = context->priv;
+
+    file = av_fopen_utf8(ctx->labels_filename, "r");
+    if (!file){
+        av_log(context, AV_LOG_ERROR, "failed to open file %s\n", ctx->labels_filename);
+        return AVERROR(EINVAL);
+    }
+
+    while (!feof(file)) {
+        char *label;
+        char buf[256];
+        if (!fgets(buf, 256, file)) {
+            break;
+        }
+
+        line_len = strlen(buf);
+        while (line_len) {
+            int i = line_len - 1;
+            if (buf[i] == '\n' || buf[i] == '\r' || buf[i] == ' ') {
+                buf[i] = '\0';
+                line_len--;
+            } else {
+                break;
+            }
+        }
+
+        if (line_len == 0)  // empty line
+            continue;
+
+        if (line_len >= AV_DETECTION_BBOX_LABEL_NAME_MAX_SIZE) {
+            av_log(context, AV_LOG_ERROR, "label %s too long\n", buf);
+            fclose(file);
+            return AVERROR(EINVAL);
+        }
+
+        label = av_strdup(buf);
+        if (!label) {
+            av_log(context, AV_LOG_ERROR, "failed to allocate memory for label %s\n", buf);
+            fclose(file);
+            return AVERROR(ENOMEM);
+        }
+
+        if (av_dynarray_add_nofree(&ctx->labels, &ctx->label_count, label) < 0) {
+            av_log(context, AV_LOG_ERROR, "failed to do av_dynarray_add\n");
+            fclose(file);
+            av_freep(&label);
+            return AVERROR(ENOMEM);
+        }
+    }
+
+    fclose(file);
+    return 0;
+}
+
+static av_cold int dnn_classify_init(AVFilterContext *context)
+{
+    DnnClassifyContext *ctx = context->priv;
+    int ret = ff_dnn_init(&ctx->dnnctx, DFT_ANALYTICS_CLASSIFY, context);
+    if (ret < 0)
+        return ret;
+    ff_dnn_set_classify_post_proc(&ctx->dnnctx, dnn_classify_post_proc);
+
+    if (ctx->labels_filename) {
+        return read_classify_label_file(context);
+    }
+    return 0;
+}
+
+static int dnn_classify_query_formats(AVFilterContext *context)
+{
+    static const enum AVPixelFormat pix_fmts[] = {
+        AV_PIX_FMT_RGB24, AV_PIX_FMT_BGR24,
+        AV_PIX_FMT_GRAY8, AV_PIX_FMT_GRAYF32,
+        AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUV422P,
+        AV_PIX_FMT_YUV444P, AV_PIX_FMT_YUV410P, AV_PIX_FMT_YUV411P,
+        AV_PIX_FMT_NV12,
+        AV_PIX_FMT_NONE
+    };
+    AVFilterFormats *fmts_list = ff_make_format_list(pix_fmts);
+    return ff_set_common_formats(context, fmts_list);
+}
+
+static int dnn_classify_flush_frame(AVFilterLink *outlink, int64_t pts, int64_t *out_pts)
+{
+    DnnClassifyContext *ctx = outlink->src->priv;
+    int ret;
+    DNNAsyncStatusType async_state;
+
+    ret = ff_dnn_flush(&ctx->dnnctx);
+    if (ret != DNN_SUCCESS) {
+        return -1;
+    }
+
+    do {
+        AVFrame *in_frame = NULL;
+        AVFrame *out_frame = NULL;
+        async_state = ff_dnn_get_async_result(&ctx->dnnctx, &in_frame, &out_frame);
+        if (out_frame) {
+            av_assert0(in_frame == out_frame);
+            ret = ff_filter_frame(outlink, out_frame);
+            if (ret < 0)
+                return ret;
+            if (out_pts)
+                *out_pts = out_frame->pts + pts;
+        }
+        av_usleep(5000);
+    } while (async_state >= DAST_NOT_READY);
+
+    return 0;
+}
+
+static int dnn_classify_activate(AVFilterContext *filter_ctx)
+{
+    AVFilterLink *inlink = filter_ctx->inputs[0];
+    AVFilterLink *outlink = filter_ctx->outputs[0];
+    DnnClassifyContext *ctx = filter_ctx->priv;
+    AVFrame *in = NULL;
+    int64_t pts;
+    int ret, status;
+    int got_frame = 0;
+    int async_state;
+
+    FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
+
+    do {
+        // drain all input frames
+        ret = ff_inlink_consume_frame(inlink, &in);
+        if (ret < 0)
+            return ret;
+        if (ret > 0) {
+            if (ff_dnn_execute_model_classification(&ctx->dnnctx, in, in, ctx->target) != DNN_SUCCESS) {
+                return AVERROR(EIO);
+            }
+        }
+    } while (ret > 0);
+
+    // drain all processed frames
+    do {
+        AVFrame *in_frame = NULL;
+        AVFrame *out_frame = NULL;
+        async_state = ff_dnn_get_async_result(&ctx->dnnctx, &in_frame, &out_frame);
+        if (out_frame) {
+            av_assert0(in_frame == out_frame);
+            ret = ff_filter_frame(outlink, out_frame);
+            if (ret < 0)
+                return ret;
+            got_frame = 1;
+        }
+    } while (async_state == DAST_SUCCESS);
+
+    // if frame got, schedule to next filter
+    if (got_frame)
+        return 0;
+
+    if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
+        if (status == AVERROR_EOF) {
+            int64_t out_pts = pts;
+            ret = dnn_classify_flush_frame(outlink, pts, &out_pts);
+            ff_outlink_set_status(outlink, status, out_pts);
+            return ret;
+        }
+    }
+
+    FF_FILTER_FORWARD_WANTED(outlink, inlink);
+
+    return 0;
+}
+
+static av_cold void dnn_classify_uninit(AVFilterContext *context)
+{
+    DnnClassifyContext *ctx = context->priv;
+    ff_dnn_uninit(&ctx->dnnctx);
+    free_classify_labels(ctx);
+}
+
+static const AVFilterPad dnn_classify_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+    },
+    { NULL }
+};
+
+static const AVFilterPad dnn_classify_outputs[] = {
+    {
+        .name = "default",
+        .type = AVMEDIA_TYPE_VIDEO,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_dnn_classify = {
+    .name          = "dnn_classify",
+    .description   = NULL_IF_CONFIG_SMALL("Apply DNN classify filter to the input."),
+    .priv_size     = sizeof(DnnClassifyContext),
+    .init          = dnn_classify_init,
+    .uninit        = dnn_classify_uninit,
+    .query_formats = dnn_classify_query_formats,
+    .inputs        = dnn_classify_inputs,
+    .outputs       = dnn_classify_outputs,
+    .priv_class    = &dnn_classify_class,
+    .activate      = dnn_classify_activate,
+};