From patchwork Thu Apr 29 13:36:52 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 27479
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a11:4023:0:0:0:0 with SMTP id ky35csp1492491pxb;
        Thu, 29 Apr 2021 06:49:25 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxst+nGImN6OWjk0ehZ9C7ep16go0us+9z2qsfntWuGcU0p+QrKufwkD6Ey86sXeg840GZ+
X-Received: by 2002:a05:6402:708:: with SMTP id
 w8mr16033289edx.49.1619704165334;
        Thu, 29 Apr 2021 06:49:25 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1619704165; cv=none;
        d=google.com; s=arc-20160816;
        b=Bn4wiboOMIp4cWIPNMFzJOTy8fBqKsXxE8bv069cvzbQcIGkLZVAZypQCw37ZtgjYt
         6NVSxiDIY6ehnJCzK+Nj+4gJqv2/edmNh5tIRigVcAsgFajo2m2eeop03zSK8KEaiuL4
         flsx70uAao+4qy41pZJqpc9Mf2PfCYSsKK31YvGNdSBM1Jpv/xRrJHk+g7btYTSURSNc
         5pTb0bup9jB+Nf/0t83/HCtOUuLyH/IDThkpZSvk1DULwMs90CdIrC7rVQ2l/EpdoitE
         5LP0KU8M8Fovxqjr2eNk9wOmRp7/LeGUD7mhAPuwyy1tDxUFgRhOP+5hyZ42CSMX3c+9
         VmgQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:message-id:date:to:from:ironport-sdr
         :ironport-sdr:delivered-to;
        bh=j9gA5EKgOScTWgb42TaV7aSbh1fcIhLr9og75ANqlmg=;
        b=KiVK1S9DncS7/HYovCKkFTA0yqfmloPtzEnLcSfWxkbd7aQZ/1k3Pnuvs8XMIQjGay
         Btpolxyy0dCe74NOrfFYKa4HqzLXEExpelP+CxqYRM49eD7ZJZRog3fP6H+GSPic7dnw
         HJ98WQxhpfYxNtTaMLTuMzTUKb83dXKkQIFretuDIfq7Np75LCAZu5WjEm2MCsnnuUAz
         K/q2kjdYCHDU+FKzgA5Jt8cX8cjYKAR4GsJPQvVpRrVae+/nIIhWX1OLKETlx7PF0uAo
         pqDL+CT8WIgusNdaRECzwBckSoR9PsZa18fp/RlpkdxyKrl7dg2S/oLubX0WYMXwaQ/v
         8AuA==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id eb8si3577168edb.49.2021.04.29.06.49.24;
        Thu, 29 Apr 2021 06:49:25 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1AC2968A0B7;
	Thu, 29 Apr 2021 16:49:20 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E8B08680545
 for <ffmpeg-devel@ffmpeg.org>; Thu, 29 Apr 2021 16:49:12 +0300 (EEST)
IronPort-SDR: 
 Txo32bYbGSUNyuZQLXeUWztdE9ZKl5xw8y735IsnbQpEM77qTpthl03CQLucIYM427xYDNPKlI
 lomMi7rOJlUw==
X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="260956485"
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="260956485"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Apr 2021 06:49:08 -0700
IronPort-SDR: 
 KH1JG41OoD2GNe5M7DwBqIANpt4/pwQ+kQKEbrafUb0LCU2YMLKMC4BPfR1kMbkiv8XrI70Beg
 ZeFMQw8pK3HA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="424096031"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga008.fm.intel.com with ESMTP; 29 Apr 2021 06:49:05 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 29 Apr 2021 21:36:52 +0800
Message-Id: <20210429133657.23076-1-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
Subject: [FFmpeg-devel] [PATCH V2 1/6] lavfi/dnn_backend_openvino.c: unify
 code for infer request for sync/async
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: mXmC6vayGqwK

---
the main change of V2 in this patch set is to rebase with latest code
by resolving the conflicts

 libavfilter/dnn/dnn_backend_openvino.c | 49 +++++++++++---------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index a8032fe56b..267c154c87 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -52,9 +52,6 @@ typedef struct OVModel{
     ie_core_t *core;
     ie_network_t *network;
     ie_executable_network_t *exe_network;
-    ie_infer_request_t *infer_request;
-
-    /* for async execution */
     SafeQueue *request_queue;   // holds RequestItem
     Queue *task_queue;          // holds TaskItem
 } OVModel;
@@ -269,12 +266,9 @@ static void infer_completion_callback(void *args)
     ie_blob_free(&output_blob);
 
     request->task_count = 0;
-
-    if (task->async) {
-        if (ff_safe_queue_push_back(requestq, request) < 0) {
-            av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
-            return;
-        }
+    if (ff_safe_queue_push_back(requestq, request) < 0) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
+        return;
     }
 }
 
@@ -347,11 +341,6 @@ static DNNReturnType init_model_ov(OVModel *ov_model, const char *input_name, co
         goto err;
     }
 
-    // create infer_request for sync execution
-    status = ie_exec_network_create_infer_request(ov_model->exe_network, &ov_model->infer_request);
-    if (status != OK)
-        goto err;
-
     // create infer_requests for async execution
     if (ctx->options.nireq <= 0) {
         // the default value is a rough estimation
@@ -502,10 +491,9 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     OVModel *ov_model = model;
     OVContext *ctx = &ov_model->ctx;
     TaskItem task;
-    RequestItem request;
+    RequestItem *request;
     AVFrame *in_frame = NULL;
     AVFrame *out_frame = NULL;
-    TaskItem *ptask = &task;
     IEStatusCode status;
     input_shapes_t input_shapes;
 
@@ -557,11 +545,16 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
-    request.infer_request = ov_model->infer_request;
-    request.task_count = 1;
-    request.tasks = &ptask;
+    request = ff_safe_queue_pop_front(ov_model->request_queue);
+    if (!request) {
+        av_frame_free(&out_frame);
+        av_frame_free(&in_frame);
+        av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
+        return DNN_ERROR;
+    }
+    request->tasks[request->task_count++] = &task;
 
-    ret = execute_model_ov(&request);
+    ret = execute_model_ov(request);
     *output_width = out_frame->width;
     *output_height = out_frame->height;
 
@@ -633,8 +626,7 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     OVModel *ov_model = model->model;
     OVContext *ctx = &ov_model->ctx;
     TaskItem task;
-    RequestItem request;
-    TaskItem *ptask = &task;
+    RequestItem *request;
 
     if (!in_frame) {
         av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
@@ -674,11 +666,14 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
-    request.infer_request = ov_model->infer_request;
-    request.task_count = 1;
-    request.tasks = &ptask;
+    request = ff_safe_queue_pop_front(ov_model->request_queue);
+    if (!request) {
+        av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
+        return DNN_ERROR;
+    }
+    request->tasks[request->task_count++] = &task;
 
-    return execute_model_ov(&request);
+    return execute_model_ov(request);
 }
 
 DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
@@ -821,8 +816,6 @@ void ff_dnn_free_model_ov(DNNModel **model)
         }
         ff_queue_destroy(ov_model->task_queue);
 
-        if (ov_model->infer_request)
-            ie_infer_request_free(&ov_model->infer_request);
         if (ov_model->exe_network)
             ie_exec_network_free(&ov_model->exe_network);
         if (ov_model->network)

From patchwork Thu Apr 29 13:36:53 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 27477
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a11:4023:0:0:0:0 with SMTP id ky35csp1492653pxb;
        Thu, 29 Apr 2021 06:49:36 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJzto0mywXSNFmGawybKYQjDLBW+1KILVuajeHS15SXgJNtKxTYfbzQdI2USYyo0KT7TLeEY
X-Received: by 2002:a17:906:6789:: with SMTP id
 q9mr35250266ejp.295.1619704176597;
        Thu, 29 Apr 2021 06:49:36 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1619704176; cv=none;
        d=google.com; s=arc-20160816;
        b=crxzx40srWgbfmkrRkP4GUmuwiJjq/AK6jFNMC2OCxv8Ad0WF7c4lpVbEPeKRXkbd3
         AJ1GmwhQJXF6rfWPsISeIUwXx2peF1/N7rxfa1EVkcFTMDQ+eQb4FdkpF/Grdbzu4kNB
         tr0Ye278qpJpZrzuvd3u8KTbCDU68oc3jd6LtXcxvCg4Q6HD89rF8IotU7lwcyff0in6
         P16N+EQegfIYaREQLsE/DOAuPmTE6EtYmDRdi+armNl8pCNEp9SGOXVWMB+VgbS9fACE
         eQgoLkDP56FR4cu5Qzg0RCEktHQ0tQEQk7nUu1Lu6xQyDC8EZxufiB+MCy2qRCPLiF5J
         s7qw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=UJ+xJFWP5vGK8RAM4qrEzk55L0RSwne0WMCZO82ebN0=;
        b=RpM2juTwJaZZBaXzF9mSUGxZ+XUCKq8uoodqLIpa5pZbouaFnFvyCFJbfqwsLIqyhv
         8uxsZYjV1IZx79dO+Yn40/e2ww6m2M1ld6U+Ntf+xbJo2Tk8fUep+3lUUvrtzpxnd09t
         psV3ki5nINnKRWIf1ij5WxPeY2KzY4XNB+VNLIGnuBs9ASXU3Ki5chwN9O0hMe6wS8O2
         EZx+zaeltzgt8WROBnb6w+VFiKvRR1W+UPg+DHVp9jE2jm68n7Nx6OF10oRpmppZcLds
         C6HRO0rUADhLX3VSDfCaVOfySN53xxWX5j8uEWrO3tR5JKSkvL3Qs/7XSx6rkiu3RK91
         owfA==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id 8si42777ejx.447.2021.04.29.06.49.35;
        Thu, 29 Apr 2021 06:49:36 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 357ED68A0E9;
	Thu, 29 Apr 2021 16:49:22 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6CC70689D64
 for <ffmpeg-devel@ffmpeg.org>; Thu, 29 Apr 2021 16:49:14 +0300 (EEST)
IronPort-SDR: 
 sH8n8QTd8e5dJYWnIIK647O3WuqMNJtVhLZFXSkwJmP2nKlQSb3QEIdXm3dvhKF5fXUos5x2PW
 0CYD24olglMA==
X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="260956492"
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="260956492"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Apr 2021 06:49:10 -0700
IronPort-SDR: 
 ldSMSIyu+nU7RRIsqUQn6rtoXzRXbvE+NCR7GM6FmVF7GRyJJq7jSMDZ5y3D8YiI9f4VY2PdYD
 stP7KIrpHYTg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="424096048"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga008.fm.intel.com with ESMTP; 29 Apr 2021 06:49:07 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 29 Apr 2021 21:36:53 +0800
Message-Id: <20210429133657.23076-2-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210429133657.23076-1-yejun.guo@intel.com>
References: <20210429133657.23076-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH V2 2/6] lavfi/dnn_backend_openvino.c: add
 InferenceItem between TaskItem and RequestItem
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: LXoT8IkGu+Ef

There's one task item for one function call from dnn interface,
there's one request item for one call to openvino. For classify,
one task might need multiple inference for classification on every
bounding box, so add InferenceItem.
---
 libavfilter/dnn/dnn_backend_openvino.c | 157 ++++++++++++++++++-------
 1 file changed, 115 insertions(+), 42 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index 267c154c87..a8a02d7589 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -54,8 +54,10 @@ typedef struct OVModel{
     ie_executable_network_t *exe_network;
     SafeQueue *request_queue;   // holds RequestItem
     Queue *task_queue;          // holds TaskItem
+    Queue *inference_queue;     // holds InferenceItem
 } OVModel;
 
+// one task for one function call from dnn interface
 typedef struct TaskItem {
     OVModel *ov_model;
     const char *input_name;
@@ -64,13 +66,20 @@ typedef struct TaskItem {
     AVFrame *out_frame;
     int do_ioproc;
     int async;
-    int done;
+    uint32_t inference_todo;
+    uint32_t inference_done;
 } TaskItem;
 
+// one task might have multiple inferences
+typedef struct InferenceItem {
+    TaskItem *task;
+} InferenceItem;
+
+// one request for one call to openvino
 typedef struct RequestItem {
     ie_infer_request_t *infer_request;
-    TaskItem **tasks;
-    int task_count;
+    InferenceItem **inferences;
+    uint32_t inference_count;
     ie_complete_call_back_t callback;
 } RequestItem;
 
@@ -127,7 +136,12 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request
     IEStatusCode status;
     DNNData input;
     ie_blob_t *input_blob = NULL;
-    TaskItem *task = request->tasks[0];
+    InferenceItem *inference;
+    TaskItem *task;
+
+    inference = ff_queue_peek_front(ov_model->inference_queue);
+    av_assert0(inference);
+    task = inference->task;
 
     status = ie_infer_request_get_blob(request->infer_request, task->input_name, &input_blob);
     if (status != OK) {
@@ -159,9 +173,14 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request
     // change to be an option when necessary.
     input.order = DCO_BGR;
 
-    av_assert0(request->task_count <= dims.dims[0]);
-    for (int i = 0; i < request->task_count; ++i) {
-        task = request->tasks[i];
+    for (int i = 0; i < ctx->options.batch_size; ++i) {
+        inference = ff_queue_pop_front(ov_model->inference_queue);
+        if (!inference) {
+            break;
+        }
+        request->inferences[i] = inference;
+        request->inference_count = i + 1;
+        task = inference->task;
         if (task->do_ioproc) {
             if (ov_model->model->frame_pre_proc != NULL) {
                 ov_model->model->frame_pre_proc(task->in_frame, &input, ov_model->model->filter_ctx);
@@ -183,7 +202,8 @@ static void infer_completion_callback(void *args)
     precision_e precision;
     IEStatusCode status;
     RequestItem *request = args;
-    TaskItem *task = request->tasks[0];
+    InferenceItem *inference = request->inferences[0];
+    TaskItem *task = inference->task;
     SafeQueue *requestq = task->ov_model->request_queue;
     ie_blob_t *output_blob = NULL;
     ie_blob_buffer_t blob_buffer;
@@ -229,10 +249,11 @@ static void infer_completion_callback(void *args)
     output.dt       = precision_to_datatype(precision);
     output.data     = blob_buffer.buffer;
 
-    av_assert0(request->task_count <= dims.dims[0]);
-    av_assert0(request->task_count >= 1);
-    for (int i = 0; i < request->task_count; ++i) {
-        task = request->tasks[i];
+    av_assert0(request->inference_count <= dims.dims[0]);
+    av_assert0(request->inference_count >= 1);
+    for (int i = 0; i < request->inference_count; ++i) {
+        task = request->inferences[i]->task;
+        task->inference_done++;
 
         switch (task->ov_model->model->func_type) {
         case DFT_PROCESS_FRAME:
@@ -259,13 +280,13 @@ static void infer_completion_callback(void *args)
             break;
         }
 
-        task->done = 1;
+        av_freep(&request->inferences[i]);
         output.data = (uint8_t *)output.data
                       + output.width * output.height * output.channels * get_datatype_size(output.dt);
     }
     ie_blob_free(&output_blob);
 
-    request->task_count = 0;
+    request->inference_count = 0;
     if (ff_safe_queue_push_back(requestq, request) < 0) {
         av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
         return;
@@ -370,11 +391,11 @@ static DNNReturnType init_model_ov(OVModel *ov_model, const char *input_name, co
             goto err;
         }
 
-        item->tasks = av_malloc_array(ctx->options.batch_size, sizeof(*item->tasks));
-        if (!item->tasks) {
+        item->inferences = av_malloc_array(ctx->options.batch_size, sizeof(*item->inferences));
+        if (!item->inferences) {
             goto err;
         }
-        item->task_count = 0;
+        item->inference_count = 0;
     }
 
     ov_model->task_queue = ff_queue_create();
@@ -382,6 +403,11 @@ static DNNReturnType init_model_ov(OVModel *ov_model, const char *input_name, co
         goto err;
     }
 
+    ov_model->inference_queue = ff_queue_create();
+    if (!ov_model->inference_queue) {
+        goto err;
+    }
+
     return DNN_SUCCESS;
 
 err:
@@ -389,15 +415,24 @@ err:
     return DNN_ERROR;
 }
 
-static DNNReturnType execute_model_ov(RequestItem *request)
+static DNNReturnType execute_model_ov(RequestItem *request, Queue *inferenceq)
 {
     IEStatusCode status;
     DNNReturnType ret;
-    TaskItem *task = request->tasks[0];
-    OVContext *ctx = &task->ov_model->ctx;
+    InferenceItem *inference;
+    TaskItem *task;
+    OVContext *ctx;
+
+    if (ff_queue_size(inferenceq) == 0) {
+        return DNN_SUCCESS;
+    }
+
+    inference = ff_queue_peek_front(inferenceq);
+    task = inference->task;
+    ctx = &task->ov_model->ctx;
 
     if (task->async) {
-        if (request->task_count < ctx->options.batch_size) {
+        if (ff_queue_size(inferenceq) < ctx->options.batch_size) {
             if (ff_safe_queue_push_front(task->ov_model->request_queue, request) < 0) {
                 av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
                 return DNN_ERROR;
@@ -430,7 +465,7 @@ static DNNReturnType execute_model_ov(RequestItem *request)
             return DNN_ERROR;
         }
         infer_completion_callback(request);
-        return task->done ? DNN_SUCCESS : DNN_ERROR;
+        return (task->inference_done == task->inference_todo) ? DNN_SUCCESS : DNN_ERROR;
     }
 }
 
@@ -484,6 +519,31 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input
     return DNN_ERROR;
 }
 
+static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, TaskItem *task, Queue *inference_queue)
+{
+    switch (func_type) {
+    case DFT_PROCESS_FRAME:
+    case DFT_ANALYTICS_DETECT:
+    {
+        InferenceItem *inference = av_malloc(sizeof(*inference));
+        if (!inference) {
+            return DNN_ERROR;
+        }
+        task->inference_todo = 1;
+        task->inference_done = 0;
+        inference->task = task;
+        if (ff_queue_push_back(inference_queue, inference) < 0) {
+            av_freep(&inference);
+            return DNN_ERROR;
+        }
+        return DNN_SUCCESS;
+    }
+    default:
+        av_assert0(!"should not reach here");
+        return DNN_ERROR;
+    }
+}
+
 static DNNReturnType get_output_ov(void *model, const char *input_name, int input_width, int input_height,
                                    const char *output_name, int *output_width, int *output_height)
 {
@@ -536,7 +596,6 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
         return DNN_ERROR;
     }
 
-    task.done = 0;
     task.do_ioproc = 0;
     task.async = 0;
     task.input_name = input_name;
@@ -545,6 +604,13 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+        av_frame_free(&out_frame);
+        av_frame_free(&in_frame);
+        av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
+        return DNN_ERROR;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_frame_free(&out_frame);
@@ -552,9 +618,8 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
-    request->tasks[request->task_count++] = &task;
 
-    ret = execute_model_ov(request);
+    ret = execute_model_ov(request, ov_model->inference_queue);
     *output_width = out_frame->width;
     *output_height = out_frame->height;
 
@@ -657,7 +722,6 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
         }
     }
 
-    task.done = 0;
     task.do_ioproc = 1;
     task.async = 0;
     task.input_name = input_name;
@@ -666,14 +730,18 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+        av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
+        return DNN_ERROR;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
-    request->tasks[request->task_count++] = &task;
 
-    return execute_model_ov(request);
+    return execute_model_ov(request, ov_model->inference_queue);
 }
 
 DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
@@ -707,7 +775,6 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
         return DNN_ERROR;
     }
 
-    task->done = 0;
     task->do_ioproc = 1;
     task->async = 1;
     task->input_name = input_name;
@@ -721,14 +788,18 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
         return DNN_ERROR;
     }
 
+    if (extract_inference_from_task(ov_model->model->func_type, task, ov_model->inference_queue) != DNN_SUCCESS) {
+        av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
+        return DNN_ERROR;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
 
-    request->tasks[request->task_count++] = task;
-    return execute_model_ov(request);
+    return execute_model_ov(request, ov_model->inference_queue);
 }
 
 DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **in, AVFrame **out)
@@ -740,7 +811,7 @@ DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **i
         return DAST_EMPTY_QUEUE;
     }
 
-    if (!task->done) {
+    if (task->inference_done != task->inference_todo) {
         return DAST_NOT_READY;
     }
 
@@ -760,21 +831,17 @@ DNNReturnType ff_dnn_flush_ov(const DNNModel *model)
     IEStatusCode status;
     DNNReturnType ret;
 
+    if (ff_queue_size(ov_model->inference_queue) == 0) {
+        // no pending task need to flush
+        return DNN_SUCCESS;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
         return DNN_ERROR;
     }
 
-    if (request->task_count == 0) {
-        // no pending task need to flush
-        if (ff_safe_queue_push_back(ov_model->request_queue, request) < 0) {
-            av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
-            return DNN_ERROR;
-        }
-        return DNN_SUCCESS;
-    }
-
     ret = fill_model_input_ov(ov_model, request);
     if (ret != DNN_SUCCESS) {
         av_log(ctx, AV_LOG_ERROR, "Failed to fill model input.\n");
@@ -803,11 +870,17 @@ void ff_dnn_free_model_ov(DNNModel **model)
             if (item && item->infer_request) {
                 ie_infer_request_free(&item->infer_request);
             }
-            av_freep(&item->tasks);
+            av_freep(&item->inferences);
             av_freep(&item);
         }
         ff_safe_queue_destroy(ov_model->request_queue);
 
+        while (ff_queue_size(ov_model->inference_queue) != 0) {
+            TaskItem *item = ff_queue_pop_front(ov_model->inference_queue);
+            av_freep(&item);
+        }
+        ff_queue_destroy(ov_model->inference_queue);
+
         while (ff_queue_size(ov_model->task_queue) != 0) {
             TaskItem *item = ff_queue_pop_front(ov_model->task_queue);
             av_frame_free(&item->in_frame);

From patchwork Thu Apr 29 13:36:54 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 27478
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a11:4023:0:0:0:0 with SMTP id ky35csp1492813pxb;
        Thu, 29 Apr 2021 06:49:46 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxY3/paJBmNoqI852SntZjj9V/YRyAqWOJlnZ5A0X6hefaGnl6Hc5MqFXAIlYFmBR47dMQ6
X-Received: by 2002:a05:6402:2713:: with SMTP id
 y19mr18230248edd.347.1619704186759;
        Thu, 29 Apr 2021 06:49:46 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1619704186; cv=none;
        d=google.com; s=arc-20160816;
        b=cqyXChODaNW/SDFK1U8P/3gY+owEo579u0M22I51pCeXb2SQBOFhfRQi+vP4rHjh7/
         pCqMX0+sI9J4GozPZYO0mqaYdlQFMPF4JrKkUB3nHJ9ryFfsP2WnCrfb6mLFpAZ55wv9
         IC+qgAa6nBhUQ0OVdhwdc7x0pFNa2rtX6zPD0AKHD66PiJxHP8+B7VmuVraCWMrWmZcc
         U8Ay3fc1PwatbB5ncZ2oIEtqD8HrCJf4DpoBEHZLm+xPS6maf+6VvyTj/CC3i6eHRCsh
         xpR1VPImVy2j644DgVJpC4O4u9eZgFQrAeZQl7BJmH2iG8cq3eG2ol6v8XhEFL2hJAhD
         cELQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=hBzuAfd/+PczA5GR0+SRSP8TDQxuKKK6VChxLXX4pgU=;
        b=gV7+giwNj0zQYGp+BYg7f62RbftmqG0GZUGs/dyu3LIIjmH41GPFup+LLDau/5t/H2
         naqg5lbygFjHOCTlboXReKFnHpmE30qK9MNCmC236L3XErftBN+gTT+n+7VBDhqVy0zm
         OylNXS9+Xr1iA/Z9LWFHbMQgjiVIK0erQ2DF+pg95o6uSdSOVSg5aaJeXFa2Ogj4Nexu
         +TulJ7LM9vgtQyi00gCZWBI+4zRSNrqcfUsEY+HgNfcyytWpXlHSydKZeo/tRPyRgv1u
         NBORkvIqk3mXLaLJHlPuw8yyjfJlM6pJg4pjZI9GbpjAAaPUkDFfW9j5EZoRk8I1ayLa
         ruiQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id e6si3177301edz.445.2021.04.29.06.49.45;
        Thu, 29 Apr 2021 06:49:46 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4C7F768A0F0;
	Thu, 29 Apr 2021 16:49:26 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EBB78689F86
 for <ffmpeg-devel@ffmpeg.org>; Thu, 29 Apr 2021 16:49:18 +0300 (EEST)
IronPort-SDR: 
 upxuubUYxNJn6sBnNn9YDp1DbXZDG+JJ4xoYGnGpbWFdZlTbtdzDP4B7SjjLbl6QkUazuoN/Qi
 jWTZ1B8hktMw==
X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="260956497"
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="260956497"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Apr 2021 06:49:10 -0700
IronPort-SDR: 
 Es0KJUylLsdTwsJmtvnzR9nGMZM2kw2/PDoquxodGy/ctJxHzo9KUmKRRygp1KJLDWeiStEquY
 r7fEuJSsmSfw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="424096056"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga008.fm.intel.com with ESMTP; 29 Apr 2021 06:49:09 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 29 Apr 2021 21:36:54 +0800
Message-Id: <20210429133657.23076-3-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210429133657.23076-1-yejun.guo@intel.com>
References: <20210429133657.23076-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH V2 3/6] lavfi/dnn_backend_openvino.c: move
 the logic for batch mode earlier
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: +uyOGg/U5dR7

---
 libavfilter/dnn/dnn_backend_openvino.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index a8a02d7589..9f3c696e0a 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -432,13 +432,6 @@ static DNNReturnType execute_model_ov(RequestItem *request, Queue *inferenceq)
     ctx = &task->ov_model->ctx;
 
     if (task->async) {
-        if (ff_queue_size(inferenceq) < ctx->options.batch_size) {
-            if (ff_safe_queue_push_front(task->ov_model->request_queue, request) < 0) {
-                av_log(ctx, AV_LOG_ERROR, "Failed to push back request_queue.\n");
-                return DNN_ERROR;
-            }
-            return DNN_SUCCESS;
-        }
         ret = fill_model_input_ov(task->ov_model, request);
         if (ret != DNN_SUCCESS) {
             return ret;
@@ -793,6 +786,11 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
         return DNN_ERROR;
     }
 
+    if (ff_queue_size(ov_model->inference_queue) < ctx->options.batch_size) {
+        // not enough inference items queued for a batch
+        return DNN_SUCCESS;
+    }
+
     request = ff_safe_queue_pop_front(ov_model->request_queue);
     if (!request) {
         av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");

From patchwork Thu Apr 29 13:36:55 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 27480
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a11:4023:0:0:0:0 with SMTP id ky35csp1492953pxb;
        Thu, 29 Apr 2021 06:49:56 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJx80ESDI17MK1OF9EnXppvPsivjmvXG1kvfUXPF5EVzFeho9y6Xz8/gtZ7m7EqQwzuX8I1w
X-Received: by 2002:a17:906:c7d1:: with SMTP id
 dc17mr3566231ejb.111.1619704195848;
        Thu, 29 Apr 2021 06:49:55 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1619704195; cv=none;
        d=google.com; s=arc-20160816;
        b=e4o+X/Grlj6Fa8GxjEIFGeMyDxKzu7PUo5GAohNwMQTfqWEwozQI82/ibra6zDqipF
         fIu42vBJIOgMJJVp+bET1Rerbwmpxmu6XNOcGQ0weSsLCThNZKAGbVW2KOd9qFcss4dT
         06irNSbo6snuVTK0Xtpkg8tkqH5ewSw/prbGWWvNqJxKjqkL/+ZxjzdjPSc6POiFh2pI
         1nkfY1UBWXK+/0ALgyy9jKMrovST4K5PSQUfkmWhYoSu8QLq9//yJgXnX7tP38ccp042
         SUhX478jLhnF8P0r/estMIf/A/ycT76nAkxQPQl3e9QK2yO7WScU6cN8k8QXX+B2EFyB
         T46g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=2ZIlAhlmPNPQcy32QVhq/8hQV0VnxHly5TdtoHHHQsQ=;
        b=Q8BtIZXEdOI16skOkXDks8ZA78+zizDFOUOwrWIkLT2gCj8xHMIPetSdXSblqJAmaj
         dyfQmQ3tpQ6n3qS0HQD8tqhVUSg8zL5cXDNT4Pz7rU2hfMgkmNFJIr/P2pc0uEr8oiFA
         KX6W+tgOMaqtNObpLVoL6hyxc0Lo+24pmUIfO1BaAL2Doumhi8eWXnYF1swaw53WEf3X
         g8FT7hEI/OPXc1FRAp2HOCb5TfROAAIYWfSJgTKLL1cp+Qv42I1RAeqEPs4zXqPEHUeK
         jcxNp9opjtLca3vQ01rdzJQWXvOiALn7vAEnYDO1O7FrHwd3cznsPCdwm/377oIokY2Q
         NMJg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id c9si3112700edv.186.2021.04.29.06.49.55;
        Thu, 29 Apr 2021 06:49:55 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4870568A11C;
	Thu, 29 Apr 2021 16:49:28 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 43D6D68A0CD
 for <ffmpeg-devel@ffmpeg.org>; Thu, 29 Apr 2021 16:49:20 +0300 (EEST)
IronPort-SDR: 
 6Qc7dcFajGRb/Cscs/DEXLa9+MuP8B59lpBM1ELF9Y7ZS3eSQbzSAbFW5IUOo/iOhhSLK7PUXD
 wU0WFbhYG2OA==
X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="260956499"
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="260956499"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Apr 2021 06:49:11 -0700
IronPort-SDR: 
 WTdA76CCmm3unrFQ/eG+J+WltmIPpuRprbMNS8my3f1ZjtbGUYun/DOZDDtW+bU+PVeQVLmQwQ
 SDTeammHuX6Q==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="424096061"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga008.fm.intel.com with ESMTP; 29 Apr 2021 06:49:10 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 29 Apr 2021 21:36:55 +0800
Message-Id: <20210429133657.23076-4-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210429133657.23076-1-yejun.guo@intel.com>
References: <20210429133657.23076-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH V2 4/6] lavfi/dnn: refine dnn interface to
 add DNNExecBaseParams
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: nDXtQVvjJqpE

Different function type of model requires different parameters, for
example, object detection detects lots of objects (cat/dog/...) in
the frame, and classifcation needs to know which object (cat or dog)
it is going to classify.

The current interface needs to add a new function with more parameters
to support new requirement, with this change, we can just add a new
struct (for example DNNExecClassifyParams) based on DNNExecBaseParams,
and so we can continue to use the current interface execute_model just
with params changed.
---
 libavfilter/dnn/Makefile               |  1 +
 libavfilter/dnn/dnn_backend_common.c   | 51 ++++++++++++++++++++++++++
 libavfilter/dnn/dnn_backend_common.h   | 31 ++++++++++++++++
 libavfilter/dnn/dnn_backend_native.c   | 15 +++-----
 libavfilter/dnn/dnn_backend_native.h   |  3 +-
 libavfilter/dnn/dnn_backend_openvino.c | 50 ++++++++-----------------
 libavfilter/dnn/dnn_backend_openvino.h |  6 +--
 libavfilter/dnn/dnn_backend_tf.c       | 18 +++------
 libavfilter/dnn/dnn_backend_tf.h       |  3 +-
 libavfilter/dnn_filter_common.c        | 20 ++++++++--
 libavfilter/dnn_interface.h            | 14 +++++--
 11 files changed, 139 insertions(+), 73 deletions(-)
 create mode 100644 libavfilter/dnn/dnn_backend_common.c
 create mode 100644 libavfilter/dnn/dnn_backend_common.h

diff --git a/libavfilter/dnn/Makefile b/libavfilter/dnn/Makefile
index d6d58f4b61..4cfbce0efc 100644
--- a/libavfilter/dnn/Makefile
+++ b/libavfilter/dnn/Makefile
@@ -2,6 +2,7 @@ OBJS-$(CONFIG_DNN)                           += dnn/dnn_interface.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_io_proc.o
 OBJS-$(CONFIG_DNN)                           += dnn/queue.o
 OBJS-$(CONFIG_DNN)                           += dnn/safe_queue.o
+OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_common.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_native.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_native_layers.o
 OBJS-$(CONFIG_DNN)                           += dnn/dnn_backend_native_layer_avgpool.o
diff --git a/libavfilter/dnn/dnn_backend_common.c b/libavfilter/dnn/dnn_backend_common.c
new file mode 100644
index 0000000000..a522ab5650
--- /dev/null
+++ b/libavfilter/dnn/dnn_backend_common.c
@@ -0,0 +1,51 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * DNN common functions different backends.
+ */
+
+#include "dnn_backend_common.h"
+
+int ff_check_exec_params(void *ctx, DNNBackendType backend, DNNFunctionType func_type, DNNExecBaseParams *exec_params)
+{
+    if (!exec_params) {
+        av_log(ctx, AV_LOG_ERROR, "exec_params is null when execute model.\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (!exec_params->in_frame) {
+        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (!exec_params->out_frame) {
+        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
+        return AVERROR(EINVAL);
+    }
+
+    if (exec_params->nb_output != 1 && backend != DNN_TF) {
+        // currently, the filter does not need multiple outputs,
+        // so we just pending the support until we really need it.
+        avpriv_report_missing_feature(ctx, "multiple outputs");
+        return AVERROR(EINVAL);
+    }
+
+    return 0;
+}
diff --git a/libavfilter/dnn/dnn_backend_common.h b/libavfilter/dnn/dnn_backend_common.h
new file mode 100644
index 0000000000..cd9c0f5339
--- /dev/null
+++ b/libavfilter/dnn/dnn_backend_common.h
@@ -0,0 +1,31 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * DNN common functions different backends.
+ */
+
+#ifndef AVFILTER_DNN_DNN_BACKEND_COMMON_H
+#define AVFILTER_DNN_DNN_BACKEND_COMMON_H
+
+#include "../dnn_interface.h"
+
+int ff_check_exec_params(void *ctx, DNNBackendType backend, DNNFunctionType func_type, DNNExecBaseParams *exec_params);
+
+#endif
diff --git a/libavfilter/dnn/dnn_backend_native.c b/libavfilter/dnn/dnn_backend_native.c
index d9762eeaf6..b5f1c16538 100644
--- a/libavfilter/dnn/dnn_backend_native.c
+++ b/libavfilter/dnn/dnn_backend_native.c
@@ -28,6 +28,7 @@
 #include "dnn_backend_native_layer_conv2d.h"
 #include "dnn_backend_native_layers.h"
 #include "dnn_io_proc.h"
+#include "dnn_backend_common.h"
 
 #define OFFSET(x) offsetof(NativeContext, x)
 #define FLAGS AV_OPT_FLAG_FILTERING_PARAM
@@ -372,23 +373,17 @@ static DNNReturnType execute_model_native(const DNNModel *model, const char *inp
     return DNN_SUCCESS;
 }
 
-DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                          const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     NativeModel *native_model = model->model;
     NativeContext *ctx = &native_model->ctx;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
+    if (ff_check_exec_params(ctx, DNN_NATIVE, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
     }
 
-    return execute_model_native(model, input_name, in_frame, output_names, nb_output, out_frame, 1);
+    return execute_model_native(model, exec_params->input_name, exec_params->in_frame,
+                                exec_params->output_names, exec_params->nb_output, exec_params->out_frame, 1);
 }
 
 int32_t ff_calculate_operand_dims_count(const DnnOperand *oprd)
diff --git a/libavfilter/dnn/dnn_backend_native.h b/libavfilter/dnn/dnn_backend_native.h
index d313c48f3a..89bcb8e358 100644
--- a/libavfilter/dnn/dnn_backend_native.h
+++ b/libavfilter/dnn/dnn_backend_native.h
@@ -130,8 +130,7 @@ typedef struct NativeModel{
 
 DNNModel *ff_dnn_load_model_native(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
 
-DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                          const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_native(const DNNModel *model, DNNExecBaseParams *exec_params);
 
 void ff_dnn_free_model_native(DNNModel **model);
 
diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index 9f3c696e0a..4e58ff6d9c 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -33,6 +33,7 @@
 #include "queue.h"
 #include "safe_queue.h"
 #include <c_api/ie_c_api.h>
+#include "dnn_backend_common.h"
 
 typedef struct OVOptions{
     char *device_type;
@@ -678,28 +679,14 @@ err:
     return NULL;
 }
 
-DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     OVModel *ov_model = model->model;
     OVContext *ctx = &ov_model->ctx;
     TaskItem task;
     RequestItem *request;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame && model->func_type == DFT_PROCESS_FRAME) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (nb_output != 1) {
-        // currently, the filter does not need multiple outputs,
-        // so we just pending the support until we really need it.
-        avpriv_report_missing_feature(ctx, "multiple outputs");
+    if (ff_check_exec_params(ctx, DNN_OV, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
     }
 
@@ -709,7 +696,7 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     }
 
     if (!ov_model->exe_network) {
-        if (init_model_ov(ov_model, input_name, output_names[0]) != DNN_SUCCESS) {
+        if (init_model_ov(ov_model, exec_params->input_name, exec_params->output_names[0]) != DNN_SUCCESS) {
             av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n");
             return DNN_ERROR;
         }
@@ -717,10 +704,10 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
 
     task.do_ioproc = 1;
     task.async = 0;
-    task.input_name = input_name;
-    task.in_frame = in_frame;
-    task.output_name = output_names[0];
-    task.out_frame = out_frame;
+    task.input_name = exec_params->input_name;
+    task.in_frame = exec_params->in_frame;
+    task.output_name = exec_params->output_names[0];
+    task.out_frame = exec_params->out_frame ? exec_params->out_frame : exec_params->in_frame;
     task.ov_model = ov_model;
 
     if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
@@ -737,26 +724,19 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_n
     return execute_model_ov(request, ov_model->inference_queue);
 }
 
-DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                            const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     OVModel *ov_model = model->model;
     OVContext *ctx = &ov_model->ctx;
     RequestItem *request;
     TaskItem *task;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when async execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame && model->func_type == DFT_PROCESS_FRAME) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when async execute model.\n");
+    if (ff_check_exec_params(ctx, DNN_OV, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
     }
 
     if (!ov_model->exe_network) {
-        if (init_model_ov(ov_model, input_name, output_names[0]) != DNN_SUCCESS) {
+        if (init_model_ov(ov_model, exec_params->input_name, exec_params->output_names[0]) != DNN_SUCCESS) {
             av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n");
             return DNN_ERROR;
         }
@@ -770,10 +750,10 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *i
 
     task->do_ioproc = 1;
     task->async = 1;
-    task->input_name = input_name;
-    task->in_frame = in_frame;
-    task->output_name = output_names[0];
-    task->out_frame = out_frame;
+    task->input_name = exec_params->input_name;
+    task->in_frame = exec_params->in_frame;
+    task->output_name = exec_params->output_names[0];
+    task->out_frame = exec_params->out_frame ? exec_params->out_frame : exec_params->in_frame;
     task->ov_model = ov_model;
     if (ff_queue_push_back(ov_model->task_queue, task) < 0) {
         av_freep(&task);
diff --git a/libavfilter/dnn/dnn_backend_openvino.h b/libavfilter/dnn/dnn_backend_openvino.h
index a484a7be32..046d0c5b5a 100644
--- a/libavfilter/dnn/dnn_backend_openvino.h
+++ b/libavfilter/dnn/dnn_backend_openvino.h
@@ -31,10 +31,8 @@
 
 DNNModel *ff_dnn_load_model_ov(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
 
-DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame);
-DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                            const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *exec_params);
+DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBaseParams *exec_params);
 DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **in, AVFrame **out);
 DNNReturnType ff_dnn_flush_ov(const DNNModel *model);
 
diff --git a/libavfilter/dnn/dnn_backend_tf.c b/libavfilter/dnn/dnn_backend_tf.c
index 076dd3d6a9..03fe310b03 100644
--- a/libavfilter/dnn/dnn_backend_tf.c
+++ b/libavfilter/dnn/dnn_backend_tf.c
@@ -34,7 +34,7 @@
 #include "dnn_backend_native_layer_pad.h"
 #include "dnn_backend_native_layer_maximum.h"
 #include "dnn_io_proc.h"
-
+#include "dnn_backend_common.h"
 #include <tensorflow/c/c_api.h>
 
 typedef struct TFOptions{
@@ -814,23 +814,17 @@ static DNNReturnType execute_model_tf(const DNNModel *model, const char *input_n
     return DNN_SUCCESS;
 }
 
-DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame)
+DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, DNNExecBaseParams *exec_params)
 {
     TFModel *tf_model = model->model;
     TFContext *ctx = &tf_model->ctx;
 
-    if (!in_frame) {
-        av_log(ctx, AV_LOG_ERROR, "in frame is NULL when execute model.\n");
-        return DNN_ERROR;
-    }
-
-    if (!out_frame) {
-        av_log(ctx, AV_LOG_ERROR, "out frame is NULL when execute model.\n");
-        return DNN_ERROR;
+    if (ff_check_exec_params(ctx, DNN_TF, model->func_type, exec_params) != 0) {
+         return DNN_ERROR;
     }
 
-    return execute_model_tf(model, input_name, in_frame, output_names, nb_output, out_frame, 1);
+    return execute_model_tf(model, exec_params->input_name, exec_params->in_frame,
+                            exec_params->output_names, exec_params->nb_output, exec_params->out_frame, 1);
 }
 
 void ff_dnn_free_model_tf(DNNModel **model)
diff --git a/libavfilter/dnn/dnn_backend_tf.h b/libavfilter/dnn/dnn_backend_tf.h
index 8cec04748e..3dfd6e4280 100644
--- a/libavfilter/dnn/dnn_backend_tf.h
+++ b/libavfilter/dnn/dnn_backend_tf.h
@@ -31,8 +31,7 @@
 
 DNNModel *ff_dnn_load_model_tf(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
 
-DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                      const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_tf(const DNNModel *model, DNNExecBaseParams *exec_params);
 
 void ff_dnn_free_model_tf(DNNModel **model);
 
diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c
index 1b922455a3..c085884eb4 100644
--- a/libavfilter/dnn_filter_common.c
+++ b/libavfilter/dnn_filter_common.c
@@ -90,14 +90,26 @@ DNNReturnType ff_dnn_get_output(DnnContext *ctx, int input_width, int input_heig
 
 DNNReturnType ff_dnn_execute_model(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame)
 {
-    return (ctx->dnn_module->execute_model)(ctx->model, ctx->model_inputname, in_frame,
-                                            (const char **)&ctx->model_outputname, 1, out_frame);
+    DNNExecBaseParams exec_params = {
+        .input_name     = ctx->model_inputname,
+        .output_names   = (const char **)&ctx->model_outputname,
+        .nb_output      = 1,
+        .in_frame       = in_frame,
+        .out_frame      = out_frame,
+    };
+    return (ctx->dnn_module->execute_model)(ctx->model, &exec_params);
 }
 
 DNNReturnType ff_dnn_execute_model_async(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame)
 {
-    return (ctx->dnn_module->execute_model_async)(ctx->model, ctx->model_inputname, in_frame,
-                                                  (const char **)&ctx->model_outputname, 1, out_frame);
+    DNNExecBaseParams exec_params = {
+        .input_name     = ctx->model_inputname,
+        .output_names   = (const char **)&ctx->model_outputname,
+        .nb_output      = 1,
+        .in_frame       = in_frame,
+        .out_frame      = out_frame,
+    };
+    return (ctx->dnn_module->execute_model_async)(ctx->model, &exec_params);
 }
 
 DNNAsyncStatusType ff_dnn_get_async_result(DnnContext *ctx, AVFrame **in_frame, AVFrame **out_frame)
diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h
index ae5a488341..941670675d 100644
--- a/libavfilter/dnn_interface.h
+++ b/libavfilter/dnn_interface.h
@@ -63,6 +63,14 @@ typedef struct DNNData{
     DNNColorOrder order;
 } DNNData;
 
+typedef struct DNNExecBaseParams {
+    const char *input_name;
+    const char **output_names;
+    uint32_t nb_output;
+    AVFrame *in_frame;
+    AVFrame *out_frame;
+} DNNExecBaseParams;
+
 typedef int (*FramePrePostProc)(AVFrame *frame, DNNData *model, AVFilterContext *filter_ctx);
 typedef int (*DetectPostProc)(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx);
 
@@ -96,11 +104,9 @@ typedef struct DNNModule{
     // Loads model and parameters from given file. Returns NULL if it is not possible.
     DNNModel *(*load_model)(const char *model_filename, DNNFunctionType func_type, const char *options, AVFilterContext *filter_ctx);
     // Executes model with specified input and output. Returns DNN_ERROR otherwise.
-    DNNReturnType (*execute_model)(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                   const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+    DNNReturnType (*execute_model)(const DNNModel *model, DNNExecBaseParams *exec_params);
     // Executes model with specified input and output asynchronously. Returns DNN_ERROR otherwise.
-    DNNReturnType (*execute_model_async)(const DNNModel *model, const char *input_name, AVFrame *in_frame,
-                                         const char **output_names, uint32_t nb_output, AVFrame *out_frame);
+    DNNReturnType (*execute_model_async)(const DNNModel *model, DNNExecBaseParams *exec_params);
     // Retrieve inference result.
     DNNAsyncStatusType (*get_async_result)(const DNNModel *model, AVFrame **in, AVFrame **out);
     // Flush all the pending tasks.

From patchwork Thu Apr 29 13:36:56 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 27476
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a11:4023:0:0:0:0 with SMTP id ky35csp1493149pxb;
        Thu, 29 Apr 2021 06:50:07 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxgPkm9MmBCEGMTdJCpkq1NaMzHx1eRdL1oN+LCurvgduMUnWdoUEX1jHMB1k+qLIzZHcxI
X-Received: by 2002:a17:906:5a83:: with SMTP id
 l3mr34734995ejq.50.1619704206831;
        Thu, 29 Apr 2021 06:50:06 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1619704206; cv=none;
        d=google.com; s=arc-20160816;
        b=08TU8XW+7xBudwKpxmMkyQ13Dt5B11PoXOcPQGT5PAvpchAlQ+wsUCOQuddwmy1KWn
         rKsRhI5axaJ9lK+Hm3F9yLE4+OHgoCIl4bhav5mgzBZ2d/twbxolUMzZBLx2q2UHRhfp
         6GCJ0cGF8hiwS5WnH+xQPwsh3rg/jQCzYVPXOzc3HpIreOlbDL1HJNZsIvhz+eeyUP3C
         jlUtJKgUi13Q0I5Qgqop96pQwYjW/JGw9o/p2sH1QSxWjPZ2IWaMwO72hN3NL4bgn02c
         ctv3cg9oqdvbYQk1BrX//UjffAaBCZrbr2vBIQycyYAiU0UjWRYWJZraROPoFM8aYmtw
         oE6A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=j+AZXP17gl3JyByw31Wqk5lgB2rl185EtMdRGvi2hK4=;
        b=u0HbeGAINGw4fffrmeLWOSnW7y0yrDC43xoolgBbgIWAEvWQb91YOqFZ6e2aAVRQCO
         YFao4OOUWOOzm7K6Y2GFsqnxlUlQkDbftgIXmwBIMzoJ61hu+UPgR9XZPAjLKG2r8RNm
         KqpnUCH+5qnnpm8GJusidxC0w/A9aZE7TCQ0fnNDE9eEaSloJa3gJjgJVhtnC3jy9CEQ
         h5BhPelN9lREbtV0Wg+FYwCcucY9JaCWE0qvqP+ug2PeKj1Yzaae831laXPlyMNoqOUs
         QvgVSTpD1xOTsdMIv1VGOrHHYfdwicb6gFu1XyHma4sOcQlKlnV3ecKwPmEZtHucmBFe
         k9jQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id i25si59771ejc.18.2021.04.29.06.50.05;
        Thu, 29 Apr 2021 06:50:06 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 60CFC68A24C;
	Thu, 29 Apr 2021 16:49:29 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 88FC868A0D3
 for <ffmpeg-devel@ffmpeg.org>; Thu, 29 Apr 2021 16:49:20 +0300 (EEST)
IronPort-SDR: 
 oatIjSwVcV14Qc6MnQyKIVK1XXMASSxMvf/a/4DiieiY7AB2aIXkSIYJPQ1Z5aZwqHi9/Ce+IA
 D49JLixDqN7A==
X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="260956502"
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="260956502"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Apr 2021 06:49:12 -0700
IronPort-SDR: 
 0NWM39uNxNbg7to2DY2RVDY7tIp5iSO3KfBG3Y0ccgDUVVlmIPk6BoaY+rMcMeiBO+OeB6SLZQ
 moNIKgBA20bQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="424096074"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga008.fm.intel.com with ESMTP; 29 Apr 2021 06:49:11 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 29 Apr 2021 21:36:56 +0800
Message-Id: <20210429133657.23076-5-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210429133657.23076-1-yejun.guo@intel.com>
References: <20210429133657.23076-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH V2 5/6] lavfi/dnn: add classify support with
 openvino backend
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: d3ce6Cse9cXz

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 libavfilter/dnn/dnn_backend_openvino.c | 143 +++++++++++++++++++++----
 libavfilter/dnn/dnn_io_proc.c          |  60 +++++++++++
 libavfilter/dnn/dnn_io_proc.h          |   1 +
 libavfilter/dnn_filter_common.c        |  21 ++++
 libavfilter/dnn_filter_common.h        |   2 +
 libavfilter/dnn_interface.h            |  10 +-
 6 files changed, 218 insertions(+), 19 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c
index 4e58ff6d9c..1ff8a720b9 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -29,6 +29,7 @@
 #include "libavutil/avassert.h"
 #include "libavutil/opt.h"
 #include "libavutil/avstring.h"
+#include "libavutil/detection_bbox.h"
 #include "../internal.h"
 #include "queue.h"
 #include "safe_queue.h"
@@ -74,6 +75,7 @@ typedef struct TaskItem {
 // one task might have multiple inferences
 typedef struct InferenceItem {
     TaskItem *task;
+    uint32_t bbox_index;
 } InferenceItem;
 
 // one request for one call to openvino
@@ -182,12 +184,23 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request
         request->inferences[i] = inference;
         request->inference_count = i + 1;
         task = inference->task;
-        if (task->do_ioproc) {
-            if (ov_model->model->frame_pre_proc != NULL) {
-                ov_model->model->frame_pre_proc(task->in_frame, &input, ov_model->model->filter_ctx);
-            } else {
-                ff_proc_from_frame_to_dnn(task->in_frame, &input, ov_model->model->func_type, ctx);
+        switch (task->ov_model->model->func_type) {
+        case DFT_PROCESS_FRAME:
+        case DFT_ANALYTICS_DETECT:
+            if (task->do_ioproc) {
+                if (ov_model->model->frame_pre_proc != NULL) {
+                    ov_model->model->frame_pre_proc(task->in_frame, &input, ov_model->model->filter_ctx);
+                } else {
+                    ff_proc_from_frame_to_dnn(task->in_frame, &input, ov_model->model->func_type, ctx);
+                }
             }
+            break;
+        case DFT_ANALYTICS_CLASSIFY:
+            ff_frame_to_dnn_classify(task->in_frame, &input, inference->bbox_index, ctx);
+            break;
+        default:
+            av_assert0(!"should not reach here");
+            break;
         }
         input.data = (uint8_t *)input.data
                      + input.width * input.height * input.channels * get_datatype_size(input.dt);
@@ -276,6 +289,13 @@ static void infer_completion_callback(void *args)
             }
             task->ov_model->model->detect_post_proc(task->out_frame, &output, 1, task->ov_model->model->filter_ctx);
             break;
+        case DFT_ANALYTICS_CLASSIFY:
+            if (!task->ov_model->model->classify_post_proc) {
+                av_log(ctx, AV_LOG_ERROR, "classify filter needs to provide post proc\n");
+                return;
+            }
+            task->ov_model->model->classify_post_proc(task->out_frame, &output, request->inferences[i]->bbox_index, task->ov_model->model->filter_ctx);
+            break;
         default:
             av_assert0(!"should not reach here");
             break;
@@ -513,7 +533,44 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input
     return DNN_ERROR;
 }
 
-static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, TaskItem *task, Queue *inference_queue)
+static int contain_valid_detection_bbox(AVFrame *frame)
+{
+    AVFrameSideData *sd;
+    const AVDetectionBBoxHeader *header;
+    const AVDetectionBBox *bbox;
+
+    sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    if (!sd) { // this frame has nothing detected
+        return 0;
+    }
+
+    if (!sd->size) {
+        return 0;
+    }
+
+    header = (const AVDetectionBBoxHeader *)sd->data;
+    if (!header->nb_bboxes) {
+        return 0;
+    }
+
+    for (uint32_t i = 0; i < header->nb_bboxes; i++) {
+        bbox = av_get_detection_bbox(header, i);
+        if (bbox->x < 0 || bbox->w < 0 || bbox->x + bbox->w >= frame->width) {
+            return 0;
+        }
+        if (bbox->y < 0 || bbox->h < 0 || bbox->y + bbox->h >= frame->width) {
+            return 0;
+        }
+
+        if (bbox->classify_count == AV_NUM_DETECTION_BBOX_CLASSIFY) {
+            return 0;
+        }
+    }
+
+    return 1;
+}
+
+static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, TaskItem *task, Queue *inference_queue, DNNExecBaseParams *exec_params)
 {
     switch (func_type) {
     case DFT_PROCESS_FRAME:
@@ -532,6 +589,45 @@ static DNNReturnType extract_inference_from_task(DNNFunctionType func_type, Task
         }
         return DNN_SUCCESS;
     }
+    case DFT_ANALYTICS_CLASSIFY:
+    {
+        const AVDetectionBBoxHeader *header;
+        AVFrame *frame = task->in_frame;
+        AVFrameSideData *sd;
+        DNNExecClassificationParams *params = (DNNExecClassificationParams *)exec_params;
+
+        task->inference_todo = 0;
+        task->inference_done = 0;
+
+        if (!contain_valid_detection_bbox(frame)) {
+            return DNN_SUCCESS;
+        }
+
+        sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+        header = (const AVDetectionBBoxHeader *)sd->data;
+
+        for (uint32_t i = 0; i < header->nb_bboxes; i++) {
+            InferenceItem *inference;
+            const AVDetectionBBox *bbox = av_get_detection_bbox(header, i);
+
+            if (av_strncasecmp(bbox->detect_label, params->target, sizeof(bbox->detect_label)) != 0) {
+                continue;
+            }
+
+            inference = av_malloc(sizeof(*inference));
+            if (!inference) {
+                return DNN_ERROR;
+            }
+            task->inference_todo++;
+            inference->task = task;
+            inference->bbox_index = i;
+            if (ff_queue_push_back(inference_queue, inference) < 0) {
+                av_freep(&inference);
+                return DNN_ERROR;
+            }
+        }
+        return DNN_SUCCESS;
+    }
     default:
         av_assert0(!"should not reach here");
         return DNN_ERROR;
@@ -598,7 +694,7 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu
     task.out_frame = out_frame;
     task.ov_model = ov_model;
 
-    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue, NULL) != DNN_SUCCESS) {
         av_frame_free(&out_frame);
         av_frame_free(&in_frame);
         av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
@@ -690,6 +786,14 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *
         return DNN_ERROR;
     }
 
+    if (model->func_type == DFT_ANALYTICS_CLASSIFY) {
+        // Once we add async support for tensorflow backend and native backend,
+        // we'll combine the two sync/async functions in dnn_interface.h to
+        // simplify the code in filter, and async will be an option within backends.
+        // so, do not support now, and classify filter will not call this function.
+        return DNN_ERROR;
+    }
+
     if (ctx->options.batch_size > 1) {
         avpriv_report_missing_feature(ctx, "batch mode for sync execution");
         return DNN_ERROR;
@@ -710,7 +814,7 @@ DNNReturnType ff_dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *
     task.out_frame = exec_params->out_frame ? exec_params->out_frame : exec_params->in_frame;
     task.ov_model = ov_model;
 
-    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue) != DNN_SUCCESS) {
+    if (extract_inference_from_task(ov_model->model->func_type, &task, ov_model->inference_queue, exec_params) != DNN_SUCCESS) {
         av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
         return DNN_ERROR;
     }
@@ -730,6 +834,7 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBasePa
     OVContext *ctx = &ov_model->ctx;
     RequestItem *request;
     TaskItem *task;
+    DNNReturnType ret;
 
     if (ff_check_exec_params(ctx, DNN_OV, model->func_type, exec_params) != 0) {
         return DNN_ERROR;
@@ -761,23 +866,25 @@ DNNReturnType ff_dnn_execute_model_async_ov(const DNNModel *model, DNNExecBasePa
         return DNN_ERROR;
     }
 
-    if (extract_inference_from_task(ov_model->model->func_type, task, ov_model->inference_queue) != DNN_SUCCESS) {
+    if (extract_inference_from_task(model->func_type, task, ov_model->inference_queue, exec_params) != DNN_SUCCESS) {
         av_log(ctx, AV_LOG_ERROR, "unable to extract inference from task.\n");
         return DNN_ERROR;
     }
 
-    if (ff_queue_size(ov_model->inference_queue) < ctx->options.batch_size) {
-        // not enough inference items queued for a batch
-        return DNN_SUCCESS;
-    }
+    while (ff_queue_size(ov_model->inference_queue) >= ctx->options.batch_size) {
+        request = ff_safe_queue_pop_front(ov_model->request_queue);
+        if (!request) {
+            av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
+            return DNN_ERROR;
+        }
 
-    request = ff_safe_queue_pop_front(ov_model->request_queue);
-    if (!request) {
-        av_log(ctx, AV_LOG_ERROR, "unable to get infer request.\n");
-        return DNN_ERROR;
+        ret = execute_model_ov(request, ov_model->inference_queue);
+        if (ret != DNN_SUCCESS) {
+            return ret;
+        }
     }
 
-    return execute_model_ov(request, ov_model->inference_queue);
+    return DNN_SUCCESS;
 }
 
 DNNAsyncStatusType ff_dnn_get_async_result_ov(const DNNModel *model, AVFrame **in, AVFrame **out)
diff --git a/libavfilter/dnn/dnn_io_proc.c b/libavfilter/dnn/dnn_io_proc.c
index e104cc5064..5f60d68078 100644
--- a/libavfilter/dnn/dnn_io_proc.c
+++ b/libavfilter/dnn/dnn_io_proc.c
@@ -22,6 +22,7 @@
 #include "libavutil/imgutils.h"
 #include "libswscale/swscale.h"
 #include "libavutil/avassert.h"
+#include "libavutil/detection_bbox.h"
 
 DNNReturnType ff_proc_from_dnn_to_frame(AVFrame *frame, DNNData *output, void *log_ctx)
 {
@@ -175,6 +176,65 @@ static enum AVPixelFormat get_pixel_format(DNNData *data)
     return AV_PIX_FMT_BGR24;
 }
 
+DNNReturnType ff_frame_to_dnn_classify(AVFrame *frame, DNNData *input, uint32_t bbox_index, void *log_ctx)
+{
+    const AVPixFmtDescriptor *desc;
+    int offsetx[4], offsety[4];
+    uint8_t *bbox_data[4];
+    struct SwsContext *sws_ctx;
+    int linesizes[4];
+    enum AVPixelFormat fmt;
+    int left, top, width, height;
+    const AVDetectionBBoxHeader *header;
+    const AVDetectionBBox *bbox;
+    AVFrameSideData *sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    av_assert0(sd);
+
+    header = (const AVDetectionBBoxHeader *)sd->data;
+    bbox = av_get_detection_bbox(header, bbox_index);
+
+    left = bbox->x;
+    width = bbox->w;
+    top = bbox->y;
+    height = bbox->h;
+
+    fmt = get_pixel_format(input);
+    sws_ctx = sws_getContext(width, height, frame->format,
+                             input->width, input->height, fmt,
+                             SWS_FAST_BILINEAR, NULL, NULL, NULL);
+    if (!sws_ctx) {
+        av_log(log_ctx, AV_LOG_ERROR, "Failed to create scale context for the conversion "
+               "fmt:%s s:%dx%d -> fmt:%s s:%dx%d\n",
+               av_get_pix_fmt_name(frame->format), width, height,
+               av_get_pix_fmt_name(fmt), input->width, input->height);
+        return DNN_ERROR;
+    }
+
+    if (av_image_fill_linesizes(linesizes, fmt, input->width) < 0) {
+        av_log(log_ctx, AV_LOG_ERROR, "unable to get linesizes with av_image_fill_linesizes");
+        sws_freeContext(sws_ctx);
+        return DNN_ERROR;
+    }
+
+    desc = av_pix_fmt_desc_get(frame->format);
+    offsetx[1] = offsetx[2] = AV_CEIL_RSHIFT(left, desc->log2_chroma_w);
+    offsetx[0] = offsetx[3] = left;
+
+    offsety[1] = offsety[2] = AV_CEIL_RSHIFT(top, desc->log2_chroma_h);
+    offsety[0] = offsety[3] = top;
+
+    for (int k = 0; frame->data[k]; k++)
+        bbox_data[k] = frame->data[k] + offsety[k] * frame->linesize[k] + offsetx[k];
+
+    sws_scale(sws_ctx, (const uint8_t *const *)&bbox_data, frame->linesize,
+                       0, height,
+                       (uint8_t *const *)(&input->data), linesizes);
+
+    sws_freeContext(sws_ctx);
+
+    return DNN_SUCCESS;
+}
+
 static DNNReturnType proc_from_frame_to_dnn_analytics(AVFrame *frame, DNNData *input, void *log_ctx)
 {
     struct SwsContext *sws_ctx;
diff --git a/libavfilter/dnn/dnn_io_proc.h b/libavfilter/dnn/dnn_io_proc.h
index 91ad3cb261..16dcdd6d1a 100644
--- a/libavfilter/dnn/dnn_io_proc.h
+++ b/libavfilter/dnn/dnn_io_proc.h
@@ -32,5 +32,6 @@
 
 DNNReturnType ff_proc_from_frame_to_dnn(AVFrame *frame, DNNData *input, DNNFunctionType func_type, void *log_ctx);
 DNNReturnType ff_proc_from_dnn_to_frame(AVFrame *frame, DNNData *output, void *log_ctx);
+DNNReturnType ff_frame_to_dnn_classify(AVFrame *frame, DNNData *input, uint32_t bbox_index, void *log_ctx);
 
 #endif
diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c
index c085884eb4..52c7a5392a 100644
--- a/libavfilter/dnn_filter_common.c
+++ b/libavfilter/dnn_filter_common.c
@@ -77,6 +77,12 @@ int ff_dnn_set_detect_post_proc(DnnContext *ctx, DetectPostProc post_proc)
     return 0;
 }
 
+int ff_dnn_set_classify_post_proc(DnnContext *ctx, ClassifyPostProc post_proc)
+{
+    ctx->model->classify_post_proc = post_proc;
+    return 0;
+}
+
 DNNReturnType ff_dnn_get_input(DnnContext *ctx, DNNData *input)
 {
     return ctx->model->get_input(ctx->model->model, input, ctx->model_inputname);
@@ -112,6 +118,21 @@ DNNReturnType ff_dnn_execute_model_async(DnnContext *ctx, AVFrame *in_frame, AVF
     return (ctx->dnn_module->execute_model_async)(ctx->model, &exec_params);
 }
 
+DNNReturnType ff_dnn_execute_model_classification(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame, char *target)
+{
+    DNNExecClassificationParams class_params = {
+        {
+            .input_name     = ctx->model_inputname,
+            .output_names   = (const char **)&ctx->model_outputname,
+            .nb_output      = 1,
+            .in_frame       = in_frame,
+            .out_frame      = out_frame,
+        },
+        .target = target,
+    };
+    return (ctx->dnn_module->execute_model_async)(ctx->model, &class_params.base);
+}
+
 DNNAsyncStatusType ff_dnn_get_async_result(DnnContext *ctx, AVFrame **in_frame, AVFrame **out_frame)
 {
     return (ctx->dnn_module->get_async_result)(ctx->model, in_frame, out_frame);
diff --git a/libavfilter/dnn_filter_common.h b/libavfilter/dnn_filter_common.h
index 8deb18b39a..e7736d2bac 100644
--- a/libavfilter/dnn_filter_common.h
+++ b/libavfilter/dnn_filter_common.h
@@ -50,10 +50,12 @@ typedef struct DnnContext {
 int ff_dnn_init(DnnContext *ctx, DNNFunctionType func_type, AVFilterContext *filter_ctx);
 int ff_dnn_set_frame_proc(DnnContext *ctx, FramePrePostProc pre_proc, FramePrePostProc post_proc);
 int ff_dnn_set_detect_post_proc(DnnContext *ctx, DetectPostProc post_proc);
+int ff_dnn_set_classify_post_proc(DnnContext *ctx, ClassifyPostProc post_proc);
 DNNReturnType ff_dnn_get_input(DnnContext *ctx, DNNData *input);
 DNNReturnType ff_dnn_get_output(DnnContext *ctx, int input_width, int input_height, int *output_width, int *output_height);
 DNNReturnType ff_dnn_execute_model(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame);
 DNNReturnType ff_dnn_execute_model_async(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame);
+DNNReturnType ff_dnn_execute_model_classification(DnnContext *ctx, AVFrame *in_frame, AVFrame *out_frame, char *target);
 DNNAsyncStatusType ff_dnn_get_async_result(DnnContext *ctx, AVFrame **in_frame, AVFrame **out_frame);
 DNNReturnType ff_dnn_flush(DnnContext *ctx);
 void ff_dnn_uninit(DnnContext *ctx);
diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h
index 941670675d..799244ee14 100644
--- a/libavfilter/dnn_interface.h
+++ b/libavfilter/dnn_interface.h
@@ -52,7 +52,7 @@ typedef enum {
     DFT_NONE,
     DFT_PROCESS_FRAME,      // process the whole frame
     DFT_ANALYTICS_DETECT,   // detect from the whole frame
-    // we can add more such as detect_from_crop, classify_from_bbox, etc.
+    DFT_ANALYTICS_CLASSIFY, // classify for each bounding box
 }DNNFunctionType;
 
 typedef struct DNNData{
@@ -71,8 +71,14 @@ typedef struct DNNExecBaseParams {
     AVFrame *out_frame;
 } DNNExecBaseParams;
 
+typedef struct DNNExecClassificationParams {
+    DNNExecBaseParams base;
+    const char *target;
+} DNNExecClassificationParams;
+
 typedef int (*FramePrePostProc)(AVFrame *frame, DNNData *model, AVFilterContext *filter_ctx);
 typedef int (*DetectPostProc)(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx);
+typedef int (*ClassifyPostProc)(AVFrame *frame, DNNData *output, uint32_t bbox_index, AVFilterContext *filter_ctx);
 
 typedef struct DNNModel{
     // Stores model that can be different for different backends.
@@ -97,6 +103,8 @@ typedef struct DNNModel{
     FramePrePostProc frame_post_proc;
     // set the post process to interpret detect result from DNNData
     DetectPostProc detect_post_proc;
+    // set the post process to interpret classify result from DNNData
+    ClassifyPostProc classify_post_proc;
 } DNNModel;
 
 // Stores pointers to functions for loading, executing, freeing DNN models for one of the backends.

From patchwork Thu Apr 29 13:36:57 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Guo, Yejun" <yejun.guo@intel.com>
X-Patchwork-Id: 27481
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a11:4023:0:0:0:0 with SMTP id ky35csp1493331pxb;
        Thu, 29 Apr 2021 06:50:18 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJxHjRz2LGufx/dinE+faxxJLQbkaS/ADWe/eQ01zxnBWAZRRqn8ONB017taHP7k3OZAaoLA
X-Received: by 2002:a17:906:3da9:: with SMTP id
 y9mr7443921ejh.303.1619704218159;
        Thu, 29 Apr 2021 06:50:18 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1619704218; cv=none;
        d=google.com; s=arc-20160816;
        b=ye0b1FJbLXRqqOCeCSGQwYD5PaCRg45mof37oM7lqEOrifbTgEjbGDCa6iD543DT7+
         F0CbGZE9OheBuGPUQ16BDCBUvUZAK4q9rO5b686yOEa3UgXYeon3ERfhYKn3ORH5lHqc
         Z1px+17I1nH6HWaKmF2XsnyZ+0MgZrcNfCfXghtzH0g4cNBTa8Ihpmd8ZKBO8b0jpDlr
         LlJn0Xs6lCVdFGH/aB1A88awX+Z2h4Mob0GDXpgiN/6sONvl0yza3bQ1EoKFNNQ6erfe
         0ZvQTX6ddkwGzg/pRJYJmryo+2w7df7i63HJt7+dM/BrxNa/LN9N0bdtm1g5NU3qklNR
         nwoA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:references:in-reply-to:message-id:date
         :to:from:ironport-sdr:ironport-sdr:delivered-to;
        bh=VF+FCfXjDk6JzVrslEmZ9+ofiwHj0KIwx49snfmmhI4=;
        b=bRa8ycotnrAQEM4c9eNh5TXNK3UpDg4eRxtZ8HxUqESmq0TOmCLor0AP/G2NwULyEz
         +SKU5AXayOKrwObREzX0jXKVbVoS+y91k+8+Ww7DbR1QTkRNBkIgOXB9IXeR+zGqCOSq
         5sU0PZru8N39NJbhYlqLdJq8/g/3B7vqP0Lfc+bdLJgsr3tCZnuL8bSRpDlZZ4expAHz
         CAJPTxGBFjkOPvxYivBqwjjDbLn2Bs/hIWCDXqxP5bndLXyd9mo8OdN1ZiTea+5bpP2N
         Tl+gpmouL2iV8K335M0ig2H7HBHdcBtX1xBkWSRPoRaADAeJ5wU4gYZKVQkTVvEDfoH4
         Bcug==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 v13si3128943edl.526.2021.04.29.06.50.17;
        Thu, 29 Apr 2021 06:50:18 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 740C768A2A2;
	Thu, 29 Apr 2021 16:49:32 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E946068A27E
 for <ffmpeg-devel@ffmpeg.org>; Thu, 29 Apr 2021 16:49:24 +0300 (EEST)
IronPort-SDR: 
 Yb1KdFokxyM1G/pY+RaI7C2zJsiVc6cmwomSrAoa93u/iVEP2VmAg13vrD8Wz5LPMhMEgaFXgw
 I4aZaBv0IQJw==
X-IronPort-AV: E=McAfee;i="6200,9189,9969"; a="260956505"
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="260956505"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Apr 2021 06:49:13 -0700
IronPort-SDR: 
 cXz/nkZmTVbTMTP0/xKC7kQBOMJGE5riPfxhtMXOy2eKJi1uXp+I9i3nywBULqNJNOjXyIZksU
 +onC1MEmjKug==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,259,1613462400"; d="scan'208";a="424096084"
Received: from yguo18-skl-u1604.sh.intel.com ([10.239.159.53])
 by fmsmga008.fm.intel.com with ESMTP; 29 Apr 2021 06:49:12 -0700
From: "Guo, Yejun" <yejun.guo@intel.com>
To: ffmpeg-devel@ffmpeg.org
Date: Thu, 29 Apr 2021 21:36:57 +0800
Message-Id: <20210429133657.23076-6-yejun.guo@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210429133657.23076-1-yejun.guo@intel.com>
References: <20210429133657.23076-1-yejun.guo@intel.com>
Subject: [FFmpeg-devel] [PATCH V2 6/6] lavfi/dnn_classify: add filter
 dnn_classify for classification based on detection bounding boxes
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yejun.guo@intel.com
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: upF2GIlwp9MN

classification is done on every detection bounding box in frame's side data,
which are the results of object detection (filter dnn_detect).

Please refer to commit log of dnn_detect for the material for detection,
and see below for classification.

- download material for classifcation:
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.bin
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.xml
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.label

- run command as:
./ffmpeg -i cici.jpg -vf dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:input=data:output=detection_out:confidence=0.6:labels=face-detection-adas-0001.label,dnn_classify=dnn_backend=openvino:model=emotions-recognition-retail-0003.xml:input=data:output=prob_emotion:confidence=0.3:labels=emotions-recognition-retail-0003.label:target=face,showinfo -f null -

We'll see the detect&classify result as below:
[Parsed_showinfo_2 @ 0x55b7d25e77c0]   side data - detection bounding boxes:
[Parsed_showinfo_2 @ 0x55b7d25e77c0] source: face-detection-adas-0001.xml, emotions-recognition-retail-0003.xml
[Parsed_showinfo_2 @ 0x55b7d25e77c0] index: 0,  region: (1005, 813) -> (1086, 905), label: face, confidence: 10000/10000.
[Parsed_showinfo_2 @ 0x55b7d25e77c0]            classify:  label: happy, confidence: 6757/10000.
[Parsed_showinfo_2 @ 0x55b7d25e77c0] index: 1,  region: (888, 839) -> (967, 926), label: face, confidence: 6917/10000.
[Parsed_showinfo_2 @ 0x55b7d25e77c0]            classify:  label: anger, confidence: 4320/10000.

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
the main change of V2 in this patch set is to rebase with latest code
by resolving the conflicts.

 configure                     |   1 +
 doc/filters.texi              |  39 ++++
 libavfilter/Makefile          |   1 +
 libavfilter/allfilters.c      |   1 +
 libavfilter/vf_dnn_classify.c | 330 ++++++++++++++++++++++++++++++++++
 5 files changed, 372 insertions(+)
 create mode 100644 libavfilter/vf_dnn_classify.c

diff --git a/configure b/configure
index 820f719a32..9f2dfaf2d4 100755
--- a/configure
+++ b/configure
@@ -3550,6 +3550,7 @@ derain_filter_select="dnn"
 deshake_filter_select="pixelutils"
 deshake_opencl_filter_deps="opencl"
 dilation_opencl_filter_deps="opencl"
+dnn_classify_filter_select="dnn"
 dnn_detect_filter_select="dnn"
 dnn_processing_filter_select="dnn"
 drawtext_filter_deps="libfreetype"
diff --git a/doc/filters.texi b/doc/filters.texi
index 36e35a175b..b405cc5dfb 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -10127,6 +10127,45 @@ ffmpeg -i INPUT -f lavfi -i nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2
 @end example
 @end itemize
 
+@section dnn_classify
+
+Do classification with deep neural networks based on bounding boxes.
+
+The filter accepts the following options:
+
+@table @option
+@item dnn_backend
+Specify which DNN backend to use for model loading and execution. This option accepts
+only openvino now, tensorflow backends will be added.
+
+@item model
+Set path to model file specifying network architecture and its parameters.
+Note that different backends use different file formats.
+
+@item input
+Set the input name of the dnn network.
+
+@item output
+Set the output name of the dnn network.
+
+@item confidence
+Set the confidence threshold (default: 0.5).
+
+@item labels
+Set path to label file specifying the mapping between label id and name.
+Each label name is written in one line, tailing spaces and empty lines are skipped.
+The first line is the name of label id 0,
+and the second line is the name of label id 1, etc.
+The label id is considered as name if the label file is not provided.
+
+@item backend_configs
+Set the configs to be passed into backend
+
+For tensorflow backend, you can set its configs with @option{sess_config} options,
+please use tools/python/tf_sess_config.py to get the configs for your system.
+
+@end table
+
 @section dnn_detect
 
 Do object detection with deep neural networks.
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 5a287364b0..6c22d0404e 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -243,6 +243,7 @@ OBJS-$(CONFIG_DILATION_FILTER)               += vf_neighbor.o
 OBJS-$(CONFIG_DILATION_OPENCL_FILTER)        += vf_neighbor_opencl.o opencl.o \
                                                 opencl/neighbor.o
 OBJS-$(CONFIG_DISPLACE_FILTER)               += vf_displace.o framesync.o
+OBJS-$(CONFIG_DNN_CLASSIFY_FILTER)           += vf_dnn_classify.o
 OBJS-$(CONFIG_DNN_DETECT_FILTER)             += vf_dnn_detect.o
 OBJS-$(CONFIG_DNN_PROCESSING_FILTER)         += vf_dnn_processing.o
 OBJS-$(CONFIG_DOUBLEWEAVE_FILTER)            += vf_weave.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 931d7dbb0d..87c3661cf4 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -229,6 +229,7 @@ extern const AVFilter ff_vf_detelecine;
 extern const AVFilter ff_vf_dilation;
 extern const AVFilter ff_vf_dilation_opencl;
 extern const AVFilter ff_vf_displace;
+extern const AVFilter ff_vf_dnn_classify;
 extern const AVFilter ff_vf_dnn_detect;
 extern const AVFilter ff_vf_dnn_processing;
 extern const AVFilter ff_vf_doubleweave;
diff --git a/libavfilter/vf_dnn_classify.c b/libavfilter/vf_dnn_classify.c
new file mode 100644
index 0000000000..18fcd452d0
--- /dev/null
+++ b/libavfilter/vf_dnn_classify.c
@@ -0,0 +1,330 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * implementing an classification filter using deep learning networks.
+ */
+
+#include "libavformat/avio.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/avassert.h"
+#include "libavutil/imgutils.h"
+#include "filters.h"
+#include "dnn_filter_common.h"
+#include "formats.h"
+#include "internal.h"
+#include "libavutil/time.h"
+#include "libavutil/avstring.h"
+#include "libavutil/detection_bbox.h"
+
+typedef struct DnnClassifyContext {
+    const AVClass *class;
+    DnnContext dnnctx;
+    float confidence;
+    char *labels_filename;
+    char *target;
+    char **labels;
+    int label_count;
+} DnnClassifyContext;
+
+#define OFFSET(x) offsetof(DnnClassifyContext, dnnctx.x)
+#define OFFSET2(x) offsetof(DnnClassifyContext, x)
+#define FLAGS AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
+static const AVOption dnn_classify_options[] = {
+    { "dnn_backend", "DNN backend",                OFFSET(backend_type),     AV_OPT_TYPE_INT,       { .i64 = 2 },    INT_MIN, INT_MAX, FLAGS, "backend" },
+#if (CONFIG_LIBOPENVINO == 1)
+    { "openvino",    "openvino backend flag",      0,                        AV_OPT_TYPE_CONST,     { .i64 = 2 },    0, 0, FLAGS, "backend" },
+#endif
+    DNN_COMMON_OPTIONS
+    { "confidence",  "threshold of confidence",    OFFSET2(confidence),      AV_OPT_TYPE_FLOAT,     { .dbl = 0.5 },  0, 1, FLAGS},
+    { "labels",      "path to labels file",        OFFSET2(labels_filename), AV_OPT_TYPE_STRING,    { .str = NULL }, 0, 0, FLAGS },
+    { "target",      "which one to be classified", OFFSET2(target),          AV_OPT_TYPE_STRING,    { .str = NULL }, 0, 0, FLAGS },
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(dnn_classify);
+
+static int dnn_classify_post_proc(AVFrame *frame, DNNData *output, uint32_t bbox_index, AVFilterContext *filter_ctx)
+{
+    DnnClassifyContext *ctx = filter_ctx->priv;
+    float conf_threshold = ctx->confidence;
+    AVDetectionBBoxHeader *header;
+    AVDetectionBBox *bbox;
+    float *classifications;
+    uint32_t label_id;
+    float confidence;
+    AVFrameSideData *sd;
+
+    if (output->channels <= 0) {
+        return -1;
+    }
+
+    sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES);
+    header = (AVDetectionBBoxHeader *)sd->data;
+
+    if (bbox_index == 0) {
+        av_strlcat(header->source, ", ", sizeof(header->source));
+        av_strlcat(header->source, ctx->dnnctx.model_filename, sizeof(header->source));
+    }
+
+    classifications = output->data;
+    label_id = 0;
+    confidence= classifications[0];
+    for (int i = 1; i < output->channels; i++) {
+        if (classifications[i] > confidence) {
+            label_id = i;
+            confidence= classifications[i];
+        }
+    }
+
+    if (confidence < conf_threshold) {
+        return 0;
+    }
+
+    bbox = av_get_detection_bbox(header, bbox_index);
+    bbox->classify_confidences[bbox->classify_count] = av_make_q((int)(confidence * 10000), 10000);
+
+    if (ctx->labels && label_id < ctx->label_count) {
+        av_strlcpy(bbox->classify_labels[bbox->classify_count], ctx->labels[label_id], sizeof(bbox->classify_labels[bbox->classify_count]));
+    } else {
+        snprintf(bbox->classify_labels[bbox->classify_count], sizeof(bbox->classify_labels[bbox->classify_count]), "%d", label_id);
+    }
+
+    bbox->classify_count++;
+
+    return 0;
+}
+
+static void free_classify_labels(DnnClassifyContext *ctx)
+{
+    for (int i = 0; i < ctx->label_count; i++) {
+        av_freep(&ctx->labels[i]);
+    }
+    ctx->label_count = 0;
+    av_freep(&ctx->labels);
+}
+
+static int read_classify_label_file(AVFilterContext *context)
+{
+    int line_len;
+    FILE *file;
+    DnnClassifyContext *ctx = context->priv;
+
+    file = av_fopen_utf8(ctx->labels_filename, "r");
+    if (!file){
+        av_log(context, AV_LOG_ERROR, "failed to open file %s\n", ctx->labels_filename);
+        return AVERROR(EINVAL);
+    }
+
+    while (!feof(file)) {
+        char *label;
+        char buf[256];
+        if (!fgets(buf, 256, file)) {
+            break;
+        }
+
+        line_len = strlen(buf);
+        while (line_len) {
+            int i = line_len - 1;
+            if (buf[i] == '\n' || buf[i] == '\r' || buf[i] == ' ') {
+                buf[i] = '\0';
+                line_len--;
+            } else {
+                break;
+            }
+        }
+
+        if (line_len == 0)  // empty line
+            continue;
+
+        if (line_len >= AV_DETECTION_BBOX_LABEL_NAME_MAX_SIZE) {
+            av_log(context, AV_LOG_ERROR, "label %s too long\n", buf);
+            fclose(file);
+            return AVERROR(EINVAL);
+        }
+
+        label = av_strdup(buf);
+        if (!label) {
+            av_log(context, AV_LOG_ERROR, "failed to allocate memory for label %s\n", buf);
+            fclose(file);
+            return AVERROR(ENOMEM);
+        }
+
+        if (av_dynarray_add_nofree(&ctx->labels, &ctx->label_count, label) < 0) {
+            av_log(context, AV_LOG_ERROR, "failed to do av_dynarray_add\n");
+            fclose(file);
+            av_freep(&label);
+            return AVERROR(ENOMEM);
+        }
+    }
+
+    fclose(file);
+    return 0;
+}
+
+static av_cold int dnn_classify_init(AVFilterContext *context)
+{
+    DnnClassifyContext *ctx = context->priv;
+    int ret = ff_dnn_init(&ctx->dnnctx, DFT_ANALYTICS_CLASSIFY, context);
+    if (ret < 0)
+        return ret;
+    ff_dnn_set_classify_post_proc(&ctx->dnnctx, dnn_classify_post_proc);
+
+    if (ctx->labels_filename) {
+        return read_classify_label_file(context);
+    }
+    return 0;
+}
+
+static int dnn_classify_query_formats(AVFilterContext *context)
+{
+    static const enum AVPixelFormat pix_fmts[] = {
+        AV_PIX_FMT_RGB24, AV_PIX_FMT_BGR24,
+        AV_PIX_FMT_GRAY8, AV_PIX_FMT_GRAYF32,
+        AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUV422P,
+        AV_PIX_FMT_YUV444P, AV_PIX_FMT_YUV410P, AV_PIX_FMT_YUV411P,
+        AV_PIX_FMT_NV12,
+        AV_PIX_FMT_NONE
+    };
+    AVFilterFormats *fmts_list = ff_make_format_list(pix_fmts);
+    return ff_set_common_formats(context, fmts_list);
+}
+
+static int dnn_classify_flush_frame(AVFilterLink *outlink, int64_t pts, int64_t *out_pts)
+{
+    DnnClassifyContext *ctx = outlink->src->priv;
+    int ret;
+    DNNAsyncStatusType async_state;
+
+    ret = ff_dnn_flush(&ctx->dnnctx);
+    if (ret != DNN_SUCCESS) {
+        return -1;
+    }
+
+    do {
+        AVFrame *in_frame = NULL;
+        AVFrame *out_frame = NULL;
+        async_state = ff_dnn_get_async_result(&ctx->dnnctx, &in_frame, &out_frame);
+        if (out_frame) {
+            av_assert0(in_frame == out_frame);
+            ret = ff_filter_frame(outlink, out_frame);
+            if (ret < 0)
+                return ret;
+            if (out_pts)
+                *out_pts = out_frame->pts + pts;
+        }
+        av_usleep(5000);
+    } while (async_state >= DAST_NOT_READY);
+
+    return 0;
+}
+
+static int dnn_classify_activate(AVFilterContext *filter_ctx)
+{
+    AVFilterLink *inlink = filter_ctx->inputs[0];
+    AVFilterLink *outlink = filter_ctx->outputs[0];
+    DnnClassifyContext *ctx = filter_ctx->priv;
+    AVFrame *in = NULL;
+    int64_t pts;
+    int ret, status;
+    int got_frame = 0;
+    int async_state;
+
+    FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
+
+    do {
+        // drain all input frames
+        ret = ff_inlink_consume_frame(inlink, &in);
+        if (ret < 0)
+            return ret;
+        if (ret > 0) {
+            if (ff_dnn_execute_model_classification(&ctx->dnnctx, in, in, ctx->target) != DNN_SUCCESS) {
+                return AVERROR(EIO);
+            }
+        }
+    } while (ret > 0);
+
+    // drain all processed frames
+    do {
+        AVFrame *in_frame = NULL;
+        AVFrame *out_frame = NULL;
+        async_state = ff_dnn_get_async_result(&ctx->dnnctx, &in_frame, &out_frame);
+        if (out_frame) {
+            av_assert0(in_frame == out_frame);
+            ret = ff_filter_frame(outlink, out_frame);
+            if (ret < 0)
+                return ret;
+            got_frame = 1;
+        }
+    } while (async_state == DAST_SUCCESS);
+
+    // if frame got, schedule to next filter
+    if (got_frame)
+        return 0;
+
+    if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
+        if (status == AVERROR_EOF) {
+            int64_t out_pts = pts;
+            ret = dnn_classify_flush_frame(outlink, pts, &out_pts);
+            ff_outlink_set_status(outlink, status, out_pts);
+            return ret;
+        }
+    }
+
+    FF_FILTER_FORWARD_WANTED(outlink, inlink);
+
+    return 0;
+}
+
+static av_cold void dnn_classify_uninit(AVFilterContext *context)
+{
+    DnnClassifyContext *ctx = context->priv;
+    ff_dnn_uninit(&ctx->dnnctx);
+    free_classify_labels(ctx);
+}
+
+static const AVFilterPad dnn_classify_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+    },
+    { NULL }
+};
+
+static const AVFilterPad dnn_classify_outputs[] = {
+    {
+        .name = "default",
+        .type = AVMEDIA_TYPE_VIDEO,
+    },
+    { NULL }
+};
+
+const AVFilter ff_vf_dnn_classify = {
+    .name          = "dnn_classify",
+    .description   = NULL_IF_CONFIG_SMALL("Apply DNN classify filter to the input."),
+    .priv_size     = sizeof(DnnClassifyContext),
+    .init          = dnn_classify_init,
+    .uninit        = dnn_classify_uninit,
+    .query_formats = dnn_classify_query_formats,
+    .inputs        = dnn_classify_inputs,
+    .outputs       = dnn_classify_outputs,
+    .priv_class    = &dnn_classify_class,
+    .activate      = dnn_classify_activate,
+};