From patchwork Mon Dec 4 05:36:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44894 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a301:b0:181:818d:5e7f with SMTP id x1csp2566427pzk; Sun, 3 Dec 2023 21:36:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IGoCd1RscZvvHGM+xnQItltMFZXSHUFHEAvhMsd9ClILfBX8MU7r5qWMPWhi2BG0V/R6/+U X-Received: by 2002:a05:6512:3e25:b0:50b:cab3:1102 with SMTP id i37-20020a0565123e2500b0050bcab31102mr7603848lfv.6.1701668209850; Sun, 03 Dec 2023 21:36:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701668209; cv=none; d=google.com; s=arc-20160816; b=Ua3gjX1gdZN+d5xl3uSO7Xb9RrYgRAZxdM5niS3Q7Rs8drNcGmwwHeUG93SbUwMIOm SkkGoSzOl+F49i6l3OFuNy4OZZpfeN+uIthIzGXi0Qc6PdXWP5VrAn/vrPIc79l7Zx7y oB7POHTEtLrmO4Nfl0c6dst7d9YPEjdh2AHjyLV5qVqwGmnt7fpYJfSQrNXs5L4/wmmn bgRRrQAsUcJdkTQZbTAkvjAdgK/8kFhw4vl4Bt+Vkm7j+fcX8GgJpngcJZrrqH+nrUFJ gNqIMx/9RVLSgNfWvdkKHMTENRzbSYAlx7JfUwW9m9i+0FpkLlXHSmTbl4n4whosuQC2 Nswg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=d8Lc/Te11bTqN+JEQNRppSg/UF8nEMTgg6JmwMkWYPQ=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=ZDl7nqAO80G+qJk6LuXyvYPUGUC4m3WG/kt5VwxBEdIxQoBxEA/moKTrT9rR0j+f0H +pVOx9kM/GiIjjkqninxz1dnHG964nJcIizclkl4G1VnBgT6rR/3PjCykQtJliV7EIbW utVzZ/2AiJ5il3rn2AWnvOFCV9hnHb8xCNEl8x0+Ow/FKYCNYMtX4hu87yVgg8ja4P7V 1k3WUFs8aR0hrPVS9evBRg2F6dui/M6HyoROIW424zsYMvY4M6B3QNfA04ds4FMSqAKH KRmJ6BnxUVc5wZbW6QxJIOO1NsvmHFR0VE1z7FQfrHd7DhtdtMoVZfl4UUZvsprA8raG 4PpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=U5t4cM9Z; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t30-20020a50d71e000000b005455a9d010esi1534047edi.326.2023.12.03.21.36.49; Sun, 03 Dec 2023 21:36:49 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=U5t4cM9Z; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C058668CE28; Mon, 4 Dec 2023 07:36:45 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 51ACF68AC2D for ; Mon, 4 Dec 2023 07:36:38 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701668203; x=1733204203; h=from:to:subject:date:message-id:mime-version: content-transfer-encoding; bh=vogDmk7kMvf3YVu6fjyTr4qr3ZPkCiqCT85raBM1QPA=; b=U5t4cM9ZM3mvjbWdnsh0dSXGioclF867+VfeUy/8uOUD/2Ce1Rk3V40e vUSulBO4bUpIfqS2R21UZwwMV5Nkvc2kO9tlSxAToQeaEvhpmw21TZGZf nHwNfgIpJMmHj33YGgyk+jluF6CPzAEtLMiS044h/kFKfJiXT1GtMx+VA Ry2Drg7eP8cw3YSyUSP5ly2rMhP3g7AITEdFmTWGgvKyZDQe2JS1HT4oq s4i8cn6FRn5tuw8LkeRZHZDpKXVOmNzgWUQCEukSFKakha7hUxv2dJCJY 4owQZeatdV+gSCu2ijbede0ZU7Eyjd7XIpkhpDsIMsHvbjbKaIziMLkc8 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="397574027" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="397574027" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 21:36:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="914308722" X-IronPort-AV: E=Sophos;i="6.04,249,1695711600"; d="scan'208";a="914308722" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga001.fm.intel.com with ESMTP; 03 Dec 2023 21:36:34 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Mon, 4 Dec 2023 13:36:30 +0800 Message-Id: <20231204053633.1743228-1-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] libavfiter/dnn/dnn_backend_openvino: add multiple output support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: o8kX3ERwU0Fv From: Wenbin Chen Add multiple output support to openvino backend. You can use '&' to split different output when you set output name using command line. Signed-off-by: Wenbin Chen --- libavfilter/dnn/dnn_backend_common.c | 7 - libavfilter/dnn/dnn_backend_openvino.c | 216 +++++++++++++++++-------- libavfilter/vf_dnn_detect.c | 11 +- 3 files changed, 150 insertions(+), 84 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_common.c b/libavfilter/dnn/dnn_backend_common.c index 91a4a3c4bf..632832ec36 100644 --- a/libavfilter/dnn/dnn_backend_common.c +++ b/libavfilter/dnn/dnn_backend_common.c @@ -43,13 +43,6 @@ int ff_check_exec_params(void *ctx, DNNBackendType backend, DNNFunctionType func return AVERROR(EINVAL); } - if (exec_params->nb_output != 1 && backend != DNN_TF) { - // currently, the filter does not need multiple outputs, - // so we just pending the support until we really need it. - avpriv_report_missing_feature(ctx, "multiple outputs"); - return AVERROR(ENOSYS); - } - return 0; } diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index 6fe8b9c243..089e028818 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -64,7 +64,7 @@ typedef struct OVModel{ ov_compiled_model_t *compiled_model; ov_output_const_port_t* input_port; ov_preprocess_input_info_t* input_info; - ov_output_const_port_t* output_port; + ov_output_const_port_t** output_ports; ov_preprocess_output_info_t* output_info; ov_preprocess_prepostprocessor_t* preprocess; #else @@ -77,6 +77,7 @@ typedef struct OVModel{ SafeQueue *request_queue; // holds OVRequestItem Queue *task_queue; // holds TaskItem Queue *lltask_queue; // holds LastLevelTaskItem + int nb_outputs; } OVModel; // one request for one call to openvino @@ -349,7 +350,7 @@ static void infer_completion_callback(void *args) TaskItem *task = lltask->task; OVModel *ov_model = task->model; SafeQueue *requestq = ov_model->request_queue; - DNNData output; + DNNData *outputs; OVContext *ctx = &ov_model->ctx; #if HAVE_OPENVINO2 size_t* dims; @@ -358,45 +359,61 @@ static void infer_completion_callback(void *args) ov_shape_t output_shape = {0}; ov_element_type_e precision; - memset(&output, 0, sizeof(output)); - status = ov_infer_request_get_output_tensor_by_index(request->infer_request, 0, &output_tensor); - if (status != OK) { - av_log(ctx, AV_LOG_ERROR, - "Failed to get output tensor."); + outputs = av_calloc(ov_model->nb_outputs, sizeof(*outputs)); + if (!outputs) { + av_log(ctx, AV_LOG_ERROR, "Failed to alloc outputs."); return; } - status = ov_tensor_data(output_tensor, &output.data); - if (status != OK) { - av_log(ctx, AV_LOG_ERROR, - "Failed to get output data."); - return; - } + for (int i = 0; i < ov_model->nb_outputs; i++) { + status = ov_infer_request_get_tensor_by_const_port(request->infer_request, + ov_model->output_ports[i], + &output_tensor); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, + "Failed to get output tensor."); + goto end; + } - status = ov_tensor_get_shape(output_tensor, &output_shape); - if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to get output port shape.\n"); - return; - } - dims = output_shape.dims; + status = ov_tensor_data(output_tensor, &outputs[i].data); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, + "Failed to get output data."); + goto end; + } - status = ov_port_get_element_type(ov_model->output_port, &precision); - if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to get output port data type.\n"); + status = ov_tensor_get_shape(output_tensor, &output_shape); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, "Failed to get output port shape.\n"); + goto end; + } + dims = output_shape.dims; + + status = ov_port_get_element_type(ov_model->output_ports[i], &precision); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, "Failed to get output port data type.\n"); + goto end; + } + outputs[i].dt = precision_to_datatype(precision); + + outputs[i].channels = output_shape.rank > 2 ? dims[output_shape.rank - 3] : 1; + outputs[i].height = output_shape.rank > 1 ? dims[output_shape.rank - 2] : 1; + outputs[i].width = output_shape.rank > 0 ? dims[output_shape.rank - 1] : 1; + av_assert0(request->lltask_count <= dims[0]); + outputs[i].layout = ctx->options.layout; + outputs[i].scale = ctx->options.scale; + outputs[i].mean = ctx->options.mean; ov_shape_free(&output_shape); - return; + ov_tensor_free(output_tensor); + output_tensor = NULL; } - output.channels = output_shape.rank > 2 ? dims[output_shape.rank - 3] : 1; - output.height = output_shape.rank > 1 ? dims[output_shape.rank - 2] : 1; - output.width = output_shape.rank > 0 ? dims[output_shape.rank - 1] : 1; - av_assert0(request->lltask_count <= dims[0]); - ov_shape_free(&output_shape); #else IEStatusCode status; dimensions_t dims; ie_blob_t *output_blob = NULL; ie_blob_buffer_t blob_buffer; precision_e precision; + DNNData output; status = ie_infer_request_get_blob(request->infer_request, task->output_names[0], &output_blob); if (status != OK) { av_log(ctx, AV_LOG_ERROR, @@ -424,11 +441,12 @@ static void infer_completion_callback(void *args) output.height = dims.dims[2]; output.width = dims.dims[3]; av_assert0(request->lltask_count <= dims.dims[0]); -#endif output.dt = precision_to_datatype(precision); output.layout = ctx->options.layout; output.scale = ctx->options.scale; output.mean = ctx->options.mean; + outputs = &output; +#endif av_assert0(request->lltask_count >= 1); for (int i = 0; i < request->lltask_count; ++i) { @@ -438,28 +456,33 @@ static void infer_completion_callback(void *args) case DFT_PROCESS_FRAME: if (task->do_ioproc) { if (ov_model->model->frame_post_proc != NULL) { - ov_model->model->frame_post_proc(task->out_frame, &output, ov_model->model->filter_ctx); + ov_model->model->frame_post_proc(task->out_frame, outputs, ov_model->model->filter_ctx); } else { - ff_proc_from_dnn_to_frame(task->out_frame, &output, ctx); + ff_proc_from_dnn_to_frame(task->out_frame, outputs, ctx); } } else { - task->out_frame->width = output.width; - task->out_frame->height = output.height; + task->out_frame->width = outputs[0].width; + task->out_frame->height = outputs[0].height; } break; case DFT_ANALYTICS_DETECT: if (!ov_model->model->detect_post_proc) { av_log(ctx, AV_LOG_ERROR, "detect filter needs to provide post proc\n"); - return; + goto end; } - ov_model->model->detect_post_proc(task->in_frame, &output, 1, ov_model->model->filter_ctx); + ov_model->model->detect_post_proc(task->in_frame, outputs, + ov_model->nb_outputs, + ov_model->model->filter_ctx); break; case DFT_ANALYTICS_CLASSIFY: if (!ov_model->model->classify_post_proc) { av_log(ctx, AV_LOG_ERROR, "classify filter needs to provide post proc\n"); - return; + goto end; } - ov_model->model->classify_post_proc(task->in_frame, &output, request->lltasks[i]->bbox_index, ov_model->model->filter_ctx); + for (int output_i = 0; output_i < ov_model->nb_outputs; output_i++) + ov_model->model->classify_post_proc(task->in_frame, outputs, + request->lltasks[i]->bbox_index, + ov_model->model->filter_ctx); break; default: av_assert0(!"should not reach here"); @@ -468,10 +491,17 @@ static void infer_completion_callback(void *args) task->inference_done++; av_freep(&request->lltasks[i]); - output.data = (uint8_t *)output.data - + output.width * output.height * output.channels * get_datatype_size(output.dt); + for (int i = 0; i < ov_model->nb_outputs; i++) + outputs[i].data = (uint8_t *)outputs[i].data + + outputs[i].width * outputs[i].height * outputs[i].channels * get_datatype_size(outputs[i].dt); } -#if !HAVE_OPENVINO2 +end: +#if HAVE_OPENVINO2 + av_freep(&outputs); + ov_shape_free(&output_shape); + if (output_tensor) + ov_tensor_free(output_tensor); +#else ie_blob_free(&output_blob); #endif request->lltask_count = 0; @@ -525,8 +555,10 @@ static void dnn_free_model_ov(DNNModel **model) #if HAVE_OPENVINO2 if (ov_model->input_port) ov_output_const_port_free(ov_model->input_port); - if (ov_model->output_port) - ov_output_const_port_free(ov_model->output_port); + for (int i = 0; i < ov_model->nb_outputs; i++) + if (ov_model->output_ports[i]) + ov_output_const_port_free(ov_model->output_ports[i]); + av_freep(&ov_model->output_ports); if (ov_model->preprocess) ov_preprocess_prepostprocessor_free(ov_model->preprocess); if (ov_model->compiled_model) @@ -551,7 +583,7 @@ static void dnn_free_model_ov(DNNModel **model) } -static int init_model_ov(OVModel *ov_model, const char *input_name, const char *output_name) +static int init_model_ov(OVModel *ov_model, const char *input_name, const char **output_names, int nb_outputs) { int ret = 0; OVContext *ctx = &ov_model->ctx; @@ -594,17 +626,15 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * } status = ov_preprocess_prepostprocessor_get_input_info_by_name(ov_model->preprocess, input_name, &ov_model->input_info); - status |= ov_preprocess_prepostprocessor_get_output_info_by_name(ov_model->preprocess, output_name, &ov_model->output_info); if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to get input/output info from preprocess.\n"); + av_log(ctx, AV_LOG_ERROR, "Failed to get input info from preprocess.\n"); ret = ov2_map_error(status, NULL); goto err; } status = ov_preprocess_input_info_get_tensor_info(ov_model->input_info, &input_tensor_info); - status |= ov_preprocess_output_info_get_tensor_info(ov_model->output_info, &output_tensor_info); if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to get tensor info from input/output.\n"); + av_log(ctx, AV_LOG_ERROR, "Failed to get tensor info from input.\n"); ret = ov2_map_error(status, NULL); goto err; } @@ -642,17 +672,43 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * } status = ov_preprocess_input_tensor_info_set_element_type(input_tensor_info, U8); - if (ov_model->model->func_type != DFT_PROCESS_FRAME) - status |= ov_preprocess_output_set_element_type(output_tensor_info, F32); - else if (fabsf(ctx->options.scale - 1) > 1e-6f || fabsf(ctx->options.mean) > 1e-6f) - status |= ov_preprocess_output_set_element_type(output_tensor_info, F32); - else - status |= ov_preprocess_output_set_element_type(output_tensor_info, U8); if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to set input/output element type\n"); + av_log(ctx, AV_LOG_ERROR, "Failed to set input element type\n"); ret = ov2_map_error(status, NULL); goto err; } + + ov_model->nb_outputs = nb_outputs; + for (int i = 0; i < nb_outputs; i++) { + status = ov_preprocess_prepostprocessor_get_output_info_by_name( + ov_model->preprocess, output_names[i], &ov_model->output_info); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, "Failed to get output info from preprocess.\n"); + ret = ov2_map_error(status, NULL); + goto err; + } + status |= ov_preprocess_output_info_get_tensor_info(ov_model->output_info, &output_tensor_info); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, "Failed to get tensor info from input/output.\n"); + ret = ov2_map_error(status, NULL); + goto err; + } + if (ov_model->model->func_type != DFT_PROCESS_FRAME) + status |= ov_preprocess_output_set_element_type(output_tensor_info, F32); + else if (fabsf(ctx->options.scale - 1) > 1e-6f || fabsf(ctx->options.mean) > 1e-6f) + status |= ov_preprocess_output_set_element_type(output_tensor_info, F32); + else + status |= ov_preprocess_output_set_element_type(output_tensor_info, U8); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, "Failed to set output element type\n"); + ret = ov2_map_error(status, NULL); + goto err; + } + ov_preprocess_output_tensor_info_free(output_tensor_info); + output_tensor_info = NULL; + ov_preprocess_output_info_free(ov_model->output_info); + ov_model->output_info = NULL; + } // set preprocess steps. if (fabsf(ctx->options.scale - 1) > 1e-6f || fabsf(ctx->options.mean) > 1e-6f) { ov_preprocess_preprocess_steps_t* input_process_steps = NULL; @@ -667,11 +723,18 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * status |= ov_preprocess_preprocess_steps_scale(input_process_steps, ctx->options.scale); if (status != OK) { av_log(ctx, AV_LOG_ERROR, "Failed to set preprocess steps\n"); + ov_preprocess_preprocess_steps_free(input_process_steps); + input_process_steps = NULL; ret = ov2_map_error(status, NULL); goto err; } ov_preprocess_preprocess_steps_free(input_process_steps); + input_process_steps = NULL; } + ov_preprocess_input_tensor_info_free(input_tensor_info); + input_tensor_info = NULL; + ov_preprocess_input_info_free(ov_model->input_info); + ov_model->input_info = NULL; //update model if(ov_model->ov_model) @@ -679,20 +742,33 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * status = ov_preprocess_prepostprocessor_build(ov_model->preprocess, &ov_model->ov_model); if (status != OK) { av_log(ctx, AV_LOG_ERROR, "Failed to update OV model\n"); + ov_model_free(tmp_ov_model); + tmp_ov_model = NULL; ret = ov2_map_error(status, NULL); goto err; } ov_model_free(tmp_ov_model); //update output_port - if (ov_model->output_port) { - ov_output_const_port_free(ov_model->output_port); - ov_model->output_port = NULL; - } - status = ov_model_const_output_by_name(ov_model->ov_model, output_name, &ov_model->output_port); - if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to get output port.\n"); - goto err; + if (!ov_model->output_ports) { + ov_model->output_ports = av_calloc(nb_outputs, sizeof(*ov_model->output_ports)); + if (!ov_model->output_ports) { + ret = AVERROR(ENOMEM); + goto err; + } + } else + for (int i = 0; i < nb_outputs; i++) { + ov_output_const_port_free(ov_model->output_ports[i]); + ov_model->output_ports[i] = NULL; + } + + for (int i = 0; i < nb_outputs; i++) { + status = ov_model_const_output_by_name(ov_model->ov_model, output_names[i], + &ov_model->output_ports[i]); + if (status != OK) { + av_log(ctx, AV_LOG_ERROR, "Failed to get output port %s.\n", output_names[i]); + goto err; + } } //compile network status = ov_core_compile_model(ov_model->core, ov_model->ov_model, device, 0, &ov_model->compiled_model); @@ -701,6 +777,7 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * goto err; } ov_preprocess_input_model_info_free(input_model_info); + input_model_info = NULL; ov_layout_free(NCHW_layout); ov_layout_free(NHWC_layout); #else @@ -745,6 +822,7 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * ret = DNN_GENERIC_ERROR; goto err; } + ov_model->nb_outputs = 1; // all models in openvino open model zoo use BGR with range [0.0f, 255.0f] as input, // we don't have a AVPixelFormat to describe it, so we'll use AV_PIX_FMT_BGR24 and @@ -848,6 +926,10 @@ static int init_model_ov(OVModel *ov_model, const char *input_name, const char * err: #if HAVE_OPENVINO2 + if (output_tensor_info) + ov_preprocess_output_tensor_info_free(output_tensor_info); + if (ov_model->output_info) + ov_preprocess_output_info_free(ov_model->output_info); if (NCHW_layout) ov_layout_free(NCHW_layout); if (NHWC_layout) @@ -1204,11 +1286,6 @@ static int get_output_ov(void *model, const char *input_name, int input_width, i } } - status = ov_model_const_output_by_name(ov_model->ov_model, output_name, &ov_model->output_port); - if (status != OK) { - av_log(ctx, AV_LOG_ERROR, "Failed to get output port.\n"); - return ov2_map_error(status, NULL); - } if (!ov_model->compiled_model) { #else if (ctx->options.input_resizable) { @@ -1224,7 +1301,7 @@ static int get_output_ov(void *model, const char *input_name, int input_width, i } if (!ov_model->exe_network) { #endif - ret = init_model_ov(ov_model, input_name, output_name); + ret = init_model_ov(ov_model, input_name, &output_name, 1); if (ret != 0) { av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n"); return ret; @@ -1397,7 +1474,8 @@ static int dnn_execute_model_ov(const DNNModel *model, DNNExecBaseParams *exec_p #else if (!ov_model->exe_network) { #endif - ret = init_model_ov(ov_model, exec_params->input_name, exec_params->output_names[0]); + ret = init_model_ov(ov_model, exec_params->input_name, + exec_params->output_names, exec_params->nb_output); if (ret != 0) { av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n"); return ret; diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 7ac3bb0b58..373dda58bf 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -354,11 +354,11 @@ static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterCon break; } } - return 0; } -static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outputs, + AVFilterContext *filter_ctx) { AVFrameSideData *sd; DnnDetectContext *ctx = filter_ctx->priv; @@ -466,7 +466,7 @@ static int dnn_detect_post_proc(AVFrame *frame, DNNData *output, uint32_t nb, AV DnnContext *dnn_ctx = &ctx->dnnctx; switch (dnn_ctx->backend_type) { case DNN_OV: - return dnn_detect_post_proc_ov(frame, output, filter_ctx); + return dnn_detect_post_proc_ov(frame, output, nb, filter_ctx); case DNN_TF: return dnn_detect_post_proc_tf(frame, output, filter_ctx); default: @@ -553,11 +553,6 @@ static int check_output_nb(DnnDetectContext *ctx, DNNBackendType backend_type, i } return 0; case DNN_OV: - if (output_nb != 1) { - av_log(ctx, AV_LOG_ERROR, "Dnn detect filter with openvino backend needs 1 output only, \ - but get %d instead\n", output_nb); - return AVERROR(EINVAL); - } return 0; default: avpriv_report_missing_feature(ctx, "Dnn detect filter does not support current backend\n"); From patchwork Mon Dec 4 05:36:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44895 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a301:b0:181:818d:5e7f with SMTP id x1csp2566465pzk; Sun, 3 Dec 2023 21:36:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IFLSURoQwJOfUsJ8ToZrfdTsPpoNSFChB6XqH4KqR6QI5d4Q99Q6mhhPvWLW/UEEGHIzG21 X-Received: by 2002:a50:a69c:0:b0:54c:77cd:c544 with SMTP id e28-20020a50a69c000000b0054c77cdc544mr2685492edc.31.1701668218631; Sun, 03 Dec 2023 21:36:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701668218; cv=none; d=google.com; s=arc-20160816; b=elz0EIVkd7GZD5xArayBrdHtYVvj1+ybzsukwOetKCbEmRFRG8QJTeP0AVmMK+FX1Q E8UU2808BSx/fg9G8FIv0v82LTGBndGgFdktnd+UKXhfmsERkpfmMda4K+qawVpEK1Wg 9pD3Qk5evJf15n5q27rtYuINoAWOq6M27BRjLgjnnTGxlD2evOfqOn7yIc/ALUolaMs2 ni52kyG4gQpSRAm/8BsMGsAeUG9ENEgPa6o11eVE0H0O5pKJqKJOsx2sIfkg1M5RJwPM RC5fgM58q2Xfxh2FxH/AqO8yqgL9eSERsq20lfdD/SBdsJRY+z1aZ+RXj/LOZQGUedLZ jM0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=EHJ1+EIbUo4sP2/DggxRh0aBdY7sIXT9ODp/InkNWqc=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=MAzwzaeNQZTAcd+OoyAlYxAEYO8kCOsu6ZFEgZCrcRemJ1QHy3fotYRmEpiSMxQxhY PoS0NnkU6JOMEaV7ugjfXLepel46DlAj4depJUNhhoSYkzz8/smygNu9B53j1PSpim2m +0u7Z/Ecb7NEcGZVrLBLaOZxACg1havPtiXzsTf49h7JS8AKforZlLuJxSpjfqN5ilUR RVx5r2r4oB6q9udwdbSODh/UoilOqLfwIwzN7CNpdNC4wJT1lzlbVa5rtTId+9Fnj3Q7 4m9ChS0ttS3bIB46ai6jfN/UIb5v+P6oevrR5nYYKX0wTR94+typApAe4UFp+ANnnI/G NddA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=WuFn6sYI; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m28-20020a50931c000000b0054c6d39b156si2506503eda.401.2023.12.03.21.36.58; Sun, 03 Dec 2023 21:36:58 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=WuFn6sYI; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DEE6F68CE3F; Mon, 4 Dec 2023 07:36:47 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DAA6068AC2D for ; Mon, 4 Dec 2023 07:36:39 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701668205; x=1733204205; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=sAO5zK+LdKAT7gYfnXZCujXZe50HbDYU1O1ccgvZi0s=; b=WuFn6sYIHDk/VeyS+vv9N7lIsssudgNzjBqLoP65W/gKnEXb8Ep+58lD MP1zd7NyLZ639Gf+XWUMG/yrICQZOIuieV2Qhc930zQGBVaP3jpxcrXE6 CpIWylErA/laMFOvf1xNQsjAaCmkGDOUJxHo9Z9R5AkjMbxfqMawa0JAz Ovf24IAXewoSs2WVoPtLe9I7KKxpXELR11uOZZSdL69IR4+G7dzC8A7y4 /5LA9gbmyHWmeGVFYqg00YPieKxBvbjNix5G5qPKWGN934nMrrbi+u/TR 4l7TJgtb4Ljdk38w9bNt95Gsj1aNcSlPE8tnyI80n/7wGsPeW7Vrq7RJc Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="397574028" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="397574028" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 21:36:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="914308725" X-IronPort-AV: E=Sophos;i="6.04,249,1695711600"; d="scan'208";a="914308725" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga001.fm.intel.com with ESMTP; 03 Dec 2023 21:36:35 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Mon, 4 Dec 2023 13:36:31 +0800 Message-Id: <20231204053633.1743228-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231204053633.1743228-1-wenbin.chen@intel.com> References: <20231204053633.1743228-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] libavfilter/vf_dnn_detect: Add input pad X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8ykPuVMTdMdz From: Wenbin Chen Add input pad to get model input resolution. Detection models always have fixed input size. And the output coordinators are based on the input resolution, so we need to get input size to map coordinators to our real output frames. Signed-off-by: Wenbin Chen --- libavfilter/dnn/dnn_backend_openvino.c | 24 ++++++++++++++++------ libavfilter/vf_dnn_detect.c | 28 +++++++++++++++++++++++++- 2 files changed, 45 insertions(+), 7 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index 089e028818..671a995c70 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -1073,9 +1073,15 @@ static int get_input_ov(void *model, DNNData *input, const char *input_name) return AVERROR(ENOSYS); } - input->channels = dims[1]; - input->height = input_resizable ? -1 : dims[2]; - input->width = input_resizable ? -1 : dims[3]; + if (dims[1] <= 3) { // NCHW + input->channels = dims[1]; + input->height = input_resizable ? -1 : dims[2]; + input->width = input_resizable ? -1 : dims[3]; + } else { // NHWC + input->height = input_resizable ? -1 : dims[1]; + input->width = input_resizable ? -1 : dims[2]; + input->channels = dims[3]; + } input->dt = precision_to_datatype(precision); return 0; @@ -1105,9 +1111,15 @@ static int get_input_ov(void *model, DNNData *input, const char *input_name) return DNN_GENERIC_ERROR; } - input->channels = dims.dims[1]; - input->height = input_resizable ? -1 : dims.dims[2]; - input->width = input_resizable ? -1 : dims.dims[3]; + if (dims[1] <= 3) { // NCHW + input->channels = dims[1]; + input->height = input_resizable ? -1 : dims[2]; + input->width = input_resizable ? -1 : dims[3]; + } else { // NHWC + input->height = input_resizable ? -1 : dims[1]; + input->width = input_resizable ? -1 : dims[2]; + input->channels = dims[3]; + } input->dt = precision_to_datatype(precision); return 0; } diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 373dda58bf..86f61c9907 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -699,13 +699,39 @@ static av_cold void dnn_detect_uninit(AVFilterContext *context) free_detect_labels(ctx); } +static int config_input(AVFilterLink *inlink) +{ + AVFilterContext *context = inlink->dst; + DnnDetectContext *ctx = context->priv; + DNNData model_input; + int ret; + + ret = ff_dnn_get_input(&ctx->dnnctx, &model_input); + if (ret != 0) { + av_log(ctx, AV_LOG_ERROR, "could not get input from the model\n"); + return ret; + } + ctx->scale_width = model_input.width == -1 ? inlink->w : model_input.width; + ctx->scale_height = model_input.height == -1 ? inlink->h : model_input.height; + + return 0; +} + +static const AVFilterPad dnn_detect_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_input, + }, +}; + const AVFilter ff_vf_dnn_detect = { .name = "dnn_detect", .description = NULL_IF_CONFIG_SMALL("Apply DNN detect filter to the input."), .priv_size = sizeof(DnnDetectContext), .init = dnn_detect_init, .uninit = dnn_detect_uninit, - FILTER_INPUTS(ff_video_default_filterpad), + FILTER_INPUTS(dnn_detect_inputs), FILTER_OUTPUTS(ff_video_default_filterpad), FILTER_PIXFMTS_ARRAY(pix_fmts), .priv_class = &dnn_detect_class, From patchwork Mon Dec 4 05:36:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44896 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a301:b0:181:818d:5e7f with SMTP id x1csp2566505pzk; Sun, 3 Dec 2023 21:37:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3BTfoMhKy7VOkbyyxqoVwATLiIJ11YvW/vz+qtD7nHji6Z8gI+ajBt7n7ETenxFJVs3Ot X-Received: by 2002:a2e:155b:0:b0:2c9:f509:d81e with SMTP id 27-20020a2e155b000000b002c9f509d81emr2239448ljv.0.1701668227159; Sun, 03 Dec 2023 21:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701668227; cv=none; d=google.com; s=arc-20160816; b=CUyaDgCc9cAQ/tAXUUnHMVdWszzrGYkhuCGt/dw70IFk1gKBPDyN0JAW/rLz9LS4M7 FqZWZxn7DNDg3rPhmOypa7d5Pfal3Qhf05enJvmyFM7Fx3dVR++YHl8M9x5x3HaDvsCf l3l70+5eEEca7DqJGl5mX45ChiOjQH7eN1ALRupeNpQA0Zc4zFsKkFu12aXYMov3+CPi rtlTqs2gWV2AhV4sLNmNrETJDCT/0qXsQ8O6JFqE/EM1rTt6o4sMlpl1N6FZRp+DxvMC adVlOGu1aASOn6GWr11tBBPFL3V+rYS71VpG63qkd1gFUKbe3xk1JT1g23jI8JAuYjRm VKhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=e+6Qk6rvQTroXodYi1GYjG/uS7ynSHxnaVeH7WySlmw=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=rLV8p0uVnvID0AayqDkvb8pDOJa7ngbHuId1RSPgNBGJwp88IJ0imPBB3zRJyTQfvt pQkroWiQQUZl5UeeFh815OyGcup/yQ5uH117n0AlYN6OKMcNYEuzZUrFHL05gJRtbLdq 3hD810H0q7ySK+3hTFobnjI2QVQ9yJSWWlxivUXnRMa+CBDeEbKdUNx8C0DHNXPegqnY wfI6zSRKL7JMQ/w8ofZyx45iwlcPArzLFdE6bDZfovVudY5pd91wIVntNRBBXkAj3mmx bsqJ2IPxDkVdGRLXzvg/okZRMQPlR6hD1f7uBX+bcM7y2TG9cRwQcwXhQg3pP+Cc1U3g 66Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=PBE3ccVm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id o26-20020a1709061b1a00b009a18d9d3fa7si4010914ejg.669.2023.12.03.21.37.06; Sun, 03 Dec 2023 21:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=PBE3ccVm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F2F7C68CEB9; Mon, 4 Dec 2023 07:36:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 89D6968CDE0 for ; Mon, 4 Dec 2023 07:36:44 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701668209; x=1733204209; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=Z74JKBvWp+S5Y6Qe6neSKIcyb2IFotjVhTFu1YPhHjo=; b=PBE3ccVmuMaUmDTEbuZJK9i/rHl1veiH+cGZ+KYFa9V46LJxPNjFIKuh aMGXtNMfYI1oCATejdC9N6pMPLJo+ZsHEiAAIsnoOzsCX0vxGe0DMz2Hx W4hV2mu+l+0Efa2AFFOS0BJnEC/R78zT0VviXnQU1Iy2quA5INRHlAJbO rjN7fbS1SgmeaK2aVzYRIomQJnMfYWkh1+4yfbpLoen4klgefe/r+4Htn 1dEawQLZnwm2O6Um/ig2Vxe4Tb0hTGrXGeKT3SKuLclmzj0rVub/9Kpg+ /WStz/0ULYrnsh8DF6SbT/SVB6mAng75p9uwMilgP9M8LFp7VolDW8IRe g==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="397574029" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="397574029" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 21:36:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="914308728" X-IronPort-AV: E=Sophos;i="6.04,249,1695711600"; d="scan'208";a="914308728" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga001.fm.intel.com with ESMTP; 03 Dec 2023 21:36:36 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Mon, 4 Dec 2023 13:36:32 +0800 Message-Id: <20231204053633.1743228-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231204053633.1743228-1-wenbin.chen@intel.com> References: <20231204053633.1743228-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] libavfilter/vf_dnn_detect: Add yolov3 support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: wCdZW0hy1yBe From: Wenbin Chen Add yolov3 support. The difference of yolov3 is that it has multiple outputs in different scale to perform better on both large and small object. The model detail refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tf Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 86f61c9907..7a32b191c3 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -35,6 +35,7 @@ typedef enum { DDMT_SSD, DDMT_YOLOV1V2, + DDMT_YOLOV3 } DNNDetectionModelType; typedef struct DnnDetectContext { @@ -73,6 +74,7 @@ static const AVOption dnn_detect_options[] = { { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { "yolo", "output shape [1, N*Cx*Cy*DetectionBox]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV1V2 }, 0, 0, FLAGS, "model_type" }, + { "yolov3", "outputs shape [1, N*D, Cx, Cy]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV3 }, 0, 0, FLAGS, "model_type" }, { "cell_w", "cell width", OFFSET2(cell_w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "cell_h", "cell height", OFFSET2(cell_h), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "nb_classes", "The number of class", OFFSET2(nb_classes), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, @@ -146,6 +148,11 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out cell_h = ctx->cell_h; scale_w = cell_w; scale_h = cell_h; + } else { + cell_w = output[output_index].width; + cell_h = output[output_index].height; + scale_w = ctx->scale_width; + scale_h = ctx->scale_height; } box_size = nb_classes + 5; @@ -173,6 +180,7 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out output[output_index].height * output[output_index].width / box_size / cell_w / cell_h; + anchors = anchors + (detection_boxes * output_index * 2); /** * find all candidate bbox * yolo output can be reshaped to [B, N*D, Cx, Cy] @@ -284,6 +292,21 @@ static int dnn_detect_post_proc_yolo(AVFrame *frame, DNNData *output, AVFilterCo return 0; } +static int dnn_detect_post_proc_yolov3(AVFrame *frame, DNNData *output, + AVFilterContext *filter_ctx, int nb_outputs) +{ + int ret = 0; + for (int i = 0; i < nb_outputs; i++) { + ret = dnn_detect_parse_yolo_output(frame, output, i, filter_ctx); + if (ret < 0) + return ret; + } + ret = dnn_detect_fill_side_data(frame, filter_ctx); + if (ret < 0) + return ret; + return 0; +} + static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; @@ -380,8 +403,11 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outpu ret = dnn_detect_post_proc_yolo(frame, output, filter_ctx); if (ret < 0) return ret; + case DDMT_YOLOV3: + ret = dnn_detect_post_proc_yolov3(frame, output, filter_ctx, nb_outputs); + if (ret < 0) + return ret; } - return 0; } From patchwork Mon Dec 4 05:36:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44897 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a301:b0:181:818d:5e7f with SMTP id x1csp2566559pzk; Sun, 3 Dec 2023 21:37:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IFJFCbVdmLwDu6Vy/24K4J1V+78ewOLa6KVrQWtRSD/Bp8ZlAhYp6v6PB0PGRT4EJ1ko/oA X-Received: by 2002:ac2:504d:0:b0:50b:e76a:6ad1 with SMTP id a13-20020ac2504d000000b0050be76a6ad1mr1095726lfm.116.1701668236062; Sun, 03 Dec 2023 21:37:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701668236; cv=none; d=google.com; s=arc-20160816; b=mVdYaPJJtXLzUns3kWD+FHX5y/+TwMerruz0GML7DHptIBdszuoeOIvD1q0TfQjjfZ 6bzFGOLy6CdhRFV0o22lYCX86F1bn36GI6Ojdm3+52WDeSTfSYF4rb/zHZqsR3ys2mJI GzfPrRS/vKODdxvf8jEKFrMIHwl15ftb9WqjEsjJ06jqNPGQfVvm/ce29Hfv6/s80bW+ 2tqfRxOSLqvd7OgMTS3lHNPvpHgeZmU2Az6rnnpQVH+dih69FOMEU81NHX0LSGg4kZKe AU4aKK3V2BVehbvgkrzDVgO/AVA0Dp8mta0JLQT1y4FSig4jl6inKiUuCkbNAe1Do6DN m8gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=JaFIAot2qAm5NlN79ylv3YVppYUH+yq6OlUsoLdbQXM=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=C6vtI6ZRa1wWNefsWRdjqNr79DWuvUk8qCLFTO+hSJrDxg77NW10mVYi+i3GojSUFp IgH8BnPzH0oI1fnx/LOak7n3VHX5c+FOSG5sSCoQOqPkqNfiMRjLqLq46E1sfemKt8+K LE1nJ5QRKcz3M4RcApMoKYXyGpFZ/awTVN4vsgT2Gr9FXfYwaiK0YMhthWB48iSWN+lC n9D9QwjA3PxugQ7J3DQ0VFSXTc+r+/6/SmwOJtW6Hws7KqdxXkErpyWisarsVVHhUxoO Gg1kFxc5WS0Ios3wzDf8z01Uwq5H3E7sh9A7Kz1EQ0oialLxvjsRA/JhiMX9r5qk6Y3v fyRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=kPG8XUNq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t17-20020a50c251000000b0054cc8d2c0d9si748672edf.565.2023.12.03.21.37.15; Sun, 03 Dec 2023 21:37:16 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=kPG8XUNq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1A0C868CECE; Mon, 4 Dec 2023 07:36:53 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF5E268CE38 for ; Mon, 4 Dec 2023 07:36:45 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701668211; x=1733204211; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=FS8lqdjHwyeQoqINQEGfo36+QLLxv1E+XptoPNCgSRc=; b=kPG8XUNqn7T4A4cI0VMJPbyDeLpPkMjTsngdjSnmXuZ1O7489XjSh5/v Fjzlld0ClvGq0EE1aX+dGTIBLRwIpAn74wXf19v0ki+eHJr2xajccFYpy UKlkwATyJfil5wjdG8lSJEiK9Fr/WfqAbBzqFwj/+gIMetyWpHKUsenMm 7TeSX90XvBc/msFBtVq5VWyQiTKKJR/vk5XpZvcavBozYc+rk46a63cEA 0PiGt997Sk0m97n6QBqdfOQec6aUJqxv4YQ1Wl9/qn4I3Cl8ut8pFrevS W6mk4SOEQszoj6sT+sIXyiks291ewWLk/aY6XkSnYz0VuzJgZxgve2KAG w==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="397574033" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="397574033" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 21:36:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="914308729" X-IronPort-AV: E=Sophos;i="6.04,249,1695711600"; d="scan'208";a="914308729" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga001.fm.intel.com with ESMTP; 03 Dec 2023 21:36:37 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Mon, 4 Dec 2023 13:36:33 +0800 Message-Id: <20231204053633.1743228-4-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231204053633.1743228-1-wenbin.chen@intel.com> References: <20231204053633.1743228-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] libavfilter/vf_dnn_detect: Add yolov4 support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: A6L/b/RpCzEu From: Wenbin Chen The difference of yolov4 is that sigmoid function needed to be applied on x, y coordinates. Also make it compatiple with NHWC output as the yolov4 model from openvino model zoo has NHWC output layout. Model refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 71 ++++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 12 deletions(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 7a32b191c3..1b04a2cb98 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -35,7 +35,8 @@ typedef enum { DDMT_SSD, DDMT_YOLOV1V2, - DDMT_YOLOV3 + DDMT_YOLOV3, + DDMT_YOLOV4 } DNNDetectionModelType; typedef struct DnnDetectContext { @@ -75,6 +76,7 @@ static const AVOption dnn_detect_options[] = { { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { "yolo", "output shape [1, N*Cx*Cy*DetectionBox]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV1V2 }, 0, 0, FLAGS, "model_type" }, { "yolov3", "outputs shape [1, N*D, Cx, Cy]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV3 }, 0, 0, FLAGS, "model_type" }, + { "yolov4", "outputs shape [1, N*D, Cx, Cy]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV4 }, 0, 0, FLAGS, "model_type" }, { "cell_w", "cell width", OFFSET2(cell_w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "cell_h", "cell height", OFFSET2(cell_h), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "nb_classes", "The number of class", OFFSET2(nb_classes), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, @@ -84,6 +86,14 @@ static const AVOption dnn_detect_options[] = { AVFILTER_DEFINE_CLASS(dnn_detect); +static inline float sigmoid(float x) { + return 1.f / (1.f + exp(-x)); +} + +static inline float linear(float x) { + return x; +} + static int dnn_detect_get_label_id(int nb_classes, int cell_size, float *label_data) { float max_prob = 0; @@ -142,6 +152,8 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out float *output_data = output[output_index].data; float *anchors = ctx->anchors; AVDetectionBBox *bbox; + float (*post_process_raw_data)(float x); + int is_NHWC = 0; if (ctx->model_type == DDMT_YOLOV1V2) { cell_w = ctx->cell_w; @@ -149,13 +161,30 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out scale_w = cell_w; scale_h = cell_h; } else { - cell_w = output[output_index].width; - cell_h = output[output_index].height; + if (output[output_index].height != output[output_index].width && + output[output_index].height == output[output_index].channels) { + is_NHWC = 1; + cell_w = output[output_index].height; + cell_h = output[output_index].channels; + } else { + cell_w = output[output_index].width; + cell_h = output[output_index].height; + } scale_w = ctx->scale_width; scale_h = ctx->scale_height; } box_size = nb_classes + 5; + switch (ctx->model_type) { + case DDMT_YOLOV1V2: + case DDMT_YOLOV3: + post_process_raw_data = linear; + break; + case DDMT_YOLOV4: + post_process_raw_data = sigmoid; + break; + } + if (!cell_h || !cell_w) { av_log(filter_ctx, AV_LOG_ERROR, "cell_w and cell_h are detected\n"); return AVERROR(EINVAL); @@ -193,19 +222,36 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out float *detection_boxes_data; int label_id; - detection_boxes_data = output_data + box_id * box_size * cell_w * cell_h; - conf = detection_boxes_data[cy * cell_w + cx + 4 * cell_w * cell_h]; + if (is_NHWC) { + detection_boxes_data = output_data + + ((cy * cell_w + cx) * detection_boxes + box_id) * box_size; + conf = post_process_raw_data(detection_boxes_data[4]); + } else { + detection_boxes_data = output_data + box_id * box_size * cell_w * cell_h; + conf = post_process_raw_data( + detection_boxes_data[cy * cell_w + cx + 4 * cell_w * cell_h]); + } if (conf < conf_threshold) { continue; } - x = detection_boxes_data[cy * cell_w + cx]; - y = detection_boxes_data[cy * cell_w + cx + cell_w * cell_h]; - w = detection_boxes_data[cy * cell_w + cx + 2 * cell_w * cell_h]; - h = detection_boxes_data[cy * cell_w + cx + 3 * cell_w * cell_h]; - label_id = dnn_detect_get_label_id(ctx->nb_classes, cell_w * cell_h, - detection_boxes_data + cy * cell_w + cx + 5 * cell_w * cell_h); - conf = conf * detection_boxes_data[cy * cell_w + cx + (label_id + 5) * cell_w * cell_h]; + if (is_NHWC) { + x = post_process_raw_data(detection_boxes_data[0]); + y = post_process_raw_data(detection_boxes_data[1]); + w = detection_boxes_data[2]; + h = detection_boxes_data[3]; + label_id = dnn_detect_get_label_id(ctx->nb_classes, 1, detection_boxes_data + 5); + conf = conf * post_process_raw_data(detection_boxes_data[label_id + 5]); + } else { + x = post_process_raw_data(detection_boxes_data[cy * cell_w + cx]); + y = post_process_raw_data(detection_boxes_data[cy * cell_w + cx + cell_w * cell_h]); + w = detection_boxes_data[cy * cell_w + cx + 2 * cell_w * cell_h]; + h = detection_boxes_data[cy * cell_w + cx + 3 * cell_w * cell_h]; + label_id = dnn_detect_get_label_id(ctx->nb_classes, cell_w * cell_h, + detection_boxes_data + cy * cell_w + cx + 5 * cell_w * cell_h); + conf = conf * post_process_raw_data( + detection_boxes_data[cy * cell_w + cx + (label_id + 5) * cell_w * cell_h]); + } bbox = av_mallocz(sizeof(*bbox)); if (!bbox) @@ -404,6 +450,7 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outpu if (ret < 0) return ret; case DDMT_YOLOV3: + case DDMT_YOLOV4: ret = dnn_detect_post_proc_yolov3(frame, output, filter_ctx, nb_outputs); if (ret < 0) return ret;