From patchwork Tue Nov 21 02:20:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44735 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:8c2a:b0:181:818d:5e7f with SMTP id j42csp271953pzh; Mon, 20 Nov 2023 18:20:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IEzP9p0eJOmBpnXRCbGDS1lDwvYeMsJr9yhqWAQg4NGuCjCttJF33jeHUXPobgZKE+o2p4Y X-Received: by 2002:a17:906:b813:b0:9ff:a532:b122 with SMTP id dv19-20020a170906b81300b009ffa532b122mr2596831ejb.7.1700533234396; Mon, 20 Nov 2023 18:20:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700533234; cv=none; d=google.com; s=arc-20160816; b=Y9iU6stkY2BnzbPZEFQpVVgnS9Epk2aBydyxa0wTqBO95w2Df25pASpNn2qP8a5191 R0eVt9F0CyXk2H5WKOT100zAxx3dywlmdTRLK4dEGk15PaRsfQG3NA0//aqwdhuoJeKW 4MB9NKsqIOF7hCYpjtrZyqDYY5EQ3wViJByNqStQlRckSGKthO38s4SCvtc291o4lRQf S+J5lcwpGZVJ4PAs0kcWWZlWrVns7HwUfHY33RWTjK7/UMDhp2MlObygZqRLUSoKje9s actvQ0cQaCz0Q5WRgpii4FmO5NP3YLDqNilQBdBmKoggcU8AQdlCkhawX5NVgZb4et16 n/wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=c+0I2irAONGAZmI+sOJFcDyoMHn6tSgzR+SwXFRDKE0=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=kRu65b4vbwkt9yjvRGgSZY/HSPd1McCRjG5spJvyDTHo+2yp3ODBmqL57W8DPo7ZAs wW7IfO8UXGINvYwTIUzUWvv4O3Lm04lwrEX25D4TYojNHu7MQ3Jwtzog4XsAKGs5fPP7 iz5at5pvwwSxJRMCbVCBv42i+90ZNYP+1zoPE+J6W2EhYPaogtT0G1e+KcokxrE78sZf Gre/RFT2FH9WGD5lcbybX6V/En6o9tYEqkOiHbFvlqsw2hOL4EbFoFHXX0O7JdKWsUJq cToUXMvSW5Cjq0uccpdNXlM8O6CzcSd/i9oMqn7dTtfjp/XlZ60eQwiHvwuInfPwaxwn JNOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=aPw1bu8b; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id p11-20020a1709060e8b00b009e2a0aff6f2si4965122ejf.1022.2023.11.20.18.20.33; Mon, 20 Nov 2023 18:20:34 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=aPw1bu8b; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A01B768CBF0; Tue, 21 Nov 2023 04:20:30 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF22168C8C8 for ; Tue, 21 Nov 2023 04:20:22 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700533228; x=1732069228; h=from:to:subject:date:message-id:mime-version: content-transfer-encoding; bh=oclU/1rotUckeavEb8ig6pzoF5igc2VRdkbezGrOy5I=; b=aPw1bu8bM5FLNQUBeiX6wx0+J0Ld/Wosf3jIesJlvf2mNZBCBi/GLCWB jHcYSVdbV8Bxq8yUcPlJCCeR2ElbI0CzNqo96QsExLsMT7uqews7wcWXL gukN10vDeK4s05SUao48tQbjT0CZDLEwJuFi8vbAPzKUV21aCcIFpsgYb vWkGS/Iy4Numf3I4P+g6KuKSe6yLpTikls71z9vZwJ1H//pMMLZSMZdH3 Zu6YDDTsLTh9udMAbzm7PHMWdJWiX907peyk597zyl/PzDS2rRK2uwE3x 7DtrheWDY31lRZszSGQLpupyLeSlxzl+7G+p4ylxh4T6HIE+4keIOeR63 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="371920348" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="371920348" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:20:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="7749793" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by orviesa002.jf.intel.com with ESMTP; 20 Nov 2023 18:20:19 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Tue, 21 Nov 2023 10:20:17 +0800 Message-Id: <20231121022018.285533-1-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] libavfilter/vf_dnn_detect: Add model_type option. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: b8qsOIyfeklR From: Wenbin Chen There are many kinds of detection DNN model and they have different preprocess and postprocess methods. To support more models, "model_type" option is added to help to choose preprocess and postprocess function. Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 42 ++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index b5dae42c65..9db90ee4cf 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -31,6 +31,10 @@ #include "libavutil/avstring.h" #include "libavutil/detection_bbox.h" +typedef enum { + DDMT_SSD +} DNNDetectionModelType; + typedef struct DnnDetectContext { const AVClass *class; DnnContext dnnctx; @@ -38,6 +42,7 @@ typedef struct DnnDetectContext { char *labels_filename; char **labels; int label_count; + DNNDetectionModelType model_type; } DnnDetectContext; #define OFFSET(x) offsetof(DnnDetectContext, dnnctx.x) @@ -54,12 +59,14 @@ static const AVOption dnn_detect_options[] = { DNN_COMMON_OPTIONS { "confidence", "threshold of confidence", OFFSET2(confidence), AV_OPT_TYPE_FLOAT, { .dbl = 0.5 }, 0, 1, FLAGS}, { "labels", "path to labels file", OFFSET2(labels_filename), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, FLAGS }, + { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, + { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { NULL } }; AVFILTER_DEFINE_CLASS(dnn_detect); -static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; float conf_threshold = ctx->confidence; @@ -67,14 +74,12 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterCont int detect_size = output->width; float *detections = output->data; int nb_bboxes = 0; - AVFrameSideData *sd; - AVDetectionBBox *bbox; AVDetectionBBoxHeader *header; + AVDetectionBBox *bbox; - sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES); - if (sd) { - av_log(filter_ctx, AV_LOG_ERROR, "already have bounding boxes in side data.\n"); - return -1; + if (output->width != 7) { + av_log(filter_ctx, AV_LOG_ERROR, "Model output shape doesn't match ssd requirement.\n"); + return AVERROR(EINVAL); } for (int i = 0; i < proposal_count; ++i) { @@ -135,6 +140,29 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterCont return 0; } +static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +{ + AVFrameSideData *sd; + DnnDetectContext *ctx = filter_ctx->priv; + int ret = 0; + + sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES); + if (sd) { + av_log(filter_ctx, AV_LOG_ERROR, "already have bounding boxes in side data.\n"); + return -1; + } + + switch (ctx->model_type) { + case DDMT_SSD: + ret = dnn_detect_post_proc_ssd(frame, output, filter_ctx); + if (ret < 0) + return ret; + break; + } + + return 0; +} + static int dnn_detect_post_proc_tf(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; From patchwork Tue Nov 21 02:20:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44736 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:8c2a:b0:181:818d:5e7f with SMTP id j42csp272016pzh; Mon, 20 Nov 2023 18:20:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IFrZL+4JN9GZpXg86l/U+EdjlRlRkQbxAJdzLsPWOnEZfzPUjVnfQ0r2NzLBMvIxvmHw6Ur X-Received: by 2002:a2e:a7cc:0:b0:2c2:c387:7bd3 with SMTP id x12-20020a2ea7cc000000b002c2c3877bd3mr7405735ljp.0.1700533244896; Mon, 20 Nov 2023 18:20:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700533244; cv=none; d=google.com; s=arc-20160816; b=f9DlbLj+NVj4szzMDmZuAAxoHHLP5qC1tmJRClPs0xut9+JyLGKaVnLgB6OPrsHkTJ HTrno7z3LTyCuGUMVjd8f3qq1Qt7ZjFXC57yF0AlGsoOEMQCf3ENsSgX+oCYj6Iiq6fM vghc98WlAC7iPzY7z+SlxcntqixCNY5F4Cn1OacDKvQ3d7TXuJQ4pPwHGvlRV1W4C6ga azfvBbRy9ckjRD30oAFhPVzaWEaWD6AkCZq1vVuAMll1374A5YYmpREYWTh49haWgQSL AIzgwlIKNOzHYgaWbHpfHC2SDIjmuKzLdD69+F2U+DEvxRZYyjpnGdJJ8y3WoxTAR+bt AS3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=Zgso9t2JRTObRYPWzLhpxI+FvrFaVQti24kgIuVQ8nA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=K/rHSIJGEwwgz1OzC/8C2w8dxkc10jVDE0b1h29CNTcHzcac/2cQRmCr8CbU/cOBIL XDWMUskYwsDjKcYIBKoAdUuzKvabEn5Vt5vXqL10ZjOWVFx1BJhCBD+ZQ03z0b9aREkX xsK/CrFz6dD7BTXmrZX6k2Hqkkr9mOJsNmMhdMqTyNxIweumb1URJbHj8JOZQae6SeyZ mkTHYZUADVCL2NDNKYjrfCUPp9af3gDukFDPWnSoL5U+jZFOPpSZbftkeD0rrAwC9HDU MDyeX2dAtvTI7ijGB8m3/2wHTROx+NbdD3LgboUor1VX4/UxzPizx47igpaq5qWxTDra 2MGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=brS4UKWR; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id q20-20020a1709064cd400b009dd8bc877a3si5247403ejt.422.2023.11.20.18.20.43; Mon, 20 Nov 2023 18:20:44 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=brS4UKWR; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EE91468CC51; Tue, 21 Nov 2023 04:20:37 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0E00068CA67 for ; Tue, 21 Nov 2023 04:20:28 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700533235; x=1732069235; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=3aQzzsseX5u8e+5kF+XgfMndMP52Nj5thcxquoHTuA0=; b=brS4UKWRyS+U2pNdyaVazUI7UQ2XmRcFVaZHnXnOWIPeA7vqVrDXoHlz o7JTvChUI0VigvG6x6pbbU2z3eir0d1ClQe6PxTsOgcjGHdv1XkIX/K0z dR3tWZoc2IrOkSFh3gzxTYrbHQr5ju2RaMB5szitXoOTu8d+uJ/n8NXdN ZZIZFJywj5JTP2ytuHhaa31YO5H7wif7z2kewOdLYRwNw2c1Wa/MR0iJQ wXsPX7Xh1xLxnOVs0tgWOk1zpBPRijF4hl+jqa58lWnCQd+NBsyUYARaS ZC98ErgAYiJoFxMISnELrmaNaxqUtO9fIa6nnK9hvpNQjnJohG2X5rVxR A==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="371920349" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="371920349" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:20:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="7749799" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by orviesa002.jf.intel.com with ESMTP; 20 Nov 2023 18:20:20 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Tue, 21 Nov 2023 10:20:18 +0800 Message-Id: <20231121022018.285533-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121022018.285533-1-wenbin.chen@intel.com> References: <20231121022018.285533-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] libavfilter/vf_dnn_detect: Add yolo support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8JWcOln1EE6z From: Wenbin Chen Add yolo support. Yolo model doesn't output final result. It outputs candidate boxes, so we need post-process to remove overlap boxes to get final results. Also, the box's coordinators relate to cell and anchors, so we need these information to calculate boxes as well. Model detail please refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v2-tf Signed-off-by: Wenbin Chen --- libavfilter/dnn/dnn_backend_openvino.c | 6 +- libavfilter/vf_dnn_detect.c | 242 ++++++++++++++++++++++++- 2 files changed, 244 insertions(+), 4 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index d3af8c34ce..6fe8b9c243 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -386,9 +386,9 @@ static void infer_completion_callback(void *args) ov_shape_free(&output_shape); return; } - output.channels = dims[1]; - output.height = dims[2]; - output.width = dims[3]; + output.channels = output_shape.rank > 2 ? dims[output_shape.rank - 3] : 1; + output.height = output_shape.rank > 1 ? dims[output_shape.rank - 2] : 1; + output.width = output_shape.rank > 0 ? dims[output_shape.rank - 1] : 1; av_assert0(request->lltask_count <= dims[0]); ov_shape_free(&output_shape); #else diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 9db90ee4cf..7ac3bb0b58 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -30,9 +30,11 @@ #include "libavutil/time.h" #include "libavutil/avstring.h" #include "libavutil/detection_bbox.h" +#include "libavutil/fifo.h" typedef enum { - DDMT_SSD + DDMT_SSD, + DDMT_YOLOV1V2, } DNNDetectionModelType; typedef struct DnnDetectContext { @@ -43,6 +45,15 @@ typedef struct DnnDetectContext { char **labels; int label_count; DNNDetectionModelType model_type; + int cell_w; + int cell_h; + int nb_classes; + AVFifo *bboxes_fifo; + int scale_width; + int scale_height; + char *anchors_str; + float *anchors; + int nb_anchor; } DnnDetectContext; #define OFFSET(x) offsetof(DnnDetectContext, dnnctx.x) @@ -61,11 +72,218 @@ static const AVOption dnn_detect_options[] = { { "labels", "path to labels file", OFFSET2(labels_filename), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, FLAGS }, { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, + { "yolo", "output shape [1, N*Cx*Cy*DetectionBox]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV1V2 }, 0, 0, FLAGS, "model_type" }, + { "cell_w", "cell width", OFFSET2(cell_w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, + { "cell_h", "cell height", OFFSET2(cell_h), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, + { "nb_classes", "The number of class", OFFSET2(nb_classes), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, + { "anchors", "anchors, splited by '&'", OFFSET2(anchors_str), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, FLAGS }, { NULL } }; AVFILTER_DEFINE_CLASS(dnn_detect); +static int dnn_detect_get_label_id(int nb_classes, int cell_size, float *label_data) +{ + float max_prob = 0; + int label_id = 0; + for (int i = 0; i < nb_classes; i++) { + if (label_data[i * cell_size] > max_prob) { + max_prob = label_data[i * cell_size]; + label_id = i; + } + } + return label_id; +} + +static int dnn_detect_parse_anchors(char *anchors_str, float **anchors) +{ + char *saveptr = NULL, *token; + float *anchors_buf; + int nb_anchor = 0, i = 0; + while(anchors_str[i] != '\0') { + if(anchors_str[i] == '&') + nb_anchor++; + i++; + } + nb_anchor++; + anchors_buf = av_mallocz(nb_anchor * sizeof(*anchors)); + if (!anchors_buf) { + return 0; + } + for (int i = 0; i < nb_anchor; i++) { + token = av_strtok(anchors_str, "&", &saveptr); + anchors_buf[i] = strtof(token, NULL); + anchors_str = NULL; + } + *anchors = anchors_buf; + return nb_anchor; +} + +/* Calculate Intersection Over Union */ +static float dnn_detect_IOU(AVDetectionBBox *bbox1, AVDetectionBBox *bbox2) +{ + float overlapping_width = FFMIN(bbox1->x + bbox1->w, bbox2->x + bbox2->w) - FFMAX(bbox1->x, bbox2->x); + float overlapping_height = FFMIN(bbox1->y + bbox1->h, bbox2->y + bbox2->h) - FFMAX(bbox1->y, bbox2->y); + float intersection_area = + (overlapping_width < 0 || overlapping_height < 0) ? 0 : overlapping_height * overlapping_width; + float union_area = bbox1->w * bbox1->h + bbox2->w * bbox2->h - intersection_area; + return intersection_area / union_area; +} + +static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int output_index, + AVFilterContext *filter_ctx) +{ + DnnDetectContext *ctx = filter_ctx->priv; + float conf_threshold = ctx->confidence; + int detection_boxes, box_size, cell_w, cell_h, scale_w, scale_h; + int nb_classes = ctx->nb_classes; + float *output_data = output[output_index].data; + float *anchors = ctx->anchors; + AVDetectionBBox *bbox; + + if (ctx->model_type == DDMT_YOLOV1V2) { + cell_w = ctx->cell_w; + cell_h = ctx->cell_h; + scale_w = cell_w; + scale_h = cell_h; + } + box_size = nb_classes + 5; + + if (!cell_h || !cell_w) { + av_log(filter_ctx, AV_LOG_ERROR, "cell_w and cell_h are detected\n"); + return AVERROR(EINVAL); + } + + if (!nb_classes) { + av_log(filter_ctx, AV_LOG_ERROR, "nb_classes is not set\n"); + return AVERROR(EINVAL); + } + + if (!anchors) { + av_log(filter_ctx, AV_LOG_ERROR, "anchors is not set\n"); + return AVERROR(EINVAL); + } + + if (output[output_index].channels * output[output_index].width * + output[output_index].height % (box_size * cell_w * cell_h)) { + av_log(filter_ctx, AV_LOG_ERROR, "wrong cell_w, cell_h or nb_classes\n"); + return AVERROR(EINVAL); + } + detection_boxes = output[output_index].channels * + output[output_index].height * + output[output_index].width / box_size / cell_w / cell_h; + + /** + * find all candidate bbox + * yolo output can be reshaped to [B, N*D, Cx, Cy] + * Detection box 'D' has format [`x`, `y`, `h`, `w`, `box_score`, `class_no_1`, ...,] + **/ + for (int box_id = 0; box_id < detection_boxes; box_id++) { + for (int cx = 0; cx < cell_w; cx++) + for (int cy = 0; cy < cell_h; cy++) { + float x, y, w, h, conf; + float *detection_boxes_data; + int label_id; + + detection_boxes_data = output_data + box_id * box_size * cell_w * cell_h; + conf = detection_boxes_data[cy * cell_w + cx + 4 * cell_w * cell_h]; + if (conf < conf_threshold) { + continue; + } + + x = detection_boxes_data[cy * cell_w + cx]; + y = detection_boxes_data[cy * cell_w + cx + cell_w * cell_h]; + w = detection_boxes_data[cy * cell_w + cx + 2 * cell_w * cell_h]; + h = detection_boxes_data[cy * cell_w + cx + 3 * cell_w * cell_h]; + label_id = dnn_detect_get_label_id(ctx->nb_classes, cell_w * cell_h, + detection_boxes_data + cy * cell_w + cx + 5 * cell_w * cell_h); + conf = conf * detection_boxes_data[cy * cell_w + cx + (label_id + 5) * cell_w * cell_h]; + + bbox = av_mallocz(sizeof(*bbox)); + if (!bbox) + return AVERROR(ENOMEM); + + bbox->w = exp(w) * anchors[box_id * 2] * frame->width / scale_w; + bbox->h = exp(h) * anchors[box_id * 2 + 1] * frame->height / scale_h; + bbox->x = (cx + x) / cell_w * frame->width - bbox->w / 2; + bbox->y = (cy + y) / cell_h * frame->height - bbox->h / 2; + bbox->detect_confidence = av_make_q((int)(conf * 10000), 10000); + if (ctx->labels && label_id < ctx->label_count) { + av_strlcpy(bbox->detect_label, ctx->labels[label_id], sizeof(bbox->detect_label)); + } else { + snprintf(bbox->detect_label, sizeof(bbox->detect_label), "%d", label_id); + } + + if (av_fifo_write(ctx->bboxes_fifo, &bbox, 1) < 0) { + av_freep(&bbox); + return AVERROR(ENOMEM); + } + } + } + return 0; +} + +static int dnn_detect_fill_side_data(AVFrame *frame, AVFilterContext *filter_ctx) +{ + DnnDetectContext *ctx = filter_ctx->priv; + float conf_threshold = ctx->confidence; + AVDetectionBBox *bbox; + int nb_bboxes = 0; + AVDetectionBBoxHeader *header; + if (av_fifo_can_read(ctx->bboxes_fifo) == 0) { + av_log(filter_ctx, AV_LOG_VERBOSE, "nothing detected in this frame.\n"); + return 0; + } + + /* remove overlap bboxes */ + for (int i = 0; i < av_fifo_can_read(ctx->bboxes_fifo); i++){ + av_fifo_peek(ctx->bboxes_fifo, &bbox, 1, i); + for (int j = 0; j < av_fifo_can_read(ctx->bboxes_fifo); j++) { + AVDetectionBBox *overlap_bbox; + av_fifo_peek(ctx->bboxes_fifo, &overlap_bbox, 1, j); + if (!strcmp(bbox->detect_label, overlap_bbox->detect_label) && + av_cmp_q(bbox->detect_confidence, overlap_bbox->detect_confidence) < 0 && + dnn_detect_IOU(bbox, overlap_bbox) >= conf_threshold) { + bbox->classify_count = -1; // bad result + nb_bboxes++; + break; + } + } + } + nb_bboxes = av_fifo_can_read(ctx->bboxes_fifo) - nb_bboxes; + header = av_detection_bbox_create_side_data(frame, nb_bboxes); + if (!header) { + av_log(filter_ctx, AV_LOG_ERROR, "failed to create side data with %d bounding boxes\n", nb_bboxes); + return -1; + } + av_strlcpy(header->source, ctx->dnnctx.model_filename, sizeof(header->source)); + + while(av_fifo_can_read(ctx->bboxes_fifo)) { + AVDetectionBBox *candidate_bbox; + av_fifo_read(ctx->bboxes_fifo, &candidate_bbox, 1); + + if (nb_bboxes > 0 && candidate_bbox->classify_count != -1) { + bbox = av_get_detection_bbox(header, header->nb_bboxes - nb_bboxes); + memcpy(bbox, candidate_bbox, sizeof(*bbox)); + nb_bboxes--; + } + av_freep(&candidate_bbox); + } + return 0; +} + +static int dnn_detect_post_proc_yolo(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +{ + int ret = 0; + ret = dnn_detect_parse_yolo_output(frame, output, 0, filter_ctx); + if (ret < 0) + return ret; + ret = dnn_detect_fill_side_data(frame, filter_ctx); + if (ret < 0) + return ret; + return 0; +} + static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; @@ -158,6 +376,10 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterCont if (ret < 0) return ret; break; + case DDMT_YOLOV1V2: + ret = dnn_detect_post_proc_yolo(frame, output, filter_ctx); + if (ret < 0) + return ret; } return 0; @@ -356,11 +578,22 @@ static av_cold int dnn_detect_init(AVFilterContext *context) ret = check_output_nb(ctx, dnn_ctx->backend_type, dnn_ctx->nb_outputs); if (ret < 0) return ret; + ctx->bboxes_fifo = av_fifo_alloc2(1, sizeof(AVDetectionBBox *), AV_FIFO_FLAG_AUTO_GROW); + if (!ctx->bboxes_fifo) + return AVERROR(ENOMEM); ff_dnn_set_detect_post_proc(&ctx->dnnctx, dnn_detect_post_proc); if (ctx->labels_filename) { return read_detect_label_file(context); } + if (ctx->anchors_str) { + ret = dnn_detect_parse_anchors(ctx->anchors_str, &ctx->anchors); + if (!ctx->anchors) { + av_log(context, AV_LOG_ERROR, "failed to parse anchors_str\n"); + return AVERROR(EINVAL); + } + ctx->nb_anchor = ret; + } return 0; } @@ -460,7 +693,14 @@ static int dnn_detect_activate(AVFilterContext *filter_ctx) static av_cold void dnn_detect_uninit(AVFilterContext *context) { DnnDetectContext *ctx = context->priv; + AVDetectionBBox *bbox; ff_dnn_uninit(&ctx->dnnctx); + while(av_fifo_can_read(ctx->bboxes_fifo)) { + av_fifo_read(ctx->bboxes_fifo, &bbox, 1); + av_freep(&bbox); + } + av_fifo_freep2(&ctx->bboxes_fifo); + av_freep(&ctx->anchors); free_detect_labels(ctx); }