From patchwork Mon Dec 4 05:36:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44896 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a301:b0:181:818d:5e7f with SMTP id x1csp2566505pzk; Sun, 3 Dec 2023 21:37:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3BTfoMhKy7VOkbyyxqoVwATLiIJ11YvW/vz+qtD7nHji6Z8gI+ajBt7n7ETenxFJVs3Ot X-Received: by 2002:a2e:155b:0:b0:2c9:f509:d81e with SMTP id 27-20020a2e155b000000b002c9f509d81emr2239448ljv.0.1701668227159; Sun, 03 Dec 2023 21:37:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701668227; cv=none; d=google.com; s=arc-20160816; b=CUyaDgCc9cAQ/tAXUUnHMVdWszzrGYkhuCGt/dw70IFk1gKBPDyN0JAW/rLz9LS4M7 FqZWZxn7DNDg3rPhmOypa7d5Pfal3Qhf05enJvmyFM7Fx3dVR++YHl8M9x5x3HaDvsCf l3l70+5eEEca7DqJGl5mX45ChiOjQH7eN1ALRupeNpQA0Zc4zFsKkFu12aXYMov3+CPi rtlTqs2gWV2AhV4sLNmNrETJDCT/0qXsQ8O6JFqE/EM1rTt6o4sMlpl1N6FZRp+DxvMC adVlOGu1aASOn6GWr11tBBPFL3V+rYS71VpG63qkd1gFUKbe3xk1JT1g23jI8JAuYjRm VKhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=e+6Qk6rvQTroXodYi1GYjG/uS7ynSHxnaVeH7WySlmw=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=rLV8p0uVnvID0AayqDkvb8pDOJa7ngbHuId1RSPgNBGJwp88IJ0imPBB3zRJyTQfvt pQkroWiQQUZl5UeeFh815OyGcup/yQ5uH117n0AlYN6OKMcNYEuzZUrFHL05gJRtbLdq 3hD810H0q7ySK+3hTFobnjI2QVQ9yJSWWlxivUXnRMa+CBDeEbKdUNx8C0DHNXPegqnY wfI6zSRKL7JMQ/w8ofZyx45iwlcPArzLFdE6bDZfovVudY5pd91wIVntNRBBXkAj3mmx bsqJ2IPxDkVdGRLXzvg/okZRMQPlR6hD1f7uBX+bcM7y2TG9cRwQcwXhQg3pP+Cc1U3g 66Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=PBE3ccVm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id o26-20020a1709061b1a00b009a18d9d3fa7si4010914ejg.669.2023.12.03.21.37.06; Sun, 03 Dec 2023 21:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=PBE3ccVm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F2F7C68CEB9; Mon, 4 Dec 2023 07:36:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 89D6968CDE0 for ; Mon, 4 Dec 2023 07:36:44 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701668209; x=1733204209; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=Z74JKBvWp+S5Y6Qe6neSKIcyb2IFotjVhTFu1YPhHjo=; b=PBE3ccVmuMaUmDTEbuZJK9i/rHl1veiH+cGZ+KYFa9V46LJxPNjFIKuh aMGXtNMfYI1oCATejdC9N6pMPLJo+ZsHEiAAIsnoOzsCX0vxGe0DMz2Hx W4hV2mu+l+0Efa2AFFOS0BJnEC/R78zT0VviXnQU1Iy2quA5INRHlAJbO rjN7fbS1SgmeaK2aVzYRIomQJnMfYWkh1+4yfbpLoen4klgefe/r+4Htn 1dEawQLZnwm2O6Um/ig2Vxe4Tb0hTGrXGeKT3SKuLclmzj0rVub/9Kpg+ /WStz/0ULYrnsh8DF6SbT/SVB6mAng75p9uwMilgP9M8LFp7VolDW8IRe g==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="397574029" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="397574029" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 21:36:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="914308728" X-IronPort-AV: E=Sophos;i="6.04,249,1695711600"; d="scan'208";a="914308728" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga001.fm.intel.com with ESMTP; 03 Dec 2023 21:36:36 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Mon, 4 Dec 2023 13:36:32 +0800 Message-Id: <20231204053633.1743228-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231204053633.1743228-1-wenbin.chen@intel.com> References: <20231204053633.1743228-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] libavfilter/vf_dnn_detect: Add yolov3 support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: wCdZW0hy1yBe From: Wenbin Chen Add yolov3 support. The difference of yolov3 is that it has multiple outputs in different scale to perform better on both large and small object. The model detail refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tf Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 86f61c9907..7a32b191c3 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -35,6 +35,7 @@ typedef enum { DDMT_SSD, DDMT_YOLOV1V2, + DDMT_YOLOV3 } DNNDetectionModelType; typedef struct DnnDetectContext { @@ -73,6 +74,7 @@ static const AVOption dnn_detect_options[] = { { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { "yolo", "output shape [1, N*Cx*Cy*DetectionBox]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV1V2 }, 0, 0, FLAGS, "model_type" }, + { "yolov3", "outputs shape [1, N*D, Cx, Cy]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV3 }, 0, 0, FLAGS, "model_type" }, { "cell_w", "cell width", OFFSET2(cell_w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "cell_h", "cell height", OFFSET2(cell_h), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "nb_classes", "The number of class", OFFSET2(nb_classes), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, @@ -146,6 +148,11 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out cell_h = ctx->cell_h; scale_w = cell_w; scale_h = cell_h; + } else { + cell_w = output[output_index].width; + cell_h = output[output_index].height; + scale_w = ctx->scale_width; + scale_h = ctx->scale_height; } box_size = nb_classes + 5; @@ -173,6 +180,7 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out output[output_index].height * output[output_index].width / box_size / cell_w / cell_h; + anchors = anchors + (detection_boxes * output_index * 2); /** * find all candidate bbox * yolo output can be reshaped to [B, N*D, Cx, Cy] @@ -284,6 +292,21 @@ static int dnn_detect_post_proc_yolo(AVFrame *frame, DNNData *output, AVFilterCo return 0; } +static int dnn_detect_post_proc_yolov3(AVFrame *frame, DNNData *output, + AVFilterContext *filter_ctx, int nb_outputs) +{ + int ret = 0; + for (int i = 0; i < nb_outputs; i++) { + ret = dnn_detect_parse_yolo_output(frame, output, i, filter_ctx); + if (ret < 0) + return ret; + } + ret = dnn_detect_fill_side_data(frame, filter_ctx); + if (ret < 0) + return ret; + return 0; +} + static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; @@ -380,8 +403,11 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outpu ret = dnn_detect_post_proc_yolo(frame, output, filter_ctx); if (ret < 0) return ret; + case DDMT_YOLOV3: + ret = dnn_detect_post_proc_yolov3(frame, output, filter_ctx, nb_outputs); + if (ret < 0) + return ret; } - return 0; }