From patchwork Tue Dec 12 02:33:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 45084 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp3334401pzf; Mon, 11 Dec 2023 18:34:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IGNczcpV9EaFTsHdEyqulasoKrJZiahXmh/pi1lvYNyvO1OIS0lmeY7ZkhVGlnTzVm7I9UX X-Received: by 2002:a50:a447:0:b0:54d:7224:bd13 with SMTP id v7-20020a50a447000000b0054d7224bd13mr6056925edb.2.1702348473415; Mon, 11 Dec 2023 18:34:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702348473; cv=none; d=google.com; s=arc-20160816; b=eVn168FgukCD47oXcSIFDMowZx5pKsN+jIMJq93HdDMRdLB8pydWBpZbyVK1xqBx7S wx1rVEvYIUH0cKyvuJEwqOTHFlUEfm3uEyXzVjLOKrWwetX0MOzaUbLNap/Pwu7KiPUE 0G3UoE4twcn/ZNqvEEzKlQWdRRnncHJxZ5iY2vcSn0lc3FC8somu9vNYd+8+eBgrzy51 nS4sAm7AMJArBQ3aUYjJi3ZPhhHGZyhAynbws7GZGdNDgKZbDacD1Vc4/zRcSceplL3/ fkiB1rVO4+C98gtMmWkgS3r7/E3AF4z63VwoWDFT5PmirS4CJXTUQrd6eplaa/D4yODc qFwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=e+6Qk6rvQTroXodYi1GYjG/uS7ynSHxnaVeH7WySlmw=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=R9Bv/PML9Bo9fhjbsKtVn/VHIFIHb0zKLwshp7B2dPZ6it/uxtRc9eoi7TGVWQElFK eiDhOU1pRX13/KHf2ZX3v5wtl+SPRyPxLtSB1foUKK1/wplCUKvsanVaD5N+4weD/clz LkdLxRcUAr1P290+NJbiBh1gOV1wQvXJTigzxvVFgjPzDMUFd8ipAwuYpJltcAEZgG8X 4lVQHxyMvF+EJUyj/WwDCrzyf+bD3jh36hLzu+wSFWzF/W8TB58I3/loVd0TVRyIdos4 sf/Yf6o+fvoPEhLLEMfG5B4Eg3E6gGJKZlsV9ddqdeWMfMehkZ+1OMK+6HCO83JbrgDZ AkYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=ON+lz9ZO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h25-20020a50cdd9000000b0054dc00457d9si3915372edj.549.2023.12.11.18.34.32; Mon, 11 Dec 2023 18:34:33 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=ON+lz9ZO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3F90368D258; Tue, 12 Dec 2023 04:34:09 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 07E1368D24E for ; Tue, 12 Dec 2023 04:34:00 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702348446; x=1733884446; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=Z74JKBvWp+S5Y6Qe6neSKIcyb2IFotjVhTFu1YPhHjo=; b=ON+lz9ZOl3flgytNbeN4KJMBGe0g8nHzEPiOtlvQzy9xDAbhAyuoKr31 68MOHzH0ghESbsgdV46mAXoat0tD4XImp4cTqW3YA/OxuMwjR2fGcjt5X vOifmOshvCHB3HJbpGxO3QdvfNiqWE53pXOR5JNlv3XckHFNwqKv4ouOQ ribXTWZlP98FcYiKAQyB4SdLxMPce+gJk2oOlktL5b4zVNUsjyJnOYCOS /Jk6MbM78lmiDblt72CYVHtAlgM4dsiSTYmqDb0qPpt5DsbzLZ7m3aFc3 wEGD1+GY5Ul2c2EcwaZPkkqvUSD6xENpFpT0c+J7w53r1x36fjyf9Ywyt Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="379738943" X-IronPort-AV: E=Sophos;i="6.04,269,1695711600"; d="scan'208";a="379738943" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 18:33:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="1020503546" X-IronPort-AV: E=Sophos;i="6.04,269,1695711600"; d="scan'208";a="1020503546" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga006.fm.intel.com with ESMTP; 11 Dec 2023 18:33:38 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Tue, 12 Dec 2023 10:33:33 +0800 Message-Id: <20231212023334.2506376-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231212023334.2506376-1-wenbin.chen@intel.com> References: <20231212023334.2506376-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/4] libavfilter/vf_dnn_detect: Add yolov3 support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CqaRWIslh2gy From: Wenbin Chen Add yolov3 support. The difference of yolov3 is that it has multiple outputs in different scale to perform better on both large and small object. The model detail refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tf Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 86f61c9907..7a32b191c3 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -35,6 +35,7 @@ typedef enum { DDMT_SSD, DDMT_YOLOV1V2, + DDMT_YOLOV3 } DNNDetectionModelType; typedef struct DnnDetectContext { @@ -73,6 +74,7 @@ static const AVOption dnn_detect_options[] = { { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { "yolo", "output shape [1, N*Cx*Cy*DetectionBox]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV1V2 }, 0, 0, FLAGS, "model_type" }, + { "yolov3", "outputs shape [1, N*D, Cx, Cy]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV3 }, 0, 0, FLAGS, "model_type" }, { "cell_w", "cell width", OFFSET2(cell_w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "cell_h", "cell height", OFFSET2(cell_h), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "nb_classes", "The number of class", OFFSET2(nb_classes), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, @@ -146,6 +148,11 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out cell_h = ctx->cell_h; scale_w = cell_w; scale_h = cell_h; + } else { + cell_w = output[output_index].width; + cell_h = output[output_index].height; + scale_w = ctx->scale_width; + scale_h = ctx->scale_height; } box_size = nb_classes + 5; @@ -173,6 +180,7 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out output[output_index].height * output[output_index].width / box_size / cell_w / cell_h; + anchors = anchors + (detection_boxes * output_index * 2); /** * find all candidate bbox * yolo output can be reshaped to [B, N*D, Cx, Cy] @@ -284,6 +292,21 @@ static int dnn_detect_post_proc_yolo(AVFrame *frame, DNNData *output, AVFilterCo return 0; } +static int dnn_detect_post_proc_yolov3(AVFrame *frame, DNNData *output, + AVFilterContext *filter_ctx, int nb_outputs) +{ + int ret = 0; + for (int i = 0; i < nb_outputs; i++) { + ret = dnn_detect_parse_yolo_output(frame, output, i, filter_ctx); + if (ret < 0) + return ret; + } + ret = dnn_detect_fill_side_data(frame, filter_ctx); + if (ret < 0) + return ret; + return 0; +} + static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; @@ -380,8 +403,11 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outpu ret = dnn_detect_post_proc_yolo(frame, output, filter_ctx); if (ret < 0) return ret; + case DDMT_YOLOV3: + ret = dnn_detect_post_proc_yolov3(frame, output, filter_ctx, nb_outputs); + if (ret < 0) + return ret; } - return 0; }