From patchwork Tue Nov 21 02:20:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 44735 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:8c2a:b0:181:818d:5e7f with SMTP id j42csp271953pzh; Mon, 20 Nov 2023 18:20:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IEzP9p0eJOmBpnXRCbGDS1lDwvYeMsJr9yhqWAQg4NGuCjCttJF33jeHUXPobgZKE+o2p4Y X-Received: by 2002:a17:906:b813:b0:9ff:a532:b122 with SMTP id dv19-20020a170906b81300b009ffa532b122mr2596831ejb.7.1700533234396; Mon, 20 Nov 2023 18:20:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700533234; cv=none; d=google.com; s=arc-20160816; b=Y9iU6stkY2BnzbPZEFQpVVgnS9Epk2aBydyxa0wTqBO95w2Df25pASpNn2qP8a5191 R0eVt9F0CyXk2H5WKOT100zAxx3dywlmdTRLK4dEGk15PaRsfQG3NA0//aqwdhuoJeKW 4MB9NKsqIOF7hCYpjtrZyqDYY5EQ3wViJByNqStQlRckSGKthO38s4SCvtc291o4lRQf S+J5lcwpGZVJ4PAs0kcWWZlWrVns7HwUfHY33RWTjK7/UMDhp2MlObygZqRLUSoKje9s actvQ0cQaCz0Q5WRgpii4FmO5NP3YLDqNilQBdBmKoggcU8AQdlCkhawX5NVgZb4et16 n/wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=c+0I2irAONGAZmI+sOJFcDyoMHn6tSgzR+SwXFRDKE0=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=kRu65b4vbwkt9yjvRGgSZY/HSPd1McCRjG5spJvyDTHo+2yp3ODBmqL57W8DPo7ZAs wW7IfO8UXGINvYwTIUzUWvv4O3Lm04lwrEX25D4TYojNHu7MQ3Jwtzog4XsAKGs5fPP7 iz5at5pvwwSxJRMCbVCBv42i+90ZNYP+1zoPE+J6W2EhYPaogtT0G1e+KcokxrE78sZf Gre/RFT2FH9WGD5lcbybX6V/En6o9tYEqkOiHbFvlqsw2hOL4EbFoFHXX0O7JdKWsUJq cToUXMvSW5Cjq0uccpdNXlM8O6CzcSd/i9oMqn7dTtfjp/XlZ60eQwiHvwuInfPwaxwn JNOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=aPw1bu8b; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id p11-20020a1709060e8b00b009e2a0aff6f2si4965122ejf.1022.2023.11.20.18.20.33; Mon, 20 Nov 2023 18:20:34 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=aPw1bu8b; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A01B768CBF0; Tue, 21 Nov 2023 04:20:30 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF22168C8C8 for ; Tue, 21 Nov 2023 04:20:22 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700533228; x=1732069228; h=from:to:subject:date:message-id:mime-version: content-transfer-encoding; bh=oclU/1rotUckeavEb8ig6pzoF5igc2VRdkbezGrOy5I=; b=aPw1bu8bM5FLNQUBeiX6wx0+J0Ld/Wosf3jIesJlvf2mNZBCBi/GLCWB jHcYSVdbV8Bxq8yUcPlJCCeR2ElbI0CzNqo96QsExLsMT7uqews7wcWXL gukN10vDeK4s05SUao48tQbjT0CZDLEwJuFi8vbAPzKUV21aCcIFpsgYb vWkGS/Iy4Numf3I4P+g6KuKSe6yLpTikls71z9vZwJ1H//pMMLZSMZdH3 Zu6YDDTsLTh9udMAbzm7PHMWdJWiX907peyk597zyl/PzDS2rRK2uwE3x 7DtrheWDY31lRZszSGQLpupyLeSlxzl+7G+p4ylxh4T6HIE+4keIOeR63 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="371920348" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="371920348" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:20:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="7749793" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by orviesa002.jf.intel.com with ESMTP; 20 Nov 2023 18:20:19 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Tue, 21 Nov 2023 10:20:17 +0800 Message-Id: <20231121022018.285533-1-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] libavfilter/vf_dnn_detect: Add model_type option. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: b8qsOIyfeklR From: Wenbin Chen There are many kinds of detection DNN model and they have different preprocess and postprocess methods. To support more models, "model_type" option is added to help to choose preprocess and postprocess function. Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 42 ++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index b5dae42c65..9db90ee4cf 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -31,6 +31,10 @@ #include "libavutil/avstring.h" #include "libavutil/detection_bbox.h" +typedef enum { + DDMT_SSD +} DNNDetectionModelType; + typedef struct DnnDetectContext { const AVClass *class; DnnContext dnnctx; @@ -38,6 +42,7 @@ typedef struct DnnDetectContext { char *labels_filename; char **labels; int label_count; + DNNDetectionModelType model_type; } DnnDetectContext; #define OFFSET(x) offsetof(DnnDetectContext, dnnctx.x) @@ -54,12 +59,14 @@ static const AVOption dnn_detect_options[] = { DNN_COMMON_OPTIONS { "confidence", "threshold of confidence", OFFSET2(confidence), AV_OPT_TYPE_FLOAT, { .dbl = 0.5 }, 0, 1, FLAGS}, { "labels", "path to labels file", OFFSET2(labels_filename), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, FLAGS }, + { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, + { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { NULL } }; AVFILTER_DEFINE_CLASS(dnn_detect); -static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; float conf_threshold = ctx->confidence; @@ -67,14 +74,12 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterCont int detect_size = output->width; float *detections = output->data; int nb_bboxes = 0; - AVFrameSideData *sd; - AVDetectionBBox *bbox; AVDetectionBBoxHeader *header; + AVDetectionBBox *bbox; - sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES); - if (sd) { - av_log(filter_ctx, AV_LOG_ERROR, "already have bounding boxes in side data.\n"); - return -1; + if (output->width != 7) { + av_log(filter_ctx, AV_LOG_ERROR, "Model output shape doesn't match ssd requirement.\n"); + return AVERROR(EINVAL); } for (int i = 0; i < proposal_count; ++i) { @@ -135,6 +140,29 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterCont return 0; } +static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +{ + AVFrameSideData *sd; + DnnDetectContext *ctx = filter_ctx->priv; + int ret = 0; + + sd = av_frame_get_side_data(frame, AV_FRAME_DATA_DETECTION_BBOXES); + if (sd) { + av_log(filter_ctx, AV_LOG_ERROR, "already have bounding boxes in side data.\n"); + return -1; + } + + switch (ctx->model_type) { + case DDMT_SSD: + ret = dnn_detect_post_proc_ssd(frame, output, filter_ctx); + if (ret < 0) + return ret; + break; + } + + return 0; +} + static int dnn_detect_post_proc_tf(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv;