From patchwork Tue Apr 2 14:29:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Yejun" X-Patchwork-Id: 12571 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7BE4D4470B6 for ; Tue, 2 Apr 2019 09:40:07 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 61E1268AC55; Tue, 2 Apr 2019 09:40:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 97A5F68ABAD for ; Tue, 2 Apr 2019 09:40:00 +0300 (EEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Apr 2019 23:39:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,298,1549958400"; d="scan'208";a="130694834" Received: from yguo18-skl-u1604.sh.intel.com ([10.239.13.25]) by orsmga008.jf.intel.com with ESMTP; 01 Apr 2019 23:39:57 -0700 From: "Guo, Yejun" To: ffmpeg-devel@ffmpeg.org Date: Tue, 2 Apr 2019 22:29:35 +0800 Message-Id: <1554215375-14103-1-git-send-email-yejun.guo@intel.com> X-Mailer: git-send-email 2.7.4 Subject: [FFmpeg-devel] [PATCH 8/8][RFC] libavfilter/vf_objectdetect: add a filter for object detection X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: yejun.guo@intel.com MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This filter is not finished yet. Currently, for visual effect, it detects objections and draws a box for the detected objections with score > 0.8. The purpose here is to show the previous changes of the dnn is necessary, and it is also a RFC (require for comment) patch for this filter. In my plan, an example under doc/examples will be added to show how to use this filter and connet the filter result with the ROI encoding. And, will add options to this filter for score_threshold, visual effect or not. And, a new type of sidedata will be added to hold the filter result. So, the example can get the filter result and set ROI info for encoder. Looks that, it is not easy to transfer data between filters, between filter and application, between filter and encoders, the current feasible method I found is to by sidedata. Signed-off-by: Guo, Yejun --- configure | 1 + doc/filters.texi | 20 ++++ libavfilter/Makefile | 1 + libavfilter/allfilters.c | 1 + libavfilter/vf_objectdetect.c | 222 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 245 insertions(+) create mode 100644 libavfilter/vf_objectdetect.c diff --git a/configure b/configure index c94f516..f2af07c 100755 --- a/configure +++ b/configure @@ -3461,6 +3461,7 @@ minterpolate_filter_select="scene_sad" mptestsrc_filter_deps="gpl" negate_filter_deps="lut_filter" nnedi_filter_deps="gpl" +objectdetect_filter_deps="libtensorflow" ocr_filter_deps="libtesseract" ocv_filter_deps="libopencv" openclsrc_filter_deps="opencl" diff --git a/doc/filters.texi b/doc/filters.texi index cf13a58..fcef30a 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -12631,6 +12631,26 @@ normalize=blackpt=red:whitept=cyan Pass the video source unchanged to the output. +@section objectdetect +Object Detection + +This filter uses Tensorflow for DNN inference. To enable this filter you +need to install the TensorFlow for C library (see +@url{https://www.tensorflow.org/install/install_c}) and configure FFmpeg with +@code{--enable-libtensorflow} + +It accepts the following options: + +@table @option +@item model +Set path to model file specifying network architecture and its parameters. +You can download tar.gz file from +@url{https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md}, +do not download quantized models which is for tensorflow light. Unzip and use frozen_inference_graph.pb +which is the frozen graph proto with weights baked into the graph as constants. + +@end table + @section ocr Optical Character Recognition diff --git a/libavfilter/Makefile b/libavfilter/Makefile index fef6ec5..cfd36d9 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -296,6 +296,7 @@ OBJS-$(CONFIG_NOFORMAT_FILTER) += vf_format.o OBJS-$(CONFIG_NOISE_FILTER) += vf_noise.o OBJS-$(CONFIG_NORMALIZE_FILTER) += vf_normalize.o OBJS-$(CONFIG_NULL_FILTER) += vf_null.o +OBJS-$(CONFIG_OBJECTDETECT_FILTER) += vf_objectdetect.o OBJS-$(CONFIG_OCR_FILTER) += vf_ocr.o OBJS-$(CONFIG_OCV_FILTER) += vf_libopencv.o OBJS-$(CONFIG_OSCILLOSCOPE_FILTER) += vf_datascope.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index c51ae0f..265e05c 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -282,6 +282,7 @@ extern AVFilter ff_vf_noformat; extern AVFilter ff_vf_noise; extern AVFilter ff_vf_normalize; extern AVFilter ff_vf_null; +extern AVFilter ff_vf_objectdetect; extern AVFilter ff_vf_ocr; extern AVFilter ff_vf_ocv; extern AVFilter ff_vf_oscilloscope; diff --git a/libavfilter/vf_objectdetect.c b/libavfilter/vf_objectdetect.c new file mode 100644 index 0000000..6e75884 --- /dev/null +++ b/libavfilter/vf_objectdetect.c @@ -0,0 +1,222 @@ +/* + * Copyright (c) 2019 Guo Yejun + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Filter implementing object detection framework using deep convolutional networks. + * models available at https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md + */ + +#include "avfilter.h" +#include "formats.h" +#include "internal.h" +#include "libavutil/opt.h" +#include "libavutil/avassert.h" +#include "libavformat/avio.h" +#include "dnn_interface.h" + +typedef struct ObjectDetectContext { + const AVClass *class; + + char *model_filename; + DNNModule *dnn_module; + DNNModel *model; + DNNInputData input; + DNNData output; +} ObjectDetectContext; + +#define OFFSET(x) offsetof(ObjectDetectContext, x) +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM +static const AVOption objectdetect_options[] = { + { "model", "path to model file specifying network architecture and its parameters", OFFSET(model_filename), AV_OPT_TYPE_STRING, {.str=NULL}, 0, 0, FLAGS }, + { NULL } +}; + +AVFILTER_DEFINE_CLASS(objectdetect); + +#define MODEL_OUTPUT_NB 4 + +static av_cold int init(AVFilterContext *context) +{ + ObjectDetectContext *od_context = context->priv; + if (!od_context->model_filename){ + av_log(context, AV_LOG_ERROR, "model file for network was not specified\n"); + return AVERROR(EIO); + } + + od_context->dnn_module = ff_get_dnn_module(DNN_TF); + if (!od_context->dnn_module){ + av_log(context, AV_LOG_ERROR, "could not create DNN module for tensorflow backend\n"); + return AVERROR(ENOMEM); + } + + od_context->model = (od_context->dnn_module->load_model)(od_context->model_filename); + if (!od_context->model){ + av_log(context, AV_LOG_ERROR, "could not load DNN model\n"); + return AVERROR(EIO); + } + + return 0; +} + +static int query_formats(AVFilterContext *context) +{ + const enum AVPixelFormat pixel_formats[] = {AV_PIX_FMT_RGB24, AV_PIX_FMT_NONE}; + AVFilterFormats *formats_list; + + formats_list = ff_make_format_list(pixel_formats); + if (!formats_list){ + av_log(context, AV_LOG_ERROR, "could not create formats list\n"); + return AVERROR(ENOMEM); + } + + return ff_set_common_formats(context, formats_list); +} + +static int config_props(AVFilterLink *inlink) +{ + AVFilterContext *context = inlink->dst; + ObjectDetectContext *od_context = context->priv; + DNNReturnType result; + const char *model_output_names[] = {"num_detections", + "detection_scores", + "detection_classes", + "detection_boxes"}; + av_assert0(MODEL_OUTPUT_NB == sizeof(model_output_names)/sizeof(model_output_names[0])); + od_context->input.width = inlink->w; + od_context->input.height = inlink->h; + od_context->input.channels = 3; + od_context->input.dt = DNN_UINT8; + + result = (od_context->model->set_input_output)(od_context->model->model, + &od_context->input, "image_tensor", + model_output_names, MODEL_OUTPUT_NB); + if (result != DNN_SUCCESS){ + av_log(context, AV_LOG_ERROR, "could not set input and output for the model\n"); + return AVERROR(EIO); + } + + return 0; +} + +static int draw_box(AVFrame *in, int x0, int y0, int x1, int y1) +{ + x0 = av_clip(x0, 0, in->width); + x1 = av_clip(x1, 0, in->width); + y0 = av_clip(y0, 0, in->height); + y1 = av_clip(y1, 0, in->height); + for (int j = x0; j < x1; ++j) { + for (int i = y0; i < y1; ++i) { + in->data[0][i * in->linesize[0] + j * 3] = 0xFF; + in->data[0][i * in->linesize[0] + j * 3 + 1] = 0; + in->data[0][i * in->linesize[0] + j * 3 + 2] = 0; + } + } + return 0; +} + +static int filter_frame(AVFilterLink *inlink, AVFrame *in) +{ + AVFilterContext *context = inlink->dst; + AVFilterLink *outlink = context->outputs[0]; + ObjectDetectContext *ctx = context->priv; + DNNReturnType dnn_result; + DNNData outputs[MODEL_OUTPUT_NB]; + + uint8_t *dnn_input = ctx->input.data; + for (int i = 0; i < in->height; ++i) { + for (int j = 0; j < in->width; ++j) { + dnn_input[i * ctx->input.width * ctx->input.channels + j * 3] = in->data[0][i * in->linesize[0] + j * 3]; + dnn_input[i * ctx->input.width * ctx->input.channels + j * 3 + 1] = in->data[0][i * in->linesize[0] + j * 3 + 1]; + dnn_input[i * ctx->input.width * ctx->input.channels + j * 3 + 2] = in->data[0][i * in->linesize[0] + j * 3 + 2]; + } + } + + dnn_result = (ctx->dnn_module->execute_model)(ctx->model, outputs, MODEL_OUTPUT_NB); + if (dnn_result != DNN_SUCCESS){ + av_log(context, AV_LOG_ERROR, "failed to execute loaded model\n"); + return AVERROR(EIO); + } + + for (uint32_t i = 0; i < *outputs[0].data; ++i) { + float score = outputs[1].data[i]; + int y0 = (int)(outputs[3].data[i*4] * in->height); + int x0 = (int)(outputs[3].data[i*4+1] * in->width); + int y1 = (int)(outputs[3].data[i*4+2] * in->height); + int x1 = (int)(outputs[3].data[i*4+3] * in->width); + // int class_id = (int)(outputs[2].data[i]; + int half_width = 1; + if (score < 0.8f) + continue; + + // can we transfer data between filters? + // for example, I want to invoke draw_text/draw_box here, + // but unable to pass data to vf_drawtext/vf_drawbox. + // Or, can filters export general interface for its functionaily? + // so we can utilize the filter functions flexibly. + // Now, have to write code here for a simple draw_box function. + draw_box(in, x0 - half_width, y0 - half_width, x1 + half_width, y0 + half_width); + draw_box(in, x0 - half_width, y0 - half_width, x0 + half_width, y1 + half_width); + draw_box(in, x1 - half_width, y0 - half_width, x1 + half_width, y1 + half_width); + draw_box(in, x0 - half_width, y1 - half_width, x1 + half_width, y1 + half_width); + } + + return ff_filter_frame(outlink, in); +} + +static av_cold void uninit(AVFilterContext *context) +{ + ObjectDetectContext *od_context = context->priv; + + if (od_context->dnn_module) + (od_context->dnn_module->free_model)(&od_context->model); + av_freep(&od_context->dnn_module); +} + +static const AVFilterPad objectdetect_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_props, + .filter_frame = filter_frame, + }, + { NULL } +}; + +static const AVFilterPad objectdetect_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + }, + { NULL } +}; + +AVFilter ff_vf_objectdetect = { + .name = "objectdetect", + .description = NULL_IF_CONFIG_SMALL("Object detection using deep convolutional networks"), + .priv_size = sizeof(ObjectDetectContext), + .init = init, + .uninit = uninit, + .query_formats = query_formats, + .inputs = objectdetect_inputs, + .outputs = objectdetect_outputs, + .priv_class = &objectdetect_class, + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC, +};