From patchwork Sun Apr 28 09:27:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: xwmeng@pku.edu.cn X-Patchwork-Id: 12927 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 4F97D448920 for ; Sun, 28 Apr 2019 12:27:23 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2B5DE6899E4; Sun, 28 Apr 2019 12:27:23 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from pku.edu.cn (mx11.pku.edu.cn [162.105.129.174]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 697C068091B for ; Sun, 28 Apr 2019 12:27:12 +0300 (EEST) Received: by ajax-webmail-mailfront03 (Coremail) ; Sun, 28 Apr 2019 17:27:08 +0800 (GMT+08:00) X-Originating-IP: [10.1.234.75] Date: Sun, 28 Apr 2019 17:27:08 +0800 (GMT+08:00) X-CM-HeaderCharset: UTF-8 From: xwmeng@pku.edu.cn To: "ffmpeg development discussions and patches" X-Priority: 3 X-Mailer: Coremail Webmail Server Version XT5.0.7b build 20180509(9e2321e9) Copyright (c) 2002-2019 www.mailtech.cn pku.edu.cn MIME-Version: 1.0 Message-ID: <3d9d5b50.458db.16a63450325.Coremail.xwmeng@pku.edu.cn> X-Coremail-Locale: zh_CN X-CM-TRANSID: 84FpogBXxk7sccVcDE04Ag--.56022W X-CM-SenderInfo: irxqijyruqkmo6sn3hxhgxhubq/1tbiAgEOBVPy7pttOgABsK X-Coremail-Antispam: 1Ur529EdanIXcx71UUUUU7IcSsGvfJ3iIAIbVAYjsxI4VWxJw CS07vEb4IE77IF4wCS07vE1I0E4x80FVAKz4kxMIAIbVAFxVCaYxvI4VCIwcAKzIAtYxBI daVFxhVjvjDU= X-Content-Filtered-By: Mailman/MimeDel 2.1.20 Subject: [FFmpeg-devel] [PATCH] libavfilter: Add more operation supports in FFmpeg dnn native mode. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This patch is for the support of derain filter project in GSoC. It adds supports for the following operations: (1) Conv padding method: "SAME" and "VALID" (2) Dilation (3) Activation: "NONE" and "LEAKY_RELU" These operations are all needed in derain filter. And if modify the dnn native mode in FFmpeg, the generation process of Super Resolution model should be changed accordingly, e.g. add padding method parameter (= 0) and dilation parameter (= 1). In addition, I have a question about the Super Resulotion implementation. The model training process of SR uses "VALID" method. According to my understanding of "VALID" mode in tensorflow, the size of output image should be smaller than the current design in SR. Because pixels near the boundary are not processed in "VALID" mode, however, these unprocessed pixels are filled with adjacent pixels in current dnn native mode. I wonder why to do like this here. From 4d92ef21a5acf064122c51f442d0e2f5437b3343 Mon Sep 17 00:00:00 2001 From: Xuewei Meng Date: Sun, 28 Apr 2019 17:21:35 +0800 Subject: [PATCH] Add operation supports in dnn_native Signed-off-by: Xuewei Meng --- libavfilter/dnn_backend_native.c | 36 +++++++++++++++++++++----------- libavfilter/dnn_backend_native.h | 6 +++++- 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/libavfilter/dnn_backend_native.c b/libavfilter/dnn_backend_native.c index 70d857f5f2..0e3ef5d64d 100644 --- a/libavfilter/dnn_backend_native.c +++ b/libavfilter/dnn_backend_native.c @@ -157,13 +157,15 @@ DNNModel *ff_dnn_load_model_native(const char *model_filename) ff_dnn_free_model_native(&model); return NULL; } + conv_params->dilation = (int32_t)avio_rl32(model_file_context); + conv_params->padding_method = (int32_t)avio_rl32(model_file_context); conv_params->activation = (int32_t)avio_rl32(model_file_context); conv_params->input_num = (int32_t)avio_rl32(model_file_context); conv_params->output_num = (int32_t)avio_rl32(model_file_context); conv_params->kernel_size = (int32_t)avio_rl32(model_file_context); kernel_size = conv_params->input_num * conv_params->output_num * conv_params->kernel_size * conv_params->kernel_size; - dnn_size += 16 + (kernel_size + conv_params->output_num << 2); + dnn_size += 24 + (kernel_size + conv_params->output_num << 2); if (dnn_size > file_size || conv_params->input_num <= 0 || conv_params->output_num <= 0 || conv_params->kernel_size <= 0){ avio_closep(&model_file_context); @@ -221,23 +223,28 @@ DNNModel *ff_dnn_load_model_native(const char *model_filename) static void convolve(const float *input, float *output, const ConvolutionalParams *conv_params, int width, int height) { - int y, x, n_filter, ch, kernel_y, kernel_x; int radius = conv_params->kernel_size >> 1; int src_linesize = width * conv_params->input_num; int filter_linesize = conv_params->kernel_size * conv_params->input_num; int filter_size = conv_params->kernel_size * filter_linesize; + int pad_size = (conv_params->padding_method == VALID) ? (conv_params->kernel_size - 1) / 2 * conv_params->dilation : 0; - for (y = 0; y < height; ++y){ - for (x = 0; x < width; ++x){ - for (n_filter = 0; n_filter < conv_params->output_num; ++n_filter){ + for (int y = pad_size; y < height - pad_size; ++y){ + for (int x = pad_size; x < width - pad_size; ++x){ + for (int n_filter = 0; n_filter < conv_params->output_num; ++n_filter){ output[n_filter] = conv_params->biases[n_filter]; - for (ch = 0; ch < conv_params->input_num; ++ch){ - for (kernel_y = 0; kernel_y < conv_params->kernel_size; ++kernel_y){ - for (kernel_x = 0; kernel_x < conv_params->kernel_size; ++kernel_x){ - output[n_filter] += input[CLAMP_TO_EDGE(y + kernel_y - radius, height) * src_linesize + - CLAMP_TO_EDGE(x + kernel_x - radius, width) * conv_params->input_num + ch] * - conv_params->kernel[n_filter * filter_size + kernel_y * filter_linesize + - kernel_x * conv_params->input_num + ch]; + + for (int ch = 0; ch < conv_params->input_num; ++ch){ + for (int kernel_y = 0; kernel_y < conv_params->kernel_size; ++kernel_y){ + for (int kernel_x = 0; kernel_x < conv_params->kernel_size; ++kernel_x){ + int y_pos = y + (kernel_y - radius) * conv_params->dilation; + int x_pos = x + (kernel_x - radius) * conv_params->dilation; + + float input_pel = (x_pos < 0 || x_pos >= width || y_pos < 0 || y_pos >= height) ? 0.0 : + input[y_pos * src_linesize + x_pos * conv_params->input_num + ch]; + + output[n_filter] += input_pel * conv_params->kernel[n_filter * filter_size + kernel_y * filter_linesize + + kernel_x * conv_params->input_num + ch]; } } } @@ -250,6 +257,11 @@ static void convolve(const float *input, float *output, const ConvolutionalParam break; case SIGMOID: output[n_filter] = 1.0f / (1.0f + exp(-output[n_filter])); + break; + case NONE: + break; + case LEAKY_RELU: + output[n_filter] = FFMAX(output[n_filter], 0.0) + 0.2 * FFMIN(output[n_filter], 0.0); } } output += conv_params->output_num; diff --git a/libavfilter/dnn_backend_native.h b/libavfilter/dnn_backend_native.h index 51d4cac955..f7d4eb823b 100644 --- a/libavfilter/dnn_backend_native.h +++ b/libavfilter/dnn_backend_native.h @@ -32,7 +32,9 @@ typedef enum {INPUT, CONV, DEPTH_TO_SPACE} DNNLayerType; -typedef enum {RELU, TANH, SIGMOID} DNNActivationFunc; +typedef enum {RELU, TANH, SIGMOID, NONE, LEAKY_RELU} DNNActivationFunc; + +typedef enum {VALID, SAME} DNNPaddingFunc; typedef struct Layer{ DNNLayerType type; @@ -43,6 +45,8 @@ typedef struct Layer{ typedef struct ConvolutionalParams{ int32_t input_num, output_num, kernel_size; DNNActivationFunc activation; + DNNPaddingFunc padding_method; + int32_t dilation; float *kernel; float *biases; } ConvolutionalParams;