From patchwork Tue Jun 12 21:07:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pedro Arthur X-Patchwork-Id: 9380 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:11c:0:0:0:0:0 with SMTP id c28-v6csp5942498jad; Tue, 12 Jun 2018 14:14:01 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKCxWa7knPrTE+SwysjM+RuXbTdXqFLJtB7BWxISJ3S1dapjzG6pNhGLlnzjPCQCY/LS9YI X-Received: by 2002:a1c:8590:: with SMTP id h138-v6mr1564320wmd.85.1528838041444; Tue, 12 Jun 2018 14:14:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528838041; cv=none; d=google.com; s=arc-20160816; b=udfRPWz1p/WLYq35FNJ2aqpqRZEAw+vrRBtFRrMpTgu3lWxwJu1NVA7d4/XdeFpklJ lR6k/XwDn8/rAvOVdjh03eLheKCxu+9MVgFc308n/WFDcGmJjnSBbaOmHaSSzFUywtzn 89iQvgYZUt2x8W7i1iFDHUIlR6+3l6F9Qep8NaOMU3H32BIy46aJEwY1/saMfCuw0ZJh u+QedQNirtwLl/A+SVAmWGwutnv3jsBvo7cq1WNk7gSPVRKix7H6EA5XcrNvs+FSCxgI b9Q4rsHHwe5nVAoj929XytNkXE8SiAkfVSZIuJcqeYQnpppukdERcrTbI6sH10YuJCYE QsaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to :arc-authentication-results; bh=e1kd7OaH+i30QtRdjSwOScHTHQWneBO74Ia/xhgErXk=; b=xK5fYwMwKYZHM4f2Wr+AU6gCy7zBCgNco0tXyutx/iAi9ZWM4/UTTQF20Jr6bNLNzP NGIpFNB6+NRwbzQraaPPpNnz5jMBtss8T3VRH4tN2fKSMMQ3KAeH7v3dTEi+KSQnxDL5 e7ZhW1PUkVeyqIVasF9SmWaNIs/1TX2ydwHTUyXS5QbWBrjGmLGF2I6E52doZ+MgqN3V 9Pfex4vOwStlOgaSq7Yo2SEF5a/UAdn0h2LxHsleQ3y1SdoKFrcefKsAL9HjXlLWUkU3 jnMS2Fo02Y07UiH/kE2Jt6F75XXA0AuROLUO7yV8cjArwBkcI2nWu2JLjvA2FTrwGNRJ uMTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=VndVlWiW; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g50-v6si965331wrd.55.2018.06.12.14.14.01; Tue, 12 Jun 2018 14:14:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=VndVlWiW; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EDEFE68AE21; Wed, 13 Jun 2018 00:13:09 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-it0-f45.google.com (mail-it0-f45.google.com [209.85.214.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8E5D768AE21 for ; Wed, 13 Jun 2018 00:13:03 +0300 (EEST) Received: by mail-it0-f45.google.com with SMTP id j135-v6so1209899itj.1 for ; Tue, 12 Jun 2018 14:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=Lu4MQOV6RrskWzDGAhavISM2/pIErQJeUJNvSocWMBs=; b=VndVlWiW8hdOQrU7u7IN7IEai8uocElGtrzg1P8iY+CdIRnd7LTr93Ye9C+z2KmjHO Xad0SH4K81wvbf6pF4Ent5diyelQaUBIxpn/MJp3XT4XbiOOMmnPvCIGTRgSF//qhYq3 LahMvy2CjvGecY/tHqeYv7hGxiGzK/fgh/PQfAp/UK9WQj4lGiO6iT689qM2UIPaeCqB vtBoH9/eyn5EgEh3DIvdyC63AzM2OcJRSBMl8QZJw2rV2ID4CX6HMXEJdSFr5DYZ4+XC Xd9MYkmYo88uUwaoOWfRLDMJ3sUfEYF5SbIeVe4vI1ATf/DV7puXxULUhWVE/+zg/bXE JXDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Lu4MQOV6RrskWzDGAhavISM2/pIErQJeUJNvSocWMBs=; b=GbPvNVI1JLgrPRE5ydl1sdzKaw0BVzBPFqZzyFAgxVlZ3edp1Wa7iIOg/ZCQKYX4h9 muZv9pAW+16HoEt17KgX1wDyUbnXJ6Zy+pIIn9bS/B0fqvERU3QhfojFkZQvl2+oZd1j jElFTlYkx94hnhY+WommKkHfkd2kxLX/MYm8HN6LCbhYd/kMDwWDMjxmHPDIPQTKgeh0 TKiX+GFjnwt5pv9id++0NzgKzZtCR9YfXmjPb+YMFf4p2cYL2lBB/zpLIBY2XaMLv/D4 yeH2FDYTrRbSjlYoubOz/cvjyrWcSvHc22CbzkvUsSxAcO4VxbHa1MQKnQ9kft+dEju6 jSWw== X-Gm-Message-State: APt69E3HY/JG1u5Lh1veLMZRnH5getqiZ/ndNM8ukmKWGiut6ZoFJbL2 uzNDZ8vMFFCWf/G2/hEsaWu2kykB1Uvi0VqWzfIJYg== X-Received: by 2002:a24:270a:: with SMTP id g10-v6mr1987101ita.21.1528837650675; Tue, 12 Jun 2018 14:07:30 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:c405:0:0:0:0:0 with HTTP; Tue, 12 Jun 2018 14:07:30 -0700 (PDT) From: Pedro Arthur Date: Tue, 12 Jun 2018 18:07:30 -0300 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.20 Subject: [FFmpeg-devel] [PATCH] Improve dnn_backend_native convolution speed X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The attached patch adds some specialized convolution functions based on the filter size. Benchmark (1190x670px image): Filter New Old Diff (%) 9x9x1x64 | 3.093662 5.135679 39.76% 1x1x64x32| 0.912451 5.670451 83.90% 5x5x32x1 | 0.502857 0.787371 36.13% Total | 4.51023 11.5954 61.10% From 3868e5f033c62b84d29a3592bb7997fa348c2e9c Mon Sep 17 00:00:00 2001 From: Pedro Arthur Date: Tue, 12 Jun 2018 17:47:05 -0300 Subject: [PATCH] Improve dnn_backend_native convolution speed Changed memory layout from i x j x k (width, height, ch) to k x i x j Added convolve function for 1x1xn filter case Added convolve using 32x32 blocks of input --- libavfilter/dnn_backend_native.c | 212 ++++++++++++++++++++++++++----- 1 file changed, 181 insertions(+), 31 deletions(-) diff --git a/libavfilter/dnn_backend_native.c b/libavfilter/dnn_backend_native.c index 6e80dd3663..9f6b690a82 100644 --- a/libavfilter/dnn_backend_native.c +++ b/libavfilter/dnn_backend_native.c @@ -51,6 +51,187 @@ typedef struct ConvolutionalNetwork{ int32_t layers_num; } ConvolutionalNetwork; +#define VINDEX3(view, i, j, k) (view)->data[((view)->c * ((view)->w * (i) + (j)) + (k))] +#define VINDEX2(view, i, j) (view)->data[((view)->w * (i) + (j))] +#define VINDEX3A(view, i, j, k) (view)->data[((view)->w * ((view)->h * (k) + (i)) + (j))] +#define CLAMP_TO_EDGE(x, w) ((x) < 0 ? 0 : ((x) >= (w) ? (w - 1) : (x))) + +typedef struct Tensor_view +{ + float *data; + int w, h, c; +} Tensor_view; + +static void copy(Tensor_view *in, Tensor_view *buff, int size, int row, int col, int channel, int half) +{ + int h = in->h; + int w = in->w; + + for (int i = 0; i < size; ++i) { + int line = CLAMP_TO_EDGE(row + i - half, h); + for (int j = 0; j < size; ++j) { + int column = CLAMP_TO_EDGE(col + j - half, w); + VINDEX2(buff, i, j) = VINDEX3A(in, line, column, channel); + } + } +} + +static void copy_relu(Tensor_view *in, Tensor_view *out, int row, int col, int ilen, int jlen, float bias) +{ + for (int i = 0; i <= ilen; ++i) { + for (int j = 0; j <= jlen; ++j) { + VINDEX3A(out, row + i, col + j, 0) = FFMAX(VINDEX2(in, i, j) + bias, 0); + } + } +} + +static void do_block(Tensor_view *in, Tensor_view *out, Tensor_view *kern, float bias, const int row, const int col, int w, int h, int fw) +{ + float tmp[32 * 32]; + float tmp2[32 * 32]; + + int half = fw / 2; + int ilen = FFMIN(32 - fw, h - row - 1); + int jlen = FFMIN(32 - fw, w - col - 1); + + Tensor_view buf = {tmp, 32, 32, 1}; + Tensor_view obuf = { tmp2, 32, 32, 1 }; + memset(tmp2, 0, sizeof(float) * 32 * 32); + + + for (int k = 0; k < kern->c; ++k) { + copy(in, &buf, 32, row, col, k, half); + for (int ii = 0; ii <= ilen; ++ii) { + for (int jj = 0; jj <= jlen; ++jj) { + + float acc = 0; + for (int i = 0; i < fw; ++i) { + for (int j = 0; j < fw; ++j) { + acc += VINDEX2(&buf, ii + i, jj + j) * VINDEX3(kern, i, j, k); + } + } + VINDEX2(&obuf, ii, jj) += acc; + } + } + } + copy_relu(&obuf, out, row, col, ilen, jlen, bias); +} + + +static void convolve_block_32(Tensor_view *in, Tensor_view *kernel, Tensor_view *out, float bias, int w, int h, int c, int fw) +{ + int stride = 32 - fw + 1; + for (int i = 0; i < h; i += stride) { + for (int j = 0; j < w; j += stride) { + do_block(in, out, kernel, bias, i, j, w, h, fw); + } + } +} + +static void convolve_1x1(Tensor_view *in, Tensor_view *kernel, Tensor_view *out, float bias, int w, int h, int c, int fw) +{ + if (c > 1) { + for (int i = 0; i < h; ++i) { + for (int j = 0; j < w; ++j) { + VINDEX3A(out, i, j, 0) = VINDEX3A(in, i, j, 0) * kernel->data[0]; + } + } + } + + for (int k = 1; k < c-1; ++k) { + for (int i = 0; i < h; ++i) { + for (int j = 0; j < w; ++j) { + VINDEX3A(out, i, j, 0) += VINDEX3A(in, i, j, k) * kernel->data[k]; + } + } + } + + for (int i = 0; i < h; ++i) { + for (int j = 0; j < w; ++j) { + VINDEX3A(out, i, j, 0) += VINDEX3A(in, i, j, c-1) * kernel->data[c-1]; + VINDEX3A(out, i, j, 0) = FFMAX(VINDEX3A(out, i, j, 0) + bias, 0); + } + } +} + +static void convolve_generic(Tensor_view *in, Tensor_view *kernel, Tensor_view *out, float bias, int w, int h, int c, int fw) +{ + int half = fw / 2; + + if (c > 1) { + for (int i = 0; i < h; ++i) { + for (int j = 0; j < w; ++j) { + float acc = 0; + for (int ii = 0; ii < fw; ++ii) { + for (int jj = 0; jj < fw; ++jj) { + int row = CLAMP_TO_EDGE(i + ii - half, h); + int col = CLAMP_TO_EDGE(j + jj - half, w); + + acc += VINDEX3A(in, row, col, 0) * VINDEX3(kernel, ii, jj, 0); + } + } + VINDEX3A(out, i, j, 0) = acc; + } + } + } + + + for (int k = 1; k < kernel->c - 1; ++k) { + for (int i = 0; i < h; ++i) { + for (int j = 0; j < w; ++j) { + float acc = 0; + for (int ii = 0; ii < fw; ++ii) { + for (int jj = 0; jj < fw; ++jj) { + int row = CLAMP_TO_EDGE(i + ii - half, h); + int col = CLAMP_TO_EDGE(j + jj - half, w); + + acc += VINDEX3A(in, row, col, k) * VINDEX3(kernel, ii, jj, k); + } + } + VINDEX3A(out, i, j, 0) += acc; + } + } + } + + for (int i = 0; i < h; ++i) { + for (int j = 0; j < w; ++j) { + float acc = 0; + for (int ii = 0; ii < fw; ++ii) { + for (int jj = 0; jj < fw; ++jj) { + int row = CLAMP_TO_EDGE(i + ii - half, h); + int col = CLAMP_TO_EDGE(j + jj - half, w); + + acc += VINDEX3A(in, row, col, c-1) * VINDEX3(kernel, ii, jj, c-1); + } + } + + VINDEX3A(out, i, j, 0) += acc; + VINDEX3A(out, i, j, 0) = FFMAX(0, VINDEX3A(out, i, j, 0) + bias); + } + } +} + +static void convolve(const float* input, float* output, const ConvolutionalParams* conv_params, int32_t width, int32_t height) +{ + int out_stride = width * height; + int kern_stride = conv_params->kernel_size * conv_params->kernel_size * conv_params->input_num; + + Tensor_view in = {(float*)input, width, height, conv_params->input_num}; + + + for (int i = 0; i < conv_params->output_num; ++i) { + Tensor_view out = {output + i * out_stride, width, height, 1}; + Tensor_view kern = {conv_params->kernel + i * kern_stride, conv_params->kernel_size, conv_params->kernel_size, conv_params->input_num}; + + if (kern.w == 1 && kern.h == 1) + convolve_1x1(&in, &kern, &out, conv_params->biases[i], width, height, conv_params->input_num, conv_params->kernel_size); + else if (kern.w < 16 && kern.h < 16) + convolve_block_32(&in, &kern, &out, conv_params->biases[i], width, height, conv_params->input_num, conv_params->kernel_size); + else + convolve_generic(&in, &kern, &out, conv_params->biases[i], width, height, conv_params->input_num, conv_params->kernel_size); + } +} + static DNNReturnType set_input_output_native(void* model, const DNNData* input, const DNNData* output) { ConvolutionalNetwork* network = (ConvolutionalNetwork*)model; @@ -289,37 +470,6 @@ DNNModel* ff_dnn_load_default_model_native(DNNDefaultModel model_type) } } -#define CLAMP_TO_EDGE(x, w) ((x) < 0 ? 0 : ((x) >= (w) ? (w - 1) : (x))) - -static void convolve(const float* input, float* output, const ConvolutionalParams* conv_params, int32_t width, int32_t height) -{ - int y, x, n_filter, ch, kernel_y, kernel_x; - int radius = conv_params->kernel_size >> 1; - int src_linesize = width * conv_params->input_num; - int filter_linesize = conv_params->kernel_size * conv_params->input_num; - int filter_size = conv_params->kernel_size * filter_linesize; - - for (y = 0; y < height; ++y){ - for (x = 0; x < width; ++x){ - for (n_filter = 0; n_filter < conv_params->output_num; ++n_filter){ - output[n_filter] = conv_params->biases[n_filter]; - for (ch = 0; ch < conv_params->input_num; ++ch){ - for (kernel_y = 0; kernel_y < conv_params->kernel_size; ++kernel_y){ - for (kernel_x = 0; kernel_x < conv_params->kernel_size; ++kernel_x){ - output[n_filter] += input[CLAMP_TO_EDGE(y + kernel_y - radius, height) * src_linesize + - CLAMP_TO_EDGE(x + kernel_x - radius, width) * conv_params->input_num + ch] * - conv_params->kernel[n_filter * filter_size + kernel_y * filter_linesize + - kernel_x * conv_params->input_num + ch]; - } - } - } - output[n_filter] = FFMAX(output[n_filter], 0.0); - } - output += conv_params->output_num; - } - } -} - DNNReturnType ff_dnn_execute_model_native(const DNNModel* model) { ConvolutionalNetwork* network = (ConvolutionalNetwork*)model->model; -- 2.17.1