From patchwork Wed Jan 9 19:07:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 11687 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 121CE44E37E for ; Wed, 9 Jan 2019 21:14:04 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B511C68A299; Wed, 9 Jan 2019 21:14:00 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0389F689D0F for ; Wed, 9 Jan 2019 21:13:53 +0200 (EET) Received: by mail-wr1-f67.google.com with SMTP id z5so8781443wrt.11 for ; Wed, 09 Jan 2019 11:14:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=1bd9hojQZR/KnvMJcK1OYCg0RhGTppeGom1GeLLSiP8=; b=rsj5KowBQIojVCLnpdl/jYjgIX/+1o5SLGmKf48Zzxx4obA1qDEw4NUAQuu34AXPkN v/E2g3nFvHh7eu0KOrU3Mwv1euHFee6KsTD1ofWgfQG8WumD8rGTIVawoWhIO9evotD0 zYtzItAWk8vB+5d5Zsh52ScWu/vjhH7AKwHVNxgeeQn8FPzsMeInIHvnvtqQMrFEYpmP qdauzcY9Pueu1XxdydVq7XO06y+9W3tX/VpVP9TuELF0+ZMNIc4DaYb2PIVaJyRS4+Oa lawXX3mo337WjVmAlGX572mS8szWNQoDr6cFtBau6JMeOrtUoQAIj2JfX3bKTY3l6YjX 6W8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=1bd9hojQZR/KnvMJcK1OYCg0RhGTppeGom1GeLLSiP8=; b=Am1JdYv9m0O3xK/0ddtTdMw/e58lxSu6dWtJ3jWfUnu4SXe20QUnhvxQoksqdqoese 9voPbvk040FXRTJIb7xYxZN+EqDZsDOVcqJSfppynp/kpg+n72zNiInkJ46hZSSuC5w/ Bb/V81nzmiQgx+OA7wwO0I2ajWoDqgwI+Debk+Fu+t+tNWD/crqt26nVtPuAUrYqcpcH T9GbePHN3g5/yDVwLWaq+R9kjUHTSihx1KFWxvsS5rh9Cd1gNr6rZSOs0HklQ5/7w1mv pj3Wuaaxt2yUAReCJvrMZQYH7DGo9S0prCDIdAPj274e2fo16NKESqk4u5w5s+F8lro6 eGdA== X-Gm-Message-State: AJcUuke6/Gl2H2JJpoEBLRbL3G58lZS8IZkS77eAcen+oArtrci+Qgyj 7jxeKxT5xGTz1gsMYa3T02+ZQdQC X-Google-Smtp-Source: ALg8bN7rLsKWXil09sTQfzcJEEBndniheYPbhWPhfVWMfDj/4V1FpdO934/YlSoogum1sPLA3jTaHw== X-Received: by 2002:a5d:480d:: with SMTP id l13mr6588108wrq.175.1547060877174; Wed, 09 Jan 2019 11:07:57 -0800 (PST) Received: from localhost.localdomain ([94.250.174.60]) by smtp.gmail.com with ESMTPSA id q9sm76114761wrp.0.2019.01.09.11.07.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jan 2019 11:07:56 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Wed, 9 Jan 2019 20:07:44 +0100 Message-Id: <20190109190744.20034-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH] avfilter: add anlmdn filter x86 SIMD optimizations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/af_anlmdn.c | 38 +++++++++++---- libavfilter/af_anlmdndsp.h | 40 ++++++++++++++++ libavfilter/x86/Makefile | 2 + libavfilter/x86/af_anlmdn.asm | 80 ++++++++++++++++++++++++++++++++ libavfilter/x86/af_anlmdn_init.c | 35 ++++++++++++++ 5 files changed, 185 insertions(+), 10 deletions(-) create mode 100644 libavfilter/af_anlmdndsp.h create mode 100644 libavfilter/x86/af_anlmdn.asm create mode 100644 libavfilter/x86/af_anlmdn_init.c diff --git a/libavfilter/af_anlmdn.c b/libavfilter/af_anlmdn.c index 62931e37cc..9ef0bf7239 100644 --- a/libavfilter/af_anlmdn.c +++ b/libavfilter/af_anlmdn.c @@ -27,6 +27,8 @@ #include "audio.h" #include "formats.h" +#include "af_anlmdndsp.h" + #define SQR(x) ((x) * (x)) typedef struct AudioNLMeansContext { @@ -49,7 +51,7 @@ typedef struct AudioNLMeansContext { AVAudioFifo *fifo; - float (*compute_distance)(const float *f1, const float *f2, int K); + AudioNLMDNDSPContext dsp; } AudioNLMeansContext; #define OFFSET(x) offsetof(AudioNLMeansContext, x) @@ -93,7 +95,7 @@ static int query_formats(AVFilterContext *ctx) return ff_set_common_samplerates(ctx, formats); } -static float compute_distance_ssd(const float *f1, const float *f2, int K) +static float compute_distance_ssd_c(const float *f1, const float *f2, ptrdiff_t K) { float distance = 0.; @@ -103,6 +105,25 @@ static float compute_distance_ssd(const float *f1, const float *f2, int K) return distance; } +static void compute_cache_c(float *cache, const float *f, + ptrdiff_t S, ptrdiff_t K, + ptrdiff_t i, ptrdiff_t jj) +{ + int v = 0; + + for (int j = jj; j < jj + S; j++, v++) + cache[v] += -SQR(f[i - K - 1] - f[j - K - 1]) + SQR(f[i + K] - f[j + K]); +} + +void ff_anlmdn_init(AudioNLMDNDSPContext *dsp) +{ + dsp->compute_distance_ssd = compute_distance_ssd_c; + dsp->compute_cache = compute_cache_c; + + if (ARCH_X86) + ff_anlmdn_init_x86(dsp); +} + static int config_output(AVFilterLink *outlink) { AVFilterContext *ctx = outlink->src; @@ -129,7 +150,7 @@ static int config_output(AVFilterLink *outlink) if (!s->fifo) return AVERROR(ENOMEM); - s->compute_distance = compute_distance_ssd; + ff_anlmdn_init(&s->dsp); return 0; } @@ -153,17 +174,14 @@ static int filter_channel(AVFilterContext *ctx, void *arg, int ch, int nb_jobs) for (int j = i - S; j <= i + S; j++) { if (i == j) continue; - cache[v++] = s->compute_distance(f + i, f + j, K); + cache[v++] = s->dsp.compute_distance_ssd(f + i, f + j, K); } } else { - for (int j = i - S; j < i; j++, v++) - cache[v] = cache[v] - SQR(f[i - K - 1] - f[j - K - 1]) + SQR(f[i + K] - f[j + K]); - - for (int j = i + 1; j <= i + S; j++, v++) - cache[v] = cache[v] - SQR(f[i - K - 1] - f[j - K - 1]) + SQR(f[i + K] - f[j + K]); + s->dsp.compute_cache(cache, f, S, K, i, i - S); + s->dsp.compute_cache(cache + S, f, S, K, i, i + 1); } - for (int j = 0; j < v; j++) { + for (int j = 0; j < 2 * S; j++) { const float distance = cache[j]; float w; diff --git a/libavfilter/af_anlmdndsp.h b/libavfilter/af_anlmdndsp.h new file mode 100644 index 0000000000..d8f5136cd8 --- /dev/null +++ b/libavfilter/af_anlmdndsp.h @@ -0,0 +1,40 @@ +/* + * Copyright (c) 2019 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVFILTER_ANLMDNDSP_H +#define AVFILTER_ANLMDNDSP_H + +#include "libavutil/common.h" + +#include "audio.h" +#include "avfilter.h" +#include "formats.h" +#include "internal.h" + +typedef struct AudioNLMDNDSPContext { + float (*compute_distance_ssd)(const float *f1, const float *f2, ptrdiff_t K); + void (*compute_cache)(float *cache, const float *f, ptrdiff_t S, ptrdiff_t K, + ptrdiff_t i, ptrdiff_t jj); +} AudioNLMDNDSPContext; + +void ff_anlmdn_init(AudioNLMDNDSPContext *s); +void ff_anlmdn_init_x86(AudioNLMDNDSPContext *s); + +#endif /* AVFILTER_ANLMDNDSP_H */ diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index 6eecb94359..17499f14da 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -1,6 +1,7 @@ OBJS-$(CONFIG_SCENE_SAD) += x86/scene_sad_init.o OBJS-$(CONFIG_AFIR_FILTER) += x86/af_afir_init.o +OBJS-$(CONFIG_ANLMDN_FILTER) += x86/af_anlmdn_init.o OBJS-$(CONFIG_BLEND_FILTER) += x86/vf_blend_init.o OBJS-$(CONFIG_BWDIF_FILTER) += x86/vf_bwdif_init.o OBJS-$(CONFIG_COLORSPACE_FILTER) += x86/colorspacedsp_init.o @@ -34,6 +35,7 @@ OBJS-$(CONFIG_YADIF_FILTER) += x86/vf_yadif_init.o X86ASM-OBJS-$(CONFIG_SCENE_SAD) += x86/scene_sad.o X86ASM-OBJS-$(CONFIG_AFIR_FILTER) += x86/af_afir.o +X86ASM-OBJS-$(CONFIG_ANLMDN_FILTER) += x86/af_anlmdn.o X86ASM-OBJS-$(CONFIG_BLEND_FILTER) += x86/vf_blend.o X86ASM-OBJS-$(CONFIG_BWDIF_FILTER) += x86/vf_bwdif.o X86ASM-OBJS-$(CONFIG_COLORSPACE_FILTER) += x86/colorspacedsp.o diff --git a/libavfilter/x86/af_anlmdn.asm b/libavfilter/x86/af_anlmdn.asm new file mode 100644 index 0000000000..9630f4771c --- /dev/null +++ b/libavfilter/x86/af_anlmdn.asm @@ -0,0 +1,80 @@ +;***************************************************************************** +;* x86-optimized functions for anlmdn filter +;* Copyright (c) 2017 Paul B Mahol +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION .text + +;------------------------------------------------------------------------------ +; float ff_compute_distance_ssd(float *f1, const float *f2, ptrdiff_t len) +;------------------------------------------------------------------------------ + +INIT_XMM sse +cglobal compute_distance_ssd, 3,5,3, f1, f2, len, r, x + mov xq, lenq + shl xq, 2 + neg xq + add f1q, xq + add f2q, xq + xor xq, xq + shl lend, 1 + add lend, 1 + shl lend, 2 + mov rq, lenq + and rq, mmsize - 1 + xorps m0, m0 + cmp lenq, mmsize + jl .loop1 + sub lenq, rq +ALIGN 16 + .loop0: + movups m1, [f1q + xq] + movups m2, [f2q + xq] + subps m1, m2 + mulps m1, m1 + addps m0, m1 + add xq, mmsize + cmp xq, lenq + jl .loop0 + + movhlps xmm1, xmm0 + addps xmm0, xmm1 + movss xmm1, xmm0 + shufps xmm0, xmm0, 1 + addss xmm0, xmm1 + + cmp rq, 0 + je .end + add lenq, rq + .loop1: + movss xm1, [f1q + xq] + subss xm1, [f2q + xq] + mulss xm1, xm1 + addss xm0, xm1 + add xq, 4 + cmp xq, lenq + jl .loop1 + .end: +%if ARCH_X86_64 == 0 + movss r0m, xm0 + fld dword r0m +%endif + RET diff --git a/libavfilter/x86/af_anlmdn_init.c b/libavfilter/x86/af_anlmdn_init.c new file mode 100644 index 0000000000..30eff6f644 --- /dev/null +++ b/libavfilter/x86/af_anlmdn_init.c @@ -0,0 +1,35 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "config.h" +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/x86/cpu.h" +#include "libavfilter/af_anlmdndsp.h" + +float ff_compute_distance_ssd_sse(const float *f1, const float *f2, + ptrdiff_t len); + +av_cold void ff_anlmdn_init_x86(AudioNLMDNDSPContext *s) +{ + int cpu_flags = av_get_cpu_flags(); + + if (EXTERNAL_SSE(cpu_flags)) { + s->compute_distance_ssd = ff_compute_distance_ssd_sse; + } +}