From patchwork Sat Nov 6 20:56:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 31309 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5e:dc01:0:0:0:0:0 with SMTP id b1csp1805300iok; Sat, 6 Nov 2021 13:55:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzkTgTrix1xb9ImNc1jdR3V0Lt7roSIhq5aDwnBsDkkIE04YpjDn+4hFf5xcMAIQUEH5GiE X-Received: by 2002:a17:906:17c5:: with SMTP id u5mr83655272eje.431.1636232158612; Sat, 06 Nov 2021 13:55:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636232158; cv=none; d=google.com; s=arc-20160816; b=vcCP5v5+0B5VPh6qXr4UFoDyeYPLQA+qC/DhOuTV5uSh8qEtkOemRF1lFY8+UgCx9q iz/MP1Egc1YS6MWyPbQmN9kZDicq7QmfWAE5BHhdnv29EEud/K3jzJ6DZF5lyw2FE3K2 r/Jv3llHWILD29HNXeLbPqVaNNQEdfEy2/jsMdT7r+Gzl4nnEZtFSIVlVN0Lt4B3C4Hd FGmZU2ulQDLtMju9n1BISJyXgPC6mJDoZPbPUt0koS2I4X4oiBDTRL1JhDwsfjqwKlfP fylfBFiMDcj3mMHWX01ITXvX/8w7Din74gTGcoAWAQfPgMg1mlCDb3J3wDdPTFAy7zrb 9Siw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=0qaE7tKb2Np3/50qf6ad7VZ8ADjq4QViPlS/zjcK+qY=; b=GfkaXP0Y+ceKgyCxxJUete2lQzlk/LDPESbmgm2x6Y6GriRuZdUacsn06w11MppaGy ID9MOcSSI62QDRC/L0d7ZSD1gSKKzsFj/UfB9feMdS7Yc4fL3APbMDgVtamJZjw+CT04 xeYdPklN+ZOTg7waq7LJloOKYqgGc/jWHX4op7vl5VF7pD/dRl6JhMTZy9dyprujijgU iEDb+tymrg+7E9fs+zhOh6PDkekHjdQ4bta7SVO4CEudNTd3KC/0C+SW+GQf00c/znYP BYll/hj/j7QJiaWEiu+efcqR0kh/bRcOAZUThyQLUCOuCzvxqKvlhbyvXVwU4FmNc90z FvtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=a0P2kCEu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l5si7236446eja.741.2021.11.06.13.55.57; Sat, 06 Nov 2021 13:55:58 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=a0P2kCEu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8A97A68A8BE; Sat, 6 Nov 2021 22:55:54 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 61FD36807C5 for ; Sat, 6 Nov 2021 22:55:47 +0200 (EET) Received: by mail-wm1-f46.google.com with SMTP id v127so9878240wme.5 for ; Sat, 06 Nov 2021 13:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=0Dji29LQJdGOhceyej9hK3Gp19caC9MC6sCDGivX1h8=; b=a0P2kCEunSkYdynVZfmZxG79tKIi97QMtecxSXLWiMkAE6/rSdBQZhh7IgrPXAd6EG 2xWekDp0maHky2vESRWgCOZtV4+OOOLqABcCC6xn9x4pb7cZ7AThbaq5R6rRxUxePpH/ K72Ft0QFkI8sQqCN6XC6vGVnTF8gpRC33tFhJNykWxFGdHNJAffSSmN7raCKeQmdrUu3 1E553E11f0OBOnxh98RMeEEOzRc58DXuaq7ArvD3+KTYeV81mGfZvnn7RMSr+LrJ4XUH GGamIyg6SMcNzpI1EXFCJEmzvX7brSHwT78Di/31jcKVZEFI1ohwwJhYRz/yKPJxUnXL Ya4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=0Dji29LQJdGOhceyej9hK3Gp19caC9MC6sCDGivX1h8=; b=0JlqwMOVPkvh8dNZJkbVckMk+w3FF1LH2lt5Zk7rarflIYLBWm7xgeZBGl5N/uVRVV OXl2tF2Ab05lctwzUHJYHxXg8WeAAk2laz3FBvy+5gI5kWBYukkRU2EOMMifkyl7mZQk SKHUn1lcwatuYN3ztGcF19P3xWg1P/WChamIfCmfHQVnYxH9lDhpdCIi4m7d4i5JoPje +XCjDtzIU/KVmIQ1qj/WpXe6JABCQvWR2BQL6XbmWpxvI+PzEt/vk4vj/Js5+boIe3MX e08qYDgYko7kDPNKb84zgnLSDwFtfnjoc7/uR/3tFJmYMye6hRcz+bEh73FCchCyVjxT ONTg== X-Gm-Message-State: AOAM531evYrTf4rmKWXKy8v0qw3eHLnDdoftp+94vy0NOhKZpdJVdPH0 V2mL7e0y6zBNIYjBoOy7HSRlyjkBpt4= X-Received: by 2002:a05:600c:350c:: with SMTP id h12mr37593762wmq.123.1636232146574; Sat, 06 Nov 2021 13:55:46 -0700 (PDT) Received: from localhost.localdomain ([212.15.177.0]) by smtp.gmail.com with ESMTPSA id a134sm12130700wmd.9.2021.11.06.13.55.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Nov 2021 13:55:46 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 6 Nov 2021 21:56:06 +0100 Message-Id: <20211106205606.184012-1-onemda@gmail.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avfilter/vf_nlmeans: add x86 SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 392aCP5KKVzG Signed-off-by: Paul B Mahol --- libavfilter/vf_nlmeans.c | 9 ++- libavfilter/vf_nlmeans.h | 1 + libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_nlmeans.asm | 97 +++++++++++++++++++++++++++++++ libavfilter/x86/vf_nlmeans_init.c | 40 +++++++++++++ 5 files changed, 146 insertions(+), 3 deletions(-) create mode 100644 libavfilter/x86/vf_nlmeans.asm create mode 100644 libavfilter/x86/vf_nlmeans_init.c diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index dee1f68101..8a05965c9b 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -308,9 +308,9 @@ static int config_input(AVFilterLink *inlink) s->ii = s->ii_orig + s->ii_lz_32 + 1; // allocate weighted average for every pixel - s->linesize = inlink->w; - s->total_weight = av_malloc_array(inlink->w, inlink->h * sizeof(*s->total_weight)); - s->sum = av_malloc_array(inlink->w, inlink->h * sizeof(*s->sum)); + s->linesize = inlink->w + 100; + s->total_weight = av_malloc_array(s->linesize, inlink->h * sizeof(*s->total_weight)); + s->sum = av_malloc_array(s->linesize, inlink->h * sizeof(*s->sum)); if (!s->total_weight || !s->sum) return AVERROR(ENOMEM); @@ -519,6 +519,9 @@ void ff_nlmeans_init(NLMeansDSPContext *dsp) if (ARCH_AARCH64) ff_nlmeans_init_aarch64(dsp); + + if (ARCH_X86) + ff_nlmeans_init_x86(dsp); } static av_cold int init(AVFilterContext *ctx) diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h index cd1ee7c0bf..43611a03bd 100644 --- a/libavfilter/vf_nlmeans.h +++ b/libavfilter/vf_nlmeans.h @@ -41,5 +41,6 @@ typedef struct NLMeansDSPContext { void ff_nlmeans_init(NLMeansDSPContext *dsp); void ff_nlmeans_init_aarch64(NLMeansDSPContext *dsp); +void ff_nlmeans_init_x86(NLMeansDSPContext *dsp); #endif /* AVFILTER_NLMEANS_H */ diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index a29941eaeb..e87481bd7a 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -20,6 +20,7 @@ OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter_init.o OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d_init.o OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp_init.o OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge_init.o +OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans_init.o OBJS-$(CONFIG_NOISE_FILTER) += x86/vf_noise.o OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay_init.o OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7_init.o @@ -61,6 +62,7 @@ X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter.o X86ASM-OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d.o X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp.o X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o +X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans.o X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o X86ASM-OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr.o diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm new file mode 100644 index 0000000000..8f57801035 --- /dev/null +++ b/libavfilter/x86/vf_nlmeans.asm @@ -0,0 +1,97 @@ +;***************************************************************************** +;* x86-optimized functions for nlmeans filter +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + + +%include "libavutil/x86/x86util.asm" + +%if HAVE_AVX2_EXTERNAL && ARCH_X86_64 + +SECTION_RODATA 32 + +ending_lut: dd -1, -1, -1, -1, -1, -1, -1, -1,\ + 0, -1, -1, -1, -1, -1, -1, -1,\ + 0, 0, -1, -1, -1, -1, -1, -1,\ + 0, 0, 0, -1, -1, -1, -1, -1,\ + 0, 0, 0, 0, -1, -1, -1, -1,\ + 0, 0, 0, 0, 0, -1, -1, -1,\ + 0, 0, 0, 0, 0, 0, -1, -1,\ + 0, 0, 0, 0, 0, 0, 0, -1,\ + 0, 0, 0, 0, 0, 0, 0, 0 + +SECTION .text + +; void ff_compute_weights_line(const uint32_t *const iia, +; const uint32_t *const iib, +; const uint32_t *const iid, +; const uint32_t *const iie, +; const uint8_t *const src, +; float *total, +; float *sum, +; const float *const lut, +; int max, +; int startx, int endx); + +INIT_YMM avx2 +cglobal compute_weights_line, 8, 13, 5, 0, iia, iib, iid, iie, src, total, sum, lut, x, startx, endx, mod, elut + movsxd startxq, dword startxm + movsxd endxq, dword endxm + VPBROADCASTD m2, r8m + + mov xq, startxq + mov modq, mmsize / 4 + lea elutq, [ending_lut] + + vpcmpeqd m4, m4 + + .loop: + mov startxq, endxq + sub startxq, xq + cmp startxq, modq + cmovge startxq, modq + sal startxq, 5 + + movu m0, [iieq + xq * 4] + + psubd m0, [iidq + xq * 4] + psubd m0, [iibq + xq * 4] + paddd m0, [iiaq + xq * 4] + por m0, [elutq + startxq] + pminud m0, m2 + pslld m0, 2 + mova m3, m4 + vgatherdps m1, [lutq + m0], m3 + + pmovzxbd m0, [srcq + xq] + cvtdq2ps m0, m0 + + mulps m0, m1 + + addps m1, [totalq + xq * 4] + addps m0, [sumq + xq * 4] + + movups [totalq + xq * 4], m1 + movups [sumq + xq * 4], m0 + + add xq, mmsize / 4 + cmp xq, endxq + jl .loop + RET + +%endif diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c new file mode 100644 index 0000000000..37764d30ab --- /dev/null +++ b/libavfilter/x86/vf_nlmeans_init.c @@ -0,0 +1,40 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/x86/cpu.h" +#include "libavfilter/vf_nlmeans.h" + +void ff_compute_weights_line_avx2(const uint32_t *const iia, + const uint32_t *const iib, + const uint32_t *const iid, + const uint32_t *const iie, + const uint8_t *const src, + float *total_weight, + float *sum, + const float *const weight_lut, + int max_meaningful_diff, + int startx, int endx); + +av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp) +{ + int cpu_flags = av_get_cpu_flags(); + + if (ARCH_X86_64 && EXTERNAL_AVX2_FAST(cpu_flags)) + dsp->compute_weights_line = ff_compute_weights_line_avx2; +}