From patchwork Sat Nov 6 14:41:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 31307 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5e:dc01:0:0:0:0:0 with SMTP id b1csp1494218iok; Sat, 6 Nov 2021 07:41:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzTl2zXpe5LY87s1vjM4QeNMChcMG2mF3+1Rs/7qkDrIp3jp22FfiHPClNRlkFSX5mTZ7st X-Received: by 2002:a17:906:dc8a:: with SMTP id cs10mr81305856ejc.254.1636209704177; Sat, 06 Nov 2021 07:41:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636209704; cv=none; d=google.com; s=arc-20160816; b=ePQkW1HJgKEd1V8FFYvk5yTYGsvytL+560F98X5wP8grZOH0sK7/BaGchU2YRucYy6 FSkKZoM43J3xnFMzx82dbBZFt/yRl4pE1i6Y8Tjw4oZ76choqVUegIvyJ/TyeTnd7ekO mt1yvmjUNcNUJeX6S6lQ8HIfWr+f+fukSuYdkWbfpz/Cdxsic4Gq0s6dVkuNTj4dwneW we/moDEbTDqtIfc7td1yagoKk7BZ8ziE4EJOTOtlk0sVZd099aKEERphNtyVm7hdzTay EyPrhgXaLpRPrhO8Rk7jQwsB65uOpv9cjo4RQRK+vpC3uvP8rS8tKXe7Ae68ZkVd6uyG yxKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=bQS46qoI0GO6ePYRkHYOdBMMLo8bBZEjvKXMOhwRzmk=; b=sZhqZ8kLMPsdnU2YkC6zbhzoZL+5PIBvGiIe3X301No0mbeLrpiVIR7/iNxMRPAX1W /epUfwYu+c/USHST8mrwVI1WSuLJPXPYx+7PzCrZSKJXRo5Y24OzsMCdnTWCiwZn2BU8 M6JlKP+lvENj8KEF/Gmbyt1Tux5SibfV6ikMfmTANlFi7VbxAYNLCuV+FQnI9kyhLnop 1p7S5s+oX54KBPU9W4tUyVfRyBT7KxW8WJcCLG7cG2wSPaQuFZburi29o4zB6C3JpTeB RkWQVohyCGf946iKJw9SkXO6lsCpRubU1XqttrkR1v2JkAzRGdVz8K8qwv/tDmwxUX6r DYlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b="T8w5IFc/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hq13si24917089ejc.109.2021.11.06.07.41.43; Sat, 06 Nov 2021 07:41:44 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b="T8w5IFc/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A78AD680BAD; Sat, 6 Nov 2021 16:41:40 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5CB76680BAD for ; Sat, 6 Nov 2021 16:41:34 +0200 (EET) Received: by mail-wm1-f49.google.com with SMTP id y84-20020a1c7d57000000b00330cb84834fso11684619wmc.2 for ; Sat, 06 Nov 2021 07:41:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=yXa3+e0oC+k10EACo3E5hEq1GrqR6DZOPnHdY4BlD/E=; b=T8w5IFc/Pu1tydL+YRADmDjkdKWxacqpxysuksTDu4GKRBru8T7K2zSzsOeXgBn+lC ajOwBuNnUvvYJAAKMP6zm8lUEN5/Kx2+ZzhpNUgTPwc75KwI+wvQ9ygRd6W1wAKRsU16 BUyELKlgw0zHMLaE3GyKYf9pOmzUaL1HW+MwAAy9E44sNJmE94C3I0oow01KdhkyBxSO CRUN9Fq/gwqjM3RXUxECOh0gSMTkZnVJw5Efhj43LrSyWqoYkfqgiKQGaDvpqHHgxryV AnDhzo1jkXv/h/M1bV98UKHghL3/vYMrVeSTmFzZaNe5HNmdPHZWspYvMm+awEck2SoJ uWhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=yXa3+e0oC+k10EACo3E5hEq1GrqR6DZOPnHdY4BlD/E=; b=0hDMe+uGfLIdxvuBIZZhaN9sa1LextIPfyourW7rREqtlg9cSrtBqGTYubDrfDx2an myHYd9ETpd9fObVsA3j+rePKnQWiVwfLS4Bntzz2Fe2DAzJvDx5UUqhZuGEyJLIxwAta xPIqhLl1QO4zAcQ0unEVdqeSUfV/2tKiVcE2Z6Y0tWzR0XJWxFpfmcuOHxOwF1z68vbr i4f8HMQWuwdUvQaDln0uaRO7TzGXFADLy9yUcnOyHdspRGRONo2+TYV00QFrcrAbKsau /YhFvmju1+Tv51pzFDHNpNwu2Uq0WdOlIKjBinGF1OSgGIxHjqQw2AbdbbF4qCaiQnDx j9vA== X-Gm-Message-State: AOAM532OcN1UwcTtG0b/92QdLq8PIVsUb6gvh9Hs5iQOm3qR7Tu9j3Q3 HZrq1lWp5Kd09wU3PC1mv5RxSFxQt7U= X-Received: by 2002:a7b:cc07:: with SMTP id f7mr39389666wmh.106.1636209693374; Sat, 06 Nov 2021 07:41:33 -0700 (PDT) Received: from localhost.localdomain ([212.15.177.0]) by smtp.gmail.com with ESMTPSA id y12sm10868986wrn.73.2021.11.06.07.41.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Nov 2021 07:41:32 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 6 Nov 2021 15:41:49 +0100 Message-Id: <20211106144150.175429-1-onemda@gmail.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avfilter/vf_nlmeans: add x86 SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pEy3vlwChYsg Signed-off-by: Paul B Mahol --- libavfilter/vf_nlmeans.c | 9 ++- libavfilter/vf_nlmeans.h | 1 + libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_nlmeans.asm | 97 +++++++++++++++++++++++++++++++ libavfilter/x86/vf_nlmeans_init.c | 40 +++++++++++++ 5 files changed, 146 insertions(+), 3 deletions(-) create mode 100644 libavfilter/x86/vf_nlmeans.asm create mode 100644 libavfilter/x86/vf_nlmeans_init.c diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index dee1f68101..8a05965c9b 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -308,9 +308,9 @@ static int config_input(AVFilterLink *inlink) s->ii = s->ii_orig + s->ii_lz_32 + 1; // allocate weighted average for every pixel - s->linesize = inlink->w; - s->total_weight = av_malloc_array(inlink->w, inlink->h * sizeof(*s->total_weight)); - s->sum = av_malloc_array(inlink->w, inlink->h * sizeof(*s->sum)); + s->linesize = inlink->w + 100; + s->total_weight = av_malloc_array(s->linesize, inlink->h * sizeof(*s->total_weight)); + s->sum = av_malloc_array(s->linesize, inlink->h * sizeof(*s->sum)); if (!s->total_weight || !s->sum) return AVERROR(ENOMEM); @@ -519,6 +519,9 @@ void ff_nlmeans_init(NLMeansDSPContext *dsp) if (ARCH_AARCH64) ff_nlmeans_init_aarch64(dsp); + + if (ARCH_X86) + ff_nlmeans_init_x86(dsp); } static av_cold int init(AVFilterContext *ctx) diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h index cd1ee7c0bf..43611a03bd 100644 --- a/libavfilter/vf_nlmeans.h +++ b/libavfilter/vf_nlmeans.h @@ -41,5 +41,6 @@ typedef struct NLMeansDSPContext { void ff_nlmeans_init(NLMeansDSPContext *dsp); void ff_nlmeans_init_aarch64(NLMeansDSPContext *dsp); +void ff_nlmeans_init_x86(NLMeansDSPContext *dsp); #endif /* AVFILTER_NLMEANS_H */ diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index a29941eaeb..e87481bd7a 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -20,6 +20,7 @@ OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter_init.o OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d_init.o OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp_init.o OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge_init.o +OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans_init.o OBJS-$(CONFIG_NOISE_FILTER) += x86/vf_noise.o OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay_init.o OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7_init.o @@ -61,6 +62,7 @@ X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter.o X86ASM-OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d.o X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp.o X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o +X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans.o X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o X86ASM-OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr.o diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm new file mode 100644 index 0000000000..8731dc5c45 --- /dev/null +++ b/libavfilter/x86/vf_nlmeans.asm @@ -0,0 +1,97 @@ +;***************************************************************************** +;* x86-optimized functions for nlmeans filter +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + + +%include "libavutil/x86/x86util.asm" + +%if HAVE_AVX2_EXTERNAL && ARCH_X86_64 + +SECTION_RODATA 32 + +ending_lut: dd -1, -1, -1, -1, -1, -1, -1, -1,\ + 0, -1, -1, -1, -1, -1, -1, -1,\ + 0, 0, -1, -1, -1, -1, -1, -1,\ + 0, 0, 0, -1, -1, -1, -1, -1,\ + 0, 0, 0, 0, -1, -1, -1, -1,\ + 0, 0, 0, 0, 0, -1, -1, -1,\ + 0, 0, 0, 0, 0, 0, -1, -1,\ + 0, 0, 0, 0, 0, 0, 0, -1,\ + 0, 0, 0, 0, 0, 0, 0, 0 + +SECTION .text + +; void ff_compute_weights_line(const uint32_t *const iia, +; const uint32_t *const iib, +; const uint32_t *const iid, +; const uint32_t *const iie, +; const uint8_t *const src, +; float *total, +; float *sum, +; const float *const lut, +; int max, +; int startx, int endx); + +INIT_YMM avx2 +cglobal compute_weights_line, 8, 13, 5, 0, iia, iib, iid, iie, src, total, sum, lut, x, startx, endx, mod, elut + movsxd startxq, dword startxm + movsxd endxq, dword endxm + VPBROADCASTD m2, r8m + + mov xq, startxq + mov modq, mmsize / 4 + lea elutq, [ending_lut] + + vpcmpeqd m4, m4 + + .loop: + mov startxq, endxq + sub startxq, xq + cmp startxq, modq + cmovge startxq, modq + sal startxq, 5 + + movu m0, [iieq + xq * 4] + + psubd m0, [iidq + xq * 4] + psubd m0, [iibq + xq * 4] + paddd m0, [iiaq + xq * 4] + por m0, [elutq + startxq] + pminud m0, m2 + pslld m0, 2 + mova m3, m4 + vgatherdps m1, [lutq + m0], m3 + + pmovzxbd m3, [srcq + xq] + cvtdq2ps m3, m3 + + mulps m0, m1, m3 + + addps m1, [totalq + xq * 4] + addps m0, [sumq + xq * 4] + + movups [totalq + xq * 4], m1 + movups [sumq + xq * 4], m0 + + add xq, mmsize / 4 + cmp xq, endxq + jl .loop + RET + +%endif diff --git a/libavfilter/x86/vf_nlmeans_init.c b/libavfilter/x86/vf_nlmeans_init.c new file mode 100644 index 0000000000..37764d30ab --- /dev/null +++ b/libavfilter/x86/vf_nlmeans_init.c @@ -0,0 +1,40 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/x86/cpu.h" +#include "libavfilter/vf_nlmeans.h" + +void ff_compute_weights_line_avx2(const uint32_t *const iia, + const uint32_t *const iib, + const uint32_t *const iid, + const uint32_t *const iie, + const uint8_t *const src, + float *total_weight, + float *sum, + const float *const weight_lut, + int max_meaningful_diff, + int startx, int endx); + +av_cold void ff_nlmeans_init_x86(NLMeansDSPContext *dsp) +{ + int cpu_flags = av_get_cpu_flags(); + + if (ARCH_X86_64 && EXTERNAL_AVX2_FAST(cpu_flags)) + dsp->compute_weights_line = ff_compute_weights_line_avx2; +}