From patchwork Sun Oct 24 20:25:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 31227 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2084:0:0:0:0 with SMTP id a4csp4158907ioa; Sun, 24 Oct 2021 13:26:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwbe9qWfVw4zyKop8OotYRxyvjwT26vvzr3N6U7GU6r312+xn+HVD3jl/CqvTCgjhVjsKE6 X-Received: by 2002:a05:6402:11d4:: with SMTP id j20mr20449273edw.267.1635107180556; Sun, 24 Oct 2021 13:26:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635107180; cv=none; d=google.com; s=arc-20160816; b=BXG+KU9a2J8PN5lgR40xK2K0+du8G0oc0IA7RfIjhiqEoKZP3LErPe0woqqr8GvaL/ 2KOkOHCnYAnLeSjM1+yDC7zOs1nt8jn5KL5zkooSQbSf5H3rlA5Puu27TMsh0Gc+3/lb hFLU1CMVf64svW5XZsXKSHaxP8JTEeYbTRo1BQrOXjKL/aus0KTgq3U8KB/Qn6TEbJT6 XAqVvJOoolP4eGl2bme9a3IMUsticEUea/nH/v/XEvFLqipf5B2DnqGGTvSalXTTBsLH Ei+8t+1pV9PaU0WFb+Ije4yOQClAITEqCk+XuAuXsNy1nKw1m1vSNnnLcJSK86Nh4gqT LimQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=QkZ7lphgr94mS/4pArZT6XG/5Ll6nMHr/mKG7Bj5mro=; b=zaY1K/EU6zr+KyPGV7mDQxzKIzKRs9IFLMm+X4+tyiZhXdW4iwV1Os9jUzMVdYwPO5 3YotRXmJxNRcrP8YnYf/CFIl7b1KdzCxgasRK6YOLxLTar3uCTDTAsjjXOIZzmrszWsr LmP1qUN5NSaANX7Xtoqb/GTdZjrZk2ZXYi2t12MKJEqd2uIfuS075hz4LVzxtVGOBsS1 SV95vzZo+rtaKNnZMgvkCgIe4qUv6v0fnW1QbGpOB0R8Wt9NIh5pPWdQEj45tw2MHg+v df9qaxjx8oO4CakyAX7vbfVUB9I9CwxG2Af8WAJhYh6HYLFmtMa+VdGHIVHgVoZz7W/v w2YQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=O5a5xid6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dd5si30408743ejc.375.2021.10.24.13.26.17; Sun, 24 Oct 2021 13:26:20 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=O5a5xid6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 816BD68A8D9; Sun, 24 Oct 2021 23:25:13 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F3E6068A83B for ; Sun, 24 Oct 2021 23:25:04 +0300 (EEST) Received: by mail-ed1-f42.google.com with SMTP id j10so2695930eds.12 for ; Sun, 24 Oct 2021 13:25:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=bMHPgpzmtqlcZHw94FtPgaA/9xsJFaKcS9h21vC454o=; b=O5a5xid6zbOIxSW843wSgSb8yXU3CUym6wMBPnsi9taOUxRln7iqudh5WIR6uCtQWl Hs/89jvrbhVeTh8b7fYmENz+874t+vex6N3uYlJSR1IuMhpsYQ3S2JgDWpgRmFXnk9qp k/xK8eP21mqDGKYeihAoBvDi53Kg7YCahARgh+Bjk9SD6oBVQ9hoX/M6S1ifQ6DeoKQ1 0wb5b2QuVVuKkL9/XynuvT/1Gskc2ujHODqE0MaxF4JjGNyjTXuvcWAPokOgpE7FxGUb ++LF34fzi7e0tFhdk6dFoEdE9PUmAQQNgkBnBf6D8ZQuxCfHXTfHw8oZjpS6GEmuHc9F bEhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bMHPgpzmtqlcZHw94FtPgaA/9xsJFaKcS9h21vC454o=; b=39TnCp/sKPCRgSWhqqCEy/zdU7Uw1wMgjFvvN/WO575s1W88OtD9ZzuSHvIi/a7KFE OY6HO45DNt7hETx9hgtXcqPwsSvBioiGa6ByQkVA5tSfvxcPy110MQHr+jBA/+Bmh0Mx 24EYrVKxsU/5Xvgt4hydoG3TdO9n14bb4NeTlREyAM26E6SvclcwTvXPMBRE91LGOmAt qh9hhpPtIFZhlj5DcM1QpYED9t1vD71jts0XYe2rng9cskX1x9vuUa7Ko43ukkX5Z+Vq KqblDZG/i6fbXiSqW/hNPEubHcKgmBW04Mt3vzDRZCT7yqlcL0VuAExQKUJMMSqKy81/ F42w== X-Gm-Message-State: AOAM533Sxv9N4MYzFWgZ2qf6/tCXHNCVFznSeTrTPObpDFUkSAE/1NYm akuh86AnuZUGjaW7LpKf2u4k6Sw37Lc= X-Received: by 2002:a17:906:a986:: with SMTP id jr6mr16992210ejb.520.1635107104530; Sun, 24 Oct 2021 13:25:04 -0700 (PDT) Received: from localhost.localdomain ([95.168.118.28]) by smtp.gmail.com with ESMTPSA id ga42sm6416697ejc.105.2021.10.24.13.25.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Oct 2021 13:25:04 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sun, 24 Oct 2021 22:25:02 +0200 Message-Id: <20211024202502.945133-6-onemda@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211024202502.945133-1-onemda@gmail.com> References: <20211024202502.945133-1-onemda@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 6/6][BROKEN] avfilter/vf_nlmeans: add x86 SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Ykg8Z5gcYzUh Signed-off-by: Paul B Mahol --- libavfilter/vf_nlmeans.c | 3 ++ libavfilter/vf_nlmeans.h | 1 + libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_nlmeans.asm | 89 ++++++++++++++++++++++++++++++++++ 4 files changed, 95 insertions(+) create mode 100644 libavfilter/x86/vf_nlmeans.asm diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index 93a14bcf19..16171d830a 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -513,6 +513,9 @@ void ff_nlmeans_init(NLMeansDSPContext *dsp) if (ARCH_AARCH64) ff_nlmeans_init_aarch64(dsp); + + if (ARCH_X86) + ff_nlmeans_init_x86(dsp); } static av_cold int init(AVFilterContext *ctx) diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h index d0d0056163..ae9f450dbf 100644 --- a/libavfilter/vf_nlmeans.h +++ b/libavfilter/vf_nlmeans.h @@ -45,5 +45,6 @@ typedef struct NLMeansDSPContext { void ff_nlmeans_init(NLMeansDSPContext *dsp); void ff_nlmeans_init_aarch64(NLMeansDSPContext *dsp); +void ff_nlmeans_init_x86(NLMeansDSPContext *dsp); #endif /* AVFILTER_NLMEANS_H */ diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index a29941eaeb..e87481bd7a 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -20,6 +20,7 @@ OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter_init.o OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d_init.o OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp_init.o OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge_init.o +OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans_init.o OBJS-$(CONFIG_NOISE_FILTER) += x86/vf_noise.o OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay_init.o OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7_init.o @@ -61,6 +62,7 @@ X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += x86/vf_limiter.o X86ASM-OBJS-$(CONFIG_LUT3D_FILTER) += x86/vf_lut3d.o X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp.o X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o +X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans.o X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o X86ASM-OBJS-$(CONFIG_PSNR_FILTER) += x86/vf_psnr.o diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm new file mode 100644 index 0000000000..aebcc59b54 --- /dev/null +++ b/libavfilter/x86/vf_nlmeans.asm @@ -0,0 +1,89 @@ +;***************************************************************************** +;* x86-optimized functions for nlmeans filter +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + + +%include "libavutil/x86/x86util.asm" + +%if HAVE_AVX2_EXTERNAL + +SECTION_RODATA + +SECTION .text + +; void ff_compute_weights_line(const uint32_t *const iia, +; const uint32_t *const iib, +; const uint32_t *const iid, +; const uint32_t *const iie, +; const uint8_t *const src, +; struct weighted_avg *wa, +; const float *const lut, +; int max, +; int startx, int endx); + +INIT_YMM avx2 +cglobal compute_weights_line, 11, 11, 7, iia, iib, iid, iie, src, wa, lut, max, startx, endx, x + movsxdifnidn startxq, startxd + movsxdifnidn endxq, endxd + movsxdifnidn maxq, maxd + + sal startxq, 2 + sal endxq, 2 + + mov xq, startxq + sar startxq, 2 + VBROADCASTI128 m4, maxm + pcmpeqd m5, m5 + + .loop: + movu m0, [iieq + xq] + movu m1, [iidq + xq] + movu m2, [iibq + xq] + movu m3, [iiaq + xq] + vpmovzxbd m6, [srcq + startxq] + vcvtdq2ps m6, m6 + + psubd m0, m1 + psubd m0, m2 + paddd m0, m3 + pminud m0, m4 + pslld m0, 2 + mova m3, m5 + vpgatherdd m1, [lutq + m0], m3 + + vmulps m2, m1, m6 + vunpcklps m0, m1, m2 + vunpckhps m1, m1, m2 + + movu m2, [waq + xq * 2] + movu m3, [waq + xq * 2 + 4 * 4] + + vaddps m0, m2 + vaddps m1, m3 + + movu [waq + xq * 2], m0 + movu [waq + xq * 2 + 4 * 4], m1 + + add startxq, 1 * 4 + add xq, 4 * 4 + cmp xq, endxq + jl .loop + RET + +%endif