From patchwork Fri Aug 27 04:51:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29801 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1119106iov; Thu, 26 Aug 2021 21:52:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoCi8p8kNg0Qx8bF+4ZUXFq5DdOkcKiNcbbkeZXBdu2WP+cpdoMdAJItndpicm8DRtG4Rg X-Received: by 2002:a17:907:7252:: with SMTP id ds18mr8005905ejc.105.1630039942070; Thu, 26 Aug 2021 21:52:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630039942; cv=none; d=google.com; s=arc-20160816; b=ImvVEiYAf3qbp+wSuwIslkN0XhaIVV/w2vqh57mk9T5onBqrrsBJ6TMNmvCJjbaVJO j+u/N3VzABy0BHTjb28dqwxDP7B7cQPg1CQ1g+ebYKcBOrEJbAebSIfCzK2lS6c0j3qJ QmMnbRY2HvQG5x7ZR9h1FXZc3KgxnHYZ+ZWLJxVkpTrkxUbSQINQyxv47dIwQvMKHyva On8R7A/HhlrhbAgtzWvrr6w2KaPfDasin6ah+n8zbP3oEfpmzYu8oDtgjxZS1YUnKReA 0FFWGYnPa8il2XAfCIjB49IRCvyonFFU03jKUMY0ZItgGuK0hwtzzkO8lZtnlUCuKXvK McPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=s5YZmcHC9NmRkUyOwKvZVIuDXYOGdQDGAH2LOvzfwZ8=; b=yiULbKX5wz31oSD4biDgmrQ95hFRnH14qrNrsosMmzPpn7NH2MfUz7ljm/RSE8RIaQ qgNWLxCMst7YL5eaPScwUV4D3PowYLdcjLBTzDquVTdfkK/FbZNR2LRpTMyoY7hmU1en h7tTYg2NBKROE0AjLwFwBIrp4gJTrgGuEki3HXVEmxu39sr4wY8/7rQOpwrohvCOT+ly B3+iAui5mNtcMhEDmS2GYGoshVgR2yj42ZKlrt02HRVLKzHu/o+zKo+j0aNXAR5P3OI1 zSjY/fJIIQsDMGms7MgAHb030N11Il/FtFBGusVqh2yGNUbCOowaYHuVptoOeat5Qn3F cgxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hc39si5053999ejc.485.2021.08.26.21.52.21; Thu, 26 Aug 2021 21:52:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD58368A2D5; Fri, 27 Aug 2021 07:52:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 125F668A2DC for ; Fri, 27 Aug 2021 07:51:54 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281613554" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281613554" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 21:51:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="495523698" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by fmsmga008.fm.intel.com with ESMTP; 26 Aug 2021 21:51:51 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Aug 2021 12:51:43 +0800 Message-Id: <20210827045144.73794-3-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210827045144.73794-1-jianhua.wu@intel.com> References: <20210827045144.73794-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 3/4] libavfilter/x86/vf_threshold: add ff_threshold8/16_avx512 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Y6HrETntIok/ Performance(Less is better) 8bit: ff_threshold8_sse4 32.7555351 ff_threshold8_avx2 32.1713562 ff_threshold8_avx512 32.0103531 16bit: ff_threshold16_sse4 37.7713432 ff_threshold16_avx2 35.3348312 ff_threshold16_avx512 32.6976166 Signed-off-by: Wu Jianhua --- libavfilter/x86/vf_threshold.asm | 44 +++++++++++++++++++++-------- libavfilter/x86/vf_threshold_init.c | 8 ++++++ 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/libavfilter/x86/vf_threshold.asm b/libavfilter/x86/vf_threshold.asm index 098069b083..dc4126c7af 100644 --- a/libavfilter/x86/vf_threshold.asm +++ b/libavfilter/x86/vf_threshold.asm @@ -29,6 +29,15 @@ pb_128_0 : times 8 db 0, 128 SECTION .text +%macro DECL_MASK 2 +%if mmsize < 64 + %xdefine %1 m%2 +%else + %assign %%i %2 + 1 + %xdefine %1 k %+ %%i +%endif +%endmacro + ;%1 depth (8 or 16) ; %2 b or w ; %3 constant %macro THRESHOLD 3 %if ARCH_X86_64 @@ -58,17 +67,24 @@ cglobal threshold%1, 5, 7, 5, in, threshold, min, max, out, w, x .nextrow: mov xq, wq - .loop: - movu m1, [inq + xq] - movu m0, [thresholdq + xq] - movu m2, [minq + xq] - movu m3, [maxq + xq] - pxor m0, m4 - pxor m1, m4 - pcmpgt%2 m0, m1 - PBLENDVB m3, m2, m0 - movu [outq + xq], m3 - add xq, mmsize +.loop: + movu m1, [inq + xq] + movu m0, [thresholdq + xq] + movu m2, [minq + xq] + movu m3, [maxq + xq] + pxor m0, m4 + pxor m1, m4 + DECL_MASK mask, 0 + pcmpgt%2 mask, m0, m1 + +%if mmsize == 64 + vpblendm%2 m3{mask}, m3, m2 +%else + PBLENDVB m3, m2, mask +%endif + + movu [outq + xq], m3 + add xq, mmsize jl .loop add inq, ilinesizeq @@ -90,3 +106,9 @@ INIT_YMM avx2 THRESHOLD 8, b, pb_128 THRESHOLD 16, w, pb_128_0 %endif + +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +THRESHOLD 8, b, pb_128 +THRESHOLD 16, w, pb_128_0 +%endif diff --git a/libavfilter/x86/vf_threshold_init.c b/libavfilter/x86/vf_threshold_init.c index 71bde15097..23500ea1bf 100644 --- a/libavfilter/x86/vf_threshold_init.c +++ b/libavfilter/x86/vf_threshold_init.c @@ -34,8 +34,10 @@ void ff_threshold##depth##_##opt(const uint8_t *in, const uint8_t *threshold,\ THRESHOLD_FUNC(8, sse4) THRESHOLD_FUNC(8, avx2) +THRESHOLD_FUNC(8, avx512) THRESHOLD_FUNC(16, sse4) THRESHOLD_FUNC(16, avx2) +THRESHOLD_FUNC(16, avx512) av_cold void ff_threshold_init_x86(ThresholdContext *s) { @@ -48,6 +50,9 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold8_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->threshold = ff_threshold8_avx512; + } } else { if (EXTERNAL_SSE4(cpu_flags)) { s->threshold = ff_threshold16_sse4; @@ -55,5 +60,8 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold16_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->threshold = ff_threshold16_avx512; + } } }