From patchwork Fri Oct 8 02:31:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 30983 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2084:0:0:0:0 with SMTP id a4csp416818ioa; Thu, 7 Oct 2021 19:31:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzBeG3djEJ0MRhuQPVGBifCHaBM2zVRnPj9VMwcaVDu4yAVbQB18uu/AfNtegnqdRJ/XK8W X-Received: by 2002:a17:906:7847:: with SMTP id p7mr713982ejm.335.1633660289798; Thu, 07 Oct 2021 19:31:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633660289; cv=none; d=google.com; s=arc-20160816; b=0FJjHcSjR7tnStVgJMNSMkWCh0+K4CekGBwP5xk+Lmp1j9sEvYGA0vd2Bil7/zIub4 6A96i4wG/4tSUUicnS6Gt+dGGd/GfXkEWpiMvg2kbKVfWdjaxBkRD9YqAw4mlnzsG4DA Da9/jm4yMkVlKuD3C3aSH21+6Tj6sJBTJLEbhZCSg128Y8vYN6EbcEwej/ulVUzqKS2q BUT/h9aUtC6nutxoTXU1tShQVDvaz+QYgty4jU1qfweDBJ6B2oYGXry8mJJaAcEKcW4+ uIO3q9z1hoZaOVFp2NvxR1dBLXElqWMQRnkc+du+sWd+XUS6tbOvJqWXoVPFyZspURus jxvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=g6ixFo/5eUOSK103gJFDG2UxqtGDH2ZONkxYrWjNf84=; b=RiimqB6D1cEf3f4ETZal0DhutBjFEPJ4faxnFHNy1xN1qZ/kHC0I+B5JLWLLfTcgfB ur4Uf3/iQXS/v6ek6ZMg7+LnaS0xsUqetTVp6QuKUw3Iz+i7attN1pk3e2a2WBcJe6dt W2PZcJDS6OkJxX2E/Y/N8f5gKvEiEEVC59ALv5stJTGFOXzgN+MUc4OoMTSI/hlllb5C NBxxrUtbVmsGQsCBotciErST2GIqcDKp6/J+7HcEXNOD6rjE4CVAOkolcVo5l1i/dYmb 1V8PYiRo5qN0dZ1YvpPAcX+kRGvCEaXQpLUiGyx0rnL5mhfHrtQEdOZ6LGiFNODQ7dP4 jPmA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c11si1568695ejm.510.2021.10.07.19.31.29; Thu, 07 Oct 2021 19:31:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3F9D6680679; Fri, 8 Oct 2021 05:31:16 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5139F689F08 for ; Fri, 8 Oct 2021 05:31:08 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10130"; a="225195018" X-IronPort-AV: E=Sophos;i="5.85,356,1624345200"; d="scan'208";a="225195018" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 19:31:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,356,1624345200"; d="scan'208";a="489274211" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by orsmga008.jf.intel.com with ESMTP; 07 Oct 2021 19:31:04 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Fri, 8 Oct 2021 10:31:00 +0800 Message-Id: <20211008023101.4100-2-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211008023101.4100-1-jianhua.wu@intel.com> References: <20211008023101.4100-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH v2 2/3] libavfilter/x86/vf_threshold: add ff_threshold8/16_avx512 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: AMgjWQdvhljw Performance(Less is better) 8bit: ff_threshold8_sse4 32.7555351 ff_threshold8_avx2 32.1713562 ff_threshold8_avx512 32.0103531 16bit: ff_threshold16_sse4 37.7713432 ff_threshold16_avx2 35.3348312 ff_threshold16_avx512 32.6976166 Signed-off-by: Wu Jianhua --- libavfilter/x86/vf_threshold.asm | 44 +++++++++++++++++++++-------- libavfilter/x86/vf_threshold_init.c | 8 ++++++ 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/libavfilter/x86/vf_threshold.asm b/libavfilter/x86/vf_threshold.asm index 098069b083..dc4126c7af 100644 --- a/libavfilter/x86/vf_threshold.asm +++ b/libavfilter/x86/vf_threshold.asm @@ -29,6 +29,15 @@ pb_128_0 : times 8 db 0, 128 SECTION .text +%macro DECL_MASK 2 +%if mmsize < 64 + %xdefine %1 m%2 +%else + %assign %%i %2 + 1 + %xdefine %1 k %+ %%i +%endif +%endmacro + ;%1 depth (8 or 16) ; %2 b or w ; %3 constant %macro THRESHOLD 3 %if ARCH_X86_64 @@ -58,17 +67,24 @@ cglobal threshold%1, 5, 7, 5, in, threshold, min, max, out, w, x .nextrow: mov xq, wq - .loop: - movu m1, [inq + xq] - movu m0, [thresholdq + xq] - movu m2, [minq + xq] - movu m3, [maxq + xq] - pxor m0, m4 - pxor m1, m4 - pcmpgt%2 m0, m1 - PBLENDVB m3, m2, m0 - movu [outq + xq], m3 - add xq, mmsize +.loop: + movu m1, [inq + xq] + movu m0, [thresholdq + xq] + movu m2, [minq + xq] + movu m3, [maxq + xq] + pxor m0, m4 + pxor m1, m4 + DECL_MASK mask, 0 + pcmpgt%2 mask, m0, m1 + +%if mmsize == 64 + vpblendm%2 m3{mask}, m3, m2 +%else + PBLENDVB m3, m2, mask +%endif + + movu [outq + xq], m3 + add xq, mmsize jl .loop add inq, ilinesizeq @@ -90,3 +106,9 @@ INIT_YMM avx2 THRESHOLD 8, b, pb_128 THRESHOLD 16, w, pb_128_0 %endif + +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +THRESHOLD 8, b, pb_128 +THRESHOLD 16, w, pb_128_0 +%endif diff --git a/libavfilter/x86/vf_threshold_init.c b/libavfilter/x86/vf_threshold_init.c index 8e42296791..0c75ea2870 100644 --- a/libavfilter/x86/vf_threshold_init.c +++ b/libavfilter/x86/vf_threshold_init.c @@ -34,8 +34,10 @@ void ff_threshold##depth##_##opt(const uint8_t *in, const uint8_t *threshold,\ THRESHOLD_FUNC(8, sse4) THRESHOLD_FUNC(8, avx2) +THRESHOLD_FUNC(8, avx512) THRESHOLD_FUNC(16, sse4) THRESHOLD_FUNC(16, avx2) +THRESHOLD_FUNC(16, avx512) av_cold void ff_threshold_init_x86(ThresholdContext *s) { @@ -48,6 +50,9 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold8_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->threshold = ff_threshold8_avx512; + } } else if (s->depth == 16) { if (EXTERNAL_SSE4(cpu_flags)) { s->threshold = ff_threshold16_sse4; @@ -55,5 +60,8 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold16_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->threshold = ff_threshold16_avx512; + } } }