From patchwork Fri Aug 27 04:51:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29800 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1118910iov; Thu, 26 Aug 2021 21:52:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyBm0tb1wIVIoXY8+HDpEwSQzN60VI1w/shZqcZ27qbhsXTj2ubvjmqnqmQ5PNV6Q9ttlve X-Received: by 2002:a05:6402:1642:: with SMTP id s2mr7909738edx.135.1630039923630; Thu, 26 Aug 2021 21:52:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630039923; cv=none; d=google.com; s=arc-20160816; b=h1C+uxGlmuANiRkCmPXNbAGRVx2i6uJPHrp8wZz2TxHFF5Q8prBpgP71xcCsax3P8s p5ZcTHO43hdUQ5lpHlgYd1Z3DFrXrN/uN+yzb2pW8n+ii5zBtEGG3i5xwChq7JW7uAN1 zhhFJnDSUqLw7uW+KkjPHkUyC8/FkoLd26crEAHDDOv5j+Zfy/O22gADND0wyS6WEFv7 96fA+zVdco5f5hlBxODCdV2aaUO541jczDWxjzE00/+wxpPWM3Qm4eE20HyzdrejC5Xs NJ3n/rwIDVXxYSF1GfVzmYCCgzaX4Htbjehob07486zLdA3WckWnMd+JBLxvLbb5ZG93 MB4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:delivered-to; bh=LqF0uR0pgBMMT5b+q87BnShpmsBmLfJXnW81avPAs+w=; b=g0O1Y/MegP7WumQ3fS7Zhdeb5UyXULiWBIiMVLgG5ULHhjfwgBkgMEP6DEaEykHpTx mHo9A3ogpv9UuLbKWCaXhNhV2uF1jFNu0j/bJle7LIQ8FnwsElTAiT8S1BjtVuBbVyvT BCrdWlcuKdH1iebTiTCXQNjYbijnZtUXvgd5cdVrdYw4N69Km0yINWHmzTtSIoKAM6LA l0PbFCIeaoViL+K8FNtvVOz3iPeFTKUXrGYlVaFdq2K34Rf4rfq/JDvqui5uJdPbnmYZ cr1vFHwy6VnuMevd8o2iuL2BA7H3pHsrTik/4pCWD2Yc6jnOGZ5mGE2c1P5rTXLycNKP zHiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m20si5187889edq.553.2021.08.26.21.52.02; Thu, 26 Aug 2021 21:52:03 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 76CB268A2B9; Fri, 27 Aug 2021 07:51:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7BAF16881A9 for ; Fri, 27 Aug 2021 07:51:52 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281613545" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281613545" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 21:51:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="495523685" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by fmsmga008.fm.intel.com with ESMTP; 26 Aug 2021 21:51:48 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Aug 2021 12:51:41 +0800 Message-Id: <20210827045144.73794-1-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH 1/4] libavfilter/x86/vf_hflip: add ff_flip_byte/short_avx512() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: tH5MEboBDCR6 Performance(Less is better): 8bit: ff_hflip_byte_ssse3 0.61 ff_hflip_byte_avx2 0.37 ff_hflip_byte_avx512 0.19 16bit: ff_hflip_short_ssse3 1.27 ff_hflip_short_avx2 0.76 ff_hflip_short_avx512 0.40 Signed-off-by: Wu Jianhua --- libavfilter/x86/vf_hflip.asm | 23 ++++++++++++++++++----- libavfilter/x86/vf_hflip_init.c | 8 ++++++++ 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/libavfilter/x86/vf_hflip.asm b/libavfilter/x86/vf_hflip.asm index 285618954f..c2237217f7 100644 --- a/libavfilter/x86/vf_hflip.asm +++ b/libavfilter/x86/vf_hflip.asm @@ -26,12 +26,16 @@ SECTION_RODATA pb_flip_byte: db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 pb_flip_short: db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 +pd_flip_indicies: dd 12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3 SECTION .text ;%1 byte or short, %2 b or w, %3 size in byte (1 for byte, 2 for short) %macro HFLIP 3 cglobal hflip_%1, 3, 5, 3, src, dst, w, r, x +%if mmsize == 64 + movu m3, [pd_flip_indicies] +%endif VBROADCASTI128 m0, [pb_flip_%1] xor xq, xq %if %3 == 1 @@ -47,12 +51,15 @@ cglobal hflip_%1, 3, 5, 3, src, dst, w, r, x .loop0: neg xq -%if mmsize == 32 - vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at load - vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at load +%if mmsize == 64 + vpermd m1, m3, [srcq + xq - mmsize + %3] + vpermd m2, m3, [srcq + xq - 2 * mmsize + %3] +%elif mmsize == 32 + vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at load + vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at load %else - movu m1, [srcq + xq - mmsize + %3] - movu m2, [srcq + xq - 2 * mmsize + %3] + movu m1, [srcq + xq - mmsize + %3] + movu m2, [srcq + xq - 2 * mmsize + %3] %endif pshufb m1, m0 pshufb m2, m0 @@ -88,3 +95,9 @@ INIT_YMM avx2 HFLIP byte, b, 1 HFLIP short, w, 2 %endif + +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +HFLIP byte, b, 1 +HFLIP short, w, 2 +%endif diff --git a/libavfilter/x86/vf_hflip_init.c b/libavfilter/x86/vf_hflip_init.c index 0ac399b0d4..25fc40f7b0 100644 --- a/libavfilter/x86/vf_hflip_init.c +++ b/libavfilter/x86/vf_hflip_init.c @@ -25,8 +25,10 @@ void ff_hflip_byte_ssse3(const uint8_t *src, uint8_t *dst, int w); void ff_hflip_byte_avx2(const uint8_t *src, uint8_t *dst, int w); +void ff_hflip_byte_avx512(const uint8_t *src, uint8_t *dst, int w); void ff_hflip_short_ssse3(const uint8_t *src, uint8_t *dst, int w); void ff_hflip_short_avx2(const uint8_t *src, uint8_t *dst, int w); +void ff_hflip_short_avx512(const uint8_t *src, uint8_t *dst, int w); av_cold void ff_hflip_init_x86(FlipContext *s, int step[4], int nb_planes) { @@ -41,6 +43,9 @@ av_cold void ff_hflip_init_x86(FlipContext *s, int step[4], int nb_planes) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->flip_line[i] = ff_hflip_byte_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->flip_line[i] = ff_hflip_byte_avx512; + } } else if (step[i] == 2) { if (EXTERNAL_SSSE3(cpu_flags)) { s->flip_line[i] = ff_hflip_short_ssse3; @@ -48,6 +53,9 @@ av_cold void ff_hflip_init_x86(FlipContext *s, int step[4], int nb_planes) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->flip_line[i] = ff_hflip_short_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->flip_line[i] = ff_hflip_short_avx512; + } } } } From patchwork Fri Aug 27 04:51:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29798 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1118996iov; Thu, 26 Aug 2021 21:52:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwEd9bANn1v6SDwUQq1UzMkUnkzLSrW/8dMMRkT5NCZiLS+6R/5EL8AFEPIC6cqmrzK5zF1 X-Received: by 2002:a17:906:aada:: with SMTP id kt26mr7993220ejb.199.1630039933036; Thu, 26 Aug 2021 21:52:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630039933; cv=none; d=google.com; s=arc-20160816; b=mM4wxx7kKxHQClCLPDqKNHpYF8NwZ7H6NFxUM7rbyuydLtzwjXFM4UFeBjzqHEbRbG v2+HbBL3l4ZwdlamgVSrWIJIVszMl8gFJOKUUAkr6DWB9trmgQmWGn3h9/oJfPM9WYVR x1zlvQecKcZtPDrZZarBnf4ft4RqgztusFYNJdhQbXL1pEyLY5IDaD3qimWm6LbJKLy8 CcpM03E1X8CoVabYFyKltD4z5wos0cxkF6jjVvHsOt/m2YCEld8rQhzfWwoBBZKkTxuV 3tbMRKTf2rXaUsbqk9z9ECMTcq9ds1zYMd6ZIx0NHh1cKBGLaFmL1h3WPL1k8f/ZBPTr /rbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=AwpNjNqkICCFv8Bf8mPhOg2y8pAgiqTboxmm6UxKLtU=; b=hUT/XalxZlLrvLnZ4urznprEcMyFF5t7eRpjVw47WCxsyQ9HRij9ZsCVOrMfM9sYh2 XJLvWjtQ8yUltOds4Of6KdU9bVW8/anXZ1Di6wEsVrYU0pwUlscDmlMVJ0CxNbs1DDqf hh16e/V5xb03xpFt3nlKZ8Wzlew58oB7Y++w81FTEnAZZBy2WpnD1BoiEUlp08SssxT4 RuPdd7BWCt8f/fmACF02dHC3PwkgGUwdXBGs9m9yjQJ9JrXXEC6bgOLjkFnw9miP32+P jJumEJawPIQ1PJ4M2VBVOw0mwq8xXhJgRxmF+Tgj4vRrkcV5mHbk129BUfwtkY0HqiCj qU/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h10si5461156edk.552.2021.08.26.21.52.12; Thu, 26 Aug 2021 21:52:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ACF0A68A306; Fri, 27 Aug 2021 07:52:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E093268834D for ; Fri, 27 Aug 2021 07:51:53 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281613550" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281613550" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 21:51:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="495523691" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by fmsmga008.fm.intel.com with ESMTP; 26 Aug 2021 21:51:50 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Aug 2021 12:51:42 +0800 Message-Id: <20210827045144.73794-2-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210827045144.73794-1-jianhua.wu@intel.com> References: <20210827045144.73794-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 2/4] libavfilter/x86/vf_threshold_init: remove condition s->depth == 16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CD2zUe7nvecZ As we all know the 10bit samples would also be stored as a 16bits type, the condition judgment will lead the SIMD optimizations to be uninitialized when the depth is 10. Signed-off-by: Wu Jianhua --- libavfilter/x86/vf_threshold_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavfilter/x86/vf_threshold_init.c b/libavfilter/x86/vf_threshold_init.c index 8e42296791..71bde15097 100644 --- a/libavfilter/x86/vf_threshold_init.c +++ b/libavfilter/x86/vf_threshold_init.c @@ -48,7 +48,7 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold8_avx2; } - } else if (s->depth == 16) { + } else { if (EXTERNAL_SSE4(cpu_flags)) { s->threshold = ff_threshold16_sse4; } From patchwork Fri Aug 27 04:51:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29801 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1119106iov; Thu, 26 Aug 2021 21:52:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoCi8p8kNg0Qx8bF+4ZUXFq5DdOkcKiNcbbkeZXBdu2WP+cpdoMdAJItndpicm8DRtG4Rg X-Received: by 2002:a17:907:7252:: with SMTP id ds18mr8005905ejc.105.1630039942070; Thu, 26 Aug 2021 21:52:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630039942; cv=none; d=google.com; s=arc-20160816; b=ImvVEiYAf3qbp+wSuwIslkN0XhaIVV/w2vqh57mk9T5onBqrrsBJ6TMNmvCJjbaVJO j+u/N3VzABy0BHTjb28dqwxDP7B7cQPg1CQ1g+ebYKcBOrEJbAebSIfCzK2lS6c0j3qJ QmMnbRY2HvQG5x7ZR9h1FXZc3KgxnHYZ+ZWLJxVkpTrkxUbSQINQyxv47dIwQvMKHyva On8R7A/HhlrhbAgtzWvrr6w2KaPfDasin6ah+n8zbP3oEfpmzYu8oDtgjxZS1YUnKReA 0FFWGYnPa8il2XAfCIjB49IRCvyonFFU03jKUMY0ZItgGuK0hwtzzkO8lZtnlUCuKXvK McPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=s5YZmcHC9NmRkUyOwKvZVIuDXYOGdQDGAH2LOvzfwZ8=; b=yiULbKX5wz31oSD4biDgmrQ95hFRnH14qrNrsosMmzPpn7NH2MfUz7ljm/RSE8RIaQ qgNWLxCMst7YL5eaPScwUV4D3PowYLdcjLBTzDquVTdfkK/FbZNR2LRpTMyoY7hmU1en h7tTYg2NBKROE0AjLwFwBIrp4gJTrgGuEki3HXVEmxu39sr4wY8/7rQOpwrohvCOT+ly B3+iAui5mNtcMhEDmS2GYGoshVgR2yj42ZKlrt02HRVLKzHu/o+zKo+j0aNXAR5P3OI1 zSjY/fJIIQsDMGms7MgAHb030N11Il/FtFBGusVqh2yGNUbCOowaYHuVptoOeat5Qn3F cgxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hc39si5053999ejc.485.2021.08.26.21.52.21; Thu, 26 Aug 2021 21:52:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD58368A2D5; Fri, 27 Aug 2021 07:52:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 125F668A2DC for ; Fri, 27 Aug 2021 07:51:54 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281613554" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281613554" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 21:51:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="495523698" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by fmsmga008.fm.intel.com with ESMTP; 26 Aug 2021 21:51:51 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Aug 2021 12:51:43 +0800 Message-Id: <20210827045144.73794-3-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210827045144.73794-1-jianhua.wu@intel.com> References: <20210827045144.73794-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 3/4] libavfilter/x86/vf_threshold: add ff_threshold8/16_avx512 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Y6HrETntIok/ Performance(Less is better) 8bit: ff_threshold8_sse4 32.7555351 ff_threshold8_avx2 32.1713562 ff_threshold8_avx512 32.0103531 16bit: ff_threshold16_sse4 37.7713432 ff_threshold16_avx2 35.3348312 ff_threshold16_avx512 32.6976166 Signed-off-by: Wu Jianhua --- libavfilter/x86/vf_threshold.asm | 44 +++++++++++++++++++++-------- libavfilter/x86/vf_threshold_init.c | 8 ++++++ 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/libavfilter/x86/vf_threshold.asm b/libavfilter/x86/vf_threshold.asm index 098069b083..dc4126c7af 100644 --- a/libavfilter/x86/vf_threshold.asm +++ b/libavfilter/x86/vf_threshold.asm @@ -29,6 +29,15 @@ pb_128_0 : times 8 db 0, 128 SECTION .text +%macro DECL_MASK 2 +%if mmsize < 64 + %xdefine %1 m%2 +%else + %assign %%i %2 + 1 + %xdefine %1 k %+ %%i +%endif +%endmacro + ;%1 depth (8 or 16) ; %2 b or w ; %3 constant %macro THRESHOLD 3 %if ARCH_X86_64 @@ -58,17 +67,24 @@ cglobal threshold%1, 5, 7, 5, in, threshold, min, max, out, w, x .nextrow: mov xq, wq - .loop: - movu m1, [inq + xq] - movu m0, [thresholdq + xq] - movu m2, [minq + xq] - movu m3, [maxq + xq] - pxor m0, m4 - pxor m1, m4 - pcmpgt%2 m0, m1 - PBLENDVB m3, m2, m0 - movu [outq + xq], m3 - add xq, mmsize +.loop: + movu m1, [inq + xq] + movu m0, [thresholdq + xq] + movu m2, [minq + xq] + movu m3, [maxq + xq] + pxor m0, m4 + pxor m1, m4 + DECL_MASK mask, 0 + pcmpgt%2 mask, m0, m1 + +%if mmsize == 64 + vpblendm%2 m3{mask}, m3, m2 +%else + PBLENDVB m3, m2, mask +%endif + + movu [outq + xq], m3 + add xq, mmsize jl .loop add inq, ilinesizeq @@ -90,3 +106,9 @@ INIT_YMM avx2 THRESHOLD 8, b, pb_128 THRESHOLD 16, w, pb_128_0 %endif + +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +THRESHOLD 8, b, pb_128 +THRESHOLD 16, w, pb_128_0 +%endif diff --git a/libavfilter/x86/vf_threshold_init.c b/libavfilter/x86/vf_threshold_init.c index 71bde15097..23500ea1bf 100644 --- a/libavfilter/x86/vf_threshold_init.c +++ b/libavfilter/x86/vf_threshold_init.c @@ -34,8 +34,10 @@ void ff_threshold##depth##_##opt(const uint8_t *in, const uint8_t *threshold,\ THRESHOLD_FUNC(8, sse4) THRESHOLD_FUNC(8, avx2) +THRESHOLD_FUNC(8, avx512) THRESHOLD_FUNC(16, sse4) THRESHOLD_FUNC(16, avx2) +THRESHOLD_FUNC(16, avx512) av_cold void ff_threshold_init_x86(ThresholdContext *s) { @@ -48,6 +50,9 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold8_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->threshold = ff_threshold8_avx512; + } } else { if (EXTERNAL_SSE4(cpu_flags)) { s->threshold = ff_threshold16_sse4; @@ -55,5 +60,8 @@ av_cold void ff_threshold_init_x86(ThresholdContext *s) if (EXTERNAL_AVX2_FAST(cpu_flags)) { s->threshold = ff_threshold16_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->threshold = ff_threshold16_avx512; + } } } From patchwork Fri Aug 27 04:51:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29799 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1119188iov; Thu, 26 Aug 2021 21:52:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5XtLHmV9WhR45XMgOdgQIJFSkZAKOUbs8DaZuDSAhEy56hqgDj+knOWzOFWMo424fVYV4 X-Received: by 2002:a17:906:7802:: with SMTP id u2mr7925664ejm.325.1630039952223; Thu, 26 Aug 2021 21:52:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630039952; cv=none; d=google.com; s=arc-20160816; b=Qm2vqCm5JCzCWeZ2XH4LZQCaKFFHfzdP7QxmV4CgTsGmkzWbRafwp2kNngiK/kvkOf fr/VIUE+s1JoTFa1wHk/NC1pawd5+xTuS1OZfG+hFqp61Kj7h2zpEgboSuMloph/kv/0 6vh6MGtKVEW9lbQgs5Ox0G6x7wsnKMXzCv3S5GR9dEoiX+ied0ekxZ4EqVWFQCJtvRSn lyaZLJHR0mbxEhyBSVF5Jyp/tcdH5wdCnGVffrdDv/YDSuRShTdllu1JqdvN4JBCyYN7 y2z3z7kkNTACdBcylJT/rMn2ds0YD2by0FClFEl/xtXazBqQ0YLPCAZZgowCgy+0gkkz mPbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=QoFvcStYl51bL2wz1AAujr77pmfTSQINdau5jbw8ZBw=; b=btXhXtIKsvF1VWm/lAMk6+aZqnKo2LFkecM9DicWwPcawA3il+0R8AOJiJkD+j5Ppf 1isgM1nfRUVirR29JtbHAUabug7mr7f9jXSI9AaiB0r17R4cqACykbVryYj8gOtKNk/F unieMdzbXJRN9Q3qT/3/N9QW/XSydDg54XDOnMNmaU2BAs5lGBnnQjYotaT8WTG8h7gP RcgZthj5khLINuKl0OAEQvysKXep26DBQZHeX9cA0adMaaqAwvUe3BSSeL9vNPAP+sEY QV4XSxHx1l4QrDKugWs0sl/p6nljOI0P5DQsXr7EwK5VEWVU2QOWEc8JRmsKbLzZ6o7f HbiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f5si4960184ejj.75.2021.08.26.21.52.31; Thu, 26 Aug 2021 21:52:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C02A96881A9; Fri, 27 Aug 2021 07:52:06 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 742EF68A323 for ; Fri, 27 Aug 2021 07:51:58 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281613555" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281613555" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 21:51:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="495523704" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by fmsmga008.fm.intel.com with ESMTP; 26 Aug 2021 21:51:52 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Aug 2021 12:51:44 +0800 Message-Id: <20210827045144.73794-4-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210827045144.73794-1-jianhua.wu@intel.com> References: <20210827045144.73794-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 4/4] libavfilter/vf_avgblur_vulkan: fix incorrect conditional judgement X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: AmYjFObistov Signed-off-by: Wu Jianhua --- libavfilter/vf_avgblur_vulkan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavfilter/vf_avgblur_vulkan.c b/libavfilter/vf_avgblur_vulkan.c index 1e485061cd..2dbfc947a8 100644 --- a/libavfilter/vf_avgblur_vulkan.c +++ b/libavfilter/vf_avgblur_vulkan.c @@ -333,7 +333,7 @@ static int avgblur_vulkan_filter_frame(AVFilterLink *link, AVFrame *in) } tmp = ff_get_video_buffer(outlink, outlink->w, outlink->h); - if (!out) { + if (!tmp) { err = AVERROR(ENOMEM); goto fail; }