From patchwork Mon Feb 20 19:57:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 40459 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:5494:b0:bf:7b3a:fd32 with SMTP id i20csp2180584pzk; Mon, 20 Feb 2023 11:59:35 -0800 (PST) X-Google-Smtp-Source: AK7set//znLI/45hW8+GMtHjZJ/prXhPfxE6GUN9hCWOk0rOrxfgIh7R8jKIen11CtJ9uHZoXTTu X-Received: by 2002:a05:6402:1b05:b0:4ab:4ad1:a381 with SMTP id by5-20020a0564021b0500b004ab4ad1a381mr3276553edb.10.1676923175735; Mon, 20 Feb 2023 11:59:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676923175; cv=none; d=google.com; s=arc-20160816; b=NReyNeIJBxUycLjBvGfgTHZ/b8nPs7msdUQWb9Ux1sgSMDfJOxNaEBhwJQfWdK5muZ lQ5Mfg3etITTOwWcKyc9dKAgsAvQBNGUD/BY5rD4AQJMw5B2/OnPIzDhFSF7ajdVqcDb uOOWsu4VMNPX4cuaWCImvj+gEHFzHkipYwQt3abp6fR3TetSOn0xn8GhLpOjjAfCS8EV XzJNRVJOxkYGmhwKTgn0JBLhJLhphZBQJZ+f8QMlHb0uS4d6AnQ0GwW9U+NhTxBfqBAP 08uPF5t1PUqw3Ac4gCth05B9UKnnVkG4A2I1cB6N3Lsfo+3sY4Vkqb/yOqfHDqfybNVN SJWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=IPjJg7xguXB2MF3UsQw8iBElqAZDH1n5NSIbW5687ZY=; b=vhFNi1FeYPMuREEAOuU0vXGNJbacTHP+i08sG/Hciq/m/zIOyFjS9atcLFhedkvI1k 2ZurH5popgiH4EpB8/UA5bLSqByWBifEHmE3GmHYtvaf7V7w0UeK7UhoS6W7JVowgSVO rFRRbAtsalC51LTGo1tyMeKCDR0UwZChDn9iwJDLuoxzRdROYTeCE6BKCQK6qdR9Ff8j K2ZQha6hMw9coPeiEL13FebfIP5McsPQy0XX/yliXqDQVAQjtOtnNW3RVtOfC0zZ8tdF kFr4GjlgAda+WeI9G1Y2EKiOyEyGlFfkQAcsQWeoKHkYx2eAU1SLNamsQF9oa8eAiV72 1zIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@obe-tv.20210112.gappssmtp.com header.s=20210112 header.b="x++gt/On"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e15-20020aa7d7cf000000b004acc28218e7si1472675eds.581.2023.02.20.11.59.35; Mon, 20 Feb 2023 11:59:35 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@obe-tv.20210112.gappssmtp.com header.s=20210112 header.b="x++gt/On"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8D3B368C0BA; Mon, 20 Feb 2023 21:59:16 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF84A68BE7D for ; Mon, 20 Feb 2023 21:59:09 +0200 (EET) Received: by mail-wr1-f47.google.com with SMTP id v3so2253110wrp.2 for ; Mon, 20 Feb 2023 11:59:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=obe-tv.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=uKIXicLH7OplCg7ap8YwJSrqgWdlwNOLDSn8nKoFYos=; b=x++gt/OnwcWb5DnLil9onb9UcCwFxpSNjmabLq4l8gVPP55AtUA16dWfV3iFjwLs67 I4ky/gndbhON7O7/ljpNn441futMS6254ogkHNqf4nSbieNplFZPrlj78SZmaFagXtd7 2iKe4h9Z5d6FsEdmccqBU473DDZ44NK/pwKdd/T8nnAb9Pw5vgZuHlvk9AHNZSfJrkrr jMpF10MyS6BCSwwS1tb/cdFXaZWk2HjfrHSPYGtc2J0tFlQdqhJdquDcots53hMXXO3T JnW6X0808inlgM13NbukQUYb+mQieNhesBuuFNjnySovY5oU721EMMpTrTMdN9svT63k jVCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uKIXicLH7OplCg7ap8YwJSrqgWdlwNOLDSn8nKoFYos=; b=RcV8T35YGMgAn7HRGx0OpJMPboS3gWbABGcazFsd3ZimCAedxYr4URZU0sdCkqf3pE ufyylABEWQqPED7FHEmgira8sjGS54CGPYWIpo3pBUZgwJ/PTXCpyMKp2Sv/9RtR51Bm qOe/eQ3U5jGGulPB8XL+IAmPHTkcMZdyA2r70KHBL4GeQI8lSa76xNwoeZ1WnCwN/xkA +G7794ZYdNzZWNrAa0Ru4+/cyzrS0junz5nLGmf3OiF0JkNnHePUIGQprsxnFwVOAaxq HaLjiBmmmMtdwL9QngMG3+uebRqqxoKLpSRpkD3hLVt35TVP7dUKLKPudy4NltJpEXnD pdfw== X-Gm-Message-State: AO0yUKV1aLDKpbzDWVXwQOVXu0s17oxL0BoaphruoffRMxvH2SQr2X+h U1lzF84g6rxNntoqNiaRRsOCUWM9l4BO4HEmaLGqlw== X-Received: by 2002:adf:e241:0:b0:2c3:ddbe:d7b2 with SMTP id bl1-20020adfe241000000b002c3ddbed7b2mr1245676wrb.62.1676923149260; Mon, 20 Feb 2023 11:59:09 -0800 (PST) Received: from Dana.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id k9-20020a5d6289000000b002c56af32e8csm1391787wru.35.2023.02.20.11.59.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Feb 2023 11:59:08 -0800 (PST) From: James Darnley To: ffmpeg-devel@ffmpeg.org Date: Mon, 20 Feb 2023 20:57:03 +0100 Message-Id: <20230220195703.1297421-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230220195703.1297421-1-jdarnley@obe.tv> References: <20230220195703.1297421-1-jdarnley@obe.tv> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] avfilter: add avx2 filter_line function for bwdif X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: jjCbjGoXrREC 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 --- libavfilter/x86/vf_bwdif.asm | 29 ++++++++++++++++++++++++----- libavfilter/x86/vf_bwdif_init.c | 12 ++++++++++++ 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/libavfilter/x86/vf_bwdif.asm b/libavfilter/x86/vf_bwdif.asm index 0b453da53b..5cc61435fd 100644 --- a/libavfilter/x86/vf_bwdif.asm +++ b/libavfilter/x86/vf_bwdif.asm @@ -26,18 +26,22 @@ %include "libavutil/x86/x86util.asm" -SECTION_RODATA +SECTION_RODATA 32 -pw_coefhf: times 4 dw 1016, 5570 -pw_coefhf1: times 8 dw -3801 -pw_coefsp: times 4 dw 5077, -981 -pw_splfdif: times 4 dw -768, 768 +pw_coefhf: times 8 dw 1016, 5570 +pw_coefhf1: times 16 dw -3801 +pw_coefsp: times 8 dw 5077, -981 +pw_splfdif: times 8 dw -768, 768 SECTION .text %macro LOAD8 2 + %if mmsize == 32 + pmovzxbw %1, %2 + %else movh %1, %2 punpcklbw %1, m7 + %endif %endmacro %macro LOAD12 2 @@ -45,8 +49,14 @@ SECTION .text %endmacro %macro DISP8 0 + %if mmsize == 32 + vextracti128 xm1, m2, 1 + packuswb xm2, xm1 + movu [dstq], xm2 + %else packuswb m2, m2 movh [dstq], m2 + %endif %endmacro %macro DISP12 0 @@ -244,8 +254,12 @@ cglobal bwdif_filter_line_12bit, 4, 9, 13, 0, dst, prev, cur, next, w, \ prefs, mrefs, prefs2, mrefs2, \ prefs3, mrefs3, prefs4, \ mrefs4, parity, clip_max + %if mmsize == 32 + vpbroadcastd m12, DWORD clip_maxm + %else movd m12, DWORD clip_maxm SPLATW m12, m12, 0 + %endif %else cglobal bwdif_filter_line_12bit, 4, 6, 8, 80, dst, prev, cur, next, w, \ prefs, mrefs, prefs2, mrefs2, \ @@ -264,3 +278,8 @@ INIT_XMM ssse3 BWDIF INIT_XMM sse2 BWDIF + +%if HAVE_AVX2_EXTERNAL && ARCH_X86_64 +INIT_YMM avx2 +BWDIF +%endif diff --git a/libavfilter/x86/vf_bwdif_init.c b/libavfilter/x86/vf_bwdif_init.c index ba7bc40c3d..f833318c10 100644 --- a/libavfilter/x86/vf_bwdif_init.c +++ b/libavfilter/x86/vf_bwdif_init.c @@ -32,6 +32,10 @@ void ff_bwdif_filter_line_ssse3(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int prefs2, int mrefs2, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line_avx2(void *dst, void *prev, void *cur, void *next, + int w, int prefs, int mrefs, int prefs2, + int mrefs2, int prefs3, int mrefs3, int prefs4, + int mrefs4, int parity, int clip_max); void ff_bwdif_filter_line_12bit_sse2(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int prefs2, @@ -41,6 +45,10 @@ void ff_bwdif_filter_line_12bit_ssse3(void *dst, void *prev, void *cur, void *ne int w, int prefs, int mrefs, int prefs2, int mrefs2, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line_12bit_avx2(void *dst, void *prev, void *cur, void *next, + int w, int prefs, int mrefs, int prefs2, + int mrefs2, int prefs3, int mrefs3, int prefs4, + int mrefs4, int parity, int clip_max); av_cold void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth) { @@ -51,10 +59,14 @@ av_cold void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth) bwdif->filter_line = ff_bwdif_filter_line_sse2; if (EXTERNAL_SSSE3(cpu_flags)) bwdif->filter_line = ff_bwdif_filter_line_ssse3; + if (ARCH_X86_64 && EXTERNAL_AVX2(cpu_flags)) + bwdif->filter_line = ff_bwdif_filter_line_avx2; } else if (bit_depth <= 12) { if (EXTERNAL_SSE2(cpu_flags)) bwdif->filter_line = ff_bwdif_filter_line_12bit_sse2; if (EXTERNAL_SSSE3(cpu_flags)) bwdif->filter_line = ff_bwdif_filter_line_12bit_ssse3; + if (ARCH_X86_64 && EXTERNAL_AVX2(cpu_flags)) + bwdif->filter_line = ff_bwdif_filter_line_12bit_avx2; } }