From patchwork Mon Mar 20 16:49:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 40733 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:d046:b0:cd:afd7:272c with SMTP id hv6csp1934182pzb; Mon, 20 Mar 2023 09:52:23 -0700 (PDT) X-Google-Smtp-Source: AK7set/7Qr6OXCOR70dIr4+xQQYi+g/iOakPTVkQRLUiIIY2QmoqT9Kfb4Zn7IxMCFfk1MlI0ikG X-Received: by 2002:a17:906:9f14:b0:92b:e1ff:be30 with SMTP id fy20-20020a1709069f1400b0092be1ffbe30mr10288234ejc.4.1679331143065; Mon, 20 Mar 2023 09:52:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679331143; cv=none; d=google.com; s=arc-20160816; b=JUzX2BQ+Pe9DkEQx83jGmthy5YLqZlRlrfGy9sXLP3erqLJUAtW+g2Lr7UhRQ1vXRx 9VxjLXtaOyX9+PDKh1lfPqVAX9ljRToCTG+iikpQB7cu5W8a/pRK4hUG8aPMmS80tTbM jO16LKAQM+SUlNtUz7K45kOVzh1qeTgq7xaWXWDS5EM3f/tmGD3Ggnn8QMAq4Ad/Uyi6 iQfxKz3RmQRF48Do/S1ffrfOFcEHj0fr6zf+i3cttAHZHHrQKrfxtSakuR8nKaRBcX+5 gf3TAyD6TaS8eqJ3M/6amXSP/L1H3CEibDWdqRu8d4XKlgjFPru8lzXp5O3aoXrzgg9i d8nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=YXwzJAJavDL6wqievbFyMP+MpsZtKko94kU4wwwdcKM=; b=B+RRMZ3y8KVmvIC8ZDR/46Ttif+K++41gWF+Qof21xQ9j2sj4cpbjMm+wUxh+4apI8 md587xRp4B6Lk33zm8UVdcTyYHq8zt/0B9uql8gmLd9RB2O6pFqNqHva5IE3xDSeokfH j/kPIdiLcYPN+9z6BV7XVCX9MqsmgatIQE1YIalssADgMHOXXPlPaL5iQ4givjAuxSa2 YZ7rwaY7dO1TPHk7yTekHkgUv4aydnrhcrjvF+tDxs3tqbP2P65OB9PHLJeieAoHKjOd tvoGEEgc3oyChV19GzrBRTH5T9seUbYODnVs9HjEnLrJSMB+/eR6Y75O0VNZHOTjqrlT ORUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@obe-tv.20210112.gappssmtp.com header.s=20210112 header.b=tMle5Mrx; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e16-20020a50fb90000000b004bd0d14881dsi10545869edq.90.2023.03.20.09.52.22; Mon, 20 Mar 2023 09:52:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@obe-tv.20210112.gappssmtp.com header.s=20210112 header.b=tMle5Mrx; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2D58068C527; Mon, 20 Mar 2023 18:51:46 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-vs1-f42.google.com (mail-vs1-f42.google.com [209.85.217.42]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D954368C506 for ; Mon, 20 Mar 2023 18:51:37 +0200 (EET) Received: by mail-vs1-f42.google.com with SMTP id d18so6062364vsv.11 for ; Mon, 20 Mar 2023 09:51:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=obe-tv.20210112.gappssmtp.com; s=20210112; t=1679331096; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=31s57943+0+xsq74PT0PLrcsZWJFV7gxxoqU6rH3w7A=; b=tMle5MrxsHiastEyzka4KDm2UBAnS3U5AIn7QnwexV3Lm4iyPhBaKkpPq7BiQdrRbH cXczwxZxQALHi/PqKjzfH7bbinqDwoEcq4jyml7oJtyV8nzWQHKxj5nkaUwrA3tLZAsN lBXfXjEou4+G97EyolvCq8/8VaOeYdWPfzkoTKzeP5dIoarqQAi9FUljrEMw7WZE8YZ2 U1vwjiZcab0T9kUqDVRAltmCg5WUL/Gr3PQvsSYKFjbemgTmvkWDyQmERCG82UzGuwXs WWm6AV9aM97s3+64xkq6FWUSPECk+imYdOM2SqxjSaWhYOX6CqNcGmIX4K7jg3rZTfIr FcBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679331096; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=31s57943+0+xsq74PT0PLrcsZWJFV7gxxoqU6rH3w7A=; b=sMWxzrzZ72eYKjjfQwMEsf87ZThqMr7yOhppWQLeRktkYJiOsmY9u2fHuDJgUy5z0y LqTcnvUVRTfvPXW1FdbjTCJ0A6T9HXwNTXmh1YodhEooS59hIRDsjloVJsTRyrTKNg9x CjPRgAEam03sxLRw5mloRgS/LRTaTJ2yEwgsV98QGQLu+xD6QZhC0IkJUwJ/uvc3LuMO iEn4Wr5KBrY+j0hoR9oWsXNxSMfXuB9gNvRfCA+Tqq8v3qNfZ73Xb0qPUTyk9mMBDeSz 33AHqWsScNI5TcSkf9kanB3cA+QKDlIr7RA3RBfWioLQI3s6U7utE4qP+tYhGFknM67a Uxpw== X-Gm-Message-State: AO0yUKXmRs1EbAGYezMjbZqTC8AILMbon4gYrmOcC1vlykxt18boffxc no7JbG/O8+Ao3UGX8jE+Izz72pqOWUDgPdnybuI= X-Received: by 2002:a67:fc97:0:b0:412:2edf:5239 with SMTP id x23-20020a67fc97000000b004122edf5239mr2902178vsp.34.1679331096639; Mon, 20 Mar 2023 09:51:36 -0700 (PDT) Received: from Dana.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id q11-20020a056102100b00b0042590870708sm1219171vsp.0.2023.03.20.09.51.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 09:51:36 -0700 (PDT) From: James Darnley To: ffmpeg-devel@ffmpeg.org Date: Mon, 20 Mar 2023 17:49:25 +0100 Message-Id: <20230320164925.299207-5-jdarnley@obe.tv> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230320164925.299207-1-jdarnley@obe.tv> References: <20230320164925.299207-1-jdarnley@obe.tv> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 5/5] avfilter/bwdif: add avx2 filter_line function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: NS7r+970SHYR 8-bit: 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 10-bit: 2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3 --- Fixed the word broadcast libavfilter/x86/vf_bwdif.asm | 29 ++++++++++++++++++++++++----- libavfilter/x86/vf_bwdif_init.c | 12 ++++++++++++ 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/libavfilter/x86/vf_bwdif.asm b/libavfilter/x86/vf_bwdif.asm index 0b453da53b..c93b41ec48 100644 --- a/libavfilter/x86/vf_bwdif.asm +++ b/libavfilter/x86/vf_bwdif.asm @@ -26,18 +26,22 @@ %include "libavutil/x86/x86util.asm" -SECTION_RODATA +SECTION_RODATA 32 -pw_coefhf: times 4 dw 1016, 5570 -pw_coefhf1: times 8 dw -3801 -pw_coefsp: times 4 dw 5077, -981 -pw_splfdif: times 4 dw -768, 768 +pw_coefhf: times 8 dw 1016, 5570 +pw_coefhf1: times 16 dw -3801 +pw_coefsp: times 8 dw 5077, -981 +pw_splfdif: times 8 dw -768, 768 SECTION .text %macro LOAD8 2 + %if mmsize == 32 + pmovzxbw %1, %2 + %else movh %1, %2 punpcklbw %1, m7 + %endif %endmacro %macro LOAD12 2 @@ -45,8 +49,14 @@ SECTION .text %endmacro %macro DISP8 0 + %if mmsize == 32 + vextracti128 xm1, m2, 1 + packuswb xm2, xm1 + movu [dstq], xm2 + %else packuswb m2, m2 movh [dstq], m2 + %endif %endmacro %macro DISP12 0 @@ -244,8 +254,12 @@ cglobal bwdif_filter_line_12bit, 4, 9, 13, 0, dst, prev, cur, next, w, \ prefs, mrefs, prefs2, mrefs2, \ prefs3, mrefs3, prefs4, \ mrefs4, parity, clip_max + %if mmsize == 32 + vpbroadcastw m12, WORD clip_maxm + %else movd m12, DWORD clip_maxm SPLATW m12, m12, 0 + %endif %else cglobal bwdif_filter_line_12bit, 4, 6, 8, 80, dst, prev, cur, next, w, \ prefs, mrefs, prefs2, mrefs2, \ @@ -264,3 +278,8 @@ INIT_XMM ssse3 BWDIF INIT_XMM sse2 BWDIF + +%if HAVE_AVX2_EXTERNAL && ARCH_X86_64 +INIT_YMM avx2 +BWDIF +%endif diff --git a/libavfilter/x86/vf_bwdif_init.c b/libavfilter/x86/vf_bwdif_init.c index ba7bc40c3d..f833318c10 100644 --- a/libavfilter/x86/vf_bwdif_init.c +++ b/libavfilter/x86/vf_bwdif_init.c @@ -32,6 +32,10 @@ void ff_bwdif_filter_line_ssse3(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int prefs2, int mrefs2, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line_avx2(void *dst, void *prev, void *cur, void *next, + int w, int prefs, int mrefs, int prefs2, + int mrefs2, int prefs3, int mrefs3, int prefs4, + int mrefs4, int parity, int clip_max); void ff_bwdif_filter_line_12bit_sse2(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int prefs2, @@ -41,6 +45,10 @@ void ff_bwdif_filter_line_12bit_ssse3(void *dst, void *prev, void *cur, void *ne int w, int prefs, int mrefs, int prefs2, int mrefs2, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line_12bit_avx2(void *dst, void *prev, void *cur, void *next, + int w, int prefs, int mrefs, int prefs2, + int mrefs2, int prefs3, int mrefs3, int prefs4, + int mrefs4, int parity, int clip_max); av_cold void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth) { @@ -51,10 +59,14 @@ av_cold void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth) bwdif->filter_line = ff_bwdif_filter_line_sse2; if (EXTERNAL_SSSE3(cpu_flags)) bwdif->filter_line = ff_bwdif_filter_line_ssse3; + if (ARCH_X86_64 && EXTERNAL_AVX2(cpu_flags)) + bwdif->filter_line = ff_bwdif_filter_line_avx2; } else if (bit_depth <= 12) { if (EXTERNAL_SSE2(cpu_flags)) bwdif->filter_line = ff_bwdif_filter_line_12bit_sse2; if (EXTERNAL_SSSE3(cpu_flags)) bwdif->filter_line = ff_bwdif_filter_line_12bit_ssse3; + if (ARCH_X86_64 && EXTERNAL_AVX2(cpu_flags)) + bwdif->filter_line = ff_bwdif_filter_line_12bit_avx2; } }