From patchwork Fri Dec 22 23:52:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45304 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp1388487pzh; Fri, 22 Dec 2023 15:53:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IE1l1ujSWaB+KrN6srTn6CjlHVgnRxQYJcEqRFH6G90Aep7sA/85irTiCzOV7/c/3O/DbbC X-Received: by 2002:a50:ab5d:0:b0:554:1af:fe11 with SMTP id t29-20020a50ab5d000000b0055401affe11mr1120715edc.12.1703289188599; Fri, 22 Dec 2023 15:53:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703289188; cv=none; d=google.com; s=arc-20160816; b=AXZkEYrJNLhIcmpE03HOF953InGCtrtVxM90xTnDYiWcqmxq/Cv4CVtw7BALvbS6f2 CeTxxh5sXwF836wmX65EnkAIMZo5prieVmJ62xlL5hpepxFCcopClDZg976xbR6+Wmo9 4pvbWepVmuiF3bIhAJBbIZ95tPcKODjbUxGWFUAtnb0paqhheMfyoaSnhEDgMgcpOIPW VwWdwB3CfTDsMcQJB0lfQIYadj3HXfGKksP00pEHLTOHneNo/nB8ZtEEUi/1JJ5yWGds F6I+sB4aWLi+Va0e4EHIp8+GUwP44gKO5S47eYZnJ87ZkqVQIZHtJju4t43otpVTJ6ks 4sTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=vUYvKDIzzLyrc/QOXaf5Dn2dX93TgoJGdq8kjMFH3mI=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=wXfTZdy2M+hAWGR9S8qzjzqmtSH8Fwt2mKPPoI7rFYjqjA/t/nL7rZn2iHeCylVsCb nh7oSHl8iSXYE9n7yiXt9ttpKMG0asuZ5LXTYdUh2fh5mGTuamWQg5rzaLCOoMM79U1O cJT7LaZTFVIJHc2RDHhUkp8bXUCZEhDTeCfgRabOz5ixAoLBt3nhPlUp9AkOTo/cNZGq bsftWrQ18bTafU/oC4FOnYCRIJWDyHnWXoMU2T66IOxd3SPm6zuRS6OHmaGxkf7Ppx4F UAWTve3JFd6Lz5npmrYZC5LypBeSzHSU4+ak/wESQ+MkfFxAMtRJSACc4N54ILpWVEzC Tqxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=lLBLx9fr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dc9-20020a056402310900b005548307af13si190258edb.144.2023.12.22.15.53.07; Fri, 22 Dec 2023 15:53:08 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=lLBLx9fr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CBE1668D2CF; Sat, 23 Dec 2023 01:53:03 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F3DFA68D240 for ; Sat, 23 Dec 2023 01:52:56 +0200 (EET) Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-7b7f3eda169so121297639f.2 for ; Fri, 22 Dec 2023 15:52:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703289174; x=1703893974; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=HQSro3dix2RmF4MOrQAjPhrCTJpM21DkdRxMeln5VNw=; b=lLBLx9frkw8raN95caF1fypebSPNJ7w0aq92BZvOI/P415QRW8F7HfcQQguLo2Urfk jz5qqDOCcdDcwqrt5TDe3hqWezR59dRBUk6hSDdO70cw50RnnlLmxKXMXZ/Cj7LVLtJc r+Opg2IxRFzHwMRgj6yQSfzQpJE2ggiayp98+i4Ayt3wzEdrQdt/dnzsKPWHZU6I8cxs 0KdpxPPe8CQXe+oCMDfSBGYSvOVeXWy3nSbLknsXcBpGpDXjdjn3zSUjXqfLwpjV1zhf Ef9xgd7lHKUlvM9tk5Fc0Q6EI7j4nLY+n8vzCdi2ZwLn1Q6PP2ubohUR40RVLZEffAm5 s47Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703289174; x=1703893974; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HQSro3dix2RmF4MOrQAjPhrCTJpM21DkdRxMeln5VNw=; b=l+MUn+BDC6UWrbpIMN39jiJ0qMlWlQeVzXTdAfrHzp/8PqWlg03Q9URrqP6nUlcTqp /gSe6SJb+cuwUYdXEEutM+8UK1KF5vGvA1j7Q/p5IAh2IwKOhTBn550+JDQnbL32b+03 WsA02e4YQ9P4l/f2xanJwQnWDN9q1hB92yui2voFzwzHLBmwIetqvE9QSZq955qIk0fu 6ZkqicqoT5Dg0W1Kn3nRKPiGtX7Uh7TlP3tQRfDKM/Q1ZWPuf8CV51LMrLEzthNZIzew VHHLUvj1ap1jfEp9d4kMeoJpVcyVlJSf0F+xpUyN18DTwahDbKf4UOg5cXz6MrFc1pBy JrOg== X-Gm-Message-State: AOJu0YxFHPrNOE5Ae3Ep6fAjdRnuTSB2jlyAI3p0oXJSrec6SOwbJALO hM+ktJcVIIXoIE/SEWx+hCDmuOMTYC0= X-Received: by 2002:a05:6e02:1544:b0:35f:d860:a08b with SMTP id j4-20020a056e02154400b0035fd860a08bmr2706996ilu.61.1703289174536; Fri, 22 Dec 2023 15:52:54 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id h15-20020a170902ac8f00b001bc930d4517sm3968475plr.42.2023.12.22.15.52.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Dec 2023 15:52:53 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 22 Dec 2023 20:52:58 -0300 Message-ID: <20231222235259.2328-1-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231222230834.GH6420@pb2> References: <20231222230834.GH6420@pb2> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2 v2] x86/takdsp: add avx2 versions of all functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: O6DLqurO8K6N On an Intel Core i7 12700k: decorrelate_ls_c: 814.3 decorrelate_ls_sse2: 165.8 decorrelate_ls_avx2: 101.3 decorrelate_sf_c: 1602.6 decorrelate_sf_sse4: 640.1 decorrelate_sf_avx2: 324.6 decorrelate_sm_c: 1564.8 decorrelate_sm_sse2: 379.3 decorrelate_sm_avx2: 203.3 decorrelate_sr_c: 785.3 decorrelate_sr_sse2: 176.3 decorrelate_sr_avx2: 99.8 Signed-off-by: James Almer --- libavcodec/x86/takdsp.asm | 41 ++++++++++++++++++++++++------------ libavcodec/x86/takdsp_init.c | 11 ++++++++++ 2 files changed, 38 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/takdsp.asm b/libavcodec/x86/takdsp.asm index be8e1ab553..d55c5f39aa 100644 --- a/libavcodec/x86/takdsp.asm +++ b/libavcodec/x86/takdsp.asm @@ -28,7 +28,7 @@ pd_128: times 4 dd 128 SECTION .text -INIT_XMM sse2 +%macro TAK_DECORRELATE 0 cglobal tak_decorrelate_ls, 3, 3, 2, p1, p2, length shl lengthd, 2 add p1q, lengthq @@ -73,10 +73,8 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length mova m1, [p2q+lengthq] mova m3, [p1q+lengthq+mmsize] mova m4, [p2q+lengthq+mmsize] - mova m2, m1 - mova m5, m4 - psrad m2, 1 - psrad m5, 1 + psrad m2, m1, 1 + psrad m5, m4, 1 psubd m0, m2 psubd m3, m5 paddd m1, m0 @@ -88,29 +86,44 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length add lengthq, mmsize*2 jl .loop RET +%endmacro -INIT_XMM sse4 +INIT_XMM sse2 +TAK_DECORRELATE +INIT_YMM avx2 +TAK_DECORRELATE + +%macro TAK_DECORRELATE_SF 0 cglobal tak_decorrelate_sf, 3, 3, 5, p1, p2, length, dshift, dfactor shl lengthd, 2 add p1q, lengthq add p2q, lengthq neg lengthq - movd m2, dshiftm - movd m3, dfactorm - pshufd m3, m3, 0 - mova m4, [pd_128] + movd xm2, dshiftm +%if UNIX64 + movd xm3, dfactorm + VPBROADCASTD m3, xm3 +%else + VPBROADCASTD m3, dfactorm +%endif + VBROADCASTI128 m4, [pd_128] .loop: - mova m0, [p1q+lengthq] mova m1, [p2q+lengthq] - psrad m1, m2 + psrad m1, xm2 pmulld m1, m3 paddd m1, m4 psrad m1, 8 - pslld m1, m2 - psubd m1, m0 + pslld m1, xm2 + psubd m1, [p1q+lengthq] mova [p1q+lengthq], m1 add lengthq, mmsize jl .loop RET +%endmacro + +INIT_XMM sse4 +TAK_DECORRELATE_SF +INIT_YMM avx2 +TAK_DECORRELATE_SF diff --git a/libavcodec/x86/takdsp_init.c b/libavcodec/x86/takdsp_init.c index 12b62b8247..9553f8442c 100644 --- a/libavcodec/x86/takdsp_init.c +++ b/libavcodec/x86/takdsp_init.c @@ -24,9 +24,13 @@ #include "config.h" void ff_tak_decorrelate_ls_sse2(const int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_ls_avx2(const int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sr_sse2(int32_t *p1, const int32_t *p2, int length); +void ff_tak_decorrelate_sr_avx2(int32_t *p1, const int32_t *p2, int length); void ff_tak_decorrelate_sm_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sm_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sf_sse4(int32_t *p1, const int32_t *p2, int length, int dshift, int dfactor); +void ff_tak_decorrelate_sf_avx2(int32_t *p1, const int32_t *p2, int length, int dshift, int dfactor); av_cold void ff_takdsp_init_x86(TAKDSPContext *c) { @@ -42,5 +46,12 @@ av_cold void ff_takdsp_init_x86(TAKDSPContext *c) if (EXTERNAL_SSE4(cpu_flags)) { c->decorrelate_sf = ff_tak_decorrelate_sf_sse4; } + + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + c->decorrelate_ls = ff_tak_decorrelate_ls_avx2; + c->decorrelate_sr = ff_tak_decorrelate_sr_avx2; + c->decorrelate_sm = ff_tak_decorrelate_sm_avx2; + c->decorrelate_sf = ff_tak_decorrelate_sf_avx2; + } #endif }