From patchwork Fri Dec 22 01:15:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45278 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp788209pzh; Thu, 21 Dec 2023 17:16:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IFXx3WgLHCIGRKW7w5ZV6try4G3YqRuJAHu0hgbMEwNBz/y7aLMNo1A5w3ymNHXbykSg7af X-Received: by 2002:a1c:7906:0:b0:40c:62be:4fe7 with SMTP id l6-20020a1c7906000000b0040c62be4fe7mr155853wme.118.1703207765416; Thu, 21 Dec 2023 17:16:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703207765; cv=none; d=google.com; s=arc-20160816; b=y/gE6orVP/brOg8x0Do0lzpqUGGB4EDKES4cl3uf3nKC6xa6Ce5W+eVFFj9sL5Pf6H 8Fp+oraNlroJq/bve2+ahGR+tVi3RLmETCE3Cb1LHhoR6oGpVMADEgGzcgcQSwAwCohL hplQk0yqjFwRaFBYxjoFtVXtj7WCVOAPA+Opptt4x9CwcJssbs9CR6J/PoDA/gdETT0J pzWOaNRbpcddF3MHIMI3YO3iqhGm8VgrGHx2FegHlIbKiM6xUg1MVKhx51eL0FY34TXi GzfkmIMS4dgsEWoAjbSefwXcL5lNCLy6f7/b4aROeLiPr+ueKd0YrhNELOiL7L9/pTaG Mycw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=/do/UIsFVQi7+TUB9V5n1y9a0NtJcBsig/ryooUXYBs=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=SiaiMPWsfHDeD1Owuxg6bUIUJBHfC1I2l8iG5icvQnMdeg7/ALfDPAfQf/ihaXNPQj cFY+HXWb1W/WWEJ1zjLccwUaRhnCjbml5YlTDHuxHLPlH3y8rpzZo30HqVq7nQoNFG4K PlrQk2v7+ZDqw8r1YWYK9KGu5c5tHiQ/BY9caD+Cl/4yc9Nmg8npMsfMFcA7r2LrBZwK KVPOk4ckkCE2v4v06BS9Xn27ZstXfnYfjRH2fczuVe2xy1+RZaGXVQdw3wyV3zOU2SoR Qt7UUUwUUSghPKy0RSSvlQNufxbSQmXChx9U4rOSuoWHCcpChyLl1pIl059+kmRZ0VMg aBGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=ARU4DgWF; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c17-20020a170906695100b00a2692c33716si1377327ejs.794.2023.12.21.17.16.04; Thu, 21 Dec 2023 17:16:05 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=ARU4DgWF; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 404F468D098; Fri, 22 Dec 2023 03:16:01 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 719EA68CB17 for ; Fri, 22 Dec 2023 03:15:54 +0200 (EET) Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5c21e185df5so1072381a12.1 for ; Thu, 21 Dec 2023 17:15:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703207752; x=1703812552; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=yDqZJvrczLHAowfEuEmYDdc41d6Cc+AJwGp4kpqdFDk=; b=ARU4DgWFGpYvsnJwUzeP0b9M01RPQZxGEY3b1fufJhcb7IleaGymPenAAxEDItKIIe eGtJPb5pJY/SyQwuBC8ZSfCa3ibSvCODPyIYTypw4m/YY0YY/exHAx/CG7j2+0EJytq8 ORFfYt/3buIZSDrjLlxXCzwB1HSm5xtgkbkrDM6k3xH+s90VTLBWqP2pfEo/yWxfihCP dlGbdOnUEHTulsX3oJvJwzSEpcLv/xaq+bSR48d6H+BSlpBqcNpghd6IFZ0TH3tDoG18 0twhdSTMq+VTqOCYZPFHaFjaj28Sb1rrff5CYh3nAI3fxg9224lHewcbkpNfg6LK6/lJ NTBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703207752; x=1703812552; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yDqZJvrczLHAowfEuEmYDdc41d6Cc+AJwGp4kpqdFDk=; b=VMgym53bpeaiCg72+hAeUHSxFxmIF7vDudFJiR0xEavTgoKjqjnTM6znBPAqK2XQ4S eGZgQvmk3P97ktC9MxMPm56q5y63EFp3eiuS+sx5b+i0OSDr/iGdE9K5w8o7z3S07/sM 5i4UOgBsTZ4op5AcNuuw+VFVWX9T9/3y0nwNUDTUvjA9CKE9DThHQUYBhYEttnjnX3nL doRPrFQudWcJ6VRg6hlkRcYPHKP7D26wyD7vNNw4UTwLSxlPwyFuiunwJwm3eH8Rc2Z8 hl5xYHHHpl4zmerPizi4atCQgBDztBttfHqebugKNZg8wzisSGgH8nA0JcGFRjTkdg9K jzCQ== X-Gm-Message-State: AOJu0YwHeIkPiTbniMMqCWy4xKjG3mpnBc/VpuKUNxYeQN03rRMrur8Y CuxMZgXwc66beqlJcQW/16DZjV/o8Gc= X-Received: by 2002:a17:903:98b:b0:1d3:ff24:b3bf with SMTP id mb11-20020a170903098b00b001d3ff24b3bfmr775288plb.8.1703207751497; Thu, 21 Dec 2023 17:15:51 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id 5-20020a170902c24500b001d078e31cd1sm2236866plg.259.2023.12.21.17.15.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 17:15:50 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 21 Dec 2023 22:15:48 -0300 Message-ID: <20231222011549.16057-1-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] checkasm/takdsp: add decorrelate_sf test X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lNRspeZTyBKq Signed-off-by: James Almer --- tests/checkasm/takdsp.c | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/takdsp.c b/tests/checkasm/takdsp.c index 495b7242c5..8df93cfd52 100644 --- a/tests/checkasm/takdsp.c +++ b/tests/checkasm/takdsp.c @@ -24,6 +24,7 @@ #include "libavutil/mem_internal.h" #include "libavcodec/takdsp.h" +#include "libavcodec/mathops.h" #include "checkasm.h" @@ -33,8 +34,9 @@ buf[i] = rnd(); \ } while (0) -static void test_decorrelate_ls(TAKDSPContext *s) { #define BUF_SIZE 1024 + +static void test_decorrelate_ls(TAKDSPContext *s) { declare_func(void, int32_t *, int32_t *, int); if (check_func(s->decorrelate_ls, "decorrelate_ls")) { @@ -60,7 +62,6 @@ static void test_decorrelate_ls(TAKDSPContext *s) { } static void test_decorrelate_sr(TAKDSPContext *s) { -#define BUF_SIZE 1024 declare_func(void, int32_t *, int32_t *, int); if (check_func(s->decorrelate_sr, "decorrelate_sr")) { @@ -86,7 +87,6 @@ static void test_decorrelate_sr(TAKDSPContext *s) { } static void test_decorrelate_sm(TAKDSPContext *s) { -#define BUF_SIZE 1024 declare_func(void, int32_t *, int32_t *, int); if (check_func(s->decorrelate_sm, "decorrelate_sm")) { @@ -113,6 +113,35 @@ static void test_decorrelate_sm(TAKDSPContext *s) { report("decorrelate_sm"); } +static void test_decorrelate_sf(TAKDSPContext *s) { + declare_func(void, int32_t *, int32_t *, int, int, int); + + if (check_func(s->decorrelate_sf, "decorrelate_sf")) { + LOCAL_ALIGNED_32(int32_t, p1, [BUF_SIZE]); + LOCAL_ALIGNED_32(int32_t, p1_2, [BUF_SIZE]); + LOCAL_ALIGNED_32(int32_t, p2, [BUF_SIZE]); + LOCAL_ALIGNED_32(int32_t, p2_2, [BUF_SIZE]); + int dshift, dfactor; + + randomize(p1, BUF_SIZE); + memcpy(p1, p1_2, BUF_SIZE); + randomize(p2, BUF_SIZE); + memcpy(p2_2, p2, BUF_SIZE); + dshift = (rnd() & 0xF) + 1; + dfactor = sign_extend(rnd(), 10); + call_ref(p1, p2, BUF_SIZE, dshift, dfactor); + call_new(p1_2, p2_2, BUF_SIZE, dshift, dfactor); + + if (memcmp(p2, p2_2, BUF_SIZE) != 0){ + fail(); + } + + bench_new(p1, p2, BUF_SIZE, dshift, dfactor); + } + + report("decorrelate_sf"); +} + void checkasm_check_takdsp(void) { TAKDSPContext s = { 0 }; @@ -121,4 +150,5 @@ void checkasm_check_takdsp(void) test_decorrelate_ls(&s); test_decorrelate_sr(&s); test_decorrelate_sm(&s); + test_decorrelate_sf(&s); } From patchwork Fri Dec 22 01:15:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45279 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp788249pzh; Thu, 21 Dec 2023 17:16:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IFJnk/v+ZQ1M+XPpvAlShELypsyveWA5i3HaHd3wBGTZb3WTm8p5DCm+DOEHcAUEvVx+DJG X-Received: by 2002:a05:6000:110c:b0:336:6a1b:418d with SMTP id z12-20020a056000110c00b003366a1b418dmr330830wrw.134.1703207773524; Thu, 21 Dec 2023 17:16:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703207773; cv=none; d=google.com; s=arc-20160816; b=arOxEfBIDbdg0zSUpf4ixtH5LZSQVaZ26WWBI96yWFRF23Dwq3B0GxGJp0HGPa71lJ gAuKHGDOpMNXmrYywySGRL+JRcWpbAv9MfRWrvDZECkHkxR36624ValBTs4UaoYPICG8 3plYRrOvCpUMa1MP+N88cz1DyOWPA3paElGSPkjZkrvc9hfFoI4V8QLp6A+951QCK6rC hKt9n8z4X31pw/Atfh/3UmqJ1gdnSQTK229zp5wBBF/7fUjRKxxwk5NFbU2nRV4lwL+V Emyox234pigd2GT97dbiQO2Kv5YvZI+vOnceMrBPXW4POLyYUnFm5FOsM5brQ/Dp4a8E ogBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=uXcInnRXBjvkWZQ/7w6r9OBrX6j+8M/Bkrc2aV9rinY=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=rwbfJyjasy8mJ19TP2rl/j+2HGWHUvKJHDREJEqFpZZx6R/mP6nlAmX+hlcPnFqgBX PyY5OTqLUxHr7c49/93jR2Qw1fN0ZGZQIQhhWKqiED4rblKKRD47H3y2zOJDaP99U9lc vKk8BbbLSNbSoiUWSvcLxDw02wkM6qsNdIqnjuDe5QCQy6BLMTiq7tO753198mIZBd75 okTO4miVJDo7jhC8Jua4JuRYtk9bCAwOCQDQsP9JZQsVBQ+3hsLcRSiBN7hWRBGwSMDv Deq/d0zyBLNUGlp7jsYxMKoivPmJP/48ljmmTfYLdehj9lMM14oe5ctNybP/VGx7yW/A aohw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=dt7KsWZ3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id v16-20020a1709060b5000b00a234c5dfe2dsi1291864ejg.716.2023.12.21.17.16.13; Thu, 21 Dec 2023 17:16:13 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=dt7KsWZ3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5C89D68CB17; Fri, 22 Dec 2023 03:16:04 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E13EB68D098 for ; Fri, 22 Dec 2023 03:15:55 +0200 (EET) Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-5cdf76cde78so36904a12.1 for ; Thu, 21 Dec 2023 17:15:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703207753; x=1703812553; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=cJzjsQpqaCrySaEpiMdwi47JKa5WIgt/7RqBfePe75A=; b=dt7KsWZ3z916ZvJFE2Iif3CMdAjlj7T2o4cdfaMbZ90O1g+WpSOEvY/qjbBlzqobcb EVvYWTUYpYtFKr4iqJnDuojyrcqyTw/PUzlDcTGto8t0kdRWeHo9YpIDe4Xrd4JY0ZwH qV2TbCHQi6NbQYWQtg1M1MB1Z2JXm9LMA6gm8PQ45/zpw6rPwJiPdWLtlLZJ74afOVNw /Xxj5xzhkjtWtvSsKCHyQ24KufS8vlkxGstGyaifrr//Qd9CR2jmbCx3ABWorTisjLV6 RetBlrlLrQG45fsUSdHtvQdXuw/5uUOP5ySpq7CMDZ9XERegvHAtxKrMWB6LQxY+qhwv mB7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703207753; x=1703812553; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cJzjsQpqaCrySaEpiMdwi47JKa5WIgt/7RqBfePe75A=; b=rPubAwULsOnSGzKeXCA1aMCpdnvWQa4gsiOzU7l93upZOEpzMdZkFY0CdGIZt8jpMy b6DSAYyQtT3GK7upliD82B8x9f9s7IRH89bX1iUIHB7uYJkhzVQzTjLLl1WMO9/90CNM LqUE/NAy0xLHxfNUZUTHNXDR3tT/DlB9OLLzaVYMt6Mttu/fpKOfJDvVeYX0Zbp20Zyu QbXX4FmQUyKEGLW2y/lylXJv8scI4p52waaBdzwXOZpeiPi5J2l4vQQTh+PT+y0Seq2X eQaz5BcH84Mu2UOIpFRCUJJkLY4+RxJetptLHaZy/nqrOBk3QrDJCwS9fF8/q3Ooq/lG qngQ== X-Gm-Message-State: AOJu0YxZUs329jPbHHzZvbK8HSBytNQTV441e9hcL2eHpWHYhceBu/5E p2NJBdgM4CSq7lK4aOQqy1hur/kvPMA= X-Received: by 2002:a05:6a20:3cab:b0:18f:97c:8238 with SMTP id b43-20020a056a203cab00b0018f097c8238mr491779pzj.66.1703207753205; Thu, 21 Dec 2023 17:15:53 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id 5-20020a170902c24500b001d078e31cd1sm2236866plg.259.2023.12.21.17.15.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 17:15:52 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 21 Dec 2023 22:15:49 -0300 Message-ID: <20231222011549.16057-2-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231222011549.16057-1-jamrial@gmail.com> References: <20231222011549.16057-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] x86/takdsp: add avx2 versions of all functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: X9MeapsNYu+l On an Intel Core i7 12700k: decorrelate_ls_c: 814.3 decorrelate_ls_sse2: 165.8 decorrelate_ls_avx2: 101.3 decorrelate_sf_c: 1602.6 decorrelate_sf_sse4: 640.1 decorrelate_sf_avx2: 324.6 decorrelate_sm_c: 1564.8 decorrelate_sm_sse2: 379.3 decorrelate_sm_avx2: 203.3 decorrelate_sr_c: 785.3 decorrelate_sr_sse2: 176.3 decorrelate_sr_avx2: 99.8 Signed-off-by: James Almer --- libavcodec/x86/takdsp.asm | 36 ++++++++++++++++++++++-------------- libavcodec/x86/takdsp_init.c | 11 +++++++++++ 2 files changed, 33 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/takdsp.asm b/libavcodec/x86/takdsp.asm index be8e1ab553..a5501cc285 100644 --- a/libavcodec/x86/takdsp.asm +++ b/libavcodec/x86/takdsp.asm @@ -28,7 +28,7 @@ pd_128: times 4 dd 128 SECTION .text -INIT_XMM sse2 +%macro TAK_DECORRELATE 0 cglobal tak_decorrelate_ls, 3, 3, 2, p1, p2, length shl lengthd, 2 add p1q, lengthq @@ -73,10 +73,8 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length mova m1, [p2q+lengthq] mova m3, [p1q+lengthq+mmsize] mova m4, [p2q+lengthq+mmsize] - mova m2, m1 - mova m5, m4 - psrad m2, 1 - psrad m5, 1 + psrad m2, m1, 1 + psrad m5, m4, 1 psubd m0, m2 psubd m3, m5 paddd m1, m0 @@ -88,29 +86,39 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length add lengthq, mmsize*2 jl .loop RET +%endmacro -INIT_XMM sse4 +INIT_XMM sse2 +TAK_DECORRELATE +INIT_YMM avx2 +TAK_DECORRELATE + +%macro TAK_DECORRELATE_SF 0 cglobal tak_decorrelate_sf, 3, 3, 5, p1, p2, length, dshift, dfactor shl lengthd, 2 add p1q, lengthq add p2q, lengthq neg lengthq - movd m2, dshiftm - movd m3, dfactorm - pshufd m3, m3, 0 - mova m4, [pd_128] + movd xm2, dshiftm + VPBROADCASTD m3, dfactorm + VBROADCASTI128 m4, [pd_128] .loop: - mova m0, [p1q+lengthq] mova m1, [p2q+lengthq] - psrad m1, m2 + psrad m1, xm2 pmulld m1, m3 paddd m1, m4 psrad m1, 8 - pslld m1, m2 - psubd m1, m0 + pslld m1, xm2 + psubd m1, [p1q+lengthq] mova [p1q+lengthq], m1 add lengthq, mmsize jl .loop RET +%endmacro + +INIT_XMM sse4 +TAK_DECORRELATE_SF +INIT_YMM avx2 +TAK_DECORRELATE_SF diff --git a/libavcodec/x86/takdsp_init.c b/libavcodec/x86/takdsp_init.c index b2e6e639ee..c99a057b24 100644 --- a/libavcodec/x86/takdsp_init.c +++ b/libavcodec/x86/takdsp_init.c @@ -24,9 +24,13 @@ #include "config.h" void ff_tak_decorrelate_ls_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_ls_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sr_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sr_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sm_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sm_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sf_sse4(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); +void ff_tak_decorrelate_sf_avx2(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); av_cold void ff_takdsp_init_x86(TAKDSPContext *c) { @@ -42,5 +46,12 @@ av_cold void ff_takdsp_init_x86(TAKDSPContext *c) if (EXTERNAL_SSE4(cpu_flags)) { c->decorrelate_sf = ff_tak_decorrelate_sf_sse4; } + + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + c->decorrelate_ls = ff_tak_decorrelate_ls_avx2; + c->decorrelate_sr = ff_tak_decorrelate_sr_avx2; + c->decorrelate_sm = ff_tak_decorrelate_sm_avx2; + c->decorrelate_sf = ff_tak_decorrelate_sf_avx2; + } #endif }