From patchwork Fri Dec 22 12:12:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45292 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp1017419pzh; Fri, 22 Dec 2023 04:12:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IHENjCorG4Qh7tM92C0G0utPYdna8E/NxnyyN2aNrQOhI1mZvYhKNIIpe50/Oq2ule6ArDL X-Received: by 2002:a50:ab58:0:b0:553:8e01:35ba with SMTP id t24-20020a50ab58000000b005538e0135bamr571906edc.54.1703247171474; Fri, 22 Dec 2023 04:12:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703247171; cv=none; d=google.com; s=arc-20160816; b=Y2xdwAdPKnDMIYhHF2hq4dWWij04qdxCGcxAUP/96QKoDGPelXZj5X9UNNX6zEc9gs egBBbgrkZgtcWjFYqyNXF/yBgjLRsRBN38T8YF84BYOW3bJsjC44ShABzETENpuxoYqu TzSR4W7T0vqxcjmO1bcyxUEPp9F4WQ47PRcHuWtMgrIIZXxr+W2p8p5pwbgdVDMjjWDF c8RyztJ+Oq+KRKtyi7sHS86h35VdVn7eU0MGsOIsIHP0MAVzZAaC54WE454J+wkX5hBF 4dee7J58kr8bv3+LtHJ8gqRCpE6IZGM+0Fos3Yg7fYZ9Pk1f9WIJjcPbVI/wVVmy1aqB nziA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=m5Nog1AFrGiBTgfYHM45ucmwftmudFJIcO4wOUb0Hvs=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=kCtFs+825nyInTrT/Z1UedaZlgd82y+1v5RnhGIgiP3etgL6cvyFnbIslQHtiZ8ecO XLHMUmVpCOCeYzTn5KL/koYkvMC4K3NpkXewPf8CiYZdrvlrQSwW7X7J90Z3JW+UOj3A sgzOJWXaEkeN2VGj6m1jCm5ORMDagqRKlKVobC0jSyEi6Ld+HHf6M9hxpOvHteC+IuAR rWTtMbiHeL+sm7ukG39j8YoYHaHYGab3dCXmYWZ1ia1dliFS3RdmYDZI9KUjlizeyiaj zHS+/ZfTQHNpLjr6BPy2yrxCuZHQGB4wURUEW3fHzKxj9aDQ4atW9qgOT7xIIliNumy5 PzfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=FOy38tzT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r10-20020a50d68a000000b005545e655cf7si360790edi.487.2023.12.22.04.12.48; Fri, 22 Dec 2023 04:12:51 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=FOy38tzT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6E41468D2E4; Fri, 22 Dec 2023 14:12:45 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7332868D030 for ; Fri, 22 Dec 2023 14:12:38 +0200 (EET) Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1d40eec5e12so10463825ad.1 for ; Fri, 22 Dec 2023 04:12:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703247156; x=1703851956; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=eRiOTuD+Ws1ri4rybHJkPM8Tr59itzQpbUO8yQo7BjY=; b=FOy38tzTlQtHk6QXqbmyKrD0+mzQm0SIxfaq5OnHBDBriBk4LJb0GOQ4sWiUBW9K8A Kl9eSS9Wz5XHeJW4loIEdnxzNOAPHmXHZb9QcmPIlMJGkNsfjKWbh178sOK86SHdGiWs 2zAK1v8nk6WIVBp4Rw9semH73PaBaTzG2r8bN9zomHavNwSOgpnCjy07CXYlNuPskve3 rN5dSl3wkdNdEc8oPuMMafIttWXVDPtKQBfgNMmLdaAJaHknbaTTX9k5HKLiyvLYXaQF hF4/rmo+i2F911uyViu7pIexaAFDYJhhAgldPrEXcDi89Jtu3337jZEu4a8qN+v2sdC/ b7Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703247156; x=1703851956; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eRiOTuD+Ws1ri4rybHJkPM8Tr59itzQpbUO8yQo7BjY=; b=vMe14BweQyHopl2QmiWs7UIuZKBpvwc5ZnEfYQCH3qxCshbNwn6WfFZrZfIF0KWjYp zwRHTPd/+SifTpSS8nksuZnzQ4zz3f+np5a2ilbwa881zGddzFEQzGjC6VhiSFKva86m H5W+k3fSL977+j6sPwy2GFwQY49RPThyoOUrJOCAF4uCungwUf4V8ZPpulkhLlKsOkfA 43rqw/nBeVuS3autDgFZM1Y5Q0LC6j+vtzjB2VuGKSZtujS5vKOr4oPUQ3jEQ142kv6z s0EEQnIgfDSNkxi7+grPXW2rMx9McQVvtWOZ/y2ZdsuPtzUEy+66SmRtxSqHVfDHNJxb z69w== X-Gm-Message-State: AOJu0Yygz/2OwqVVde0gzzYilgoe3BqtKuMkNIIVJ9q511dIOG3skyTx BCrFT6EaKnntXXFoxQi/z/ueGTBLKKA= X-Received: by 2002:a17:903:40c4:b0:1d3:9f6e:cd89 with SMTP id t4-20020a17090340c400b001d39f6ecd89mr1156859pld.98.1703247155389; Fri, 22 Dec 2023 04:12:35 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id j14-20020a170902da8e00b001d09c5424d4sm3280608plx.297.2023.12.22.04.12.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Dec 2023 04:12:34 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 22 Dec 2023 09:12:30 -0300 Message-ID: <20231222121232.324-1-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3 v2] checkasm/takdsp: add decorrelate_sf test X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: jmwuBr4iUFhC Signed-off-by: James Almer --- Fixes broken logic as reported by Martin. tests/checkasm/takdsp.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/takdsp.c b/tests/checkasm/takdsp.c index 3aecee1f72..78528b1c5d 100644 --- a/tests/checkasm/takdsp.c +++ b/tests/checkasm/takdsp.c @@ -24,6 +24,7 @@ #include "libavutil/mem_internal.h" #include "libavcodec/takdsp.h" +#include "libavcodec/mathops.h" #include "checkasm.h" @@ -33,8 +34,9 @@ buf[i] = rnd(); \ } while (0) -static void test_decorrelate_ls(TAKDSPContext *s) { #define BUF_SIZE 1024 + +static void test_decorrelate_ls(TAKDSPContext *s) { declare_func(void, int32_t *, int32_t *, int); if (check_func(s->decorrelate_ls, "decorrelate_ls")) { @@ -60,7 +62,6 @@ static void test_decorrelate_ls(TAKDSPContext *s) { } static void test_decorrelate_sr(TAKDSPContext *s) { -#define BUF_SIZE 1024 declare_func(void, int32_t *, int32_t *, int); if (check_func(s->decorrelate_sr, "decorrelate_sr")) { @@ -86,7 +87,6 @@ static void test_decorrelate_sr(TAKDSPContext *s) { } static void test_decorrelate_sm(TAKDSPContext *s) { -#define BUF_SIZE 1024 declare_func(void, int32_t *, int32_t *, int); if (check_func(s->decorrelate_sm, "decorrelate_sm")) { @@ -114,6 +114,34 @@ static void test_decorrelate_sm(TAKDSPContext *s) { report("decorrelate_sm"); } +static void test_decorrelate_sf(TAKDSPContext *s) { + declare_func(void, int32_t *, int32_t *, int, int, int); + + if (check_func(s->decorrelate_sf, "decorrelate_sf")) { + LOCAL_ALIGNED_32(int32_t, p1, [BUF_SIZE]); + LOCAL_ALIGNED_32(int32_t, p1_2, [BUF_SIZE]); + LOCAL_ALIGNED_32(int32_t, p2, [BUF_SIZE]); + int dshift, dfactor; + + randomize(p1, BUF_SIZE); + memcpy(p1_2, p1, BUF_SIZE * sizeof(*p1)); + randomize(p2, BUF_SIZE); + dshift = (rnd() & 0xF) + 1; + dfactor = sign_extend(rnd(), 10); + + call_ref(p1, p2, BUF_SIZE, dshift, dfactor); + call_new(p1_2, p2, BUF_SIZE, dshift, dfactor); + + if (memcmp(p1, p1_2, BUF_SIZE) != 0) { + fail(); + } + + bench_new(p1, p2, BUF_SIZE, dshift, dfactor); + } + + report("decorrelate_sf"); +} + void checkasm_check_takdsp(void) { TAKDSPContext s = { 0 }; @@ -122,4 +150,5 @@ void checkasm_check_takdsp(void) test_decorrelate_ls(&s); test_decorrelate_sr(&s); test_decorrelate_sm(&s); + test_decorrelate_sf(&s); } From patchwork Fri Dec 22 12:12:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45293 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp1017518pzh; Fri, 22 Dec 2023 04:13:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IGfLy8maKnVCa7SLnh8v1Ox5pCMCw5pIKY67qf8Ql4eQSYaY89GBeypmpZZzru9WrAG8Y60 X-Received: by 2002:a50:d61e:0:b0:554:5b43:278f with SMTP id x30-20020a50d61e000000b005545b43278fmr181150edi.121.1703247182316; Fri, 22 Dec 2023 04:13:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703247182; cv=none; d=google.com; s=arc-20160816; b=dQeyyfNTXhJBq7BH68p1+uV7QsLOXlDxW7S6j+n9S/UN2UNHBtKS4p9rcA+skAWNzS ZeY5XbxkX33yRC627u6I/hTwOK4TBPrKUByiyivBv6HcE/4XO5sN2flmxbsdI9axsu3T PlCklIg+jBxYMCFMWFn4B00DEpQcca3OJcTRhsUkGNxfQqDAUAj84Blcd+D/YZiE3a0g gBcpYOr1AgpBIeiDk6tWl2zkUh+2ihL4Bcz6AGxCWr6lz5DZqSIV/WyqsR95swA+ft0h YmjVo3N+HLC94CdpvQa4A5dlOz+NLqwNmhdSjohG88Av9QcYX+d2IdqGN2Cry9ycGq4Z SJBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=4UlUDSoUPLe/g/UZ5YOiE1e4n40yhliYIqzjH8ND3Ag=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=IhUw3ZunETWPP5+l2m2ZR8DLdk5rWWYmcaEF+kl1Ge8n+ux46vpMUm3xE6EXPdCwFu j5aLlTH09aXms8I5zZ3CM7PHlOTJ6YXYkfNv4Igk9zB4ab6dyD8okqGsQ56Ne/4PbQiV 68BZEj/QRJcsa4jKzZ4l5j9J8tr6EKvHUvwHCa4xR6hP050eiuwNOudgmRsCS6o/m2rT VRFjqAObiChMTrkLDayt4igKi6+BoPyZb/DpVAr/QlbLp13V0lx7+tMCVfbrGhzwTlBY G719dSbNAR5ZLa1WNeuk+wEGxHl8ociyRgQ5kc2nZUbcK0neB3l1tGZCYfJ0uotIBFKD m5Kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=O16jSFVa; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l28-20020a50d6dc000000b0055468afa598si173274edj.125.2023.12.22.04.12.56; Fri, 22 Dec 2023 04:13:02 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=O16jSFVa; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8125A68D2EC; Fri, 22 Dec 2023 14:12:47 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8408C68D2D6 for ; Fri, 22 Dec 2023 14:12:39 +0200 (EET) Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1d4006b2566so12434055ad.0 for ; Fri, 22 Dec 2023 04:12:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703247157; x=1703851957; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=GNLcPkPGhil+MhhuZf4A5SiH8KrL98eN9i1Q4LdgusU=; b=O16jSFVajMLP6z2dCTSXy9eOtyzPHHySz5aCJKLgAVuAIsODRWDvFdXo9I49CcfzhK yDlBFg/x3LZteJXKGkOUuY+zaVDUvGsLSOa9SCUAISxp0WFvfQ4RYXyxkKDXsOVELyqU 46OpXK/WlMnXEGT6RfruL+pFpLQAH+7V9+dxohwobuaBkRufyLW3thcXQIL0cnUrXj5B HATQ6m/FQpTftYyGlbHPUrDpo9ekPanmgeLP0h36dYKgwmbTxX7Gr5I6vyJcEpWQ9dsX +naba6eXxlhncahK1Y5BJV3BQI3C87SgtZK2EC8BIYmfV8wLVGGRwe4CXBBzyPKjJWaX qr6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703247157; x=1703851957; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GNLcPkPGhil+MhhuZf4A5SiH8KrL98eN9i1Q4LdgusU=; b=EqoSpGasRyejOUtf6vaX4gkYurYw2HvB7vJJsXu2GM/hmfEZvmvtRBOa/XSu384NUJ FMZHzJrTh83w19hTlxhkcCrWmrDpYZRX584tmHv6VrCZaHwT2tXjSE2A8LyugrfUQz+x Tw8F3HVCd0UMMPk4CPbIv9LoOkKNocwys9p7neDUrBmBZXKBGrhYa+nOmwD4+7PRm52K PEVHPOuYnTMXqfbFVyME3WsNuS3/dTHdpE8rlYCnA6xW72pi57xG5qd24s2evllC9ZVZ yTp7GkUKla12ozixmx3IN09u74fQqbRZ8hvZkPG/UkSTy4WI7FsTv6vzougrr9hIdQvq HwSQ== X-Gm-Message-State: AOJu0YxFWzMd8PUHZU4aRnf3ULvMLCrJwBW+xw9cmIAZuiOsiBRlMdI+ HxBBnuHQ0MMdYyw3xJw/DZFhmznGujY= X-Received: by 2002:a17:902:ef87:b0:1d3:4783:cfc with SMTP id iz7-20020a170902ef8700b001d347830cfcmr710727plb.93.1703247157303; Fri, 22 Dec 2023 04:12:37 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id j14-20020a170902da8e00b001d09c5424d4sm3280608plx.297.2023.12.22.04.12.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Dec 2023 04:12:36 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 22 Dec 2023 09:12:31 -0300 Message-ID: <20231222121232.324-2-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231222121232.324-1-jamrial@gmail.com> References: <20231222121232.324-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3 v2] x86/takdsp: add avx2 versions of all functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pWO5Q+vPGP1s On an Intel Core i7 12700k: decorrelate_ls_c: 814.3 decorrelate_ls_sse2: 165.8 decorrelate_ls_avx2: 101.3 decorrelate_sf_c: 1602.6 decorrelate_sf_sse4: 640.1 decorrelate_sf_avx2: 324.6 decorrelate_sm_c: 1564.8 decorrelate_sm_sse2: 379.3 decorrelate_sm_avx2: 203.3 decorrelate_sr_c: 785.3 decorrelate_sr_sse2: 176.3 decorrelate_sr_avx2: 99.8 Signed-off-by: James Almer --- No changes since last version libavcodec/x86/takdsp.asm | 36 ++++++++++++++++++++++-------------- libavcodec/x86/takdsp_init.c | 11 +++++++++++ 2 files changed, 33 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/takdsp.asm b/libavcodec/x86/takdsp.asm index be8e1ab553..a5501cc285 100644 --- a/libavcodec/x86/takdsp.asm +++ b/libavcodec/x86/takdsp.asm @@ -28,7 +28,7 @@ pd_128: times 4 dd 128 SECTION .text -INIT_XMM sse2 +%macro TAK_DECORRELATE 0 cglobal tak_decorrelate_ls, 3, 3, 2, p1, p2, length shl lengthd, 2 add p1q, lengthq @@ -73,10 +73,8 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length mova m1, [p2q+lengthq] mova m3, [p1q+lengthq+mmsize] mova m4, [p2q+lengthq+mmsize] - mova m2, m1 - mova m5, m4 - psrad m2, 1 - psrad m5, 1 + psrad m2, m1, 1 + psrad m5, m4, 1 psubd m0, m2 psubd m3, m5 paddd m1, m0 @@ -88,29 +86,39 @@ cglobal tak_decorrelate_sm, 3, 3, 6, p1, p2, length add lengthq, mmsize*2 jl .loop RET +%endmacro -INIT_XMM sse4 +INIT_XMM sse2 +TAK_DECORRELATE +INIT_YMM avx2 +TAK_DECORRELATE + +%macro TAK_DECORRELATE_SF 0 cglobal tak_decorrelate_sf, 3, 3, 5, p1, p2, length, dshift, dfactor shl lengthd, 2 add p1q, lengthq add p2q, lengthq neg lengthq - movd m2, dshiftm - movd m3, dfactorm - pshufd m3, m3, 0 - mova m4, [pd_128] + movd xm2, dshiftm + VPBROADCASTD m3, dfactorm + VBROADCASTI128 m4, [pd_128] .loop: - mova m0, [p1q+lengthq] mova m1, [p2q+lengthq] - psrad m1, m2 + psrad m1, xm2 pmulld m1, m3 paddd m1, m4 psrad m1, 8 - pslld m1, m2 - psubd m1, m0 + pslld m1, xm2 + psubd m1, [p1q+lengthq] mova [p1q+lengthq], m1 add lengthq, mmsize jl .loop RET +%endmacro + +INIT_XMM sse4 +TAK_DECORRELATE_SF +INIT_YMM avx2 +TAK_DECORRELATE_SF diff --git a/libavcodec/x86/takdsp_init.c b/libavcodec/x86/takdsp_init.c index b2e6e639ee..c99a057b24 100644 --- a/libavcodec/x86/takdsp_init.c +++ b/libavcodec/x86/takdsp_init.c @@ -24,9 +24,13 @@ #include "config.h" void ff_tak_decorrelate_ls_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_ls_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sr_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sr_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sm_sse2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sm_avx2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sf_sse4(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); +void ff_tak_decorrelate_sf_avx2(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); av_cold void ff_takdsp_init_x86(TAKDSPContext *c) { @@ -42,5 +46,12 @@ av_cold void ff_takdsp_init_x86(TAKDSPContext *c) if (EXTERNAL_SSE4(cpu_flags)) { c->decorrelate_sf = ff_tak_decorrelate_sf_sse4; } + + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + c->decorrelate_ls = ff_tak_decorrelate_ls_avx2; + c->decorrelate_sr = ff_tak_decorrelate_sr_avx2; + c->decorrelate_sm = ff_tak_decorrelate_sm_avx2; + c->decorrelate_sf = ff_tak_decorrelate_sf_avx2; + } #endif } From patchwork Fri Dec 22 12:12:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45294 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6623:b0:194:e134:edd4 with SMTP id n35csp1017560pzh; Fri, 22 Dec 2023 04:13:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IHpqv0Pe6wXfSsFxkYTJlPiGseHl9CCROfZBo//HKqoFLdRWD5O1KFpxifnolZgYA6oIRv6 X-Received: by 2002:a17:906:18e:b0:a24:457d:9b23 with SMTP id 14-20020a170906018e00b00a24457d9b23mr598359ejb.151.1703247187090; Fri, 22 Dec 2023 04:13:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703247187; cv=none; d=google.com; s=arc-20160816; b=WIkkNZWNwZKzvVjg7iGLgrNZdddlLDdaN3x6pCNkzNdzz/mKTdSjmwG2wDHKaogt0l obePNh6Z08x3wsOzqIugmFs3r+5v+GI0wyKeCxDfFMR3xkZldAksZHsAdLDVhAZuHLuL T2R4DHw7Bya9l/aM3pAvb9vMOKogt52pOcd3IMmJAUc7E4JH55iGtfyVBPnbyf3FPo2n yUzy1RxtXO1OFU2K8awBEXHJg5C0xJD+qHsFJfVf0ntyDIrauS5abm5IWvcitExI0O6G +bD+7EuBJpK0w6nvZprY002mwenwbF/TudEgiEBddGvI1qGWttSIp1UbIxw9yN1xJYJc W3yQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=LuU042zP/Fa+fs7iF2ADPoca7qJZaFh8Ikztgw0Ndx0=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=BrhcpHZ+eT5fXVj50MMdfBPrJ0puzq7hA0eUzNFruJ3crcNoGkqiaMxgMMcq/wgOaj ii+OKsntm0tkX+w7S0acdiWwDHbwm1sE8N2BL6AKUPNTJX7kQArdfhnH800UjHtR05SI EK8OQx3k+IG1HB5htJmLm2mFvsrFBzlxcF5nTkRGWpx0NV95Mo9AqdAogxAwbUpN0bB8 BFQ0mmNiYm/0bDxWnBImOy5E1jEUNAjval+m8GVuGO4E6Um1hn3U2XoPtDHzhL0jPowv ytzGUsS83krNpPAPimAgpCmxb5aw419VtxpwDXbBNI35EBGXTP1FQVbtmEkI03ApdOKi xzdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=fgFRBCuE; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c27-20020a17090603db00b00a247cdce410si1719758eja.857.2023.12.22.04.13.04; Fri, 22 Dec 2023 04:13:07 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=fgFRBCuE; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 89EDC68D2F2; Fri, 22 Dec 2023 14:12:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E171768D2D8 for ; Fri, 22 Dec 2023 14:12:41 +0200 (EET) Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1d427518d52so1651355ad.0 for ; Fri, 22 Dec 2023 04:12:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703247159; x=1703851959; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=JZu1YVhS6VJXOySMEc5puq7NRg7xKbuv09u4tN3gMJE=; b=fgFRBCuEV5hGhRX2Fy7D/T82lZE3Llhuy2OAuSwmD/4wt1sXfLwxuuXOuME1r2481s AoFyxvqYTAC8LHCICbWKt8riJAIdWtJqXKVDp2QkiAX0/kLvd5RVBgSDdXCKxfCjkg5r kA+nXlsW3+CaILUcwZdSJiIA3u4RxDzoh2EQ7Ge6w54xzoAxUEE6C2lciEtMyB+bw98R zVB6eYXEBjXlvWiKAv0iObbDvRPEGcfkfCgMMgOYqdQ34GIEmn9afgwtZ5uuvpSmQF97 URxHjVdep6kFEZK/7nOVs7kqy7RBm3txd/qVcEV+HmYStwGJotsmM7/gXBW5ij45DP7M xlbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703247159; x=1703851959; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JZu1YVhS6VJXOySMEc5puq7NRg7xKbuv09u4tN3gMJE=; b=U61ejHNrscGhBmFdhmZe7gX5T5FcvVXMj0Posc+L3mGDZNFLNl0tYp3V7DQxudVf0u U6o4746fU+GUTaUJDvKuoKPBSZjXVG0h9l/Dm1m00uHXSbmF02EDT3NFIrc5j5se+GjV RIBbH5OUS3qRYIAcOxRu0WAja/SLgnbBwd583ZEd0K3kb7wOtuJkR9VcYTJ8ZFUpR/Fo gKbXLM4f4vRevv/QNDBH+jkaCKWQg9gsR1IGBZQlBW6Siu6d9RFFIkdvPW18KgH6P+PU +U9e8YAY73i6DwDxfgah1bAyRwqwFdkyCXWLHZh9CB9TIOl8ZUMevXP2uCkwtXLlu/Lo jgHQ== X-Gm-Message-State: AOJu0Yymtj4jBIHoDel3iout2ca0k/y+tns2syYlcjV8lL11l8A/SZZ8 HLHdKR15qS6MeZtqGxWe0XzoJxfvqZo= X-Received: by 2002:a17:902:e544:b0:1d3:7368:663 with SMTP id n4-20020a170902e54400b001d373680663mr1474093plf.7.1703247159389; Fri, 22 Dec 2023 04:12:39 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id j14-20020a170902da8e00b001d09c5424d4sm3280608plx.297.2023.12.22.04.12.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Dec 2023 04:12:38 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 22 Dec 2023 09:12:32 -0300 Message-ID: <20231222121232.324-3-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231222121232.324-1-jamrial@gmail.com> References: <20231222121232.324-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] avcodec/takdsp: fix const correctness X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: htu8wOAg+51/ Signed-off-by: James Almer --- libavcodec/riscv/takdsp_init.c | 4 ++-- libavcodec/takdsp.c | 6 +++--- libavcodec/takdsp.h | 6 +++--- libavcodec/x86/takdsp_init.c | 12 ++++++------ tests/checkasm/takdsp.c | 6 +++--- 5 files changed, 17 insertions(+), 17 deletions(-) diff --git a/libavcodec/riscv/takdsp_init.c b/libavcodec/riscv/takdsp_init.c index 0b4ec18086..2d5c974459 100644 --- a/libavcodec/riscv/takdsp_init.c +++ b/libavcodec/riscv/takdsp_init.c @@ -25,8 +25,8 @@ #include "libavutil/riscv/cpu.h" #include "libavcodec/takdsp.h" -void ff_decorrelate_ls_rvv(int32_t *p1, int32_t *p2, int length); -void ff_decorrelate_sr_rvv(int32_t *p1, int32_t *p2, int length); +void ff_decorrelate_ls_rvv(const int32_t *p1, int32_t *p2, int length); +void ff_decorrelate_sr_rvv(int32_t *p1, const int32_t *p2, int length); av_cold void ff_takdsp_init_riscv(TAKDSPContext *dsp) { diff --git a/libavcodec/takdsp.c b/libavcodec/takdsp.c index 25cac558ce..51b6658de4 100644 --- a/libavcodec/takdsp.c +++ b/libavcodec/takdsp.c @@ -23,7 +23,7 @@ #include "takdsp.h" #include "config.h" -static void decorrelate_ls(int32_t *p1, int32_t *p2, int length) +static void decorrelate_ls(const int32_t *p1, int32_t *p2, int length) { int i; @@ -34,7 +34,7 @@ static void decorrelate_ls(int32_t *p1, int32_t *p2, int length) } } -static void decorrelate_sr(int32_t *p1, int32_t *p2, int length) +static void decorrelate_sr(int32_t *p1, const int32_t *p2, int length) { int i; @@ -58,7 +58,7 @@ static void decorrelate_sm(int32_t *p1, int32_t *p2, int length) } } -static void decorrelate_sf(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor) +static void decorrelate_sf(int32_t *p1, const int32_t *p2, int length, int dshift, int dfactor) { int i; diff --git a/libavcodec/takdsp.h b/libavcodec/takdsp.h index 55f1a10cd3..13b5e530b2 100644 --- a/libavcodec/takdsp.h +++ b/libavcodec/takdsp.h @@ -22,10 +22,10 @@ #include typedef struct TAKDSPContext { - void (*decorrelate_ls)(int32_t *p1, int32_t *p2, int length); - void (*decorrelate_sr)(int32_t *p1, int32_t *p2, int length); + void (*decorrelate_ls)(const int32_t *p1, int32_t *p2, int length); + void (*decorrelate_sr)(int32_t *p1, const int32_t *p2, int length); void (*decorrelate_sm)(int32_t *p1, int32_t *p2, int length); - void (*decorrelate_sf)(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); + void (*decorrelate_sf)(int32_t *p1, const int32_t *p2, int length, int dshift, int dfactor); } TAKDSPContext; void ff_takdsp_init(TAKDSPContext *c); diff --git a/libavcodec/x86/takdsp_init.c b/libavcodec/x86/takdsp_init.c index c99a057b24..9553f8442c 100644 --- a/libavcodec/x86/takdsp_init.c +++ b/libavcodec/x86/takdsp_init.c @@ -23,14 +23,14 @@ #include "libavutil/x86/cpu.h" #include "config.h" -void ff_tak_decorrelate_ls_sse2(int32_t *p1, int32_t *p2, int length); -void ff_tak_decorrelate_ls_avx2(int32_t *p1, int32_t *p2, int length); -void ff_tak_decorrelate_sr_sse2(int32_t *p1, int32_t *p2, int length); -void ff_tak_decorrelate_sr_avx2(int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_ls_sse2(const int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_ls_avx2(const int32_t *p1, int32_t *p2, int length); +void ff_tak_decorrelate_sr_sse2(int32_t *p1, const int32_t *p2, int length); +void ff_tak_decorrelate_sr_avx2(int32_t *p1, const int32_t *p2, int length); void ff_tak_decorrelate_sm_sse2(int32_t *p1, int32_t *p2, int length); void ff_tak_decorrelate_sm_avx2(int32_t *p1, int32_t *p2, int length); -void ff_tak_decorrelate_sf_sse4(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); -void ff_tak_decorrelate_sf_avx2(int32_t *p1, int32_t *p2, int length, int dshift, int dfactor); +void ff_tak_decorrelate_sf_sse4(int32_t *p1, const int32_t *p2, int length, int dshift, int dfactor); +void ff_tak_decorrelate_sf_avx2(int32_t *p1, const int32_t *p2, int length, int dshift, int dfactor); av_cold void ff_takdsp_init_x86(TAKDSPContext *c) { diff --git a/tests/checkasm/takdsp.c b/tests/checkasm/takdsp.c index 78528b1c5d..fd4122f34b 100644 --- a/tests/checkasm/takdsp.c +++ b/tests/checkasm/takdsp.c @@ -37,7 +37,7 @@ #define BUF_SIZE 1024 static void test_decorrelate_ls(TAKDSPContext *s) { - declare_func(void, int32_t *, int32_t *, int); + declare_func(void, const int32_t *, int32_t *, int); if (check_func(s->decorrelate_ls, "decorrelate_ls")) { LOCAL_ALIGNED_32(int32_t, p1, [BUF_SIZE]); @@ -62,7 +62,7 @@ static void test_decorrelate_ls(TAKDSPContext *s) { } static void test_decorrelate_sr(TAKDSPContext *s) { - declare_func(void, int32_t *, int32_t *, int); + declare_func(void, int32_t *, const int32_t *, int); if (check_func(s->decorrelate_sr, "decorrelate_sr")) { LOCAL_ALIGNED_32(int32_t, p1, [BUF_SIZE]); @@ -115,7 +115,7 @@ static void test_decorrelate_sm(TAKDSPContext *s) { } static void test_decorrelate_sf(TAKDSPContext *s) { - declare_func(void, int32_t *, int32_t *, int, int, int); + declare_func(void, int32_t *, const int32_t *, int, int, int); if (check_func(s->decorrelate_sf, "decorrelate_sf")) { LOCAL_ALIGNED_32(int32_t, p1, [BUF_SIZE]);