From patchwork Wed Nov 22 19:49:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 44747 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:bca6:b0:181:818d:5e7f with SMTP id fx38csp118002pzb; Wed, 22 Nov 2023 11:49:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IFiLJW42NrL+jtwiZhtfHAIcdUMgHUo5TeFeyO8vpTNLcTHC1eEhEFmlvYCzxIfj8ArXfnD X-Received: by 2002:a17:906:ce:b0:9ae:50de:1aaf with SMTP id 14-20020a17090600ce00b009ae50de1aafmr1963998eji.4.1700682568133; Wed, 22 Nov 2023 11:49:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700682568; cv=none; d=google.com; s=arc-20160816; b=g+/RYEAdhfDnGigDg9auhwmikjvt/7V93qpLLbL86vytSbPvpnOasdFteiFNHjGdgm aL5Ajyzej8vV3qDzbENBjH2P7o/pm/3b45TI2HFNzAsecAJGYmL9Pwwig7EL41rShKtD 3Ev1rNRx7MOzKYoB7Fv6wx9ODCMnoF+KRbsG1pn1aR2TKpVVnrN/OYfntd9UuIJ5JQT7 +K1+ZIjGdG1xEyT022PUy2scPaCgUr0Ro+ijPvSdrOkz7N1uDN4oEuZI8sFfUIW5OdJk hQk0EryFenPDNxO7ditmwoj551L31yjlyMuK4k5X3TgwF1RYWQQBgt6tpZc94AVps+pP XFhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=HU7ss5XudROEXR5Eone3EXeZ+2SUj/czKNLFu4AsE6k=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=U7TB7jIGLkysVm2Tshk9zQGTh5SB4p2ziaS6+3UWfvyxQvMJqbtTmI65oqY5TNb+8V 54Sj59Ty9U+hlyBsqTVBrhbCZX6YuxpOCQ9I1/HW+LJmBZvptDfqLh5T5RZLgTcfqtm/ l2ojy5e/4Bpk6U2bPm6SgjLmT9H2onMYhUnQpe990LZsMmZ7ATavar5wZT1ipBK5wSGH zAzLkkgLnDNRtukp0eEg8lj9m3uxKmHIXVVhmOpJTBhVSvU3uvJjnoI8z3tzrTk254Gl pG6Wdfgap2sZ/OLiO6ygzuhsUpuciUeeHKUNzF7T1NkFoUdLuRt9zxaHnNBstbgOcWJD zjWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Ug32jYj2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c23-20020a170906695700b00a0386b1ac79si118906ejs.203.2023.11.22.11.49.27; Wed, 22 Nov 2023 11:49:28 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Ug32jYj2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E33AF68CEE9; Wed, 22 Nov 2023 21:49:23 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4494568C9D6 for ; Wed, 22 Nov 2023 21:49:17 +0200 (EET) Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6bd32d1a040so174959b3a.3 for ; Wed, 22 Nov 2023 11:49:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700682555; x=1701287355; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=BfUgfR6CLyi2uy/Qt1O8LbcikFl8OKk4vMCrxM0MJvA=; b=Ug32jYj2qT28TvCm6Hdu8q/X1Qut5+knGwFsawmZT2R+PYBhhbTQP/KceCS+8TMrvT 7n8LH41YKTUleoDu3SLnGAHYnM2jWQjNYJxN+4FkV75t+gzsDEJVx08bH7TfYPNF3Omg bkXOH7MO/prjdNV0HInz04KheqSCNTbPEHrTzvCBn1/I9QnjYSxmR3OIYw+ZdIkub/8O +efA8mCyMOFed60j7dQypnFQfeYcWLmb7bSZ9AWuy7lhyjn+FwGkABOFmn+6uyxb3/r5 +yMOeoB3SJUst6UIZLPHVnRt3/VSB5+RZSnaFWD5ZIuJ/db3FKTPbcltlN7C9N/i8Oup OOtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700682555; x=1701287355; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BfUgfR6CLyi2uy/Qt1O8LbcikFl8OKk4vMCrxM0MJvA=; b=qcpxDey9+p9RuDpzOlmRcvEAXsubHtTSVWXTOQgznuLntU9aib86Mx9HNPuw28nfU+ 4uXxrs0T+9w3KKQKb0cphx5ZXbcAzOd+0Wz6gSXdfRTYP7q10GrYzKb1/c6KBtAwFPA2 ZK7A2DDKP8JyoWTCf9bYUwBQXTdEOPaWJd/V5pEN66zuqeB/uuhWhACpcn9FvSgYf3fy mQO8GelonZRI+nP1+eckxspsRxxFfg2ce+rFhwoKaLRBMuTJeU4q9gC2GJMpa1/7JY9v ZY/Q/gMVRTcvj4WpYleXd7lMuXeRe2/KExwoo+DMsxJgsTM/fqp15Fly4Lw3+MRjCWVh 6h7w== X-Gm-Message-State: AOJu0YyGtP2YosD7FUaOOyOEUi5qUZ7e+EHUet9uBaG7o5wFe/NCDkl3 IW9FOzcxpVVxI2xGe3NOCkNnQH6xfTM= X-Received: by 2002:a05:6a20:7f94:b0:180:1b3b:d560 with SMTP id d20-20020a056a207f9400b001801b3bd560mr4067493pzj.41.1700682554593; Wed, 22 Nov 2023 11:49:14 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id h18-20020aa786d2000000b00688965c5227sm100156pfo.120.2023.11.22.11.49.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 11:49:13 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 Nov 2023 16:49:11 -0300 Message-ID: <20231122194913.9856-1-jamrial@gmail.com> X-Mailer: git-send-email 2.42.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kb2xnRdUtg6/ Signed-off-by: James Almer --- libavcodec/x86/ac3dsp.asm | 46 +++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/libavcodec/x86/ac3dsp.asm b/libavcodec/x86/ac3dsp.asm index a95d359d95..42c8310462 100644 --- a/libavcodec/x86/ac3dsp.asm +++ b/libavcodec/x86/ac3dsp.asm @@ -77,16 +77,20 @@ AC3_EXPONENT_MIN INIT_XMM sse2 cglobal float_to_fixed24, 3, 3, 9, dst, src, len movaps m0, [pf_1_24] + shl lenq, 2 + add srcq, lenq + add dstq, lenq + neg lenq .loop: - movaps m1, [srcq ] - movaps m2, [srcq+16 ] - movaps m3, [srcq+32 ] - movaps m4, [srcq+48 ] + movaps m1, [srcq+lenq ] + movaps m2, [srcq+lenq+16 ] + movaps m3, [srcq+lenq+32 ] + movaps m4, [srcq+lenq+48 ] %ifdef m8 - movaps m5, [srcq+64 ] - movaps m6, [srcq+80 ] - movaps m7, [srcq+96 ] - movaps m8, [srcq+112] + movaps m5, [srcq+lenq+64 ] + movaps m6, [srcq+lenq+80 ] + movaps m7, [srcq+lenq+96 ] + movaps m8, [srcq+lenq+112] %endif mulps m1, m0 mulps m2, m0 @@ -108,24 +112,20 @@ cglobal float_to_fixed24, 3, 3, 9, dst, src, len cvtps2dq m7, m7 cvtps2dq m8, m8 %endif - movdqa [dstq ], m1 - movdqa [dstq+16 ], m2 - movdqa [dstq+32 ], m3 - movdqa [dstq+48 ], m4 + movdqa [dstq+lenq ], m1 + movdqa [dstq+lenq+16 ], m2 + movdqa [dstq+lenq+32 ], m3 + movdqa [dstq+lenq+48 ], m4 %ifdef m8 - movdqa [dstq+64 ], m5 - movdqa [dstq+80 ], m6 - movdqa [dstq+96 ], m7 - movdqa [dstq+112], m8 - add srcq, 128 - add dstq, 128 - sub lenq, 32 + movdqa [dstq+lenq+64 ], m5 + movdqa [dstq+lenq+80 ], m6 + movdqa [dstq+lenq+96 ], m7 + movdqa [dstq+lenq+112], m8 + add lenq, 128 %else - add srcq, 64 - add dstq, 64 - sub lenq, 16 + add lenq, 64 %endif - ja .loop + jl .loop RET ;------------------------------------------------------------------------------ From patchwork Wed Nov 22 19:49:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 44748 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:bca6:b0:181:818d:5e7f with SMTP id fx38csp118067pzb; Wed, 22 Nov 2023 11:49:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IEBogMEZPFCNOHjjpIetYkjpz+sKfosdJRAXXU7PGVjVF61aku2rE/23bQN2XtXMkDaUVJN X-Received: by 2002:a17:906:1959:b0:a02:1e8d:b94f with SMTP id b25-20020a170906195900b00a021e8db94fmr1189587eje.57.1700682577558; Wed, 22 Nov 2023 11:49:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700682577; cv=none; d=google.com; s=arc-20160816; b=pH2tTjVF4rPLyzwEdof74eA3VgITbE8Sw88nSSL3yusAJmGz6TClFpD70uSC3orK4q UGGOURVg9bNASSN0Sz3E8TEwmB0dSSzehAN/MkZEwLmr+qSn/cOz5qIBvWCGna9mTFTP wn0KVhLWtGPzo9CZipwDK40baBLYd/XLLca1RJ9XvohdmsS6XfbsOlZ0FF25xEyAmwL7 CGeBVWdV3uryUv7aMgVa5iFm3Qvj55bCS+XpYXO5w/VOKzt49KjqVpAHzrvXU1NQXaR7 rGTnNloTBasrrm/gYkXNyu/aFViZ+wfCIY0Ah38qdkfnHlIgwyKmPohtKmq42Ovh1Yyi Hmnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=RJsVKKm1nfb/N2K7K0kAMPaKOlnIYjA1IoJXBTjNwyY=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=cefIUEtKmq7nkxnyypq0hIajSmQmqFZeOeMj07f+9GhQZSQqHIPjCzCAT1/2Cpd5cM WCtZ6jYvoflSqGGNleCpQcNHqcu37NL/9F/eB6kPidCv7DT/wq3BpnSzHV+v85ph3w0w nHskpWLX99wB81ijXQV1VS2kxYQIBARFr0B7TpwAV3LUfH82/GTn5bCaVMpbWSG1PKRU DznLPygaHlOEHQnj+UqOSToncMXUcGbNxp7Ma+BznLYBSFZcCRo9OxFsDlGSvArq8L2G LeGPdv74fjoHRwSciwsMOA43N6BBUZwjHKrOSMlrLqtFNg6PO5i3q1iXZBXiuvp4wwSa n5yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=dXTtkStp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id gg17-20020a170906e29100b009efe6fdffa0si110487ejb.417.2023.11.22.11.49.37; Wed, 22 Nov 2023 11:49:37 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=dXTtkStp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E3DB868CEFB; Wed, 22 Nov 2023 21:49:24 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 58C9468CBA1 for ; Wed, 22 Nov 2023 21:49:18 +0200 (EET) Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6b5af4662b7so162574b3a.3 for ; Wed, 22 Nov 2023 11:49:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700682556; x=1701287356; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=04CTdn6udgjSE2E73cBKjHqFN6r1aFzHWp4JR3pjS94=; b=dXTtkStpIsbuVvh0KFlz4qD5mz6PmGu/9RjKZZeVRd6GMTmaOu7ATS7UHkBkvuMoXl 0Oy5fL1ELi8jSxbpNL2sNexA5qIMyYl9FwtuOYgJlLIrBInTZpgdFRwvvQODeg+Qkasu 8kJYcwh1Xqd6tiydmBuAvh2WNl7NZXlmnGKhTJknawYUY2coWDtzvPagjsR4s6QSsDJ8 4NcPJH/HfdtQv2O0/ZNa/eW+uVqRdvYcMt+eLWQXRv2o68uMDsEISgSU7gfx4iskinfF mz8ahdkfM6iIpNCi4jOCz0vwzDmhyt3/x1jUeo+GjeBM3cEXtCLLiHDqOTs/3Ac5xw6A yf/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700682556; x=1701287356; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=04CTdn6udgjSE2E73cBKjHqFN6r1aFzHWp4JR3pjS94=; b=qlDW9xn39hPpw5n5cfTJy4fJy2NC9aEzM8MAG5IvP9YT8/KQGc/q2xsr4ym7nkt9hH bOAeNzPRkGz5bLWYi3x9xAAw6sMkmS6QcwMEnen5r91R4bFsAxbV85uyH6w6yKsIvyOd 3/OkP2UPP3IMAV+kCo5TTiJRHku9Iw3puXqNdxotnPfNWRD6fDnh7YYrsrFimIxIPJ5R TKiYwQQDf55DBKxOfDaH7OhgFgZ4i9bK16VlcjhbTT8Cwut5Q3C7y8kg55GC33oG6W3r kRwp2jC4MPGzG+r6RpoUC04Dy7KNFuPPie3KON6uEEChrWFn2/Ca/nr0136LMA/2zNCC sR5g== X-Gm-Message-State: AOJu0Yy1RLazr6FQ14pPTW5+OhflHpBSreVA0/lSCMtjuvJQevCn0b11 78DyLjC2Y+BmbKOAkw6p1aePpr0CcpM= X-Received: by 2002:a05:6a20:8f02:b0:18b:826e:e611 with SMTP id b2-20020a056a208f0200b0018b826ee611mr751778pzk.40.1700682556156; Wed, 22 Nov 2023 11:49:16 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id h18-20020aa786d2000000b00688965c5227sm100156pfo.120.2023.11.22.11.49.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 11:49:15 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 Nov 2023 16:49:12 -0300 Message-ID: <20231122194913.9856-2-jamrial@gmail.com> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231122194913.9856-1-jamrial@gmail.com> References: <20231122194913.9856-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] x86/ac3dsp: add ff_float_to_fixed24_avx2() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kzi0X2aiMkaJ Signed-off-by: James Almer --- libavcodec/ac3dsp.h | 4 ++-- libavcodec/ac3enc_template.c | 2 +- libavcodec/x86/ac3dsp.asm | 28 ++++++++++++++++++++++++++-- libavcodec/x86/ac3dsp_init.c | 4 ++++ 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h index a01bff3d11..25341f3396 100644 --- a/libavcodec/ac3dsp.h +++ b/libavcodec/ac3dsp.h @@ -47,9 +47,9 @@ typedef struct AC3DSPContext { * [-(1<<24),(1<<24)] * * @param dst destination array of int32_t. - * constraints: 16-byte aligned + * constraints: 32-byte aligned * @param src source array of float. - * constraints: 16-byte aligned + * constraints: 32-byte aligned * @param len number of elements to convert. * constraints: multiple of 32 greater than zero */ diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c index be4ecebc9c..a16faea681 100644 --- a/libavcodec/ac3enc_template.c +++ b/libavcodec/ac3enc_template.c @@ -112,7 +112,7 @@ static void apply_channel_coupling(AC3EncodeContext *s) { LOCAL_ALIGNED_16(CoefType, cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]); #if AC3ENC_FLOAT - LOCAL_ALIGNED_16(int32_t, fixed_cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]); + LOCAL_ALIGNED_32(int32_t, fixed_cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]); #else int32_t (*fixed_cpl_coords)[AC3_MAX_CHANNELS][16] = cpl_coords; #endif diff --git a/libavcodec/x86/ac3dsp.asm b/libavcodec/x86/ac3dsp.asm index 42c8310462..e31c58e1c1 100644 --- a/libavcodec/x86/ac3dsp.asm +++ b/libavcodec/x86/ac3dsp.asm @@ -21,10 +21,10 @@ %include "libavutil/x86/x86util.asm" -SECTION_RODATA +SECTION_RODATA 32 ; 16777216.0f - used in ff_float_to_fixed24() -pf_1_24: times 4 dd 0x4B800000 +pf_1_24: times 8 dd 0x4B800000 ; used in ff_ac3_compute_mantissa_size() cextern ac3_bap_bits @@ -128,6 +128,30 @@ cglobal float_to_fixed24, 3, 3, 9, dst, src, len jl .loop RET +INIT_YMM avx2 +cglobal float_to_fixed24, 3, 3, 5, dst, src, len + movaps m0, [pf_1_24] + shl lenq, 2 + add srcq, lenq + add dstq, lenq + neg lenq +.loop: + mulps m1, m0, [srcq+lenq+mmsize*0] + mulps m2, m0, [srcq+lenq+mmsize*1] + mulps m3, m0, [srcq+lenq+mmsize*2] + mulps m4, m0, [srcq+lenq+mmsize*3] + cvtps2dq m1, m1 + cvtps2dq m2, m2 + cvtps2dq m3, m3 + cvtps2dq m4, m4 + movdqa [dstq+lenq+mmsize*0], m1 + movdqa [dstq+lenq+mmsize*1], m2 + movdqa [dstq+lenq+mmsize*2], m3 + movdqa [dstq+lenq+mmsize*3], m4 + add lenq, mmsize*4 + jl .loop + RET + ;------------------------------------------------------------------------------ ; int ff_ac3_compute_mantissa_size(uint16_t mant_cnt[6][16]) ;------------------------------------------------------------------------------ diff --git a/libavcodec/x86/ac3dsp_init.c b/libavcodec/x86/ac3dsp_init.c index 43b3b4ac85..106121b5b9 100644 --- a/libavcodec/x86/ac3dsp_init.c +++ b/libavcodec/x86/ac3dsp_init.c @@ -27,6 +27,7 @@ void ff_ac3_exponent_min_sse2 (uint8_t *exp, int num_reuse_blocks, int nb_coefs); void ff_float_to_fixed24_sse2 (int32_t *dst, const float *src, unsigned int len); +void ff_float_to_fixed24_avx2 (int32_t *dst, const float *src, unsigned int len); int ff_ac3_compute_mantissa_size_sse2(uint16_t mant_cnt[6][16]); @@ -48,6 +49,9 @@ av_cold void ff_ac3dsp_init_x86(AC3DSPContext *c) if (!(cpu_flags & AV_CPU_FLAG_ATOM)) c->extract_exponents = ff_ac3_extract_exponents_ssse3; } + if (EXTERNAL_AVX2_FAST(cpu_flags)) { + c->float_to_fixed24 = ff_float_to_fixed24_avx2; + } } #define DOWNMIX_FUNC_OPT(ch, opt) \ From patchwork Wed Nov 22 19:49:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 44749 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:bca6:b0:181:818d:5e7f with SMTP id fx38csp118180pzb; Wed, 22 Nov 2023 11:49:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IGDemDGCHLPKAklVIinSSNlYxX+38fldgqfge5b5Jqs1lyQNN5xNmSZhssMDO05T0zDf+cP X-Received: by 2002:a05:6512:3d8a:b0:509:162b:8fe0 with SMTP id k10-20020a0565123d8a00b00509162b8fe0mr2520004lfv.1.1700682588752; Wed, 22 Nov 2023 11:49:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700682588; cv=none; d=google.com; s=arc-20160816; b=R7TAgDrssVQlt1xNCrIzy7k4nDqNa3yUI39aQKmB2zIBBatdwUa69vnqdBMU0PtoST qVzO1EDZfGdinb0eNE5OpKfHioE9jkwO3TzNNqSXLzak+4JjpBkqE5tRPJZXGodEZXWQ OwYlY8AvPEblbE56N4xjrkhyT/uWrlYpmXUjxSRakyfgfKkd3EbOemAWKHj4xWMv4+J5 5HnBujnu3mIw0YJCu6wvZjHX4JCiY9LJoT7JkX/ds3QPZoSgk9794xEiiqZZP8cA5XfX miMVkt4+a9BFE53kODz/A3oSw/+V3i7DI8Ak/sqkSAQGKw4iKgyCj9XW1CAevaFovs6G TJTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=QjbvJUS3JQYNQie28/tzDkMjS4c5btWHHFCaaTF3dZ8=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=jFzWGdn9pZLf4blSyZ9GT65Dz0XcbVgepYz10tMIyy6f7sDIhKK1sUBWJ1fdhs10CL sxcPmmQjpqeJLHyMDPp62y3As0Q3Xvz4hTLfalvYOkmEl3QbX41Crei66o9EHXeYSrti 7IdGBkDzBF8Pb8ZX+eJjPmYNkjhU/NcHuVXAr/Zdj0b7gQDdn26Ojbng5CRSUnFGemW9 3Ef1WOVSwQRpNoWjuGE5UkJi0MIuy8XhkFU4+Z6nuYToe+5PaMYpg6nx5cAkphDyx57A MI649VefX/t+nuvLq1CMjXUcNg5l3/ssOEv1OXfl0kK5lDiLXVF1MOAFaDADMmeWeFUz 4EGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=l6ioKbZL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g17-20020a1709064e5100b009febad9babesi108325ejw.999.2023.11.22.11.49.48; Wed, 22 Nov 2023 11:49:48 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=l6ioKbZL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E465468CF01; Wed, 22 Nov 2023 21:49:26 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1CB4368CEDE for ; Wed, 22 Nov 2023 21:49:20 +0200 (EET) Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6b709048d8eso183169b3a.2 for ; Wed, 22 Nov 2023 11:49:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700682558; x=1701287358; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=+IkSCY24xaNWzC6cxJrr9xFKp6dYWHSj0Eh2a1LB9rQ=; b=l6ioKbZLzSPtLcG1L1RFkh44uwSZ582aXH1H72yIqdC43+fqAootMBmRVN8+Pos7wj 4HcmZLTOtNSJMGeZKC2lrUO7fSTC4/bp8NTp62zcdDjVehtsrU96lSRGVrCgz5h8wDqa qP3hV01yVMIq8F+ol8DpOu9h/urv8MfcN2u7KJgkAtLJ8NpjReE2DmOD7DKyH1xEiTUk QcAVxmu9YeHJUeffgiGt9jV/kfBJb4mghirxB1EI4SPsaiuF2m0Yi2d/2whldxVOUsan UUy60E6e+kz6eCKIezs9hrVAQYVqk0GzorWT3kMMhrMJScrT1uxU75mWlDiKihjLeMgD BlHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700682558; x=1701287358; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+IkSCY24xaNWzC6cxJrr9xFKp6dYWHSj0Eh2a1LB9rQ=; b=Ay1pJ7XIOcGEMvH3hNhTytn7qnJdkEqFB1CjWmi+E2qvXAmADazTPvU+4pHdE2/KSA 7syJiOrPwv+c287ehd4tUPsQPifuzpYqHgZb+yEM7bhAYfEgtnF1zLW9VfQM/bxFNgDC 5HDFh8Eg+3niy46g14/mn/aCKoxsQOK+xjvivr1pq9d/e6GPijhyE/7BPaXRzZX5K4Xr 8fSnpvQCzYcahietmiEwrwPrSW9yvNd1IAa4MH5mVUZ3Zv/r8dU0JLwmTV+xn1hL91J/ UpKYMYcBGB3NOTCaw+hgESDt9tQV6LHk/Hob2RVbyz206oTmfoHN6iYI2I2kq3nLkK79 pQDQ== X-Gm-Message-State: AOJu0Ywl0VTNk/l1AJEn4d5lKDBfMlaTo4VPe6TM4FixLNF/WG570rg1 2HOVpvzCpDAFB+fKwYsXabME0SvggtU= X-Received: by 2002:a05:6a20:5483:b0:18b:5390:293a with SMTP id i3-20020a056a20548300b0018b5390293amr2586910pzk.3.1700682557818; Wed, 22 Nov 2023 11:49:17 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id h18-20020aa786d2000000b00688965c5227sm100156pfo.120.2023.11.22.11.49.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 11:49:17 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 Nov 2023 16:49:13 -0300 Message-ID: <20231122194913.9856-3-jamrial@gmail.com> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231122194913.9856-1-jamrial@gmail.com> References: <20231122194913.9856-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] avcodec/ac3dsp: make len a size_t in float_to_fixed24 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bzBKSDKKUcN7 Should simplify asm implementations, and prevent UB on at least win64. Signed-off-by: James Almer --- libavcodec/ac3dsp.c | 2 +- libavcodec/ac3dsp.h | 2 +- libavcodec/arm/ac3dsp_init_arm.c | 2 +- libavcodec/mips/ac3dsp_mips.c | 2 +- libavcodec/x86/ac3dsp_init.c | 4 ++-- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c index 302b786b15..8397e03d32 100644 --- a/libavcodec/ac3dsp.c +++ b/libavcodec/ac3dsp.c @@ -54,7 +54,7 @@ static void ac3_exponent_min_c(uint8_t *exp, int num_reuse_blocks, int nb_coefs) } } -static void float_to_fixed24_c(int32_t *dst, const float *src, unsigned int len) +static void float_to_fixed24_c(int32_t *dst, const float *src, size_t len) { const float scale = 1 << 24; do { diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h index 25341f3396..ec2f598451 100644 --- a/libavcodec/ac3dsp.h +++ b/libavcodec/ac3dsp.h @@ -53,7 +53,7 @@ typedef struct AC3DSPContext { * @param len number of elements to convert. * constraints: multiple of 32 greater than zero */ - void (*float_to_fixed24)(int32_t *dst, const float *src, unsigned int len); + void (*float_to_fixed24)(int32_t *dst, const float *src, size_t len); /** * Calculate bit allocation pointers. diff --git a/libavcodec/arm/ac3dsp_init_arm.c b/libavcodec/arm/ac3dsp_init_arm.c index a64aa6ae82..ae989069c9 100644 --- a/libavcodec/arm/ac3dsp_init_arm.c +++ b/libavcodec/arm/ac3dsp_init_arm.c @@ -26,7 +26,7 @@ #include "config.h" void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs); -void ff_float_to_fixed24_neon(int32_t *dst, const float *src, unsigned int len); +void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len); void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs); void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4], const int32_t *coef0, diff --git a/libavcodec/mips/ac3dsp_mips.c b/libavcodec/mips/ac3dsp_mips.c index a5eaaf8eb2..3ea3acc185 100644 --- a/libavcodec/mips/ac3dsp_mips.c +++ b/libavcodec/mips/ac3dsp_mips.c @@ -203,7 +203,7 @@ static void ac3_update_bap_counts_mips(uint16_t mant_cnt[16], uint8_t *bap, #if HAVE_MIPSFPU #if !HAVE_MIPS32R6 && !HAVE_MIPS64R6 -static void float_to_fixed24_mips(int32_t *dst, const float *src, unsigned int len) +static void float_to_fixed24_mips(int32_t *dst, const float *src, size_t len) { const float scale = 1 << 24; float src0, src1, src2, src3, src4, src5, src6, src7; diff --git a/libavcodec/x86/ac3dsp_init.c b/libavcodec/x86/ac3dsp_init.c index 106121b5b9..baa2bea3a4 100644 --- a/libavcodec/x86/ac3dsp_init.c +++ b/libavcodec/x86/ac3dsp_init.c @@ -26,8 +26,8 @@ void ff_ac3_exponent_min_sse2 (uint8_t *exp, int num_reuse_blocks, int nb_coefs); -void ff_float_to_fixed24_sse2 (int32_t *dst, const float *src, unsigned int len); -void ff_float_to_fixed24_avx2 (int32_t *dst, const float *src, unsigned int len); +void ff_float_to_fixed24_sse2 (int32_t *dst, const float *src, size_t len); +void ff_float_to_fixed24_avx2 (int32_t *dst, const float *src, size_t len); int ff_ac3_compute_mantissa_size_sse2(uint16_t mant_cnt[6][16]);