From patchwork Sun Sep 10 19:03:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 43682 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4e27:b0:149:dfde:5c0a with SMTP id gk39csp1724865pzb; Sun, 10 Sep 2023 11:56:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHhfELPgSVNZhxIIzfp5o9mtIrbb8N5Qc+TsfTOqNkM/O6EgKMM7JWVAW35tIsOnlMlx9i3 X-Received: by 2002:aa7:c989:0:b0:525:58aa:6c83 with SMTP id c9-20020aa7c989000000b0052558aa6c83mr6870268edt.11.1694372175483; Sun, 10 Sep 2023 11:56:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694372175; cv=none; d=google.com; s=arc-20160816; b=fU4CfPKkyLwv9a9TOpX/XTgvft6K2u/OQIqBY8CyNXm/5fvA+d8QaUqZiIgRKGaFgG rvXxE7kis7xywxhdGGR3yhA38ZIgFkuASe+qO4U18KT1F/4p1miUIfWtSE3eFkc/6War ePuUCnHsRwyiZHd/xc8zBfWi4gabzZFGkDrZPZPPKC50oFswo5AYsgvRHj6W3IuE47Lp tKSclNOW8RIqZFuIH0zpwqjjlZTkxba4YBZFgyXs+GwCod1ZLPdjTICtSpFVKxIn2e62 ss3eZalcwkO/4VlzHBc0TfawTyPbH1y0PbYcPsFiR/5QqJcQ7MxZjBSv33mn4ujafd5v Mceg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=80xAuuPjJ+F7KVZK3YA9nFrgCVk4ahDLJZ+Rdq/UGqY=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=cDIdHA9T6bar1G1IckOEwUChWepjfYVQd6H4DTFTuwwqlpidJE09ejN4kueJWkUFJe CSigp+KSyXJOovgWf5n0c/PWXLYn7Acsheyqig1LlpKWUdEgG5F3OTbw0gmw+XwG07Vy fSjK9HSIyi10chUAiOd+dOSjjZXYj3PbTxSfpsJGrq1Skte53yLQ4rZ10ti4v/vLMkA+ 08I5+Ni99k0o3BgaAkSPrzeYvI4UoPgLh8u18k+I9zNgiG9nXIlFZ3hH0oU3w0rOYZjT pxLDLTJD1ZIL9Enb5DhDQitd+nJ9kriomaJa8+lq7AgEL5xCiZx8e5EXmknnGmOzHv8H KsPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20221208 header.b=stB9KOLC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l11-20020aa7d94b000000b005254cf5c284si5103244eds.526.2023.09.10.11.56.14; Sun, 10 Sep 2023 11:56:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20221208 header.b=stB9KOLC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 87FE568C91F; Sun, 10 Sep 2023 21:56:11 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-vk1-f173.google.com (mail-vk1-f173.google.com [209.85.221.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A713D68C8F7 for ; Sun, 10 Sep 2023 21:56:04 +0300 (EEST) Received: by mail-vk1-f173.google.com with SMTP id 71dfb90a1353d-495bcd861ccso785203e0c.3 for ; Sun, 10 Sep 2023 11:56:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694372163; x=1694976963; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=Tt7ANg2m2Ge1qwfN9VKi2Dyr/6GH/ixFFtWDf6lFJ28=; b=stB9KOLCxfr2hnawecwUZrxKRXM7mFXu4bwmqXaGnKg7CmVudhFYnBBL7jJNMiGNM3 /WzlCr3Iq8bRSMI1YHvYk5es3iVunCGv26dZ0E9/JaMUaZrcBsXq0BadHzokMiJrXhNs 6+Bi/TrttUJe0UWBDRQ7+po6at+I5X8hVh0J7Y/B9HYFV28SSsFMyIjgeSuJVI5dqRRS OtRZxWuNZzGGpO7ZEOUqzIk8SIyWgY1F0Jvm3bt5gEbHXSecbmqCTMa+2VuULWqKo2OY 3elDuQpnPCecmzsLexxD6xDiowbSYcFJD4jmVPW9P3T8NLchLv7NrPZFdaV+NvwCbHr7 GOqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694372163; x=1694976963; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Tt7ANg2m2Ge1qwfN9VKi2Dyr/6GH/ixFFtWDf6lFJ28=; b=dzzDJBCDOgp3ncgvFII3FFBKtHTAAfq/E26VlRw3yaxhhdfvkaGP3f862tlnGWXsA8 ym8BumrOkxRXPcyxSEJHtnSNQifKVa25TGlIXXSjMGX0jXpqPk3M1UpUvwQNZHrxnlGY qhkvNDtEt7ZTGG645OsjOMi9LYe7nivPhw2SBpiEkxJx4ymGfXgXqLK71pLk+gnNdaxT X1WUTnKG6MbempGNTLkS3BkO85P08cRvLomnWSNhjO8CXSiMZZP17812S48gpOafzBgE E4IxC2jNHW1jgYBsvLGc+ufPjS8QS5FvH9H81ExNtT77yXDdAb2OvBXcc4XzLuiLCiBw ODqQ== X-Gm-Message-State: AOJu0Ywl2xtszyP59nbVPBnrnUo33jKe1RaNpQVfM0B63+zvbp4K6bBe smj97HZXM13rEQWmYLZVTG2uVjVm82gzNOCgQG9qaS25XpQ= X-Received: by 2002:a05:6122:2662:b0:495:bc5d:4e66 with SMTP id dx2-20020a056122266200b00495bc5d4e66mr5742454vkb.7.1694372162878; Sun, 10 Sep 2023 11:56:02 -0700 (PDT) MIME-Version: 1.0 From: Paul B Mahol Date: Sun, 10 Sep 2023 21:03:27 +0200 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH] avfilter/x86/af_afir: add FMA3 SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: HYQl1MgdXyaW Attached. From 7735a84fd0fdae731955f50bddba8dfef395713b Mon Sep 17 00:00:00 2001 From: Paul B Mahol Date: Sun, 10 Sep 2023 19:25:20 +0200 Subject: [PATCH] avfilter/x86/af_afir: add FMA3 SIMD Signed-off-by: Paul B Mahol --- libavfilter/x86/af_afir.asm | 27 +++++++++++++++++++++++++++ libavfilter/x86/af_afir_init.c | 5 +++++ 2 files changed, 32 insertions(+) diff --git a/libavfilter/x86/af_afir.asm b/libavfilter/x86/af_afir.asm index 2cc09709a2..ed0276c7b9 100644 --- a/libavfilter/x86/af_afir.asm +++ b/libavfilter/x86/af_afir.asm @@ -67,3 +67,30 @@ INIT_XMM sse3 FCMUL_ADD INIT_YMM avx FCMUL_ADD + +%if HAVE_FMA3_EXTERNAL +INIT_YMM fma3 +cglobal fcmul_add, 4,4,4, sum, t, c, len + shl lend, 3 + add tq, lenq + add cq, lenq + add sumq, lenq + neg lenq +.loop: + movaps m0, [tq + lenq] + movaps m1, [cq + lenq] + vpermilps m3, m0, 177 + vpermilps m2, m1, 160 + vpermilps m1, m1, 245 + mulps m1, m1, m3 + vfmaddsub132ps m0, m1, m2 + addps m0, m0, [sumq + lenq] + movaps [sumq + lenq], m0 + add lenq, mmsize + jl .loop + movss xm0, [tq + lenq] + mulss xm0, [cq + lenq] + addss xm0, [sumq + lenq] + movss [sumq + lenq], xm0 + RET +%endif diff --git a/libavfilter/x86/af_afir_init.c b/libavfilter/x86/af_afir_init.c index e53817b9c0..d573acf10b 100644 --- a/libavfilter/x86/af_afir_init.c +++ b/libavfilter/x86/af_afir_init.c @@ -26,6 +26,8 @@ void ff_fcmul_add_sse3(float *sum, const float *t, const float *c, ptrdiff_t len); void ff_fcmul_add_avx(float *sum, const float *t, const float *c, ptrdiff_t len); +void ff_fcmul_add_fma3(float *sum, const float *t, const float *c, + ptrdiff_t len); av_cold void ff_afir_init_x86(AudioFIRDSPContext *s) { @@ -37,4 +39,7 @@ av_cold void ff_afir_init_x86(AudioFIRDSPContext *s) if (EXTERNAL_AVX_FAST(cpu_flags)) { s->fcmul_add = ff_fcmul_add_avx; } + if (EXTERNAL_FMA3_FAST(cpu_flags)) { + s->fcmul_add = ff_fcmul_add_fma3; + } } -- 2.39.1