From patchwork Wed Jul 27 17:34:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Swinney, Jonathan" X-Patchwork-Id: 37013 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp497842pzb; Wed, 27 Jul 2022 10:35:08 -0700 (PDT) X-Google-Smtp-Source: AGRyM1thcEEuVxsFe362pbq6F9LNAG48VTfetg6sr4oOMFuLGerwHu4LNTSSz8av2xqR7UlrJMMY X-Received: by 2002:a17:906:4598:b0:72e:dda1:d1d5 with SMTP id qs24-20020a170906459800b0072edda1d1d5mr18681346ejc.480.1658943308739; Wed, 27 Jul 2022 10:35:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658943308; cv=none; d=google.com; s=arc-20160816; b=bkixBqF6g5ECXXmgWaSny3XpLLGMU1g4A8LmL6nRfmBaSfiz9sncaerhDU8nEftddQ BS/bnqcJE6S7uppwzNSkAIct8xdu5tVIN8mF4gl8jKrkvGC0HIK9JrofuQaKtobUipZS 2RlFnMOhA58fJ746tDjIFQtmLpbyjiAw++JiHYgu/Fo7bRVd/LqFkRSM0Gm1h8XKYzM2 2A32Hx17gbJKDxz7QKmMmKSOkk7Xo86iY2z09cJk1i9hWIfQFbXaHKMt+4nhOpiejWZS PDBfqX41oPJk1lrMVE/sDwGSOELRbtPgy40VtWtPziW1503nx4dGmXlPYG0nslhwYfqz +PgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:content-language:accept-language :message-id:date:thread-index:thread-topic:to:from:dkim-signature :delivered-to; bh=GfQaS1sTRRv8tEOrL4YuRh/Ggv/Z1AwjxCaw3r2QzSQ=; b=GUPwqRUiK4iPegitQ09JYmt+DlwYu2wNhsHqPuDkbBgKZtptygMHId0QV8tih8UCIv 78E35558cxfSXhBshL/3Mvl9CbyXJKC+Ex/ZVoeejBj1i2pp/avSvOB63qcZUWmzBU/L hGchy6cFajwnkTCFMXZulSFpt4HAhOFPFv/fnzq8VSIJS2aPZl5ulCT0+9NtMGJzl8eQ ZvEMSjPXXxmjcH7yVZnkzbiIxM7ynPsT0re6IKOoDW9GUwfRODAZ8bGsPrYgKCMPd9x3 YzYFUROmDQ5tk7xu97uQgFV1moIg8t1uZpMcKWF1DgaVCzC3J8NfepuZlMRdvYtvUHQ5 BTXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=UjCeKLkD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j21-20020a056402239500b0043bdf9b0ff5si11443835eda.443.2022.07.27.10.35.07; Wed, 27 Jul 2022 10:35:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=UjCeKLkD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CB30D68B8DC; Wed, 27 Jul 2022 20:35:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DC9DC68B830 for ; Wed, 27 Jul 2022 20:34:57 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1658943304; x=1690479304; h=from:to:subject:date:message-id: content-transfer-encoding:mime-version; bh=QhHD0JXHikHvxz7YugNgQS8G7bp2D8s4YP84M6WpkSM=; b=UjCeKLkDyFTmbEhndlANs04sY4XTqRqDuMVN1qw/sy/01yPZKfafOTXO POPwi0gP2FNT+rcSwH5T2C7TtOiafXqyIEqk0eX71HmthnEaG1RBjVaP4 usfdn6o8Br6Z/3eBD7zniG/dgRKhcmq52OF6EJGkDnCFPGjvo/vVfl9qh M=; X-IronPort-AV: E=Sophos;i="5.93,196,1654560000"; d="scan'208";a="112944058" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-iad-1a-b27d4a00.us-east-1.amazon.com) ([10.25.36.210]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 17:34:55 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-iad-1a-b27d4a00.us-east-1.amazon.com (Postfix) with ESMTPS id 9536782663 for ; Wed, 27 Jul 2022 17:34:53 +0000 (UTC) Received: from EX19D007UWB001.ant.amazon.com (10.13.138.75) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Wed, 27 Jul 2022 17:34:52 +0000 Received: from EX19D007UWB001.ant.amazon.com (10.13.138.75) by EX19D007UWB001.ant.amazon.com (10.13.138.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.9; Wed, 27 Jul 2022 17:34:52 +0000 Received: from EX19D007UWB001.ant.amazon.com ([fe80::bcaa:e18f:a569:3851]) by EX19D007UWB001.ant.amazon.com ([fe80::bcaa:e18f:a569:3851%6]) with mapi id 15.02.1118.009; Wed, 27 Jul 2022 17:34:52 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH] enable auto vectorization for gcc 7 and higher Thread-Index: Adih3iCGbsai8yFERGe52y5L0WuErA== Date: Wed, 27 Jul 2022 17:34:52 +0000 Message-ID: <05a46152f1b2458ea326edd9cfb6d817@amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.43.161.113] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] enable auto vectorization for gcc 7 and higher X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Xg/DyjMKp1Tl I recognize that this patch is going to be somewhat controversial. I'm submitting it mostly to see what the opinions are and evaluate options. I am working on improving performance for aarch64. On that architecture, there are fewer hand written assembly implementations of hot functions than there are for x86_64 and allowing gcc to auto-vectorize yields noticeable improvements. Gcc vectorization has improved recently and it hasn't been evaluated on the mailing list for a few years. This is the latest discussion I found in my searches: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/193977.html If the community is not comfortable accepting a patch like this outright, would you be willing to accept a new option to the configure script, something like --enable-auto-vectorization? Thanks! Signed-off-by: Jonathan Swinney --- configure | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 6629d14099..c63c9348ad 100755 --- a/configure +++ b/configure @@ -7173,7 +7173,9 @@ if enabled icc; then disable aligned_stack fi elif enabled gcc; then - check_optflags -fno-tree-vectorize + case $gcc_basever in + 2|2.*|3.*|4.*|5.*|6.*) check_optflags -fno-tree-vectorize ;; + esac check_cflags -Werror=format-security check_cflags -Werror=implicit-function-declaration check_cflags -Werror=missing-prototypes