From patchwork Mon Aug 8 15:25:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Swinney, Jonathan" X-Patchwork-Id: 37190 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:92a9:b0:8d:7f3b:94fc with SMTP id q41csp1903658pzg; Mon, 8 Aug 2022 08:25:52 -0700 (PDT) X-Google-Smtp-Source: AA6agR75bsMi7E2XwWPQqasaASPcB0eSUvvyAYBDFeYCzBWOc9ICtRNcRwoiRoQqGRbnf8qDuTrf X-Received: by 2002:a17:906:9b8b:b0:730:8e70:f3b8 with SMTP id dd11-20020a1709069b8b00b007308e70f3b8mr13480573ejc.179.1659972352218; Mon, 08 Aug 2022 08:25:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659972352; cv=none; d=google.com; s=arc-20160816; b=Bs+3+qznT5MSAHHI0jI6Qrl27YQn87vcyMMXUJBBFbdaFppL7O/rnwGDcqby9uMUvO BQSVuf4WMbJifc1VLQZuOjBni8SHhlhdT8CMtbX9nMn8cYUE6wPkY65+0QcJb6PH4AC4 kw5WMKvtb8uJAIHXMdJpYg3xcseEWpfVFkX4UxFN8M4oMe+QPINlFmpZej2ozn7BM10B 15XARH4pt+uDvAkjUqxCfawudntI8wLhdSwr/XIbsmbfsYJheTlybmlr67BImSk/UczT vz69O/h4QZGFNq/vsa/Tlxu7xiQUecafVzVQtCIUrPuoVWaAnXJDIag7LXJK3MkZx5L1 EwcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:content-language:accept-language :message-id:date:thread-index:thread-topic:to:from:dkim-signature :delivered-to; bh=+8/z/oO6gXzXwA4RbzK7v+jBeOGF4a4CaQjVsJtuXHo=; b=GqgqUZaj0EH75lCOe4nYq5DEI2CMrs6JMlLXfeY1n7qAr3scXt4YIYZRKrt5ogzJwp p/SwtXNS+6gGzjMBZ/iTnWMbVY9PByO3Xm8VJ/4zjpuootTLPsA3CjhCyTRfh9FlWZb9 CoJzDflztCeiFqzDjF0Vey7lNY6785YDmGALgDP8HwgW0cmnTbs2bFJnvj3PIcxGgw+/ ar/pvDqa1z/40GMNq7Bt3ZlaKSfKGjxc5cr1Zm1LH9qmXOQUIWLOx18931uae9Td2D49 jMa3fqqhopLvUF73+l73nEAAr/SsfpZw/tM/5/+aIbsaLH0beFBCiAgDb3atGURUx1Zo x3Vw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=k0k2+QBP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id bc22-20020a056402205600b0043cfd7f7afdsi5521986edb.508.2022.08.08.08.25.51; Mon, 08 Aug 2022 08:25:52 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=k0k2+QBP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D05E368B734; Mon, 8 Aug 2022 18:25:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C45B668AFC2 for ; Mon, 8 Aug 2022 18:25:41 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1659972347; x=1691508347; h=from:to:subject:date:message-id: content-transfer-encoding:mime-version; bh=IVBMjDZqlWrwEmwaK3QpdNyPkflSvEzvGsa9OSJI9HE=; b=k0k2+QBPZxF1KHMrOxSl1s0MyXUQ3pxH1mHpS4cez88jIlCHCH5p52HU a1Zuc7ETtd0zLRWXVfsaJlmFXRg8K383b6gyN4TwO6ok/OqZ3zvh+fL2K T8J3RCEzuptj7Cxnsdray6kCz8Nf/HdN3S7FAcJDZbHY1yOtfAhyzSlqs c=; Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-iad-1d-9a235a16.us-east-1.amazon.com) ([10.43.8.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 15:25:40 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-iad-1d-9a235a16.us-east-1.amazon.com (Postfix) with ESMTPS id BE1FB8012D for ; Mon, 8 Aug 2022 15:25:37 +0000 (UTC) Received: from EX19D007UWB001.ant.amazon.com (10.13.138.75) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Mon, 8 Aug 2022 15:25:36 +0000 Received: from EX19D007UWB001.ant.amazon.com (10.13.138.75) by EX19D007UWB001.ant.amazon.com (10.13.138.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.9; Mon, 8 Aug 2022 15:25:36 +0000 Received: from EX19D007UWB001.ant.amazon.com ([fe80::bcaa:e18f:a569:3851]) by EX19D007UWB001.ant.amazon.com ([fe80::bcaa:e18f:a569:3851%6]) with mapi id 15.02.1118.009; Mon, 8 Aug 2022 15:25:36 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH v2] add a configure flag to enabled tree-vecorization with gcc Thread-Index: AdirOxsCLcOc98oYRXW/6S2Rl1FFDA== Date: Mon, 8 Aug 2022 15:25:36 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.43.161.236] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2] add a configure flag to enabled tree-vecorization with gcc X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: mlF8UgsVtkZp Recent version of gcc improve the automatic vectorization. This flag allows adventurous users to enable vectorization. Known problems with this are primarily related to inline assembly for x86 and so to address those, add a pragma to explicitly disable automatic vectorization for those files. Signed-off-by: Jonathan Swinney --- Thank you considering this patch. I believe this addresses the primary concerns that were raised by my previous submission. There may be more files which require the pragma add `-fno-tree-vectorize`, and I welcome suggestions. This should strike a compromise, allowing some users to enable vectorization while not breaking mainstream builds. This should give time to work out additional problems if they arise before enabling vectorization more broadly. --- configure | 7 ++++++- libavcodec/x86/cabac.h | 4 ++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/configure b/configure index cbbb4dd9c8..8e842da1b8 100755 --- a/configure +++ b/configure @@ -110,6 +110,7 @@ Configuration options: --disable-swscale-alpha disable alpha channel support in swscale --disable-all disable building components, libraries and programs --disable-autodetect disable automatically detected external libraries [no] + --enable-auto-vectorization enable compiler auto vectorization Program options: --disable-programs do not build command line programs @@ -1945,6 +1946,7 @@ FEATURE_LIST=" small static swscale_alpha + auto_vectorization " # this list should be kept in linking order @@ -7176,7 +7178,9 @@ if enabled icc; then disable aligned_stack fi elif enabled gcc; then - check_optflags -fno-tree-vectorize + if disabled auto_vectorization; then + check_optflags -fno-tree-vectorize + fi check_cflags -Werror=format-security check_cflags -Werror=implicit-function-declaration check_cflags -Werror=missing-prototypes @@ -7569,6 +7573,7 @@ echo "pod2man enabled ${pod2man-no}" echo "makeinfo enabled ${makeinfo-no}" echo "makeinfo supports HTML ${makeinfo_html-no}" echo "xmllint enabled ${xmllint-no}" +echo "auto-vectorization ${auto_vectorization-no}" test -n "$random_seed" && echo "random seed ${random_seed}" echo diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h index b046a56a6b..782e4cbda4 100644 --- a/libavcodec/x86/cabac.h +++ b/libavcodec/x86/cabac.h @@ -39,6 +39,10 @@ #if HAVE_INLINE_ASM +#ifdef __GNUC__ + __attribute__((optimize("-fno-tree-vectorize"))) +#endif + #ifndef UNCHECKED_BITSTREAM_READER #define UNCHECKED_BITSTREAM_READER !CONFIG_SAFE_BITSTREAM_READER #endif