From patchwork Mon Dec 20 13:56:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 32754 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp4380212iog; Mon, 20 Dec 2021 05:56:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJwlh/4WwoN8+0H5738pYYc2c6bGRCa+4Vu8X2ua3arxdqiZEE7qZiwMuox5uen5xHxOn6au X-Received: by 2002:a17:907:3ea8:: with SMTP id hs40mr3241410ejc.353.1640008608309; Mon, 20 Dec 2021 05:56:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1640008608; cv=none; d=google.com; s=arc-20160816; b=xfPrMyZnBBl+DUkJBZWTV6yJZyM47IOIACB39j56Q+1Fnme57YbCoGYEg/rjKO4Mjl IlF1V6L2Vv0YUe/FsmUWny+IdJl3secHBqluJVi0i/6fhm23MFC9v3HDxSH6B3rjfhK8 DQQKTuD9dINcIWu36kvy96Ul4d/MhOR+7YNX69WSADChn1I9BHndiAjFRgcAQg2RcXHD TD4TtLCNHGxhSio/Az0poOiZJZNKqib702qnYZYEpjGkfVqHXurFgHDtEixLHMnKzWJd 0SDxsYpM4CGTczw31NQaSWBjKcs/DKDivwvtJTCBVzJYoE5yzdzVAczUCYLrXF8VFkTi PlrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:mime-version:message-id:date :dkim-signature:delivered-to; bh=lrG96GIk68npzqt6iGkbqjFXhx8w7WjpHpuIrhgn/pY=; b=0cpwpnfit0fcZ1L7dZIgI/Lwt4AEi6TaCa8SiPe+7HPm1H351g+Go4N3Xm5MlciAiX ivBcA/elY89jgFgprHXk4DWkeq+PhmEkgGOQH06aLDsFwoxVpH3otgR9ukQzNs7nmcNK 8qwDK2QCdz+LYrYC0IP0dGLkBHhfWTW9p/RRD6Gsg5XpCLfUs8nTfFjLB1YDEwMQP41U p5VqV1hLidIr/NlIEeHiMjGVQSlljIolR43YNqpuo4VpQoHHMsdJ6bE6XumG9CQwf9X+ qbdXl0J8o0R5AkkSxXgkJC0MD57L5JQd8VzuYMKuv84Xrr92YpFkGDzHmXYKedyf5kcB M5DA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=AUK5cpxH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ga34si3963923ejc.155.2021.12.20.05.56.47; Mon, 20 Dec 2021 05:56:48 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=AUK5cpxH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2E82468AE9B; Mon, 20 Dec 2021 15:56:45 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6102368AE3C for ; Mon, 20 Dec 2021 15:56:39 +0200 (EET) Received: by mail-wm1-f73.google.com with SMTP id i15-20020a05600c354f00b0034566ac865bso3383625wmq.6 for ; Mon, 20 Dec 2021 05:56:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=3eX4wJRLIme9fOvcbR85HFX7CbULzgmbEjE4rDewLvI=; b=AUK5cpxHk/ODnEmJJo2xnfWkOglBoQ+jNKIPAiGa11oQTkG9YTc7yz8cuJugg0n5Z+ yyJNDN7+RQdo0UHXkREwZS8KZ8PckpoGG9I9GtajlQXVQViX8DMBR/ZcU33Xd89eAyAx JWD4Fl9owZSle4XY/wscmwADCkhc5YQbMhp0CeUxSOrl9HDd6euIWkBSBANEyAX6pg96 h0YnuRoOgK4rZL2UQPDAcdUwCeL3rUQxd0hyNluG4dn1wH4uACvi/+8NqVCVmOxXm/4/ opliZ+jNJEb/FsFeEJxiwg0WpQHW7dINRXxqXUb7ZLZopcJEK9b7lkhhkSSXMAJzjW8G i3Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=3eX4wJRLIme9fOvcbR85HFX7CbULzgmbEjE4rDewLvI=; b=DJEYf9QF1fC/a8zmpMg3F2s8gDJ6/9JQFjEtAzY6CRimg4D9UNeEejhwA8F0cFyV8b hPPYBWtS/ASI2gjx82OJA+hovQ/N7/e4w/WhfJb7Q2dGNMW7/dRVkKJvXZSGDjina6lj 1j16z61ZLo3afInq+GWPfDL2SsUUuTKX6O2sm3MvKAs2/rKmSsShs0imKuXP4F6AODp6 lIAZdYYgswrvI1k7jTbh9UcJCSDO6nzECGWvrlvjwz2HO6fzfc2JmTVWwJrQnXwS4C5Q 5U+VMfOSBOj9HMHWgMJQPoxvYkHHZ2fSjr/1w9ObuRsrUF0oSq48bF6XKoOX3AOeHP9v I65w== X-Gm-Message-State: AOAM530A45rOuWpDGLruoI74lyJlrXkmcNKDB+sshVblVvD3wga++gZM 74kP/N/V/vV1QM9O2ZBX0A6YiE33APoBUgnmz79h1Ur6LXnz0t1PzzBUQ7AaG7fn96+UGOgjuYz C/UCFJ9aF7sZRijr6Stj4zKgzt3MV49h2MNXSfjwEze6tnrqUxst7iJD0Ja7TQClBxjzubpY= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:922d:7ddd:85f5:5a25]) (user=alankelly job=sendgmr) by 2002:a05:6000:1688:: with SMTP id y8mr2302218wrd.682.1640008598372; Mon, 20 Dec 2021 05:56:38 -0800 (PST) Date: Mon, 20 Dec 2021 14:56:27 +0100 Message-Id: <20211220135627.615097-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.34.1.173.g76aa8bc2d0-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MQzcC27DgjxF This flag is set on Haswell and earlier and all AMD cpus. --- As discussed on IRC last week. libavutil/cpu.h | 57 +++++++++++++++++++++++---------------------- libavutil/x86/cpu.c | 13 ++++++++++- 2 files changed, 41 insertions(+), 29 deletions(-) diff --git a/libavutil/cpu.h b/libavutil/cpu.h index ae443eccad..4272d11d73 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -26,34 +26,35 @@ #define AV_CPU_FLAG_FORCE 0x80000000 /* force usage of selected flags (OR) */ /* lower 16 bits - CPU features */ -#define AV_CPU_FLAG_MMX 0x0001 ///< standard MMX -#define AV_CPU_FLAG_MMXEXT 0x0002 ///< SSE integer functions or AMD MMX ext -#define AV_CPU_FLAG_MMX2 0x0002 ///< SSE integer functions or AMD MMX ext -#define AV_CPU_FLAG_3DNOW 0x0004 ///< AMD 3DNOW -#define AV_CPU_FLAG_SSE 0x0008 ///< SSE functions -#define AV_CPU_FLAG_SSE2 0x0010 ///< PIV SSE2 functions -#define AV_CPU_FLAG_SSE2SLOW 0x40000000 ///< SSE2 supported, but usually not faster - ///< than regular MMX/SSE (e.g. Core1) -#define AV_CPU_FLAG_3DNOWEXT 0x0020 ///< AMD 3DNowExt -#define AV_CPU_FLAG_SSE3 0x0040 ///< Prescott SSE3 functions -#define AV_CPU_FLAG_SSE3SLOW 0x20000000 ///< SSE3 supported, but usually not faster - ///< than regular MMX/SSE (e.g. Core1) -#define AV_CPU_FLAG_SSSE3 0x0080 ///< Conroe SSSE3 functions -#define AV_CPU_FLAG_SSSE3SLOW 0x4000000 ///< SSSE3 supported, but usually not faster -#define AV_CPU_FLAG_ATOM 0x10000000 ///< Atom processor, some SSSE3 instructions are slower -#define AV_CPU_FLAG_SSE4 0x0100 ///< Penryn SSE4.1 functions -#define AV_CPU_FLAG_SSE42 0x0200 ///< Nehalem SSE4.2 functions -#define AV_CPU_FLAG_AESNI 0x80000 ///< Advanced Encryption Standard functions -#define AV_CPU_FLAG_AVX 0x4000 ///< AVX functions: requires OS support even if YMM registers aren't used -#define AV_CPU_FLAG_AVXSLOW 0x8000000 ///< AVX supported, but slow when using YMM registers (e.g. Bulldozer) -#define AV_CPU_FLAG_XOP 0x0400 ///< Bulldozer XOP functions -#define AV_CPU_FLAG_FMA4 0x0800 ///< Bulldozer FMA4 functions -#define AV_CPU_FLAG_CMOV 0x1000 ///< supports cmov instruction -#define AV_CPU_FLAG_AVX2 0x8000 ///< AVX2 functions: requires OS support even if YMM registers aren't used -#define AV_CPU_FLAG_FMA3 0x10000 ///< Haswell FMA3 functions -#define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 -#define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 -#define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used +#define AV_CPU_FLAG_MMX 0x0001 ///< standard MMX +#define AV_CPU_FLAG_MMXEXT 0x0002 ///< SSE integer functions or AMD MMX ext +#define AV_CPU_FLAG_MMX2 0x0002 ///< SSE integer functions or AMD MMX ext +#define AV_CPU_FLAG_3DNOW 0x0004 ///< AMD 3DNOW +#define AV_CPU_FLAG_SSE 0x0008 ///< SSE functions +#define AV_CPU_FLAG_SSE2 0x0010 ///< PIV SSE2 functions +#define AV_CPU_FLAG_SSE2SLOW 0x40000000 ///< SSE2 supported, but usually not faster + ///< than regular MMX/SSE (e.g. Core1) +#define AV_CPU_FLAG_3DNOWEXT 0x0020 ///< AMD 3DNowExt +#define AV_CPU_FLAG_SSE3 0x0040 ///< Prescott SSE3 functions +#define AV_CPU_FLAG_SSE3SLOW 0x20000000 ///< SSE3 supported, but usually not faster + ///< than regular MMX/SSE (e.g. Core1) +#define AV_CPU_FLAG_SSSE3 0x0080 ///< Conroe SSSE3 functions +#define AV_CPU_FLAG_SSSE3SLOW 0x4000000 ///< SSSE3 supported, but usually not faster +#define AV_CPU_FLAG_ATOM 0x10000000 ///< Atom processor, some SSSE3 instructions are slower +#define AV_CPU_FLAG_SSE4 0x0100 ///< Penryn SSE4.1 functions +#define AV_CPU_FLAG_SSE42 0x0200 ///< Nehalem SSE4.2 functions +#define AV_CPU_FLAG_AESNI 0x80000 ///< Advanced Encryption Standard functions +#define AV_CPU_FLAG_AVX 0x4000 ///< AVX functions: requires OS support even if YMM registers aren't used +#define AV_CPU_FLAG_AVXSLOW 0x8000000 ///< AVX supported, but slow when using YMM registers (e.g. Bulldozer) +#define AV_CPU_FLAG_XOP 0x0400 ///< Bulldozer XOP functions +#define AV_CPU_FLAG_FMA4 0x0800 ///< Bulldozer FMA4 functions +#define AV_CPU_FLAG_CMOV 0x1000 ///< supports cmov instruction +#define AV_CPU_FLAG_AVX2 0x8000 ///< AVX2 functions: requires OS support even if YMM registers aren't used +#define AV_CPU_FLAG_FMA3 0x10000 ///< Haswell FMA3 functions +#define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 +#define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 +#define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used +#define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard #define AV_CPU_FLAG_VSX 0x0002 ///< ISA 2.06 diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index bcd41a50a2..5770ecec72 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -146,8 +146,16 @@ int ff_get_cpu_flags_x86(void) if (max_std_level >= 7) { cpuid(7, eax, ebx, ecx, edx); #if HAVE_AVX2 - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) { rval |= AV_CPU_FLAG_AVX2; + cpuid(1, eax, ebx, ecx, std_caps); + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); + /* Haswell and earlier has slow gather */ + if(family == 6 && model < 70) + rval |= AV_CPU_FLAG_SLOW_GATHER; + } + #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) @@ -196,6 +204,9 @@ int ff_get_cpu_flags_x86(void) used unless explicitly disabled by checking AV_CPU_FLAG_AVXSLOW. */ if ((family == 0x15 || family == 0x16) && (rval & AV_CPU_FLAG_AVX)) rval |= AV_CPU_FLAG_AVXSLOW; + + /* AMD cpus have slow gather */ + rval |= AV_CPU_FLAG_SLOW_GATHER; } /* XOP and FMA4 use the AVX instruction coding scheme, so they can't be