From patchwork Mon Dec 20 13:56:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 32754 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp4380212iog; Mon, 20 Dec 2021 05:56:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJwlh/4WwoN8+0H5738pYYc2c6bGRCa+4Vu8X2ua3arxdqiZEE7qZiwMuox5uen5xHxOn6au X-Received: by 2002:a17:907:3ea8:: with SMTP id hs40mr3241410ejc.353.1640008608309; Mon, 20 Dec 2021 05:56:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1640008608; cv=none; d=google.com; s=arc-20160816; b=xfPrMyZnBBl+DUkJBZWTV6yJZyM47IOIACB39j56Q+1Fnme57YbCoGYEg/rjKO4Mjl IlF1V6L2Vv0YUe/FsmUWny+IdJl3secHBqluJVi0i/6fhm23MFC9v3HDxSH6B3rjfhK8 DQQKTuD9dINcIWu36kvy96Ul4d/MhOR+7YNX69WSADChn1I9BHndiAjFRgcAQg2RcXHD TD4TtLCNHGxhSio/Az0poOiZJZNKqib702qnYZYEpjGkfVqHXurFgHDtEixLHMnKzWJd 0SDxsYpM4CGTczw31NQaSWBjKcs/DKDivwvtJTCBVzJYoE5yzdzVAczUCYLrXF8VFkTi PlrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:mime-version:message-id:date :dkim-signature:delivered-to; bh=lrG96GIk68npzqt6iGkbqjFXhx8w7WjpHpuIrhgn/pY=; b=0cpwpnfit0fcZ1L7dZIgI/Lwt4AEi6TaCa8SiPe+7HPm1H351g+Go4N3Xm5MlciAiX ivBcA/elY89jgFgprHXk4DWkeq+PhmEkgGOQH06aLDsFwoxVpH3otgR9ukQzNs7nmcNK 8qwDK2QCdz+LYrYC0IP0dGLkBHhfWTW9p/RRD6Gsg5XpCLfUs8nTfFjLB1YDEwMQP41U p5VqV1hLidIr/NlIEeHiMjGVQSlljIolR43YNqpuo4VpQoHHMsdJ6bE6XumG9CQwf9X+ qbdXl0J8o0R5AkkSxXgkJC0MD57L5JQd8VzuYMKuv84Xrr92YpFkGDzHmXYKedyf5kcB M5DA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=AUK5cpxH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ga34si3963923ejc.155.2021.12.20.05.56.47; Mon, 20 Dec 2021 05:56:48 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=AUK5cpxH; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2E82468AE9B; Mon, 20 Dec 2021 15:56:45 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6102368AE3C for ; Mon, 20 Dec 2021 15:56:39 +0200 (EET) Received: by mail-wm1-f73.google.com with SMTP id i15-20020a05600c354f00b0034566ac865bso3383625wmq.6 for ; Mon, 20 Dec 2021 05:56:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=3eX4wJRLIme9fOvcbR85HFX7CbULzgmbEjE4rDewLvI=; b=AUK5cpxHk/ODnEmJJo2xnfWkOglBoQ+jNKIPAiGa11oQTkG9YTc7yz8cuJugg0n5Z+ yyJNDN7+RQdo0UHXkREwZS8KZ8PckpoGG9I9GtajlQXVQViX8DMBR/ZcU33Xd89eAyAx JWD4Fl9owZSle4XY/wscmwADCkhc5YQbMhp0CeUxSOrl9HDd6euIWkBSBANEyAX6pg96 h0YnuRoOgK4rZL2UQPDAcdUwCeL3rUQxd0hyNluG4dn1wH4uACvi/+8NqVCVmOxXm/4/ opliZ+jNJEb/FsFeEJxiwg0WpQHW7dINRXxqXUb7ZLZopcJEK9b7lkhhkSSXMAJzjW8G i3Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=3eX4wJRLIme9fOvcbR85HFX7CbULzgmbEjE4rDewLvI=; b=DJEYf9QF1fC/a8zmpMg3F2s8gDJ6/9JQFjEtAzY6CRimg4D9UNeEejhwA8F0cFyV8b hPPYBWtS/ASI2gjx82OJA+hovQ/N7/e4w/WhfJb7Q2dGNMW7/dRVkKJvXZSGDjina6lj 1j16z61ZLo3afInq+GWPfDL2SsUUuTKX6O2sm3MvKAs2/rKmSsShs0imKuXP4F6AODp6 lIAZdYYgswrvI1k7jTbh9UcJCSDO6nzECGWvrlvjwz2HO6fzfc2JmTVWwJrQnXwS4C5Q 5U+VMfOSBOj9HMHWgMJQPoxvYkHHZ2fSjr/1w9ObuRsrUF0oSq48bF6XKoOX3AOeHP9v I65w== X-Gm-Message-State: AOAM530A45rOuWpDGLruoI74lyJlrXkmcNKDB+sshVblVvD3wga++gZM 74kP/N/V/vV1QM9O2ZBX0A6YiE33APoBUgnmz79h1Ur6LXnz0t1PzzBUQ7AaG7fn96+UGOgjuYz C/UCFJ9aF7sZRijr6Stj4zKgzt3MV49h2MNXSfjwEze6tnrqUxst7iJD0Ja7TQClBxjzubpY= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:922d:7ddd:85f5:5a25]) (user=alankelly job=sendgmr) by 2002:a05:6000:1688:: with SMTP id y8mr2302218wrd.682.1640008598372; Mon, 20 Dec 2021 05:56:38 -0800 (PST) Date: Mon, 20 Dec 2021 14:56:27 +0100 Message-Id: <20211220135627.615097-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.34.1.173.g76aa8bc2d0-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MQzcC27DgjxF This flag is set on Haswell and earlier and all AMD cpus. --- As discussed on IRC last week. libavutil/cpu.h | 57 +++++++++++++++++++++++---------------------- libavutil/x86/cpu.c | 13 ++++++++++- 2 files changed, 41 insertions(+), 29 deletions(-) diff --git a/libavutil/cpu.h b/libavutil/cpu.h index ae443eccad..4272d11d73 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -26,34 +26,35 @@ #define AV_CPU_FLAG_FORCE 0x80000000 /* force usage of selected flags (OR) */ /* lower 16 bits - CPU features */ -#define AV_CPU_FLAG_MMX 0x0001 ///< standard MMX -#define AV_CPU_FLAG_MMXEXT 0x0002 ///< SSE integer functions or AMD MMX ext -#define AV_CPU_FLAG_MMX2 0x0002 ///< SSE integer functions or AMD MMX ext -#define AV_CPU_FLAG_3DNOW 0x0004 ///< AMD 3DNOW -#define AV_CPU_FLAG_SSE 0x0008 ///< SSE functions -#define AV_CPU_FLAG_SSE2 0x0010 ///< PIV SSE2 functions -#define AV_CPU_FLAG_SSE2SLOW 0x40000000 ///< SSE2 supported, but usually not faster - ///< than regular MMX/SSE (e.g. Core1) -#define AV_CPU_FLAG_3DNOWEXT 0x0020 ///< AMD 3DNowExt -#define AV_CPU_FLAG_SSE3 0x0040 ///< Prescott SSE3 functions -#define AV_CPU_FLAG_SSE3SLOW 0x20000000 ///< SSE3 supported, but usually not faster - ///< than regular MMX/SSE (e.g. Core1) -#define AV_CPU_FLAG_SSSE3 0x0080 ///< Conroe SSSE3 functions -#define AV_CPU_FLAG_SSSE3SLOW 0x4000000 ///< SSSE3 supported, but usually not faster -#define AV_CPU_FLAG_ATOM 0x10000000 ///< Atom processor, some SSSE3 instructions are slower -#define AV_CPU_FLAG_SSE4 0x0100 ///< Penryn SSE4.1 functions -#define AV_CPU_FLAG_SSE42 0x0200 ///< Nehalem SSE4.2 functions -#define AV_CPU_FLAG_AESNI 0x80000 ///< Advanced Encryption Standard functions -#define AV_CPU_FLAG_AVX 0x4000 ///< AVX functions: requires OS support even if YMM registers aren't used -#define AV_CPU_FLAG_AVXSLOW 0x8000000 ///< AVX supported, but slow when using YMM registers (e.g. Bulldozer) -#define AV_CPU_FLAG_XOP 0x0400 ///< Bulldozer XOP functions -#define AV_CPU_FLAG_FMA4 0x0800 ///< Bulldozer FMA4 functions -#define AV_CPU_FLAG_CMOV 0x1000 ///< supports cmov instruction -#define AV_CPU_FLAG_AVX2 0x8000 ///< AVX2 functions: requires OS support even if YMM registers aren't used -#define AV_CPU_FLAG_FMA3 0x10000 ///< Haswell FMA3 functions -#define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 -#define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 -#define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used +#define AV_CPU_FLAG_MMX 0x0001 ///< standard MMX +#define AV_CPU_FLAG_MMXEXT 0x0002 ///< SSE integer functions or AMD MMX ext +#define AV_CPU_FLAG_MMX2 0x0002 ///< SSE integer functions or AMD MMX ext +#define AV_CPU_FLAG_3DNOW 0x0004 ///< AMD 3DNOW +#define AV_CPU_FLAG_SSE 0x0008 ///< SSE functions +#define AV_CPU_FLAG_SSE2 0x0010 ///< PIV SSE2 functions +#define AV_CPU_FLAG_SSE2SLOW 0x40000000 ///< SSE2 supported, but usually not faster + ///< than regular MMX/SSE (e.g. Core1) +#define AV_CPU_FLAG_3DNOWEXT 0x0020 ///< AMD 3DNowExt +#define AV_CPU_FLAG_SSE3 0x0040 ///< Prescott SSE3 functions +#define AV_CPU_FLAG_SSE3SLOW 0x20000000 ///< SSE3 supported, but usually not faster + ///< than regular MMX/SSE (e.g. Core1) +#define AV_CPU_FLAG_SSSE3 0x0080 ///< Conroe SSSE3 functions +#define AV_CPU_FLAG_SSSE3SLOW 0x4000000 ///< SSSE3 supported, but usually not faster +#define AV_CPU_FLAG_ATOM 0x10000000 ///< Atom processor, some SSSE3 instructions are slower +#define AV_CPU_FLAG_SSE4 0x0100 ///< Penryn SSE4.1 functions +#define AV_CPU_FLAG_SSE42 0x0200 ///< Nehalem SSE4.2 functions +#define AV_CPU_FLAG_AESNI 0x80000 ///< Advanced Encryption Standard functions +#define AV_CPU_FLAG_AVX 0x4000 ///< AVX functions: requires OS support even if YMM registers aren't used +#define AV_CPU_FLAG_AVXSLOW 0x8000000 ///< AVX supported, but slow when using YMM registers (e.g. Bulldozer) +#define AV_CPU_FLAG_XOP 0x0400 ///< Bulldozer XOP functions +#define AV_CPU_FLAG_FMA4 0x0800 ///< Bulldozer FMA4 functions +#define AV_CPU_FLAG_CMOV 0x1000 ///< supports cmov instruction +#define AV_CPU_FLAG_AVX2 0x8000 ///< AVX2 functions: requires OS support even if YMM registers aren't used +#define AV_CPU_FLAG_FMA3 0x10000 ///< Haswell FMA3 functions +#define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 +#define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 +#define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used +#define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard #define AV_CPU_FLAG_VSX 0x0002 ///< ISA 2.06 diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index bcd41a50a2..5770ecec72 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -146,8 +146,16 @@ int ff_get_cpu_flags_x86(void) if (max_std_level >= 7) { cpuid(7, eax, ebx, ecx, edx); #if HAVE_AVX2 - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) { rval |= AV_CPU_FLAG_AVX2; + cpuid(1, eax, ebx, ecx, std_caps); + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); + /* Haswell and earlier has slow gather */ + if(family == 6 && model < 70) + rval |= AV_CPU_FLAG_SLOW_GATHER; + } + #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) @@ -196,6 +204,9 @@ int ff_get_cpu_flags_x86(void) used unless explicitly disabled by checking AV_CPU_FLAG_AVXSLOW. */ if ((family == 0x15 || family == 0x16) && (rval & AV_CPU_FLAG_AVX)) rval |= AV_CPU_FLAG_AVXSLOW; + + /* AMD cpus have slow gather */ + rval |= AV_CPU_FLAG_SLOW_GATHER; } /* XOP and FMA4 use the AVX instruction coding scheme, so they can't be From patchwork Mon Dec 20 13:57:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 32755 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp4380521iog; Mon, 20 Dec 2021 05:57:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJxnMd1pYCDoBri9pXq5pjEddQ9IBeh8NdrDvPFe+oVOoyMcBucfntBDvcb/DCc23v/uWLAv X-Received: by 2002:a05:6402:168b:: with SMTP id a11mr13426624edv.367.1640008633734; Mon, 20 Dec 2021 05:57:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1640008633; cv=none; d=google.com; s=arc-20160816; b=t8uXErntj7T3q/VyTpqFKJwVKil/0XXk+Hzb/nFB/B9ZI0fYP+kYY8e/2Fz/nox9fG C0Nzd2ZxrPex0eIHr0BbBw9PGF+QJnSG0CNhDeq6Y5GjfTOqSjyvVGeR4YF3JzKv+XSS QTgUtVfP1Ami6Rs80Q2hI6U80QZDkqJzCmLv4AlT2qL6r6CISYWIdpnHcxv9nenmqBZ+ 3KJzc68YR1YSHSyL21Q4meD429FIr3Hm4wBWNFLuUS4JCcVQNyp1sNGKlEk3d5zQd+q3 3I55oLa31qDn7L0jn/2VDCL7WWwMu+xnRf8hgTxavWO5jK/723q+wMgDLHzMlbH81RzN UPAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:mime-version:message-id:date :dkim-signature:delivered-to; bh=271CPIEvbim4jSgTox8qAMOjatCn0mVd5HppaA1h8jA=; b=IHtEm9119z9s7Pj7TYUn9tg+ZTA+YwxVwIodbNQJrNxHHQy22JBXEgTRvFVQpKdpyq dyH8HOUCLWUIejbdgctxPechjPrAEkR2OajHdRsK4egWKsd9WhY7B4TjzXQp4FCIR8kX p0TqUB1ukrNwED7eCXtRtVRLJReLG8+4ThpeSPlZd4g10SGClcDoj2UmtWaF40337n4j 2D66Vbs3tFL2iuaKpc2dpfeszYlfcGXG/JuUMFjzrbFAqqLqF/U4TtN6Ugs5FW39LF7m mqeoIL3gQX7lb7ARPbVhUSYiXkOZe2LKlAjQsPjaXOAdnxoLrZaKXo5fZLVG0oBk0Lq9 vn9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b="H/u2J7OV"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s22si12490880edd.54.2021.12.20.05.57.13; Mon, 20 Dec 2021 05:57:13 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b="H/u2J7OV"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AA00F68AF05; Mon, 20 Dec 2021 15:57:11 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 96BE668A516 for ; Mon, 20 Dec 2021 15:57:04 +0200 (EET) Received: by mail-wr1-f74.google.com with SMTP id x20-20020adfbb54000000b001a0d044e20fso3780337wrg.11 for ; Mon, 20 Dec 2021 05:57:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=/loom0B7xbjBoCDszX2o4my9AT1orT4k2fd4Y9GB0f0=; b=H/u2J7OVPgKWfIBRicfaN8PcTJdHAxKNPxag3oHYE8MjYBZHPhpF2tDTLf2ib0HR1o CYtiAnNRAsmSU4nLOijj3Sf6sE1yONRgqTmCkj1EZmzjjxe4C5s2EfO9X1DnVLJzWokZ uGGmQh+202OolZRvsRWwPY69PzW9meHEVK+CKODLt4ZyQ/kxUGDUxemr9e+ysaQE7o8v 1JmrKHA6chWAKw6xZQRz2Y1+STYkjkIa0a8eAbl8uRwsit6pdMZMPZ6e/TkAQ4/P2N3x LG3WHBXqQdGaDUhmz6bnkz8MR3oUIH+ZpyrQiO4scKkp8mBofUBw2HKQPmo8b/L1u4rg SA4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=/loom0B7xbjBoCDszX2o4my9AT1orT4k2fd4Y9GB0f0=; b=jLBGbmjBOMW401WWmmUkOqUuB0u+gdLxA5cOwqJZn1/MAIEn6wIwSoYezPf+U7QCqW 6euOMAsZd3evVkFqeOjY6o+YW6Qbs+DBWMnGr/D6VgqNGjkCzygEBQgAXJemSDemi9Vx HI0d07sQf90Mcj/4Kf5HMcWEfM1Pko7wY/rYg6CjpD649RpGd/dxF4RtI/Q7Il6C7bvl 9+poGjesZSBAsKEL8i4PhbEBFwqvYycM/DJwww47OZvlfl5J3x/XmTZzEo3SjhGz+uBY xAoyHn+AMLdo+s52XuCdVhEPpama28IF+Y21IDbkAkOwHbS99b4V3Q7WtmY1MntxiojO kmww== X-Gm-Message-State: AOAM5328o6BCNyQwhpU0SrCQhJQhD/U4TpVXlEMSluwUCWfxR3cKgJ9i A7rPDZXCXLwuaZnKsMMz3W7teThunheykqcFR8pFZsatb2R8QFjw0xNtS6KgIOlmCL0SNmr/hMJ 9T9sZ5iC4bGQd6v/kQLBPWglhaGcdLVsLqkSCn4HRfwXKwFFvXm1fl/dV4x+M5bAygbqh9P8= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:922d:7ddd:85f5:5a25]) (user=alankelly job=sendgmr) by 2002:a1c:20c2:: with SMTP id g185mr20836984wmg.115.1640008623890; Mon, 20 Dec 2021 05:57:03 -0800 (PST) Date: Mon, 20 Dec 2021 14:57:00 +0100 Message-Id: <20211220135700.615644-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.34.1.173.g76aa8bc2d0-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 2/2] libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: p6GMmeHwRQL/ This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster. --- libswscale/utils.c | 2 +- libswscale/x86/swscale.c | 2 +- tests/checkasm/sw_scale.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index d4a72d3ce1..9a69b45afe 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -282,7 +282,7 @@ void ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, int filterSiz #if ARCH_X86_64 int i, j, k, l; int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_AVX2_FAST(cpu_flags)){ + if (cpu_flags & AV_CPU_FLAG_SLOW_GATHER) { if ((c->srcBpc == 8) && (c->dstBpc <= 14)){ if (dstW % 16 == 0){ if (filter != NULL){ diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index c49a05c37b..eb5334a2be 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -578,7 +578,7 @@ switch(c->dstBpc){ \ break; \ } - if (EXTERNAL_AVX2_FAST(cpu_flags)) { + if (cpu_flags & AV_CPU_FLAG_SLOW_GATHER) { if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { if (c->chrDstW % 16 == 0) ASSIGN_AVX2_SCALE_FUNC(c->hcScale, c->hChrFilterSize); diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index f4912e6c2c..680562af08 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -217,7 +217,7 @@ static void check_hscale(void) } ff_sws_init_scale(ctx); memcpy(filterAvx2, filter, sizeof(uint16_t) * (SRC_PIXELS * MAX_FILTER_WIDTH + MAX_FILTER_WIDTH)); - if (cpu_flags & AV_CPU_FLAG_AVX2) + if (cpu_flags & AV_CPU_FLAG_SLOW_GATHER) ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, SRC_PIXELS); if (check_func(ctx->hcScale, "hscale_%d_to_%d_width%d", ctx->srcBpc, ctx->dstBpc + 1, width)) {