From patchwork Fri Jul 16 13:44:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 28936 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5d:965a:0:0:0:0:0 with SMTP id d26csp1787628ios; Fri, 16 Jul 2021 06:45:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwmR1wQZLaIrSndevFAE3y3QduKWL+CTt/WSxf/BFVVG6yGXjb11Qg3poVvv1dA3ZJivJwh X-Received: by 2002:a05:6402:3584:: with SMTP id y4mr11671485edc.218.1626443112096; Fri, 16 Jul 2021 06:45:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626443112; cv=none; d=google.com; s=arc-20160816; b=MTkolSyeXZ1jXqUH2KsXcKL2aOAAxj3Y6+dL1GNOZ4uA+AZ1nOnxOpU8MnlgV8p89C ytbohlPHJTaVzuV/iMSOTMJYgTIywSyOFkwGR6mVniur56DN0yLREqiXXP0ZQoBofYMT FSBVPocPQgU4N3Jus3k8oMwZ8ib1qZQHjVCbl44KWO98Hkdv+dmga31Y1eWBybXo82Es R9vGiGtZF5Zmu7u64JvAvCyeu+shIEyMC5SufLqjgZZGo5u+yHHXK1GKdHzO5wPMxnWA UQa3Q6GNOkq86R6HUeT07lLO7Pfv5WMm4lcfPnFzTZSUjZETMm/koeCnmzVM4ZqBzmgE lR1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:references:mime-version :message-id:in-reply-to:date:dkim-signature:delivered-to; bh=U7Ntlm8LtnRgmoSqsoG+8iZC1MhbA5MQ0tkaeCjpdFg=; b=PDXaJdHeE/eG4pvBtBMh/2hY9rOCE01IuQe8DqI5VM56yU++KbHvWMy6fjtZHfvKGj dS//I/i/0nCctsAtkbfNnq9jbcVqetZlXyMXMjk89Ik71UTmlhJT5yqU5gCW3Og5RqK2 uD0H7o8bbYGqh7ugRXaC3uMHLF1fThcKZtQKHFJVUuEU+SkEuukb81nzGOoSDfrRs97T pFO5b/Uqc/OHOhnHrtasrmlQUBUXCnxr6MU4ylRih5CuVF8gBZsmtMP4xLTDajHLy+6I LKA5sTrMj+MPwKicygb96HYyPX/ps05vvsGxUCwRWXDfADBZiN6mVV0q9CEwETtPrL7+ 1Q3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20161025 header.b=E03V0mdX; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id lh10si3971605ejb.419.2021.07.16.06.45.11; Fri, 16 Jul 2021 06:45:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20161025 header.b=E03V0mdX; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3C0E368A5D0; Fri, 16 Jul 2021 16:45:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f74.google.com (mail-qv1-f74.google.com [209.85.219.74]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A8788688095 for ; Fri, 16 Jul 2021 16:45:00 +0300 (EEST) Received: by mail-qv1-f74.google.com with SMTP id u8-20020a0562141c08b02902e82df307f0so6682274qvc.4 for ; Fri, 16 Jul 2021 06:45:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8O9sxc8B9eqpCYE9kPJM//uzteqtKFxY7KejAVA3+V4=; b=E03V0mdXDidnl3KZiolV/myXuum3laTU0rsl0zveww8PI/OacVf2MW3iezw6A4GYZg f53KKoTKw4/b5CoAogGIxnaiLBVW0U9oqwJjy/4bvIqRB/V4uPY+WgezyWQO/mSIss+g OssQscqhXSOZyvV/yZXcgbNli35D3bAbwkyH9y2eFZobr/ECexQ+d0+4s+MD+QnUysnc RyTTIeZuDLkxBIZRkotsZ3kSbcldtmY/0l0WNlWUSXNyEuRAesp4Ic8QEuyCeRXBKzD6 ZhwIuzSHZmGiyBIp3sMeCW24x9W87FJv9mr8GqWv9VV97ezAG+FzL29FtJTO55kcIqPE Cdlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8O9sxc8B9eqpCYE9kPJM//uzteqtKFxY7KejAVA3+V4=; b=Y96fTtexLgfuiVIPKfkLjNxvWFY7PWebvRdbvsC2ZoNof4Ufj+jy+OL2IGMC5Cpsz2 k2eee5B09cw4bt7MFMgPho3O6J4z5uk661c2pPRSuCHo7PXH/3YSdbVDNoKrbKYiaBYw Dbu1dCBggqO/avvr4VqB8gqT0/9ug1Bn65AiGbJA7wJJ+niW8mDO3Zg3w/DAf2jHzCh1 60yL+8Qfb5STPk1ZVaY68SaHRxZ36unZM0AFCs+CLMhfD8JoAFeCFqjjbsq/5trX4Ci/ neg59GgoXvcCf0vyqtuYGzRbqeQAT2ca8PUE0d1DTsANI6KegXmurUfmdMkfIBWbLKG8 7cmg== X-Gm-Message-State: AOAM532nHoHz8lCr1nAk/1D6Hx2AyShUXJYkEzmIR3STUUMAjloacoia 5hCwjysCtlu5CmmgGkg310YGUm3mk4QUQjlt0kbBXVYmm4cF1lIsrUhBXIhDFsHLvo9ebf7jDE9 c244X/z3qqSTlFLen1h7CKjFLnFI6EVkPhfYVm9kVUrXJBE+U+759Xt8+sFA5Ff/fSlqynN4= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:20d5:d318:a70:504f]) (user=alankelly job=sendgmr) by 2002:a05:6214:da1:: with SMTP id h1mr10268184qvh.53.1626443098869; Fri, 16 Jul 2021 06:44:58 -0700 (PDT) Date: Fri, 16 Jul 2021 15:44:53 +0200 In-Reply-To: Message-Id: <20210716134453.1126957-1-alankelly@google.com> Mime-Version: 1.0 References: X-Mailer: git-send-email 2.32.0.402.g57bb445576-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: IGyYgBNSC0ym Broadwell and later and Zen3 and later have fast gather instructions. --- Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the email thread. libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 11 ++++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/libavutil/cpu.h b/libavutil/cpu.h index c069076439..ec3073d021 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -113,6 +113,7 @@ void av_force_cpu_count(int count); * av_set_cpu_flags_mask(), then this function will behave as if AVX is not * present. */ + size_t av_cpu_max_align(void); #endif /* AVUTIL_CPU_H */ diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index bcd41a50a2..158e2170c4 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void) if (max_std_level >= 7) { cpuid(7, eax, ebx, ecx, edx); #if HAVE_AVX2 - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){ rval |= AV_CPU_FLAG_AVX2; + + cpuid(1, eax, ebx, ecx, std_caps); + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); + // Haswell and earlier has slow gather + if(family == 6 && model < 70) + rval |= AV_CPU_FLAG_AVXSLOW; + } + #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000)