From patchwork Thu Jan 18 23:06:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Henrik Gramner X-Patchwork-Id: 7338 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.156.27 with SMTP id q27csp115003jak; Thu, 18 Jan 2018 15:14:45 -0800 (PST) X-Google-Smtp-Source: ACJfBosE/mWRKzBORepMcfEiYU2kYFa52dvWmgNEGa3e96dQWiwfnOUT1lEgdG3jVhe7Zr0XlqNZ X-Received: by 10.28.182.86 with SMTP id g83mr6253436wmf.75.1516317285791; Thu, 18 Jan 2018 15:14:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516317285; cv=none; d=google.com; s=arc-20160816; b=Du7Toq+KSXWX95Cp/UlsHqWcgHb4vnihio2Rq+CSjora5XV8/mlUrNyeQj8VDiz6pt NmLH1D6TZRKjaG/WMnq1ZRT6mju/hovQ2mjIU4y/S1HgmelGaUd/rUSC0pYcrXqNrNJQ 1HEpaz/dS4nbJdw8KhgDKTWGZlwxuQA1XRt9Lwz7IL9ABMxI+u6xG3jhpMQyP2Bgw9R1 9W3CZ3mVjT2vGA96R5XCYeGD/Wq0qMWYkdlrBCC49/aBLLV9MfQSGBfaxESLduuZFJZO UwJkWEhKI56g5OCH2Ex4LsWr7uADv5rDP4rbP5M0OSTIqRB0M5BJV8NGG34nxWXNswBh 6P5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=cch0VyK0HNC2Lk6rllTK0RQSaG1UsfQlbovSUhIgrlc=; b=DDzhp0mXg0dSzyyJtZlm27HnKazr1LDL7N2SQGd04eJ0EYZRQdAYQQ/tEIpf1g3w5p cgVT1+oZoFe0xJRV6WVp56HOtdtR3CyWaMaimIJ6Ty4gn4tHEvhztxQ1RXldq/uiZ33B 7LyG+7KoYo4jDYY33EBAtQ9YN2jtL2wDjzQZTCOc5edW6otwE7qFwpw9s53HN318P9oK WvJA1vEIwc6lPBk0YsoyIgdKUnAy+qQhFqFqItRsNqN0EprBtale0arYKKF/R/f7Fdm2 p0rO0iomPwNmsF8d9te81ua1cJrFciY+7RyWOrAiCBIexnbL6YYgn/nLvHyw0fS93bpY AWKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gramner-com.20150623.gappssmtp.com header.s=20150623 header.b=TLHKLWNl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 17si5911711wmq.251.2018.01.18.15.14.45; Thu, 18 Jan 2018 15:14:45 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gramner-com.20150623.gappssmtp.com header.s=20150623 header.b=TLHKLWNl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 57D4068A316; Fri, 19 Jan 2018 01:14:44 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7435568A2AC for ; Fri, 19 Jan 2018 01:14:38 +0200 (EET) Received: by mail-lf0-f65.google.com with SMTP id t139so13990804lff.0 for ; Thu, 18 Jan 2018 15:14:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gramner-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=dK3N22i7KBiAKpibjIwZYHJl8DFVfcHgf0i9Dg6cP9c=; b=TLHKLWNleZrs14FYTMFZdbigiUtxXMXnAXrNwLxt9uEkOzcPIu9y2C+MpOtuf7UhFn +HRlhKvzpq6NzjoBjgx1jjGT0MwGR5+NzfXhyQjkod54NBHLFWzyHgm6c9xdeK7+qyz/ eb2pS2ks943FnzMHxd8OJUaNg9gOmo2kFljPdJSxIu7yXK0AL1xRaaMgDirxR6ylRaGv wTV6N9YxErMA7YPZ3xa7QTraIyiEWl7MW7oo+0Sibl1CfdrydMx20xdbPbP1Mq/AJYCV sqmTcXRSgQu2bzYZkMTGj4jdjTtwGE2vLbhSRnq6UW1je1b7MFWcMr6awGpZsX5lLs4a R4hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=dK3N22i7KBiAKpibjIwZYHJl8DFVfcHgf0i9Dg6cP9c=; b=Jv9FKQ7vir9jmEYmzgtZTTqYQAOdy8PsX4G7BGl4JWXkj48B0lmeMujcluGEr3RTMM 29oV5NqGfkFROPxalp96q4NAIjiTqAaqlNeS76QB2FI50jFAcpPNf0vQnYJ+G5zHmDTs lC9AhPYzv/AWdxJlDXH/7cO4lT1w6QL/bzgBR2++Nq5AjgreYBhfMqsm/qPBCuJUZ54Z MpI49msfcDxSi2opB9NIcvn5HH0zvyDSTUd1t7LLVMaArHo87zeUz2UGf8SYeQF1/3qv GWaWO7YskN+DkG50HEi5KtYJh996BwH+0fN+W7xbuii671SYT/yUi0C9q7bU6GlUcL0O 3cJg== X-Gm-Message-State: AKwxytdvDrtJrwdZG+IjPuQnRGkI0w6bDgRsmrLbRNpTws92z0qN6cmx 0B2zfaMKDZuNsh74z8NVVGCGilvJL8k= X-Received: by 10.25.43.141 with SMTP id r135mr4852322lfr.48.1516316811989; Thu, 18 Jan 2018 15:06:51 -0800 (PST) Received: from localhost.localdomain (81-227-53-67-no27.tbcn.telia.com. [81.227.53.67]) by smtp.gmail.com with ESMTPSA id 76sm1420130lfq.7.2018.01.18.15.06.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2018 15:06:51 -0800 (PST) From: Henrik Gramner To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Jan 2018 00:06:14 +0100 Message-Id: <20180118230615.16966-5-henrik@gramner.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180118230615.16966-1-henrik@gramner.com> References: <20180118230615.16966-1-henrik@gramner.com> Subject: [FFmpeg-devel] [PATCH 4/5] x86inc: Correctly set mmreg variables X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavutil/x86/x86inc.asm | 87 ++++++++++++++++++++---------------------------- 1 file changed, 36 insertions(+), 51 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index de048f863d..438863042f 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@ ;***************************************************************************** ;* x86inc.asm: x264asm abstraction layer ;***************************************************************************** -;* Copyright (C) 2005-2017 x264 project +;* Copyright (C) 2005-2018 x264 project ;* ;* Authors: Loren Merritt ;* Henrik Gramner @@ -892,6 +892,36 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %undef %1%2 %endmacro +%macro DEFINE_MMREGS 1 ; mmtype + %assign %%prev_mmregs 0 + %ifdef num_mmregs + %assign %%prev_mmregs num_mmregs + %endif + + %assign num_mmregs 8 + %if ARCH_X86_64 && mmsize >= 16 + %assign num_mmregs 16 + %if cpuflag(avx512) || mmsize == 64 + %assign num_mmregs 32 + %endif + %endif + + %assign %%i 0 + %rep num_mmregs + CAT_XDEFINE m, %%i, %1 %+ %%i + CAT_XDEFINE nn%1, %%i, %%i + %assign %%i %%i+1 + %endrep + %if %%prev_mmregs > num_mmregs + %rep %%prev_mmregs - num_mmregs + CAT_UNDEF m, %%i + CAT_UNDEF nn %+ mmtype, %%i + %assign %%i %%i+1 + %endrep + %endif + %xdefine mmtype %1 +%endmacro + ; Prefer registers 16-31 over 0-15 to avoid having to use vzeroupper %macro AVX512_MM_PERMUTATION 0-1 0 ; start_reg %if ARCH_X86_64 && cpuflag(avx512) @@ -908,23 +938,12 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %assign avx_enabled 0 %define RESET_MM_PERMUTATION INIT_MMX %1 %define mmsize 8 - %define num_mmregs 8 %define mova movq %define movu movq %define movh movd %define movnta movntq - %assign %%i 0 - %rep 8 - CAT_XDEFINE m, %%i, mm %+ %%i - CAT_XDEFINE nnmm, %%i, %%i - %assign %%i %%i+1 - %endrep - %rep 24 - CAT_UNDEF m, %%i - CAT_UNDEF nnmm, %%i - %assign %%i %%i+1 - %endrep INIT_CPUFLAGS %1 + DEFINE_MMREGS mm %endmacro %macro INIT_XMM 0-1+ @@ -936,22 +955,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %define movh movq %define movnta movntdq INIT_CPUFLAGS %1 - %define num_mmregs 8 - %if ARCH_X86_64 - %define num_mmregs 16 - %if cpuflag(avx512) - %define num_mmregs 32 - %endif - %endif - %assign %%i 0 - %rep num_mmregs - CAT_XDEFINE m, %%i, xmm %+ %%i - CAT_XDEFINE nnxmm, %%i, %%i - %assign %%i %%i+1 - %endrep + DEFINE_MMREGS xmm %if WIN64 - ; Swap callee-saved registers with volatile registers - AVX512_MM_PERMUTATION 6 + AVX512_MM_PERMUTATION 6 ; Swap callee-saved registers with volatile registers %endif %endmacro @@ -964,19 +970,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %undef movh %define movnta movntdq INIT_CPUFLAGS %1 - %define num_mmregs 8 - %if ARCH_X86_64 - %define num_mmregs 16 - %if cpuflag(avx512) - %define num_mmregs 32 - %endif - %endif - %assign %%i 0 - %rep num_mmregs - CAT_XDEFINE m, %%i, ymm %+ %%i - CAT_XDEFINE nnymm, %%i, %%i - %assign %%i %%i+1 - %endrep + DEFINE_MMREGS ymm AVX512_MM_PERMUTATION %endmacro @@ -984,21 +978,12 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %assign avx_enabled 1 %define RESET_MM_PERMUTATION INIT_ZMM %1 %define mmsize 64 - %define num_mmregs 8 - %if ARCH_X86_64 - %define num_mmregs 32 - %endif %define mova movdqa %define movu movdqu %undef movh %define movnta movntdq - %assign %%i 0 - %rep num_mmregs - CAT_XDEFINE m, %%i, zmm %+ %%i - CAT_XDEFINE nnzmm, %%i, %%i - %assign %%i %%i+1 - %endrep INIT_CPUFLAGS %1 + DEFINE_MMREGS zmm AVX512_MM_PERMUTATION %endmacro