From patchwork Mon Aug 5 13:39:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 14248 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 92ABB4499ED for ; Mon, 5 Aug 2019 16:46:26 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7D9CF68AA53; Mon, 5 Aug 2019 16:46:26 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 605C068AA00 for ; Mon, 5 Aug 2019 16:46:20 +0300 (EEST) Received: by mail-ed1-f65.google.com with SMTP id k21so78718366edq.3 for ; Mon, 05 Aug 2019 06:46:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=obe-tv.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Km6N0fou0Dh87eaMB0TTjw76L9rnzo+lxZ3tzIfPO2o=; b=vC0tgqHMGJivancYCMQE5jBScKgbW8qx4x6GitosJitVZ3kPJXokNZpTt/+/8RsvPv M5AA2RvqzB+7Wfrp8aOlZSPUCroPcCHYg+HGgZml7dGkA8jrRjRXTAmZYagRuVzZCjAt K2OoKv3W0WXicXKi4/ym4wcCIFY0x1t6KPi/qbVF4g4/wnld1p4U415VVoo4okRL0AB4 RT3KSuSEgjrxg3DjHveKsvg3k+1qblb7p9YOPGzX4hTXKHlH54Kr1MdVWZ14JR/z1GRi 3FYpmG5TTBQ9gXvxcBwEHK3DrFqR8TfpoeM46H3NRpZu5JCEcnSN0kl/6ACAgw7pEz13 txhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Km6N0fou0Dh87eaMB0TTjw76L9rnzo+lxZ3tzIfPO2o=; b=YBZiMkr6MyyulqLPc9s6D8nYpiZpeeZeE6NOqO0rC5+IfjIsMMXTSY5oskkfbPuixa GkrghfNLYxWO/RXEJi46YH6fpaf0BQ5rlqfPAdSC/OQz1GZYMlvHUhSpNX+CSsRmWBOL 1cX0m8wYDw1paU5wLo4DqUjg9tvWobNCTi0esMyO3lCs2i30c6a+oWIQsyh7CA2b0Tpc fHSavgy9o6gE3bzILbWWn12G8rMWXZdd9GfbwvgFzLgxor0wS+y3+b+VzLnnaKib5Mfw BCVf5PfOkmflf8GAoI2ERcZJr8xUy2kZO1B8ERb4313IHTDwHO0zFOtBrYYghjprUAYf Rl8g== X-Gm-Message-State: APjAAAWXW1KdC/Q3zSaflEMP6BeQc5+/nrflJQNLp5btzMl2U6WJ5PLf 5K3u9eG43zEQ9gZcVLfrf1QM5WaFRJM= X-Google-Smtp-Source: APXvYqy2DGRgSoinRl0My6qvQJFmlLDiEfGsWdxPzAnrG3hvddIVpYsKJQKTho0/B8Taa2720tHz6g== X-Received: by 2002:a17:906:d78d:: with SMTP id pj13mr116209035ejb.301.1565012366181; Mon, 05 Aug 2019 06:39:26 -0700 (PDT) Received: from Ramuh.systemlords.lan (ptr-7sz70r2zkpm35z8cnil.18120a2.ip6.access.telenet.be. [2a02:1811:41e:dc00:d926:c42d:226a:860d]) by smtp.gmail.com with ESMTPSA id 17sm20152675edu.21.2019.08.05.06.39.25 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 05 Aug 2019 06:39:25 -0700 (PDT) From: James Darnley To: ffmpeg-devel@ffmpeg.org Date: Mon, 5 Aug 2019 15:39:11 +0200 Message-Id: <20190805133916.3349-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190805133916.3349-1-jdarnley@obe.tv> References: <20190805133916.3349-1-jdarnley@obe.tv> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/7] x86inc: Optimize VEX instruction encoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Henrik Gramner Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Henrik Gramner Most VEX-encoded instructions require an additional byte to encode when src2 is a high register (e.g. x|ymm8..15). If the instruction is commutative we can swap src1 and src2 when doing so reduces the instruction length, e.g. vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8, xmm0 --- libavutil/x86/x86inc.asm | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index bc370a6186..39cba5db09 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1244,9 +1244,40 @@ INIT_XMM %elif %0 >= 9 __instr %6, %7, %8, %9 %elif %0 == 8 - __instr %6, %7, %8 + %if avx_enabled && %5 + %xdefine __src1 %7 + %xdefine __src2 %8 + %ifnum regnumof%7 + %ifnum regnumof%8 + %if regnumof%7 < 8 && regnumof%8 >= 8 && regnumof%8 < 16 && sizeof%8 <= 32 + ; Most VEX-encoded instructions require an additional byte to encode when + ; src2 is a high register (e.g. m8..15). If the instruction is commutative + ; we can swap src1 and src2 when doing so reduces the instruction length. + %xdefine __src1 %8 + %xdefine __src2 %7 + %endif + %endif + %endif + __instr %6, __src1, __src2 + %else + __instr %6, %7, %8 + %endif %elif %0 == 7 - __instr %6, %7 + %if avx_enabled && %5 + %xdefine __src1 %6 + %xdefine __src2 %7 + %ifnum regnumof%6 + %ifnum regnumof%7 + %if regnumof%6 < 8 && regnumof%7 >= 8 && regnumof%7 < 16 && sizeof%7 <= 32 + %xdefine __src1 %7 + %xdefine __src2 %6 + %endif + %endif + %endif + __instr %6, __src1, __src2 + %else + __instr %6, %7 + %endif %else __instr %6 %endif