From patchwork Mon Feb 13 12:44:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 2539 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp895023vsb; Mon, 13 Feb 2017 04:55:56 -0800 (PST) X-Received: by 10.28.66.88 with SMTP id p85mr38733237wma.64.1486990556234; Mon, 13 Feb 2017 04:55:56 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d10si13732156wra.84.2017.02.13.04.55.55; Mon, 13 Feb 2017 04:55:56 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7F53F689257; Mon, 13 Feb 2017 14:55:48 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wj0-f196.google.com (mail-wj0-f196.google.com [209.85.210.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 496AF6891F3 for ; Mon, 13 Feb 2017 14:55:42 +0200 (EET) Received: by mail-wj0-f196.google.com with SMTP id kq3so154524wjc.3 for ; Mon, 13 Feb 2017 04:55:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=IjDfF4W0Yk5zF34Ns22G2aj7NsCGCKpp6CGA8Ys05y4=; b=0rYRlBnq/ipizAQspibXAK5K+tD7UjjzyGRhMsfYu+9y0KvgzdaZsFREIvxz3Si8Ja wEjjG+nMd0Qmf+q5qmlJWXPD7o74YIdjbUOfhCNzCX1gyiX+cMySG1p2ozKWfzfAzOBB 2o4joQGD4hSJh7hYs2qVIOIbHo979sFYgOW5R/oZdn6aLCpLudDDGa5h/ZKKk/4yneyw SlOfEKN+loUSMV9ouL5U72Birwgc321OgkuFTcKz8Bz6tuMUwxa8Zo9lUoeNncRtVdky ZwueDTG8q1QVs1sS8fN1NxF0s52cFQ2fZCUuA0UPPphVdokBSisnjybxsiouu7EAwV60 zgRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=IjDfF4W0Yk5zF34Ns22G2aj7NsCGCKpp6CGA8Ys05y4=; b=gpztM8QeshAWzOAgpkSOKdeQ/r2gCWpiM/Y8r47bGd4R6HlbSyVJGcHqPxiPPVY7u4 FIh03h1APPMeFR+2L9nVwSC7HiuFjF9oiINkKY7ZY4R3jAlvNb+6bo/DoqkZrv6LVOcb rT9hSIYgptnCT3SaEjE5IBQbEhuv5gyUnht+wiIRgwFJnqoBR+PatZEFkc1Ypzl+x6cx 9jW969upPrJvbeaA4U+g/pgGraS4hsmqe4t37cTOQcXUSxH8EVXmqnxqJXuW9BnefEKQ XoLLrdcPdy5QoRRvga3XkF8QbkpFUfLTz64h+R2H+4RZT1nXGkRo+jRun4e0JvCMw4S3 U4Mw== X-Gm-Message-State: AMke39kc6Hq+iiwG3zEWmhlGLIhFFq8lIIFvUvjP3hnyAoUOPTGz5Lfvw+Vi6uDpW6TGsw== X-Received: by 10.28.98.194 with SMTP id w185mr41350910wmb.84.1486990046787; Mon, 13 Feb 2017 04:47:26 -0800 (PST) Received: from Ifrit.systemlords.lan (d51a44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id 10sm13778137wrw.13.2017.02.13.04.47.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Feb 2017 04:47:25 -0800 (PST) From: James Darnley To: FFmpeg development discussions and patches Date: Mon, 13 Feb 2017 13:44:16 +0100 Message-Id: <20170213124417.25808-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.11.1 In-Reply-To: <20170213124417.25808-1-jdarnley@obe.tv> References: <20170213124417.25808-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 3/4] x86util: import MOVHL macro X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible. --- libavutil/x86/x86util.asm | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm index c063436e0a..1408f0a176 100644 --- a/libavutil/x86/x86util.asm +++ b/libavutil/x86/x86util.asm @@ -876,3 +876,15 @@ psrlq %1, 8*(%2) %endif %endmacro + +%macro MOVHL 2 ; dst, src +%ifidn %1, %2 + punpckhqdq %1, %2 +%elif cpuflag(avx) + punpckhqdq %1, %2, %2 +%elif cpuflag(sse4) + pshufd %1, %2, q3232 ; pshufd is slow on some older CPUs, so only use it on more modern ones +%else + movhlps %1, %2 ; may cause an int/float domain transition and has a dependency on dst +%endif +%endmacro