From patchwork Thu Feb 16 13:11:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 2573 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp2465564vsb; Thu, 16 Feb 2017 05:12:30 -0800 (PST) X-Received: by 10.28.12.13 with SMTP id 13mr2263346wmm.10.1487250750578; Thu, 16 Feb 2017 05:12:30 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id v85si470990wmv.132.2017.02.16.05.12.29; Thu, 16 Feb 2017 05:12:30 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3CDA3689A18; Thu, 16 Feb 2017 15:12:21 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr0-f171.google.com (mail-wr0-f171.google.com [209.85.128.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7CF586808CA for ; Thu, 16 Feb 2017 15:12:15 +0200 (EET) Received: by mail-wr0-f171.google.com with SMTP id 89so6818481wrr.3 for ; Thu, 16 Feb 2017 05:12:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=dp9AueN+4urV0h6EBqreEWfZ5ZxXovWKhun/9/Skuus=; b=XWFM0h4ET8TMfdoJfEVTkKFBoVvoPnwnvsGkx4wPCZEe3Buf0BIDyDFCDJmhebtFsQ sDGLRX963dgB5VcFTWh57B71vQ938cxt018JXe7pxWAX0h7E8tRUyz4pnehTz6aWN76I mexlIYsHwjI/iYuUxB/xWUWdeDqJwyn8v4sGUmevaDHxPaOdXS7kKVXkp9EcNvzT1GqK XMDXD90cNWsJ2pmeG8sGCMKks0U5vKJFif5DK6357a1XJvpPjjUlaMR8FC5M6DVLYEf1 sORHISvrw3IQyLBiujVJjivtD/W+51lkmFw8r/MFSI1RkTxg3bpos++UsZ+iTF3pppzQ Sksg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=dp9AueN+4urV0h6EBqreEWfZ5ZxXovWKhun/9/Skuus=; b=rsMz0+i5DsYeZg0PNJvZ40YWp8PwBc8Tg9g0cV9AoXOj4ZAFQuQEF/1l5csDKBzJZv xlcLajpW9OblvOME4qyB9ZxovepOJe6Vp/aCHmCUCoKtwQGScrENGRb5weRQXCrPIS+c 4Rwf5GDKsObkcd+EPmTXoOxNviz3QN8565WRKu3V3+9CVLnWO9YmB1+K2gRV/cFsf2wW 7/ACkJ07X1yORn6onpRb/XAz1tXvLebGjasEuuqqD+dn4lqVik7eIN7fN06Ulk5pUfEa bjYHcPXOGkXOpOBKhF3gWrjAWthRgRgDCCR3rNsFh/loqFMexvPf9Jk9pLGiglecAmfv mVLA== X-Gm-Message-State: AMke39lKfAioxv2Wq+xtEkOXfEkD5F9Rdv0Ij3nu0w0cg4LVpe209qQI+BNJwg31HHSBUg== X-Received: by 10.223.135.184 with SMTP id b53mr2245694wrb.169.1487250741105; Thu, 16 Feb 2017 05:12:21 -0800 (PST) Received: from localhost.localdomain (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id e74sm210945wmd.2.2017.02.16.05.12.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Feb 2017 05:12:20 -0800 (PST) From: James Darnley To: FFmpeg development discussions and patches Date: Thu, 16 Feb 2017 14:11:48 +0100 Message-Id: <20170216131149.7028-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20170216131149.7028-1-jdarnley@obe.tv> References: <20170216131149.7028-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 3/4] x86util: import MOVHL macro X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible. --- libavutil/x86/x86util.asm | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm index c063436..1408f0a 100644 --- a/libavutil/x86/x86util.asm +++ b/libavutil/x86/x86util.asm @@ -876,3 +876,15 @@ psrlq %1, 8*(%2) %endif %endmacro + +%macro MOVHL 2 ; dst, src +%ifidn %1, %2 + punpckhqdq %1, %2 +%elif cpuflag(avx) + punpckhqdq %1, %2, %2 +%elif cpuflag(sse4) + pshufd %1, %2, q3232 ; pshufd is slow on some older CPUs, so only use it on more modern ones +%else + movhlps %1, %2 ; may cause an int/float domain transition and has a dependency on dst +%endif +%endmacro