From patchwork Sat Aug 5 19:10:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Kalvachev X-Patchwork-Id: 4631 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.46.211 with SMTP id u202csp957316vsu; Sat, 5 Aug 2017 12:10:48 -0700 (PDT) X-Received: by 10.223.155.145 with SMTP id d17mr5155246wrc.193.1501960240695; Sat, 05 Aug 2017 12:10:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501960240; cv=none; d=google.com; s=arc-20160816; b=h0ZAzqzJ1vTqsP4nHxKrTbPaQmghw3cVZJPh5JOXdkww7d0xtQGghEqHRHPwj0ne8V ihqVdbf5x5+NoAtSjJFP/GnLCX+/kTrhpYvDG+6Bxwq6bJ6Vz55itWnGjY+AfwhVvkfp JWCRD4uTAr80koxQn6nQWRjyLOB7M7mWW847KNLpozZe9EXuCZyfl2CWkTcdUhGKAwTz d+BaYtm9/qpFTShpQ2kTb4VJqcGzRXiII83UBcy011+wO2xP9xuXesNsapcY3QkvfOoh CCiHWXa9D5+Ikv5+5I111XSKq1G5StDwaA08Z16piPPRthyPw1lBp03WKzLOha9GCbYU IJBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to :arc-authentication-results; bh=uA+mAUfkQ87yrVbgP7+ddBtRiThy0TEweIPLGlZNIXs=; b=Wm24YqGpOoNOMN5+Y3WL3X8QzANyzfj91GpU6Eqcuz4zi0OAQgD6nDrtJrvQTmZVur +NkVy7oV1TxSa//GUxObbcC4GTfT93gsh9wIbvOBRfuJS8OQpRcySQC2TfVV+D5V+Ou9 LOqh51A5hrgNwIadujdQPR8C/EUsUTvxQ+nss/+lnXl9lghPiXd/cmuosCY1yasSgDad zaiVdzaxiXzF9lZK5mwydBqfPopiZCOkYZQgV/nMRVZYy2DHuoDf2jQUvHkIe7HAGYXd 9spWzUk9/dS9J0KrBEtn0I6VB/Vd6hkYLsWMJaOdIRk7kQt0uq8EOhfy9EtW6tXQ9VgJ ppqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=Ob69yPlQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id q71si4999175wmd.271.2017.08.05.12.10.39; Sat, 05 Aug 2017 12:10:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=Ob69yPlQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 79C18688347; Sat, 5 Aug 2017 22:10:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf0-f180.google.com (mail-pf0-f180.google.com [209.85.192.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CDBC968826D for ; Sat, 5 Aug 2017 22:10:29 +0300 (EEST) Received: by mail-pf0-f180.google.com with SMTP id c28so18805720pfe.3 for ; Sat, 05 Aug 2017 12:10:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=YDi9erVH5ML8gVVhg+YDdAtXujoxlG/xEYV4FaW2WZA=; b=Ob69yPlQTtosz4mLpyu8+v5S5pylezOmT8/AW/+B7UcvUZJkDTNHYhy+PDRxSCrqcM 61KB5wdnTsWuGOWGtnKueaxvlIG9XFAT7JpHMq4TeGLBShicAcLz5COvwHvG0ePGDK9w aSKhEHoydpGx/BITx+Z9a4ybN7ip48iVhMAb/VKBACJqwiIM1oP78UlcM3kmK7wtmG1I /jY6NTo6txuhA9jbGIKUUnSx4OKySwRyLDdsM++YG7ML612tOx7qQVC6SIYHSpSOsBoZ Ymy0FobMo+v/2yWoJf9bhYPZb+Usd9Mjndq5Ywzg1GSz1GGucwmgAXVOvDnLvU162b6u U2Aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=YDi9erVH5ML8gVVhg+YDdAtXujoxlG/xEYV4FaW2WZA=; b=F3LME8x+Q0Q1hkmRoIpYH5RmfSVa7Mn8zTawTR6GbSBOvceHnM2JhbcE5NqCa+QaXn d23XP1Gy0O7/s8lk9k9uUvMh864gkp8Wvn1DQK+7eUKBg1foH/Rt7HbL+kAlM0ztvqXk cz4MSVpXf9dqRbF6/fGjjoj8KZ+rGssBfUm5BHd15Q5UvvmnrQK5l1nEarq7VAG/pU2i kiUuFh7C0QDApHv4R/YBa7rrS1sULSz7GX36NhExbTN0qArkktUbaIGsx+CT6KaZOYAk Hf88Z8gi8GqJehM4J5f1vsUZS134t8uW1gZCKOy4EgACobbHn6iVuxPbn2WaH/bQXkCu t11w== X-Gm-Message-State: AIVw111GBPVtCYG/z4kTXYEthPYQ7Wkk2sRkkBuan1vkQcRTHpc4GZiJ GXgtr6AhsyTJCFaxyt185R61wv3esA== X-Received: by 10.99.97.68 with SMTP id v65mr6364188pgb.200.1501960228792; Sat, 05 Aug 2017 12:10:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.168.77 with HTTP; Sat, 5 Aug 2017 12:10:26 -0700 (PDT) From: Ivan Kalvachev Date: Sat, 5 Aug 2017 22:10:26 +0300 Message-ID: To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH] Add macros used in opus_pvq_search to x86util.asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. From cf4dc8fcd974a845b91aaa8685c06fa145b01786 Mon Sep 17 00:00:00 2001 From: Ivan Kalvachev Date: Sat, 5 Aug 2017 20:18:50 +0300 Subject: [PATCH 1/6] Add macros to x86util.asm . Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev --- libavutil/x86/x86util.asm | 108 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 100 insertions(+), 8 deletions(-) diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm index cc7d272cad..d460ee5193 100644 --- a/libavutil/x86/x86util.asm +++ b/libavutil/x86/x86util.asm @@ -832,14 +832,25 @@ pmaxsd %1, %2 %endmacro -%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32 -%if cpuflag(avx) - vbroadcastss %1, %2 -%else ; sse -%ifnidn %1, %2 - movss %1, %2 -%endif - shufps %1, %1, 0 +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm +%if cpuflag(avx2) + vbroadcastss %1, %2 ; ymm, xmm +%elif cpuflag(avx) + %ifnum sizeof%2 ; avx1 register + vpermilps xmm%1, xmm%2, q0000 ; xmm, xmm, imm || ymm, ymm, imm + %if sizeof%1 >= 32 ; mmsize>=32 + vinsertf128 %1, %1, xmm%1, 1 ; ymm, ymm, xmm, im + %endif + %else ; avx1 memory + vbroadcastss %1, %2 ; ymm, mm32 || xmm, m32 + %endif +%else + %ifnum sizeof%2 ; sse register + shufps %1, %2, %2, q0000 + %else ; sse memory + movss %1, %2 + shufps %1, %1, 0 + %endif %endif %endmacro @@ -854,6 +865,21 @@ %endif %endmacro +%macro VPBROADCASTD 2 ; dst xmm/ymm, src m32/xmm +%if cpuflag(avx2) + vpbroadcastd %1, %2 +%elif cpuflag(avx) && sizeof%1 >= 32 + %error vpbroadcastd not possible with ymm on avx1. try vbroadcastss +%else + %ifnum sizeof%2 ; sse2 register + pshufd %1, %2, q0000 + %else ; sse memory + movd %1, %2 + pshufd %1, %1, 0 + %endif +%endif +%endmacro + %macro SHUFFLE_MASK_W 8 %rep 8 %if %1>=0x80 @@ -918,3 +944,69 @@ movhlps %1, %2 ; may cause an int/float domain transition and has a dependency on dst %endif %endmacro + +; Horizontal Sum of Packed Single precision floats +; The resulting sum is in all elements. +%macro HSUMPS 2 ; dst/src, tmp +%if cpuflag(avx) + %if sizeof%1>=32 ; avx + vperm2f128 %2, %1, %1, (0)*16+(1) + addps %1, %2 + %endif + shufps %2, %1, %1, q1032 + addps %1, %2 + shufps %2, %1, %1, q0321 + addps %1, %2 +%else ; this form is a bit faster than the short avx-like emulation. + movaps %2, %1 + shufps %1, %1, q1032 + addps %1, %2 + movaps %2, %1 + shufps %1, %1, q0321 + addps %1, %2 + ; all %1 members should be equal for as long as float a+b==b+a +%endif +%endmacro + +; Emulate blendvps if not available +; +; src_b is destroyed when using emulation with logical operands +; SSE41 blendv instruction is hard coded to use xmm0 as mask +%macro BLENDVPS 3 ; dst/src_a, src_b, mask +%if cpuflag(avx) + blendvps %1, %1, %2, %3 +%elif cpuflag(sse4) + %if notcpuflag(avx) + %ifnidn %3,xmm0 + %error sse41 blendvps uses xmm0 as default 3d operand, you used %3 + %endif + %endif + blendvps %1, %2, %3 +%else + xorps %2, %1 + andps %2, %3 + xorps %1, %2 +%endif +%endmacro + +; Emulate pblendvb if not available +; +; src_b is destroyed when using emulation with logical operands +; SSE41 blendv instruction is hard coded to use xmm0 as mask +%macro PBLENDVB 3 ; dst/src_a, src_b, mask +%if cpuflag(avx) + %if cpuflag(avx) && notcpuflag(avx2) && sizeof%1 >= 32 + %error pblendb not possible with ymm on avx1, try blendvps. + %endif + pblendvb %1, %1, %2, %3 +%elif cpuflag(sse4) + %ifnidn %3,xmm0 + %error sse41 pblendvd uses xmm0 as default 3d operand, you used %3 + %endif + pblendvb %1, %2, %3 +%else + pxor %2, %1 + pand %2, %3 + pxor %1, %2 +%endif +%endmacro -- 2.13.2