From patchwork Sun Aug 6 12:36:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Kalvachev X-Patchwork-Id: 4635 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.46.211 with SMTP id u202csp1685757vsu; Sun, 6 Aug 2017 05:36:58 -0700 (PDT) X-Received: by 10.28.154.85 with SMTP id c82mr4555384wme.151.1502023018336; Sun, 06 Aug 2017 05:36:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1502023018; cv=none; d=google.com; s=arc-20160816; b=WjLIbeQH/q/D3b01GH2r7tpat8JzwSVOXXpvwZQ/4T7QWxJSKu/efenJ4a2Z3qTDpS HkGLWlAo5bcrMMRHu9+YNX6IS/zgEtYVNo8QrZiriZVPsgSECSlvWb24Iaqf/uZe9P3n qy9TOTQCCSLUzas1O9DXB+o8RxK0IPAZQBJ03Bjy0hEF3rEAmuN4zyWEnipMq/A+FnfQ ckeMGeVwRfMwF/G2Ue9xu98gzmUXdljcPl/cqIbIxI8SiNX1j9UzBp2X6RMjDesejLz9 0EqX27Xpi/jD9WajuOwfvEJ3I/D8S2USL1RL8hOXD4hIdnIW1QRMyUl0sjjrYSXECPRx laAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to :arc-authentication-results; bh=v3JWARKkQZ4esB2dco5c/SLnz4EPKvN7E0SX7cJ7gcQ=; b=JgJv32N+0S/hgEYDZGTSfdVqrW7YDNgwygpnw5gLWN4rk7E58FS17SK/reIMwQk37Y Ovx75dOzHzwpTr9cnIe9kVcfDET9zXjmpJEWGsV3rPF0By5p5Un+0zwY8274ZuL24C5o BKM7nBHDWiqzOfdu80sjVdGs48dfPhzaaNHq37KMsBLa4xFwsd3rPTrQjTewxzFRPisC jyu2vXZ60fy5pVdUXb1/Pw95+Yb8P01uPU1QB7IH0XY+keTIpjzWM1bQ2Lt2E2NsFu8X swA2MpuKm7y4pO2lFxf8nG2vjsvdBePT7YrzDU8kWbM4UhO78t2YC3zXpOCCC9kel1We fg7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=ahGDIgIr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c134si3806562wmc.210.2017.08.06.05.36.57; Sun, 06 Aug 2017 05:36:58 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=ahGDIgIr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 936696883D0; Sun, 6 Aug 2017 15:36:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg0-f43.google.com (mail-pg0-f43.google.com [74.125.83.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0D31E68828D for ; Sun, 6 Aug 2017 15:36:47 +0300 (EEST) Received: by mail-pg0-f43.google.com with SMTP id u185so24076135pgb.1 for ; Sun, 06 Aug 2017 05:36:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=TF+R3xEF6JZfSjzUVktS4pbRTsOtMA0/sMGDK/owq60=; b=ahGDIgIrqF7AECB6M4IXq1nZ/+DOgMuQNDX4ozj66hwbNQL7shFYTy61o+NhoK4F2r oAUjoH7zx8sBVaKE3jotA8qlw+ISmPGB5H8KMNH2H2suSFaoN+UM8v8+9Lau+uKFdxEX o+qtKqbv05z6BLIYQXnSXJmMjJ6i6Q06YqeL+1g7cq2VSjGadrKIufMbcYirtXNszjbG xQczrnm7RgeUt9EHajPAbFXH1ipC/h8TOilFp7ChtdvU7fUJ9bBD4W/nk0VQXR1FC3b7 6XFQ7zI7wV11AC6Gv6WEkgL+k/K3JGbVca3yYjQS61sVSrLe7zMY4UhLTyL8UlzKnU90 WTNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=TF+R3xEF6JZfSjzUVktS4pbRTsOtMA0/sMGDK/owq60=; b=t0dnGl7K3Q/GsxxiQeXhZtio1ReGAttxK++lZkMNGEIj+tab1znLCF4nyZBnooKUih fiJcbTHCPpoF7TbbPdjVtrQQUaCUAIMJW5wiH8qHxpO45dBsAs4MiXic0Xn1/tTxTr1n HSjf6HyxE7zhjtD8xec3RDuJBDnxepECmTiol0+42nz7lWrF5fGWn9d0gkzuZDY63YLP utMJLgXvYLy6btWbINVQqJClEHZPzH8erlHsCem//G2u6YkHWjW8wYtvAw1Us8uVRxkt zmcHjfDNTIu+c5VgmtVG4YONud3ogDkcI0GMNUCQ7WnwdY2k2hlW60eBFncbF+OapFoD rFow== X-Gm-Message-State: AIVw111FU33ugctKEChw5SXAyXq4BPG1UCJKXhBOXROU/iCT/R4Uni+F +ZvnnJl+6bHWCTSddlNMgDyfmwN/SA== X-Received: by 10.99.47.2 with SMTP id v2mr8190420pgv.203.1502023006086; Sun, 06 Aug 2017 05:36:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.168.77 with HTTP; Sun, 6 Aug 2017 05:36:45 -0700 (PDT) From: Ivan Kalvachev Date: Sun, 6 Aug 2017 15:36:45 +0300 Message-ID: To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH 1/2]v2 Add macros used in opus_pvq_search to x86util.asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" On 8/6/17, Henrik Gramner wrote: > On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev wrote: >> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm >> +%if cpuflag(avx2) >> + vbroadcastss %1, %2 ; ymm, xmm >> +%elif cpuflag(avx) >> + %ifnum sizeof%2 ; avx1 register >> + vpermilps xmm%1, xmm%2, q0000 ; xmm, xmm, imm || ymm, ymm, >> imm > > Nit: Use shufps instead of vpermilps, it's one byte shorter but > otherwise identical in this case. > > c5 e8 c6 ca 00 vshufps xmm1,xmm2,xmm2,0x0 > c4 e3 79 04 ca 00 vpermilps xmm1,xmm2,0x0 It's also 1 latency cycle less on some old AMD cpu's. Done. >> +%macro BLENDVPS 3 ; dst/src_a, src_b, mask >> +%if cpuflag(avx) >> + blendvps %1, %1, %2, %3 >> +%elif cpuflag(sse4) >> + %if notcpuflag(avx) >> + %ifnidn %3,xmm0 >> + %error sse41 blendvps uses xmm0 as default 3d operand, you >> used %3 >> + %endif >> + %endif > > notcpuflag(avx) is redundant (it's always true since AVX uses the first > branch). Done. This is a remnant from the time I had label to turn on and off different implementations. Best Regards _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > From a43da9061c08dcf4cb6ecd7c8eaad074cdb551d1 Mon Sep 17 00:00:00 2001 From: Ivan Kalvachev Date: Sat, 5 Aug 2017 20:18:50 +0300 Subject: [PATCH 1/6] Add macros to x86util.asm . Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev --- libavutil/x86/x86util.asm | 106 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 98 insertions(+), 8 deletions(-) diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm index cc7d272cad..e1220dfc1a 100644 --- a/libavutil/x86/x86util.asm +++ b/libavutil/x86/x86util.asm @@ -832,14 +832,25 @@ pmaxsd %1, %2 %endmacro -%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32 -%if cpuflag(avx) - vbroadcastss %1, %2 -%else ; sse -%ifnidn %1, %2 - movss %1, %2 -%endif - shufps %1, %1, 0 +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm +%if cpuflag(avx2) + vbroadcastss %1, %2 +%elif cpuflag(avx) + %ifnum sizeof%2 ; avx1 register + shufps xmm%1, xmm%2, xmm%2, q0000 + %if sizeof%1 >= 32 ; mmsize>=32 + vinsertf128 %1, %1, xmm%1, 1 + %endif + %else ; avx1 memory + vbroadcastss %1, %2 + %endif +%else + %ifnum sizeof%2 ; sse register + shufps %1, %2, %2, q0000 + %else ; sse memory + movss %1, %2 + shufps %1, %1, 0 + %endif %endif %endmacro @@ -854,6 +865,21 @@ %endif %endmacro +%macro VPBROADCASTD 2 ; dst xmm/ymm, src m32/xmm +%if cpuflag(avx2) + vpbroadcastd %1, %2 +%elif cpuflag(avx) && sizeof%1 >= 32 + %error vpbroadcastd not possible with ymm on avx1. try vbroadcastss +%else + %ifnum sizeof%2 ; sse2 register + pshufd %1, %2, q0000 + %else ; sse memory + movd %1, %2 + pshufd %1, %1, 0 + %endif +%endif +%endmacro + %macro SHUFFLE_MASK_W 8 %rep 8 %if %1>=0x80 @@ -918,3 +944,67 @@ movhlps %1, %2 ; may cause an int/float domain transition and has a dependency on dst %endif %endmacro + +; Horizontal Sum of Packed Single precision floats +; The resulting sum is in all elements. +%macro HSUMPS 2 ; dst/src, tmp +%if cpuflag(avx) + %if sizeof%1>=32 ; avx + vperm2f128 %2, %1, %1, (0)*16+(1) + addps %1, %2 + %endif + shufps %2, %1, %1, q1032 + addps %1, %2 + shufps %2, %1, %1, q0321 + addps %1, %2 +%else ; this form is a bit faster than the short avx-like emulation. + movaps %2, %1 + shufps %1, %1, q1032 + addps %1, %2 + movaps %2, %1 + shufps %1, %1, q0321 + addps %1, %2 + ; all %1 members should be equal for as long as float a+b==b+a +%endif +%endmacro + +; Emulate blendvps if not available +; +; src_b is destroyed when using emulation with logical operands +; SSE41 blendv instruction is hard coded to use xmm0 as mask +%macro BLENDVPS 3 ; dst/src_a, src_b, mask +%if cpuflag(avx) + blendvps %1, %1, %2, %3 +%elif cpuflag(sse4) + %ifnidn %3,xmm0 + %error sse41 blendvps uses xmm0 as default 3d operand, you used %3 + %endif + blendvps %1, %2, %3 +%else + xorps %2, %1 + andps %2, %3 + xorps %1, %2 +%endif +%endmacro + +; Emulate pblendvb if not available +; +; src_b is destroyed when using emulation with logical operands +; SSE41 blendv instruction is hard coded to use xmm0 as mask +%macro PBLENDVB 3 ; dst/src_a, src_b, mask +%if cpuflag(avx) + %if cpuflag(avx) && notcpuflag(avx2) && sizeof%1 >= 32 + %error pblendb not possible with ymm on avx1, try blendvps. + %endif + pblendvb %1, %1, %2, %3 +%elif cpuflag(sse4) + %ifnidn %3,xmm0 + %error sse41 pblendvd uses xmm0 as default 3d operand, you used %3 + %endif + pblendvb %1, %2, %3 +%else + pxor %2, %1 + pand %2, %3 + pxor %1, %2 +%endif +%endmacro -- 2.14.0