From patchwork Mon Jun 12 18:09:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 3929 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.22.4 with SMTP id 4csp205813vsw; Mon, 12 Jun 2017 11:10:40 -0700 (PDT) X-Received: by 10.28.7.1 with SMTP id 1mr162697wmh.22.1497291040823; Mon, 12 Jun 2017 11:10:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497291040; cv=none; d=google.com; s=arc-20160816; b=tYMs5vAZbgmyKdX4bWiveD/wbW9RoUka/3sIac7tfWSCL4bK8B4rJTrERlZ4awjo2s NbfP+6iX9zNoIgUpY2LifUU+y9BC4tpghYjR1lyd36Kt0ZBVAODwIejed2SWQiarCukX J9QdMYz24Fa+y/DZwWlP++Utn1kujT9f1FWsoumGghooFUjcDSKlgUjn7oVEfzuGJ+r7 po8/8RenD0KHbLzSXql0tAzwLtd1T5gFKJA7T4WMz3HaAw3JlpuhEBhC1rVOteYOCiPk 8rSgB3WayuYMMLeO630QB0C8kxKTpLLMU5wKRKYLOwB3SVE7XncZQwleYICrRX8TMGVw Sf5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to:arc-authentication-results; bh=0Y+CbskGfo3XvL7CLXAi0rtAVkq+fubSOTZqKecVHN4=; b=ygWHFIrUtol+08r8oHcm/lPtBHRPyoqBVZH7VcvBHcpMch7vejZDcv3jqZl+OO/QDF Av/jX/BinFKrRChjg8YOyluG/cmqdWXiwhyiAt2RVsuYWuz+WeWQcej6cRF+0WOGaLT9 z08lcLbwuMTDSZviSivaRwYikfX2CrvZTZXZxKllcgVpVMV9R1sm/zG8DKfJDDrPR1RQ qOahxGt6ym15z8uN2JYNQsOwYSYijtJrPtgWORH7xALfHZ0xE/AJRedZUD/w4RthL6ye A6/XgYtNz1RoPIs/GRJDjV28C9Oo3g3SYxRMbaGBsn8cRhHdKzIo9PYmE3wj1cM4Bzll VBuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b64si8297590wmd.13.2017.06.12.11.10.40; Mon, 12 Jun 2017 11:10:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A7624689FC4; Mon, 12 Jun 2017 21:10:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt0-f195.google.com (mail-qt0-f195.google.com [209.85.216.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3267D689C07 for ; Mon, 12 Jun 2017 21:10:29 +0300 (EEST) Received: by mail-qt0-f195.google.com with SMTP id x58so28283359qtc.2 for ; Mon, 12 Jun 2017 11:10:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=qolgVw+780lEOcrCynY5nLg9dMBNHsKuhYGYMMOU734=; b=Zi6FINR1xF7dPJrqhdzw27QRd6V3VVEW0COLWNeig8HEc2XJLmHXfTeE6bQ6H0qRxK YXf/NwhJGO+IFt7xkb71jRf0OkSNLo7zy9ofsl9TlN7ZWdqfVIkpdD2ymn08WOy1owP8 +v0XbqxXBLow5f9RC+dtP9GrKMDIg5YY2jp+5E372qUvlequdzKvISbDtrHz1XobnFF9 CFD41boM4aqn7+/OzZwHkPiS6wTIzv2PTQo7apOGp57orD4tslvXXshR/o3k5jP9Bkm+ U5rgBEtpxmQEPQ9ENzlC8kshhg5Ml41Q4YVTQ9B31b0PHOe8TTU2teuc8xjsbfP18Aee 3lPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=qolgVw+780lEOcrCynY5nLg9dMBNHsKuhYGYMMOU734=; b=jc0kc9PbgMiiIMBqgPE7WhNsu7h/n82fESBGvaEWiQgnNk/PeGmKwOd+BwockvfP55 pxRzJjeezO4G0stAoWQ27SzAmN5CL9nEccoUs5e/nOdPGdRAeSAlOEL7y0zjMwnKeCvH Uokk+BpDEXROxbBs9thApihcQNMoF4mJ1MNQeNc6r9RnYhmWopf8xI8W9gMqoJGIPR7b Yf67PknE9FBn8+eEtpqQ3Q8XAP6eieaVcLHM8bfbgncJGdonGW5N0ZGsriWNYbdset4X U84ZvwTlkuAea1ku+xZq6Soq2RNYjEI6p4kR8WWgtehX7y2vsM5D8AUNck/CRSIVt3xS mwGw== X-Gm-Message-State: AKS2vOwtQU/sKpY9Gpv6V7ndcAfhotpX0tr0jR6DjhrUqQa2zEbdFjxi YPt6F08cz2BrdZyh X-Received: by 10.55.73.67 with SMTP id w64mr28805441qka.207.1497291029701; Mon, 12 Jun 2017 11:10:29 -0700 (PDT) Received: from localhost.localdomain ([181.231.116.134]) by smtp.gmail.com with ESMTPSA id x31sm7124342qtx.12.2017.06.12.11.10.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 12 Jun 2017 11:10:29 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Mon, 12 Jun 2017 15:09:27 -0300 Message-Id: <20170612180927.2292-1-jamrial@gmail.com> X-Mailer: git-send-email 2.13.0 Subject: [FFmpeg-devel] [PATCH] x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse, sse4} X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" About 2x faster than the c version. Signed-off-by: James Almer --- libavcodec/x86/aacpsdsp.asm | 123 +++++++++++++++++++++++++++++++++++++++++ libavcodec/x86/aacpsdsp_init.c | 8 +++ libavutil/x86/x86util.asm | 15 +++-- 3 files changed, 140 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm index f7f22f274c..cdcadefcdc 100644 --- a/libavcodec/x86/aacpsdsp.asm +++ b/libavcodec/x86/aacpsdsp.asm @@ -172,6 +172,129 @@ align 16 .ret: REP_RET +;*********************************************************** +;void ps_hybrid_synthesis_deint_sse4(float out[2][38][64], +; float (*in)[32][2], +; int i, int len) +;*********************************************************** +%macro HYBRID_SYNTHESIS_DEINT 0 +cglobal ps_hybrid_synthesis_deint, 3, 7, 5, out, in, i, len, out0, out1, tmp +%if cpuflag(sse4) +%define MOVH movsd +%else +%define MOVH movlps +%endif + movsxdifnidn iq, id + mov lend, 32 << 3 + lea outq, [outq+iq*4] + mov tmpd, id + shl tmpd, 8 + add inq, tmpq + mov tmpd, 64 + sub tmpd, id + mov id, tmpd + + test id, 1 + jne .loop4 + test id, 2 + jne .loop8 + +align 16 +.loop16: + mov out0q, outq + mov out1q, 38*64*4 + add out1q, out0q + mov tmpd, lend + +.inner_loop16: + movaps m0, [inq] + movaps m1, [inq+lenq] + movaps m2, [inq+lenq*2] + movaps m3, [inq+3*32*2*4] + TRANSPOSE4x4PS 0, 1, 2, 3, 4 + movaps [out0q], m0 + movaps [out1q], m1 + movaps [out0q+lenq], m2 + movaps [out1q+lenq], m3 + lea out0q, [out0q+lenq*2] + lea out1q, [out1q+lenq*2] + add inq, mmsize + sub tmpd, mmsize + jg .inner_loop16 + add outq, 16 + add inq, 3*32*2*4 + sub id, 4 + jg .loop16 + RET + +align 16 +.loop8: + mov out0q, outq + mov out1q, 38*64*4 + add out1q, out0q + mov tmpd, lend + +.inner_loop8: + movaps m0, [inq] + movaps m1, [inq+lenq] + SBUTTERFLYPS 0, 1, 2 + SBUTTERFLYPD 0, 1, 2 + MOVH [out0q], m0 + MOVH [out1q], m1 + movhps [out0q+lenq], m0 + movhps [out1q+lenq], m1 + lea out0q, [out0q+lenq*2] + lea out1q, [out1q+lenq*2] + add inq, mmsize + sub tmpd, mmsize + jg .inner_loop8 + add outq, 8 + add inq, lenq + sub id, 2 + jg .loop16 + RET + +align 16 +.loop4: + mov out0q, outq + mov out1q, 38*64*4 + add out1q, out0q + mov tmpd, lend + +.inner_loop4: + movaps m0, [inq] + movss [out0q], m0 +%if cpuflag(sse4) + extractps [out1q], m0, 1 + extractps [out0q+lenq], m0, 2 + extractps [out1q+lenq], m0, 3 +%else + movhlps m1, m0 + movss [out0q+lenq], m1 + shufps m0, m0, 0xb1 + movss [out1q], m0 + movhlps m1, m0 + movss [out1q+lenq], m1 +%endif + lea out0q, [out0q+lenq*2] + lea out1q, [out1q+lenq*2] + add inq, mmsize + sub tmpd, mmsize + jg .inner_loop4 + add outq, 4 + sub id, 1 + test id, 2 + jne .loop8 + cmp id, 4 + jge .loop16 + RET +%endmacro + +INIT_XMM sse +HYBRID_SYNTHESIS_DEINT +INIT_XMM sse4 +HYBRID_SYNTHESIS_DEINT + ;******************************************************************* ;void ff_ps_hybrid_analysis_(float (*out)[2], float (*in)[2], ; const float (*filter)[8][2], diff --git a/libavcodec/x86/aacpsdsp_init.c b/libavcodec/x86/aacpsdsp_init.c index 767ae6588e..25e089c395 100644 --- a/libavcodec/x86/aacpsdsp_init.c +++ b/libavcodec/x86/aacpsdsp_init.c @@ -40,6 +40,10 @@ void ff_ps_stereo_interpolate_sse3(float (*l)[2], float (*r)[2], void ff_ps_stereo_interpolate_ipdopd_sse3(float (*l)[2], float (*r)[2], float h[2][4], float h_step[2][4], int len); +void ff_ps_hybrid_synthesis_deint_sse(float out[2][38][64], float (*in)[32][2], + int i, int len); +void ff_ps_hybrid_synthesis_deint_sse4(float out[2][38][64], float (*in)[32][2], + int i, int len); av_cold void ff_psdsp_init_x86(PSDSPContext *s) { @@ -48,6 +52,7 @@ av_cold void ff_psdsp_init_x86(PSDSPContext *s) if (EXTERNAL_SSE(cpu_flags)) { s->add_squares = ff_ps_add_squares_sse; s->mul_pair_single = ff_ps_mul_pair_single_sse; + s->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_sse; s->hybrid_analysis = ff_ps_hybrid_analysis_sse; } if (EXTERNAL_SSE3(cpu_flags)) { @@ -56,4 +61,7 @@ av_cold void ff_psdsp_init_x86(PSDSPContext *s) s->stereo_interpolate[1] = ff_ps_stereo_interpolate_ipdopd_sse3; s->hybrid_analysis = ff_ps_hybrid_analysis_sse3; } + if (EXTERNAL_SSE4(cpu_flags)) { + s->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_sse4; + } } diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm index fe9a727e22..cc7d272cad 100644 --- a/libavutil/x86/x86util.asm +++ b/libavutil/x86/x86util.asm @@ -71,6 +71,12 @@ SWAP %1, %3, %2 %endmacro +%macro SBUTTERFLYPD 3 + movlhps m%3, m%1, m%2 + movhlps m%2, m%2, m%1 + SWAP %1, %3 +%endmacro + %macro TRANSPOSE4x4B 5 SBUTTERFLY bw, %1, %2, %5 SBUTTERFLY bw, %3, %4, %5 @@ -117,12 +123,9 @@ %macro TRANSPOSE4x4PS 5 SBUTTERFLYPS %1, %2, %5 SBUTTERFLYPS %3, %4, %5 - movlhps m%5, m%1, m%3 - movhlps m%3, m%1 - SWAP %5, %1 - movlhps m%5, m%2, m%4 - movhlps m%4, m%2 - SWAP %5, %2, %3 + SBUTTERFLYPD %1, %3, %5 + SBUTTERFLYPD %2, %4, %5 + SWAP %2, %3 %endmacro %macro TRANSPOSE8x4D 9-11