From patchwork Tue Jun 13 00:35:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 3960 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.22.4 with SMTP id 4csp108741vsw; Mon, 12 Jun 2017 17:35:33 -0700 (PDT) X-Received: by 10.223.153.238 with SMTP id y101mr604178wrb.168.1497314133574; Mon, 12 Jun 2017 17:35:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497314133; cv=none; d=google.com; s=arc-20160816; b=LeCI56/4q5ut91C62JWxFRFdr+Tz1rw2i2YXqyjjwfTeN7J0suuCHFvG5jWHC4jIsc pii5FxELl4QjYH9S1fBzuJcgRfLMba2pdErrXyI6A/d+MKbhwAUcyWysQr/eeMxEFXbv UhD0r05CHnJBN4Lh+VFul88/kyVA4VHfaFDzfbmVhqxm8mEm5BbY3Axo8aGAaJ1wxGiQ ycfHMAUrqpXYxx+DbcUU2kJlE3iwRmUcoQB3aK1Bu8SzgnWrH5LTrYFpued4knI0e0+k 3xap3UhwHpWcCcCoQ+rndbUIlN663TSHmJgkT5xMWf8prIdrjWDmixxIdbkHkN3pNsO6 Tj5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to:arc-authentication-results; bh=PeGJcGdThdLAsHoSrd64TJgXP4oRzzOqlRsKJbmRRo4=; b=VgIBuTISIwHjaRPO+F2xAxRNtyH9qK223BI7fjLKqd2Qpm71VmbkNYg9Mc+fGbIlFE cCG6MFHkVTxuidwqNQUAsgdM3V+xnhQnXwd+bal5cPiR52peMwhTMzJbCpPU/IEcYD9w shoQ5kDAKEEK3bDfs2qi0GwQK0H0nnY15IhCm/2jnKGwJDeJjCC1z5VEIfSrj/L6L4iM tCJdZa4vW9KUCoDzSap9a3xNg628kNNBcnMYdQcQ1BLHfaNYPnp0K66tTkQzysMyOIjb Zms8SBDduOPHh374ebuj2E86tBrNyjmfnDGun76CPB9XXhKkQSz1xJtTZBLYTfV8Zl5o edrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m131si9303397wmb.140.2017.06.12.17.35.31; Mon, 12 Jun 2017 17:35:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A0B6468A11F; Tue, 13 Jun 2017 03:35:26 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt0-f195.google.com (mail-qt0-f195.google.com [209.85.216.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7364F689FE1 for ; Tue, 13 Jun 2017 03:35:20 +0300 (EEST) Received: by mail-qt0-f195.google.com with SMTP id o21so30557500qtb.1 for ; Mon, 12 Jun 2017 17:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=I6++5czmrhozkPtg6RbML9ZfjrPaZe5GV+DSYHpfT4Q=; b=oRppXI94KJV13AtaYIW9E2pYJi8I3xH/Msiz659cTOGOpST46ndHFNt7Vq/fapmE1M VF53xcxbnd7NUxsrSmt9093mPkxtcSmaH3dcX3Keczbbj36TB+JUC1dQCDBo4KGrPlND WIsJ6JKid+dDL/3omY3GRwkMn6Ft8eSZ0nsWNysrmYTXUHOlHM3fYhskExOU4/EI7C+B Y1glrmhRxUdSl1I+v4INRMZ7W1ctEMpxC5tItvrZXi/mJue3K1GTna2HlV0/21BHGskh GHt4BIbUJ3yHcWi1ZJFaQFGCRcyg5zx4WhqEviu9XK7dyJU1mB0wBTWZxyFetJsIyMZ/ RH/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=I6++5czmrhozkPtg6RbML9ZfjrPaZe5GV+DSYHpfT4Q=; b=YJlMrzj3tAddCTErCjl8tKGBPXjk8JHnUEQwYnPxAzA2DNDWCqh3EZ4Y4HFCS/XpHR o636f+OjmC1qgWJR4Au4UxiuB8a+2lh/sFRXkojMbut3U1GHP15MWpv5jkPL/twRLFUn 0BM8XPbWxvO5eF2i+lmKW8CvGK3pKf80kWcw+CMXpwO8L6CKezqtP+CJ+p2a/CoKEg5D vqzKKYQ7wKJSv5uottudUX5ibQNFvLBnr/qcmO0b3AFfxRb7ilIzMmOh8ubtd1Jw08UC K3mFcr+gZemYdx3QBLxDV0Wxp8ewwL3gSXi4WSgYheFQ0QHfFhW8yZCJ62JwdlgEr4EY qV8w== X-Gm-Message-State: AKS2vOxzCTMg9wwLXI2Dkhpp+L/0Q7jH7YbQ0ZDlOiCfS2oTVd3eQ1Z3 PX18aKA9VYIgamLI X-Received: by 10.55.42.21 with SMTP id q21mr9338911qkh.217.1497314121110; Mon, 12 Jun 2017 17:35:21 -0700 (PDT) Received: from localhost.localdomain ([181.231.116.134]) by smtp.gmail.com with ESMTPSA id b52sm7887090qtc.35.2017.06.12.17.35.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 12 Jun 2017 17:35:20 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Mon, 12 Jun 2017 21:35:05 -0300 Message-Id: <20170613003505.4460-1-jamrial@gmail.com> X-Mailer: git-send-email 2.13.0 Subject: [FFmpeg-devel] [PATCH] x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sse X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" About 2x faster than the c version. Signed-off-by: James Almer --- Depends on "[PATCH] x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}" libavcodec/x86/aacpsdsp.asm | 106 +++++++++++++++++++++++++++++++++++++++++ libavcodec/x86/aacpsdsp_init.c | 3 ++ 2 files changed, 109 insertions(+) diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm index cdcadefcdc..70a3d84780 100644 --- a/libavcodec/x86/aacpsdsp.asm +++ b/libavcodec/x86/aacpsdsp.asm @@ -172,6 +172,112 @@ align 16 .ret: REP_RET +;********************************************************** +;void ps_hybrid_analysis_ileave_sse(float out[2][38][64], +; float (*in)[32][2], +; int i, int len) +;********************************************************** +INIT_XMM sse +cglobal ps_hybrid_analysis_ileave, 3, 7, 5, out, in, i, len, in0, in1, tmp + movsxdifnidn iq, id + mov lend, 32 << 3 + lea inq, [inq+iq*4] + mov tmpd, id + shl tmpd, 8 + add outq, tmpq + mov tmpd, 64 + sub tmpd, id + mov id, tmpd + + test id, 1 + jne .loop4 + test id, 2 + jne .loop8 + +align 16 +.loop16: + mov in0q, inq + mov in1q, 38*64*4 + add in1q, in0q + mov tmpd, lend + +.inner_loop16: + movaps m0, [in0q] + movaps m1, [in1q] + movaps m2, [in0q+lenq] + movaps m3, [in1q+lenq] + TRANSPOSE4x4PS 0, 1, 2, 3, 4 + movaps [outq], m0 + movaps [outq+lenq], m1 + movaps [outq+lenq*2], m2 + movaps [outq+3*32*2*4], m3 + lea in0q, [in0q+lenq*2] + lea in1q, [in1q+lenq*2] + add outq, mmsize + sub tmpd, mmsize + jg .inner_loop16 + add inq, 16 + add outq, 3*32*2*4 + sub id, 4 + jg .loop16 + RET + +align 16 +.loop8: + mov in0q, inq + mov in1q, 38*64*4 + add in1q, in0q + mov tmpd, lend + +.inner_loop8: + movlps m0, [in0q] + movlps m1, [in1q] + movhps m0, [in0q+lenq] + movhps m1, [in1q+lenq] + SBUTTERFLYPS 0, 1, 2 + SBUTTERFLYPD 0, 1, 2 + movaps [outq], m0 + movaps [outq+lenq], m1 + lea in0q, [in0q+lenq*2] + lea in1q, [in1q+lenq*2] + add outq, mmsize + sub tmpd, mmsize + jg .inner_loop8 + add inq, 8 + add outq, lenq + sub id, 2 + jg .loop16 + RET + +align 16 +.loop4: + mov in0q, inq + mov in1q, 38*64*4 + add in1q, in0q + mov tmpd, lend + +.inner_loop4: + movss m0, [in0q] + movss m1, [in1q] + movss m2, [in0q+lenq] + movss m3, [in1q+lenq] + movlhps m0, m1 + movlhps m2, m3 + shufps m0, m2, q2020 + movaps [outq], m0 + lea in0q, [in0q+lenq*2] + lea in1q, [in1q+lenq*2] + add outq, mmsize + sub tmpd, mmsize + jg .inner_loop4 + add inq, 4 + sub id, 1 + test id, 2 + jne .loop8 + cmp id, 4 + jge .loop16 + RET + ;*********************************************************** ;void ps_hybrid_synthesis_deint_sse4(float out[2][38][64], ; float (*in)[32][2], diff --git a/libavcodec/x86/aacpsdsp_init.c b/libavcodec/x86/aacpsdsp_init.c index 25e089c395..056e23e59e 100644 --- a/libavcodec/x86/aacpsdsp_init.c +++ b/libavcodec/x86/aacpsdsp_init.c @@ -44,6 +44,8 @@ void ff_ps_hybrid_synthesis_deint_sse(float out[2][38][64], float (*in)[32][2], int i, int len); void ff_ps_hybrid_synthesis_deint_sse4(float out[2][38][64], float (*in)[32][2], int i, int len); +void ff_ps_hybrid_analysis_ileave_sse(float (*out)[32][2], float L[2][38][64], + int i, int len); av_cold void ff_psdsp_init_x86(PSDSPContext *s) { @@ -52,6 +54,7 @@ av_cold void ff_psdsp_init_x86(PSDSPContext *s) if (EXTERNAL_SSE(cpu_flags)) { s->add_squares = ff_ps_add_squares_sse; s->mul_pair_single = ff_ps_mul_pair_single_sse; + s->hybrid_analysis_ileave = ff_ps_hybrid_analysis_ileave_sse; s->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_sse; s->hybrid_analysis = ff_ps_hybrid_analysis_sse; }