From patchwork Thu Sep 13 13:08:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 10318 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:12c4:0:0:0:0:0 with SMTP id 65-v6csp546172jap; Thu, 13 Sep 2018 06:08:58 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZ8FjpJtS23GoleyWunK7/x5gnjKt7jeFv7cqFeL8RUje7uc+Vj5gGIgK8BKMOXILU6m22s X-Received: by 2002:a5d:6aca:: with SMTP id u10-v6mr5660959wrw.44.1536844138211; Thu, 13 Sep 2018 06:08:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536844138; cv=none; d=google.com; s=arc-20160816; b=nFQiCaEvMbjdC9TDVu3PuQ296J09fTH0iZmQTThC8D+C3nkEhPF9nhx5/h5Ze5fA3A DQ2BOPSrp+WLEkK3s1Em6M0QK1pPuyCOCWW9nA/20cPR12vJtawayf9lV/dbs+rkvwYA SaanqJRtx/CmqabVo5N5hxGlSF7srI6nViuuFDHqp+OsISrI4agRHxerdrieSLIh8tWG IgAhuUVxlv4DW+pBtmK0nc3ZFBmW2TaBKgpZQzAJZ2nlYT6kV+H/3DTSVvza6WrKVLpp P6x9ivmSAx3fsTCNhNd4eWthheycGvP+4c7WHubPuhN0/nIS8SKjtN5f/HpX5EtHh8AF 01TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=JL4SKnEe09uD0DW70REQj5mK4UoiEFW+vN/jwFXnVTE=; b=fhR81jjMl4AWoMaRwt96jnboXgmChTex7upYKuNxaqk2ncTptPMzuDpF5N20/jaNl1 iJiflNpm6R/BItEJ5bX7PAD6frsF7GXXlYhihgvIRY1EXanP0UKaAOEz6H2rUMjj9yEJ Y1f5+r3s9IpS/ntTMN37SopOum9dAe/UVUm6fFd4NVjSLEgr9c65atu6tpnlXsv0wA/O 0g5+02LweeJruLgLoYUQ1bhlSekGT5mFboP06ptKO6kgscgpexSwm8HIH+vdg5IJ1e7Q jH0X/7HMlDou0Vb4Kt9REjd+i4ExU9yQ9/h1XJUOk38N/zwUON05bsz9UgAE5dxySR+V kcRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=PMokXNWk; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j1-v6si3621378wrs.82.2018.09.13.06.08.57; Thu, 13 Sep 2018 06:08:58 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=PMokXNWk; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 141A668A5A5; Thu, 13 Sep 2018 16:08:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4D20C68A5B1 for ; Thu, 13 Sep 2018 16:08:29 +0300 (EEST) Received: by mail-qk1-f196.google.com with SMTP id 130-v6so3116646qkd.10 for ; Thu, 13 Sep 2018 06:08:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=dSOre8jjzcKF0lZi3HFEHiFCZfTXufFwC52AETcvE/w=; b=PMokXNWkuuJr3Wl3sNBNsaF4zXJzbDIsfYvR8u952ZCAJfoNBzMbcF1kMQ3IlK0rf9 ZOgjWcsrvGsgk9TWiiFELb3N5gXtPccsAhiGUVAgI4bLuUE1xkLZ8wkz1hSgdAL1fjlu yUB1O4w9xWZQfJUtFJ1pujKGJJIRVqnCyZGR7WBcOkjmDTr8fFX2x9kou9JSOOk9QbFV AAgMr0gcOYWA7EuUlKAzG9tNPMCBjGgLRDE8eOwlEMszJF4laENTX/pQoo6zrCa8Gtmc AHzUwlGuIFWLnJBE3olc4JCt+nsFubBl0HaJ2jzYWFfTEMdrrR4V0neoJoMo6y7n1nqP 5jeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dSOre8jjzcKF0lZi3HFEHiFCZfTXufFwC52AETcvE/w=; b=ADIyxgo4QQdZU7/GoFKtQpk8OmIFP+TkZZVoXvwGopoUw1lqWYPwX1AeCFcTMS0dps oYJgbYJVFTy4SitNmHhHgar+nB2B0nWlIhRGxRN+nhKIpLaJRbzhPcur70gTDmtlyV70 pczRn/K9sq6XmUoyVs0lewh+BKAgcJIddjHzlVQ2Cyhsjndjk+BWrTLcXD35s7Q4tg/I Xhf71T1FX1eI2tLSnIno8AUg//z4uXljZ7iDifgptyJLoWEWOFvgjq/McyfB0MmUAbkf mGtZRKaDGXqbjkxDVNpU8vgA/LorGEEQWAlDIjZtvARVNRHTQ8BSXYARkHZj22Y/IBHt CfVQ== X-Gm-Message-State: APzg51BxvTKFBODTbX8SaygnTTr7dgmFA29OZIMJYAA2w5xEZxjA/yKh XN+LIx7pv2zzaQZTqiO03BzRhTCB X-Received: by 2002:a37:788:: with SMTP id 130-v6mr4977860qkh.175.1536844118668; Thu, 13 Sep 2018 06:08:38 -0700 (PDT) Received: from localhost.localdomain ([181.231.186.237]) by smtp.gmail.com with ESMTPSA id p23-v6sm2624861qtf.6.2018.09.13.06.08.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Sep 2018 06:08:38 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 13 Sep 2018 10:08:25 -0300 Message-Id: <20180913130825.11236-2-jamrial@gmail.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180913130825.11236-1-jamrial@gmail.com> References: <20180913130825.11236-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx} X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" ~3x to 5x faster. Signed-off-by: James Almer --- libavutil/x86/float_dsp.asm | 33 +++++++++++++++++++++++++++++++++ libavutil/x86/float_dsp_init.c | 7 +++++++ 2 files changed, 40 insertions(+) diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm index 06d2d2cfd1..d77d8e9e9c 100644 --- a/libavutil/x86/float_dsp.asm +++ b/libavutil/x86/float_dsp.asm @@ -58,6 +58,39 @@ INIT_YMM avx VECTOR_FMUL %endif +;----------------------------------------------------------------------------- +; void vector_dmul(double *dst, const double *src0, const double *src1, int len) +;----------------------------------------------------------------------------- +%macro VECTOR_DMUL 0 +cglobal vector_dmul, 4,4,4, dst, src0, src1, len + lea lenq, [lend*8 - mmsize*4] +ALIGN 16 +.loop: + movaps m0, [src0q + lenq + 0*mmsize] + movaps m1, [src0q + lenq + 1*mmsize] + movaps m2, [src0q + lenq + 2*mmsize] + movaps m3, [src0q + lenq + 3*mmsize] + mulpd m0, m0, [src1q + lenq + 0*mmsize] + mulpd m1, m1, [src1q + lenq + 1*mmsize] + mulpd m2, m2, [src1q + lenq + 2*mmsize] + mulpd m3, m3, [src1q + lenq + 3*mmsize] + movaps [dstq + lenq + 0*mmsize], m0 + movaps [dstq + lenq + 1*mmsize], m1 + movaps [dstq + lenq + 2*mmsize], m2 + movaps [dstq + lenq + 3*mmsize], m3 + + sub lenq, mmsize*4 + jge .loop + RET +%endmacro + +INIT_XMM sse2 +VECTOR_DMUL +%if HAVE_AVX_EXTERNAL +INIT_YMM avx +VECTOR_DMUL +%endif + ;------------------------------------------------------------------------------ ; void ff_vector_fmac_scalar(float *dst, const float *src, float mul, int len) ;------------------------------------------------------------------------------ diff --git a/libavutil/x86/float_dsp_init.c b/libavutil/x86/float_dsp_init.c index 122087a196..8826e4e2c9 100644 --- a/libavutil/x86/float_dsp_init.c +++ b/libavutil/x86/float_dsp_init.c @@ -29,6 +29,11 @@ void ff_vector_fmul_sse(float *dst, const float *src0, const float *src1, void ff_vector_fmul_avx(float *dst, const float *src0, const float *src1, int len); +void ff_vector_dmul_sse2(double *dst, const double *src0, const double *src1, + int len); +void ff_vector_dmul_avx(double *dst, const double *src0, const double *src1, + int len); + void ff_vector_fmac_scalar_sse(float *dst, const float *src, float mul, int len); void ff_vector_fmac_scalar_avx(float *dst, const float *src, float mul, @@ -92,11 +97,13 @@ av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) fdsp->butterflies_float = ff_butterflies_float_sse; } if (EXTERNAL_SSE2(cpu_flags)) { + fdsp->vector_dmul = ff_vector_dmul_sse2; fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_sse2; fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_sse2; } if (EXTERNAL_AVX_FAST(cpu_flags)) { fdsp->vector_fmul = ff_vector_fmul_avx; + fdsp->vector_dmul = ff_vector_dmul_avx; fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_avx; fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_avx; fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_avx;