From patchwork Sun Dec 30 17:48:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 11589 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 8453A44CF47 for ; Sun, 30 Dec 2018 19:55:02 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3484A689E59; Sun, 30 Dec 2018 19:54:59 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 02E4F689E13 for ; Sun, 30 Dec 2018 19:54:52 +0200 (EET) Received: by mail-wr1-f66.google.com with SMTP id q18so24989153wrx.9 for ; Sun, 30 Dec 2018 09:55:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=yPGqAZKAQ9SrkdF/GDzhG3Z2g1k+Q0vkNXin+S2cT5g=; b=mxeCmzgujDBqf/MiF2k42/MbT0itpcHLNfoAHK7JhCkpR5Y3+QfoK9Zv8PQ1vy7N4z VZn/mFvApTBzesB/3NAKi88th9Fl+3TXPgh6+f6JkhyuWIGRmxDX+msTb7W4kbPs/pJM Ra40QL/TRlo+0drcXVQ39E2GZ4vXnuLKaL5cdee+dMXK0Zxo7+EHsTm+TRrt4l1UsMws 4yABc9pChh49fyp/lF1Dkp1mPViVdT5jG1yG/Mcit9jCX4QajvpwE+Ezh9ab+b11IqtV D0b8SgKG/Km+JmLfPnjOGhF6Kd4sV+5NosSJSMxJZdIRQkhiGoDDTghCdbyIY4n/XCHf Ei2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=yPGqAZKAQ9SrkdF/GDzhG3Z2g1k+Q0vkNXin+S2cT5g=; b=jOFqFDksIcz4lwNi+I6uob5HKaMQj+kAn08K2veXXOBn5BSKPZqGcIxXDlUze31z7S kQlIlXRfBX2yY6AQycG4ZQNDFmDXUJuCbqfCs2o3Xxb2RxcAAQNSAQyOzo/6dhitdSp4 X9ewcPpZU60O98+AqK0JIW/W7B6q4jWd8mSDjezuWImMfQIXxld3hsbtXQun0s9l+eZg drRY1btLJWow4gwjy2f1reRRSwv89x9iRfSXyd/WXcrr6glGyBRyPFraUm+X931jN9aZ mRKusNFfbHDyWgDHf2xaSdaf9PlxvMIa/r59mj7bk/8x2vNuJiztusCWrGlcQbzCClEV TDyg== X-Gm-Message-State: AJcUukcPQ9J4ucUbPwPf3/Il5l5wVTH9m7Ype76mmcFEUeuXNOpd5iBe 7hKsqqX+EIaqYRiHJ+S7nPQP1Snx X-Google-Smtp-Source: ALg8bN4obygmvgVq+fPs5CsFLqIBgACvmcm8Pp0QEOPGp7op5QkEfct6kLFdOURGv2CEfqn9+1lS8Q== X-Received: by 2002:adf:8323:: with SMTP id 32mr29839637wrd.176.1546192106065; Sun, 30 Dec 2018 09:48:26 -0800 (PST) Received: from localhost.localdomain ([94.250.174.60]) by smtp.gmail.com with ESMTPSA id 143sm43962494wml.14.2018.12.30.09.48.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Dec 2018 09:48:25 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sun, 30 Dec 2018 18:48:17 +0100 Message-Id: <20181230174817.19964-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH] avfilter/x86/af_afir: add avx version of fcmul_add X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/x86/af_afir.asm | 15 ++++++++++++--- libavfilter/x86/af_afir_init.c | 6 ++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/libavfilter/x86/af_afir.asm b/libavfilter/x86/af_afir.asm index 849d85e70f..e770420a21 100644 --- a/libavfilter/x86/af_afir.asm +++ b/libavfilter/x86/af_afir.asm @@ -27,7 +27,7 @@ SECTION .text ; void ff_fcmul_add(float *sum, const float *t, const float *c, int len) ;------------------------------------------------------------------------------ -INIT_XMM sse3 +%macro VECTOR_FCMUL_ADD 0 cglobal fcmul_add, 4,4,6, sum, t, c, len shl lend, 3 add lend, mmsize*2 @@ -43,8 +43,8 @@ ALIGN 16 movaps m4, [cq + lenq+mmsize] mulps m0, m1 mulps m3, m4 - shufps m1, m1, 0xb1 - shufps m4, m4, 0xb1 + shufps m1, m1, m1, 0xb1 + shufps m4, m4, m4, 0xb1 movshdup m2, [tq + lenq] movshdup m5, [tq + lenq+mmsize] mulps m2, m1 @@ -58,3 +58,12 @@ ALIGN 16 add lenq, mmsize*2 jl .loop REP_RET +%endmacro + +INIT_XMM sse3 +VECTOR_FCMUL_ADD + +%if HAVE_AVX_EXTERNAL +INIT_YMM avx +VECTOR_FCMUL_ADD +%endif diff --git a/libavfilter/x86/af_afir_init.c b/libavfilter/x86/af_afir_init.c index 6a652b9b83..214aaf9719 100644 --- a/libavfilter/x86/af_afir_init.c +++ b/libavfilter/x86/af_afir_init.c @@ -25,6 +25,9 @@ void ff_fcmul_add_sse3(float *sum, const float *t, const float *c, ptrdiff_t len); +void ff_fcmul_add_avx(float *sum, const float *t, const float *c, + ptrdiff_t len); + av_cold void ff_afir_init_x86(AudioFIRContext *s) { int cpu_flags = av_get_cpu_flags(); @@ -32,4 +35,7 @@ av_cold void ff_afir_init_x86(AudioFIRContext *s) if (EXTERNAL_SSE3(cpu_flags)) { s->fcmul_add = ff_fcmul_add_sse3; } + if (EXTERNAL_AVX_FAST(cpu_flags)) { + s->fcmul_add = ff_fcmul_add_avx; + } }