From patchwork Wed Jan 20 20:30:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25043 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 1C7A144B395 for ; Wed, 20 Jan 2021 22:58:18 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E42A868836F; Wed, 20 Jan 2021 22:58:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BCE4168817E for ; Wed, 20 Jan 2021 22:58:11 +0200 (EET) Received: by mail-ed1-f54.google.com with SMTP id g24so27341061edw.9 for ; Wed, 20 Jan 2021 12:58:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=YRwr9BtxpRVtWN5jwEzMkYOMZR76u/966tBESthtB/I=; b=GhEkhMpDbT8j9f8C9+Ar55+wMkUEBNHpiEOzQMukaDKZix13sik21t5xKuRhLNEHN6 uW9TjcT2h1++d57Im/Nd8Ru6yt3QhhHrPTnyeDah4N0WlltLcdGxH3XrIl8yODxFXL9V 0goo6keNYucwl2QpLHHNfxCF4yzevMr1WhHfxUZX9eyMeuMD280N0bQPYzOpuel9p8U4 q3g8/akVRb3MaCKAwzmS+BHeDXCJO0xgTjRrP4EzW5e62yWKNnH1UwPkxGyuTKBEcKyI Vp3pgyYMeyjKn0NXf77QuKqCkxRYXK1zr1vlddJF7CaZ/IGRqp8mD6+51MoUasKOyqOK kS3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=YRwr9BtxpRVtWN5jwEzMkYOMZR76u/966tBESthtB/I=; b=biAw4Qr8voWBpulVOVltiwHAVMA0N39qK/RqMGnxRVZVKMx6zwkggjisujILEIlyI7 /HaoNL2g0k6RCc5kyeUEr37USjSp37Y6dj9eLt70vFvTkTBv8SBQnZL4I8oqe4O4AyMJ ASThTHaDj4SNQrymHPF1GLNbRUEBai7eF/z/aKC/5/sJ2Tv29Cr2/cRDD8AEYPBCGdKF dbaw+t8exJ3Fh629hHL4h2FyRXtMiHJF4yYRFt5FzUPITBb50MGBaC1GranT3zX5d72P Rxd2eS+O5mFh2BsmW2JU8qlU7nZT0vSGVZ3k/Og59r4YMeC4K7rjEH0jK9We6shLYcyi H4XA== X-Gm-Message-State: AOAM5314SOvFzl24cN7olhOUVTajO5w1riCOZkk6zKkyYOl7brgHrQPt km3qTeAL+MIRpPsSektHJBiYgLLvuV1ofg== X-Google-Smtp-Source: ABdhPJxIaOilS9Vn7/2n1Jhb3WgIepU+EgzlF4Up1dYWn40x/WA5WV8518ZkQy1/jPjXD1gDrgZt+A== X-Received: by 2002:a50:aad7:: with SMTP id r23mr8321236edc.83.1611174647332; Wed, 20 Jan 2021 12:30:47 -0800 (PST) Received: from localhost.localdomain ([31.45.254.141]) by smtp.gmail.com with ESMTPSA id bl13sm1357955ejb.64.2021.01.20.12.30.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Jan 2021 12:30:46 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Wed, 20 Jan 2021 21:30:38 +0100 Message-Id: <20210120203038.18163-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH] avutil/x86/float_dsp: add fma3 for scalarproduct X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavutil/x86/float_dsp.asm | 112 +++++++++++++++++++++++++++++++++ libavutil/x86/float_dsp_init.c | 2 + 2 files changed, 114 insertions(+) diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm index 517fd63638..f7497df34e 100644 --- a/libavutil/x86/float_dsp.asm +++ b/libavutil/x86/float_dsp.asm @@ -463,6 +463,118 @@ cglobal scalarproduct_float, 3,3,2, v1, v2, offset %endif RET +INIT_YMM fma3 +cglobal scalarproduct_float, 3,5,8, v1, v2, size, len, offset + xor offsetq, offsetq + xorps m0, m0 + shl sized, 2 + mov lenq, sizeq + cmp lenq, 32 + jl .l16 + cmp lenq, 64 + jl .l32 + cmp lenq, 128 + jl .l64 + and lenq, ~127 + xorps m1, m1 + xorps m2, m2 + xorps m3, m3 +.loop128: + movups m4, [v1q+offsetq] + movups m5, [v1q+offsetq + 32] + movups m6, [v1q+offsetq + 64] + movups m7, [v1q+offsetq + 96] + fmaddps m0, m4, [v2q+offsetq ], m0 + fmaddps m1, m5, [v2q+offsetq + 32], m1 + fmaddps m2, m6, [v2q+offsetq + 64], m2 + fmaddps m3, m7, [v2q+offsetq + 96], m3 + add offsetq, 128 + cmp offsetq, lenq + jl .loop128 + addps m0, m1 + addps m2, m3 + addps m0, m2 + mov lenq, sizeq + and lenq, 127 + cmp lenq, 64 + jge .l64 + cmp lenq, 32 + jge .l32 + cmp lenq, 16 + jge .l16 + vextractf128 xmm2, m0, 1 + addps xmm0, xmm2 + movhlps xmm1, xmm0 + addps xmm0, xmm1 + movss xmm1, xmm0 + shufps xmm0, xmm0, 1 + addss xmm0, xmm1 + RET +.l64: + and lenq, ~63 + add lenq, offsetq + xorps m1, m1 +.loop64: + movups m4, [v1q+offsetq] + movups m5, [v1q+offsetq + 32] + fmaddps m0, m4, [v2q+offsetq], m0 + fmaddps m1, m5, [v2q+offsetq + 32], m1 + add offsetq, 64 + cmp offsetq, lenq + jl .loop64 + addps m0, m1 + mov lenq, sizeq + and lenq, 63 + cmp lenq, 32 + jge .l32 + cmp lenq, 16 + jge .l16 + vextractf128 xmm2, m0, 1 + addps xmm0, xmm2 + movhlps xmm1, xmm0 + addps xmm0, xmm1 + movss xmm1, xmm0 + shufps xmm0, xmm0, 1 + addss xmm0, xmm1 + RET +.l32: + and lenq, ~31 + add lenq, offsetq +.loop32: + movups m4, [v1q+offsetq] + fmaddps m0, m4, [v2q+offsetq], m0 + add offsetq, 32 + cmp offsetq, lenq + jl .loop32 + vextractf128 xmm2, m0, 1 + addps xmm0, xmm2 + mov lenq, sizeq + and lenq, 31 + cmp lenq, 16 + jge .l16 + movhlps xmm1, xmm0 + addps xmm0, xmm1 + movss xmm1, xmm0 + shufps xmm0, xmm0, 1 + addss xmm0, xmm1 + RET +.l16: + and lenq, ~15 + add lenq, offsetq +.loop16: + movaps xmm1, [v1q+offsetq] + mulps xmm1, [v2q+offsetq] + addps xmm0, xmm1 + add offsetq, 16 + cmp offsetq, lenq + jl .loop16 + movhlps xmm1, xmm0 + addps xmm0, xmm1 + movss xmm1, xmm0 + shufps xmm0, xmm0, 1 + addss xmm0, xmm1 + RET + ;----------------------------------------------------------------------------- ; void ff_butterflies_float(float *src0, float *src1, int len); ;----------------------------------------------------------------------------- diff --git a/libavutil/x86/float_dsp_init.c b/libavutil/x86/float_dsp_init.c index 8826e4e2c9..67bfbe18d0 100644 --- a/libavutil/x86/float_dsp_init.c +++ b/libavutil/x86/float_dsp_init.c @@ -76,6 +76,7 @@ void ff_vector_fmul_reverse_avx2(float *dst, const float *src0, const float *src1, int len); float ff_scalarproduct_float_sse(const float *v1, const float *v2, int order); +float ff_scalarproduct_float_fma3(const float *v1, const float *v2, int order); void ff_butterflies_float_sse(float *av_restrict src0, float *av_restrict src1, int len); @@ -117,5 +118,6 @@ av_cold void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp) fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_fma3; fdsp->vector_fmul_add = ff_vector_fmul_add_fma3; fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_fma3; + fdsp->scalarproduct_float = ff_scalarproduct_float_fma3; } }