From patchwork Tue Sep 20 19:32:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 38115 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp2159213pzh; Tue, 20 Sep 2022 12:33:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM44ucZmHlyZ8Eb7RcTKKGwFq15/cnQ4EG4Aqf/ZuFH0lH7xxhysUbz9EofVZODkNk1z/mrX X-Received: by 2002:a17:907:72d1:b0:781:bb32:7422 with SMTP id du17-20020a17090772d100b00781bb327422mr4223391ejc.729.1663702436715; Tue, 20 Sep 2022 12:33:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663702436; cv=none; d=google.com; s=arc-20160816; b=jXTgC7vEOdDWSRiiD9dQa/MHvizi1lUN/D429l5zCE7SHu+cksCg4hSwkMCVZ5lRYs SFCPzu1RSSCArwud8HKI1aeqqOoFMb/qWz3BRKIkvze/PDk0zgO0wN/8f2neYnAnnTo4 NgOR5M82N4sNDRYt3WalbBheyfGzjg1i1OfEdoRzdaOhiLLx+j0oVoVcBcoz1mx2GwV1 1IKueIlvas5KAjfGPqIQ4Y81zyXqHOvuW3N4EbJ3ZTH+Pgs5d7tBvLWxkcCO4PHerJnw PxvHS0zOpXp+8B1iyYwatE44Zqb98twti6I0hZytoeFisxGsxAZuxtfENohlkdf9X3tQ 7Gbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=IuY+BsmVSVVMM8OWnrNmaXvITTZ0wXijf0L1L6jaTT4=; b=bnUIN1yeDe1jZdmTWZqOy0YAi4KvASKSA0jYib2kxzbBhRV6KPQMVGIong6dFlRyen gkYb0crfv5K+61FhYvoVxeP0c3NyD0yOrh/zY046omuIO76fWoyGuqBmyRf4srT+4HHI K9K0avBkjANOdQclCnhCG0+rrQ2ea6aSwZqXk1BY36P4pT4wAe5hjT/2Hc12kQUBZGFQ 34NX5gAGmn668Z2aephgdWzMn5JDc2bNDVgTdtuGfcdPD4Uo/DbsImJ6yIZpyX/lftWV EblVcCBHE35JLTYEVk9GSGTmF6jO2k17hTxYDIyadHqKl89u4LTITh5cXdDXTajPxiu4 vg2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=RuDuuDxe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n25-20020aa7d059000000b004548a74553esi467688edo.445.2022.09.20.12.33.55; Tue, 20 Sep 2022 12:33:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=RuDuuDxe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E362168BAEC; Tue, 20 Sep 2022 22:33:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8CE7768B75A for ; Tue, 20 Sep 2022 22:33:44 +0300 (EEST) Received: by mail-oi1-f169.google.com with SMTP id n83so5036517oif.11 for ; Tue, 20 Sep 2022 12:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date; bh=sZMkw+d+EC1TQJVtEBEJ2iej4wzM/KzKfT+PQnuf6n0=; b=RuDuuDxem0eUArteLuPSauhaIGI3xue2d7D9cZMzpypfFNyArklDgtlhRmoUtxD+fs G9r2ir1+5fQxMVD726PpYzGidNKbbxNtwmb9f2Kg9MJ9J+/rgcKFon+N9Ig5sHY/bXOm Dy1nwlxpD4QMsZbuwFKMbZjVhbNGkmTFRV+m/5F3ZewzTW+GGzwlIK0LbbgOh5CAv+F0 0su/5qF3gxSQUNQDV6FkI0GO+bUEBE6BUIV5fePQ/OSLTtIEdLtMAoR3G2P2lAF7U6I3 z5Lky0cqtvWlsK7pj73oL9ajuJU0ijuawPRvYlGmgQXjiEycTk6KPx/VMAJ78KanrRwx +8kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date; bh=sZMkw+d+EC1TQJVtEBEJ2iej4wzM/KzKfT+PQnuf6n0=; b=DgSm/ZnJy7AHYxz1mcgBUhM8SOzegrZ1RQ13K/si2xkqklbg60nPxBfnwP7UM2zZY1 hGRCylDBRZorNTp+9UngqmZHqS+sRWYd1zgbQwoliHZVbeUjvv5oX0wK9YeNV7IZihmi BzjRW9m7jsV6+ygdvMGGjMEBD75Qy0BggPyJYy4zx7CjCGPn5oD25CC57hw956p/zu5m Z8cT5kYQvRhCi3FtyviZAPKg7iGbTIfe9pTbjnxRLzlUyBIUgBwHRLb8oxVqRG6vBl3X xDlZ2h/cz1vPkdK5+SjGmfM4YfnRLrjLkUwWvYkDZq2qZ92hntowLhpcPqPhuj00aS9O CGVA== X-Gm-Message-State: ACrzQf0eA1uqEwxMpc9fWCOsdKUmpC2qm6DOShPxaIRAXWfClvYC+QyD +bk/THkBDADPu9mq+J+w9uRs08fuhig= X-Received: by 2002:a05:6808:1391:b0:350:ab01:b539 with SMTP id c17-20020a056808139100b00350ab01b539mr2420402oiw.120.1663702422395; Tue, 20 Sep 2022 12:33:42 -0700 (PDT) Received: from localhost.localdomain ([191.97.187.183]) by smtp.gmail.com with ESMTPSA id u28-20020a4a615c000000b00448985f1f17sm278287ooe.9.2022.09.20.12.33.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Sep 2022 12:33:41 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Sep 2022 16:32:44 -0300 Message-Id: <20220920193245.3390-1-jamrial@gmail.com> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] x86/aacpsdsp: precompute constant factors X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: p+XtEQvtNzoV Inspired by the optimization done to the C version by RĂ©mi Denis-Courmont. Signed-off-by: James Almer --- libavcodec/x86/aacpsdsp.asm | 42 +++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm index 4acd087c85..543d33e68d 100644 --- a/libavcodec/x86/aacpsdsp.asm +++ b/libavcodec/x86/aacpsdsp.asm @@ -400,29 +400,32 @@ HYBRID_SYNTHESIS_DEINT ; const float (*filter)[8][2], ; ptrdiff_t stride, int n); ;******************************************************************* -%macro PS_HYBRID_ANALYSIS_LOOP 3 - movu %1, [inq+mmsize*%3] - movu m1, [inq+mmsize*(5-%3)+8] -%if cpuflag(sse3) - pshufd %2, %1, q2301 - pshufd m4, m1, q0123 - pshufd m1, m1, q1032 - pshufd m2, [filterq+nq+mmsize*%3], q2301 - addsubps %2, m4 - addsubps %1, m1 -%else - mova m2, [filterq+nq+mmsize*%3] - mova %2, %1 +%macro PS_HYBRID_ANALYSIS_IN 1 + movu m0, [inq+mmsize*%1] + movu m1, [inq+mmsize*(5-%1)+8] + mova m3, m0 mova m4, m1 - shufps %2, %2, q2301 + shufps m3, m3, q2301 shufps m4, m4, q0123 shufps m1, m1, q1032 - shufps m2, m2, q2301 +%if cpuflag(sse3) + addsubps m3, m4 + addsubps m0, m1 +%else xorps m4, m7 xorps m1, m7 - subps %2, m4 - subps %1, m1 + subps m3, m4 + subps m0, m1 %endif + mova [rsp+mmsize*%1*2], m3 + mova [rsp+mmsize+mmsize*%1*2], m0 +%endmacro + +%macro PS_HYBRID_ANALYSIS_LOOP 3 + mova m2, [filterq+nq+mmsize*%3] + shufps m2, m2, q2301 + mova %2, [rsp+mmsize*%3*2] + mova %1, [rsp+mmsize+mmsize*%3*2] mulps %2, m2 mulps %1, m2 %if %3 @@ -432,7 +435,7 @@ HYBRID_SYNTHESIS_DEINT %endmacro %macro PS_HYBRID_ANALYSIS 0 -cglobal ps_hybrid_analysis, 5, 5, 8, out, in, filter, stride, n +cglobal ps_hybrid_analysis, 5, 5, 8, 24 * 4, out, in, filter, stride, n %if cpuflag(sse3) %define MOVH movsd %else @@ -443,6 +446,9 @@ cglobal ps_hybrid_analysis, 5, 5, 8, out, in, filter, stride, n add filterq, nq neg nq mova m7, [ps_p1m1p1m1] + PS_HYBRID_ANALYSIS_IN 0 + PS_HYBRID_ANALYSIS_IN 1 + PS_HYBRID_ANALYSIS_IN 2 align 16 .loop: From patchwork Tue Sep 20 19:32:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 38116 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp2159296pzh; Tue, 20 Sep 2022 12:34:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5tamKJFfxMmsoAHk6QFNKOtX26gBfSuYhbSbeonlGVtqT9eljI90gRZbnFJAS+mAdF4mUR X-Received: by 2002:a05:6402:249f:b0:453:eb1b:1f8b with SMTP id q31-20020a056402249f00b00453eb1b1f8bmr11648038eda.235.1663702445257; Tue, 20 Sep 2022 12:34:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663702445; cv=none; d=google.com; s=arc-20160816; b=uvfEFN0jOyn1sDy161KzOx4SZ5CCfVfMdid3o8iKX09yxpB+6soUEF9JrSczq1iQmy gLPbmbV4mYyN8i3lEX+P7hLt2966zLkMyBzmH25xhUGH6bH/3Ukepteu6SZbFVwVp8av pqNB2sipaCM+qqBZNhf2eYiwpnG3m9SoJKwGtJVB2BRNqdtSMrJdZAZ+RoLcM/YATzuk RNrswxC2B/t6Pc3Yx/Co9sxlhUD0YJNN+LJVKJ8bVmEclAGxwUDloPtLFFuC3012swVB zAJof6XqXcgvmN5xVctzG2sAGJwTgqy0i7V4xorMA9vAK2bbsTseHXm28mIpzGYxtMQa qEJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=GQCHZdrtR1Yoy+BhuNSel6cKbIZSBRGwWboBOZ37qjo=; b=Mu/ml7hVs3fNACs3FvOBvh6eVDufovN2ff58Eqc+FW81rdo3i+TEF//XwmWcVgzNWO T0uQPK2rugWnloeHrAtAn6cluAwTx/XKZ9ZK9ctvrnayAsWd8H8Ge1t7TN3CXj3fc0s0 2ARFA+gXVAaYIeTRXDBYkd95wiizngFmQji5COaexDa0XPX94fYyFsIqzXD7wxWjSGcd ugl4BbEB851y0q/w4h36TLxuL+m0IrE7uunPG8QQfc2btuM5NASKIskqackO6E3wqQtm LUI9eN771BFXxbBf15Yp6EH2Ev1CuHU3tOSIkaZ5kFHDPMR4h1O0LeFydylySkPDcfqP uDBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=HGh5n5h3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b9-20020a056402278900b004527d7fffacsi597101ede.189.2022.09.20.12.34.04; Tue, 20 Sep 2022 12:34:05 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=HGh5n5h3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 17F1068BB21; Tue, 20 Sep 2022 22:33:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 15CD368B75A for ; Tue, 20 Sep 2022 22:33:45 +0300 (EEST) Received: by mail-ot1-f46.google.com with SMTP id u6-20020a056830118600b006595e8f9f3fso2473526otq.1 for ; Tue, 20 Sep 2022 12:33:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date; bh=Or2MWIoyX9Pf/frbSI7fg4zYq9TsconrLO5F/o9nhB8=; b=HGh5n5h3ufPU9+kfavOEz6EyTcmoHukHo6cgbvqhcdbOIRP/0U26RICsYSzNUNYc6e MdSkwiyzIbKcQgPEhQ98Dj5weBe0eRQHEpJ1PqfPqbeOAUbTk8Mkgq3Oju2MbicFhTt/ 8e+5JvXEPH06S018iCYBl1Zv+4LZBLrfGE3JictR+BiOs+1OTJCrKcwI+KBm/hgCLNB9 AJLlKP8qQKWV1nZfc3mN1ll8RAzRp2AbKl6B+m0OXt6/b9Ag9WP7YhEKg+gU0cyZPzAt Azto2CUwrveW393h3j6Bl3YVRII0MGrjOTMDxo60Xrb3Baizw5UVPfLl3H5GucAds88S Mw4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Or2MWIoyX9Pf/frbSI7fg4zYq9TsconrLO5F/o9nhB8=; b=rssLwpW0zodvcQx/Aq9hjeO7TIkxgdQVMinxw+lbWyX7J18m7AT2KGBHfcDArUIRNu OUR36HH6Yp5XZi4C4iqSmfpdA4CnIyYL29vZ9A7c9ERpre7HHckIcF8YyBtB1mPGeCfm mu9x8r/waqORfYuNBRPiOxA2GHYKXRIuj08elGuWq73mZWHpGQBOqr+Nfc6hujxI4E+E dBIqX/cipVJ2gfo156KRig+/qZ+8c+q3E9hN/84urXrydvp++NIrR3hEcUPANsEyN/S4 XjKdo0tLuJZOv0oH8r1tHR2ajorQ2R6t0wUi+AiINbg4XaUVU0aKSpqGBnZa9O7rbXMo PirQ== X-Gm-Message-State: ACrzQf1sp04NvCk2FCabV3FGxHLtdUJmTnTU68qctA1XID2IiKC5MNke m/XTUiYAkHJ0vvyhYYWBYcTH7quufwg= X-Received: by 2002:a9d:7d81:0:b0:655:d419:54f1 with SMTP id j1-20020a9d7d81000000b00655d41954f1mr1386635otn.177.1663702423522; Tue, 20 Sep 2022 12:33:43 -0700 (PDT) Received: from localhost.localdomain ([191.97.187.183]) by smtp.gmail.com with ESMTPSA id u28-20020a4a615c000000b00448985f1f17sm278287ooe.9.2022.09.20.12.33.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Sep 2022 12:33:43 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Sep 2022 16:32:45 -0300 Message-Id: <20220920193245.3390-2-jamrial@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220920193245.3390-1-jamrial@gmail.com> References: <20220920193245.3390-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] x86/aacpsdsp: add ps_hybrid_analysis_fma3 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 51L3fpLt9DEG This replace the sse3 version, which was not faster than the sse one. Signed-off-by: James Almer --- libavcodec/x86/aacpsdsp.asm | 38 +++++++++++++++------------------- libavcodec/x86/aacpsdsp_init.c | 6 ++++-- 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm index 543d33e68d..98f3e00f9a 100644 --- a/libavcodec/x86/aacpsdsp.asm +++ b/libavcodec/x86/aacpsdsp.asm @@ -403,10 +403,8 @@ HYBRID_SYNTHESIS_DEINT %macro PS_HYBRID_ANALYSIS_IN 1 movu m0, [inq+mmsize*%1] movu m1, [inq+mmsize*(5-%1)+8] - mova m3, m0 - mova m4, m1 - shufps m3, m3, q2301 - shufps m4, m4, q0123 + shufps m3, m0, m0, q2301 + shufps m4, m1, m1, q0123 shufps m1, m1, q1032 %if cpuflag(sse3) addsubps m3, m4 @@ -424,6 +422,15 @@ HYBRID_SYNTHESIS_DEINT %macro PS_HYBRID_ANALYSIS_LOOP 3 mova m2, [filterq+nq+mmsize*%3] shufps m2, m2, q2301 +%if cpuflag(fma3) +%if %3 + fmaddps m3, m2, [rsp+mmsize*%3*2], m3 + fmaddps m0, m2, [rsp+mmsize+mmsize*%3*2], m0 +%else + mulps m3, m2, [rsp] + mulps m0, m2, [rsp+mmsize] +%endif +%else ; cpuflag(sse) mova %2, [rsp+mmsize*%3*2] mova %1, [rsp+mmsize+mmsize*%3*2] mulps %2, m2 @@ -432,20 +439,21 @@ HYBRID_SYNTHESIS_DEINT addps m3, %2 addps m0, %1 %endif +%endif %endmacro %macro PS_HYBRID_ANALYSIS 0 -cglobal ps_hybrid_analysis, 5, 5, 8, 24 * 4, out, in, filter, stride, n +cglobal ps_hybrid_analysis, 5, 5, 5 + notcpuflag(fma3) * 3, 24 * 4, out, in, filter, stride, n %if cpuflag(sse3) %define MOVH movsd %else %define MOVH movlps + mova m7, [ps_p1m1p1m1] %endif shl strideq, 3 shl nd, 6 add filterq, nq neg nq - mova m7, [ps_p1m1p1m1] PS_HYBRID_ANALYSIS_IN 0 PS_HYBRID_ANALYSIS_IN 1 PS_HYBRID_ANALYSIS_IN 2 @@ -456,26 +464,14 @@ align 16 PS_HYBRID_ANALYSIS_LOOP m5, m6, 1 PS_HYBRID_ANALYSIS_LOOP m5, m6, 2 -%if cpuflag(sse3) - pshufd m3, m3, q2301 - xorps m0, m7 - hsubps m3, m0 - pshufd m1, m3, q0020 - pshufd m3, m3, q0031 - addps m1, m3 - movsd m2, [inq+6*8] -%else - mova m1, m3 - mova m2, m0 - shufps m1, m1, q2301 - shufps m2, m2, q2301 + shufps m1, m3, m3, q2301 + shufps m2, m0, m0, q2301 subps m1, m3 addps m2, m0 unpcklps m3, m1, m2 unpckhps m1, m2 addps m1, m3 movu m2, [inq+6*8] ; faster than movlps and no risk of overread -%endif movss m3, [filterq+nq+8*6] SPLATD m3 mulps m2, m3 @@ -489,5 +485,5 @@ align 16 INIT_XMM sse PS_HYBRID_ANALYSIS -INIT_XMM sse3 +INIT_XMM fma3 PS_HYBRID_ANALYSIS diff --git a/libavcodec/x86/aacpsdsp_init.c b/libavcodec/x86/aacpsdsp_init.c index 21f00efa24..0b0ee07db4 100644 --- a/libavcodec/x86/aacpsdsp_init.c +++ b/libavcodec/x86/aacpsdsp_init.c @@ -33,7 +33,7 @@ void ff_ps_mul_pair_single_sse (float (*dst)[2], float (*src0)[2], void ff_ps_hybrid_analysis_sse (float (*out)[2], float (*in)[2], const float (*filter)[8][2], ptrdiff_t stride, int n); -void ff_ps_hybrid_analysis_sse3(float (*out)[2], float (*in)[2], +void ff_ps_hybrid_analysis_fma3(float (*out)[2], float (*in)[2], const float (*filter)[8][2], ptrdiff_t stride, int n); void ff_ps_stereo_interpolate_sse3(float (*l)[2], float (*r)[2], @@ -64,9 +64,11 @@ av_cold void ff_psdsp_init_x86(PSDSPContext *s) s->add_squares = ff_ps_add_squares_sse3; s->stereo_interpolate[0] = ff_ps_stereo_interpolate_sse3; s->stereo_interpolate[1] = ff_ps_stereo_interpolate_ipdopd_sse3; - s->hybrid_analysis = ff_ps_hybrid_analysis_sse3; } if (EXTERNAL_SSE4(cpu_flags)) { s->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_sse4; } + if (EXTERNAL_FMA3(cpu_flags)) { + s->hybrid_analysis = ff_ps_hybrid_analysis_fma3; + } }