From patchwork Tue Sep 20 19:32:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 38115 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp2159213pzh; Tue, 20 Sep 2022 12:33:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM44ucZmHlyZ8Eb7RcTKKGwFq15/cnQ4EG4Aqf/ZuFH0lH7xxhysUbz9EofVZODkNk1z/mrX X-Received: by 2002:a17:907:72d1:b0:781:bb32:7422 with SMTP id du17-20020a17090772d100b00781bb327422mr4223391ejc.729.1663702436715; Tue, 20 Sep 2022 12:33:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663702436; cv=none; d=google.com; s=arc-20160816; b=jXTgC7vEOdDWSRiiD9dQa/MHvizi1lUN/D429l5zCE7SHu+cksCg4hSwkMCVZ5lRYs SFCPzu1RSSCArwud8HKI1aeqqOoFMb/qWz3BRKIkvze/PDk0zgO0wN/8f2neYnAnnTo4 NgOR5M82N4sNDRYt3WalbBheyfGzjg1i1OfEdoRzdaOhiLLx+j0oVoVcBcoz1mx2GwV1 1IKueIlvas5KAjfGPqIQ4Y81zyXqHOvuW3N4EbJ3ZTH+Pgs5d7tBvLWxkcCO4PHerJnw PxvHS0zOpXp+8B1iyYwatE44Zqb98twti6I0hZytoeFisxGsxAZuxtfENohlkdf9X3tQ 7Gbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=IuY+BsmVSVVMM8OWnrNmaXvITTZ0wXijf0L1L6jaTT4=; b=bnUIN1yeDe1jZdmTWZqOy0YAi4KvASKSA0jYib2kxzbBhRV6KPQMVGIong6dFlRyen gkYb0crfv5K+61FhYvoVxeP0c3NyD0yOrh/zY046omuIO76fWoyGuqBmyRf4srT+4HHI K9K0avBkjANOdQclCnhCG0+rrQ2ea6aSwZqXk1BY36P4pT4wAe5hjT/2Hc12kQUBZGFQ 34NX5gAGmn668Z2aephgdWzMn5JDc2bNDVgTdtuGfcdPD4Uo/DbsImJ6yIZpyX/lftWV EblVcCBHE35JLTYEVk9GSGTmF6jO2k17hTxYDIyadHqKl89u4LTITh5cXdDXTajPxiu4 vg2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=RuDuuDxe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n25-20020aa7d059000000b004548a74553esi467688edo.445.2022.09.20.12.33.55; Tue, 20 Sep 2022 12:33:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=RuDuuDxe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E362168BAEC; Tue, 20 Sep 2022 22:33:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8CE7768B75A for ; Tue, 20 Sep 2022 22:33:44 +0300 (EEST) Received: by mail-oi1-f169.google.com with SMTP id n83so5036517oif.11 for ; Tue, 20 Sep 2022 12:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date; bh=sZMkw+d+EC1TQJVtEBEJ2iej4wzM/KzKfT+PQnuf6n0=; b=RuDuuDxem0eUArteLuPSauhaIGI3xue2d7D9cZMzpypfFNyArklDgtlhRmoUtxD+fs G9r2ir1+5fQxMVD726PpYzGidNKbbxNtwmb9f2Kg9MJ9J+/rgcKFon+N9Ig5sHY/bXOm Dy1nwlxpD4QMsZbuwFKMbZjVhbNGkmTFRV+m/5F3ZewzTW+GGzwlIK0LbbgOh5CAv+F0 0su/5qF3gxSQUNQDV6FkI0GO+bUEBE6BUIV5fePQ/OSLTtIEdLtMAoR3G2P2lAF7U6I3 z5Lky0cqtvWlsK7pj73oL9ajuJU0ijuawPRvYlGmgQXjiEycTk6KPx/VMAJ78KanrRwx +8kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date; bh=sZMkw+d+EC1TQJVtEBEJ2iej4wzM/KzKfT+PQnuf6n0=; b=DgSm/ZnJy7AHYxz1mcgBUhM8SOzegrZ1RQ13K/si2xkqklbg60nPxBfnwP7UM2zZY1 hGRCylDBRZorNTp+9UngqmZHqS+sRWYd1zgbQwoliHZVbeUjvv5oX0wK9YeNV7IZihmi BzjRW9m7jsV6+ygdvMGGjMEBD75Qy0BggPyJYy4zx7CjCGPn5oD25CC57hw956p/zu5m Z8cT5kYQvRhCi3FtyviZAPKg7iGbTIfe9pTbjnxRLzlUyBIUgBwHRLb8oxVqRG6vBl3X xDlZ2h/cz1vPkdK5+SjGmfM4YfnRLrjLkUwWvYkDZq2qZ92hntowLhpcPqPhuj00aS9O CGVA== X-Gm-Message-State: ACrzQf0eA1uqEwxMpc9fWCOsdKUmpC2qm6DOShPxaIRAXWfClvYC+QyD +bk/THkBDADPu9mq+J+w9uRs08fuhig= X-Received: by 2002:a05:6808:1391:b0:350:ab01:b539 with SMTP id c17-20020a056808139100b00350ab01b539mr2420402oiw.120.1663702422395; Tue, 20 Sep 2022 12:33:42 -0700 (PDT) Received: from localhost.localdomain ([191.97.187.183]) by smtp.gmail.com with ESMTPSA id u28-20020a4a615c000000b00448985f1f17sm278287ooe.9.2022.09.20.12.33.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Sep 2022 12:33:41 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Sep 2022 16:32:44 -0300 Message-Id: <20220920193245.3390-1-jamrial@gmail.com> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] x86/aacpsdsp: precompute constant factors X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: p+XtEQvtNzoV Inspired by the optimization done to the C version by RĂ©mi Denis-Courmont. Signed-off-by: James Almer --- libavcodec/x86/aacpsdsp.asm | 42 +++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm index 4acd087c85..543d33e68d 100644 --- a/libavcodec/x86/aacpsdsp.asm +++ b/libavcodec/x86/aacpsdsp.asm @@ -400,29 +400,32 @@ HYBRID_SYNTHESIS_DEINT ; const float (*filter)[8][2], ; ptrdiff_t stride, int n); ;******************************************************************* -%macro PS_HYBRID_ANALYSIS_LOOP 3 - movu %1, [inq+mmsize*%3] - movu m1, [inq+mmsize*(5-%3)+8] -%if cpuflag(sse3) - pshufd %2, %1, q2301 - pshufd m4, m1, q0123 - pshufd m1, m1, q1032 - pshufd m2, [filterq+nq+mmsize*%3], q2301 - addsubps %2, m4 - addsubps %1, m1 -%else - mova m2, [filterq+nq+mmsize*%3] - mova %2, %1 +%macro PS_HYBRID_ANALYSIS_IN 1 + movu m0, [inq+mmsize*%1] + movu m1, [inq+mmsize*(5-%1)+8] + mova m3, m0 mova m4, m1 - shufps %2, %2, q2301 + shufps m3, m3, q2301 shufps m4, m4, q0123 shufps m1, m1, q1032 - shufps m2, m2, q2301 +%if cpuflag(sse3) + addsubps m3, m4 + addsubps m0, m1 +%else xorps m4, m7 xorps m1, m7 - subps %2, m4 - subps %1, m1 + subps m3, m4 + subps m0, m1 %endif + mova [rsp+mmsize*%1*2], m3 + mova [rsp+mmsize+mmsize*%1*2], m0 +%endmacro + +%macro PS_HYBRID_ANALYSIS_LOOP 3 + mova m2, [filterq+nq+mmsize*%3] + shufps m2, m2, q2301 + mova %2, [rsp+mmsize*%3*2] + mova %1, [rsp+mmsize+mmsize*%3*2] mulps %2, m2 mulps %1, m2 %if %3 @@ -432,7 +435,7 @@ HYBRID_SYNTHESIS_DEINT %endmacro %macro PS_HYBRID_ANALYSIS 0 -cglobal ps_hybrid_analysis, 5, 5, 8, out, in, filter, stride, n +cglobal ps_hybrid_analysis, 5, 5, 8, 24 * 4, out, in, filter, stride, n %if cpuflag(sse3) %define MOVH movsd %else @@ -443,6 +446,9 @@ cglobal ps_hybrid_analysis, 5, 5, 8, out, in, filter, stride, n add filterq, nq neg nq mova m7, [ps_p1m1p1m1] + PS_HYBRID_ANALYSIS_IN 0 + PS_HYBRID_ANALYSIS_IN 1 + PS_HYBRID_ANALYSIS_IN 2 align 16 .loop: