From patchwork Tue Sep 20 17:42:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38107 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp2101602pzh; Tue, 20 Sep 2022 10:42:26 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5HPH+bouEAXbEIIGNd3Y33WCXtNn+wP+kgPENGB1EmLtWTu4QT82dYgZy3ysEkOcaqSFIQ X-Received: by 2002:a05:6402:4305:b0:451:7b78:f2e0 with SMTP id m5-20020a056402430500b004517b78f2e0mr21896654edc.342.1663695746480; Tue, 20 Sep 2022 10:42:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663695746; cv=none; d=google.com; s=arc-20160816; b=rxzifWjkVXSg7XOcK0mfrtp4bNOtOELZjLN7dH5noC5CFIJFQ0SqEhKGU5dMc6SXnv /TjRgt+RIAD+1GJFsB5YwBi3iRl5hl+eoE+3DvNLiBoQR7pODqKvWWxnfKWXx5qBm1oO qu1c5flAvlLcd+ej0JGogRLzIhVqxFcUzsF9SqNyLfe6xfnsU4oM2A1wRsDNj3olVgzO okcIzOz0ant/AGkmfaGjj4SGE/02GzLjVTdsZV9zF+NivMYYJnGnzBWlWui5gLl/29Ge zHlsxnNl2hUzc/wpTWjSZx8fT8R5EiCWw7PLF+FdkZQeTnKkTA9VszOMaXwoypbbvH4O TUaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=lR8mzv5a3cWn5IO2pIbe5uigUgce+SsTxx+ebOvnpTQ=; b=p13rh3xKgeYYo/16OrzvW5a9bEeiYSa390zs41r3BBLqWaw6QmY7Y5XYUdlFVUZOb9 UVgbVON6PZvUiYZO/j0Hu/mVUD5qZ/CvLt+Flw+GqCY8r3Js3T3JZgN0U1p3Vuc+bihZ Z8QAVEfi7vSb/lRLwtyZnbBzlmXfOJ8osdy1HHIl2/p7Wie8BjTDb/ZCD1xGrJo018rA cQV5KKnYrXmFP4MWKkZdWVVdru/gQ5cZaADKIayyi8Q8UiB9mADQz/1HbAYn2s2/S4gC xG+/RzI+saGNDamE6xh666F2QJGxZ76PTTn5B2jMXZvACucXOMTzi2EQxogqGdL86vHq 3iiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hr23-20020a1709073f9700b00781e984151esi420569ejc.232.2022.09.20.10.42.25; Tue, 20 Sep 2022 10:42:26 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C6A7868BA7A; Tue, 20 Sep 2022 20:42:21 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 95EC968B5B0 for ; Tue, 20 Sep 2022 20:42:14 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id C473AC00AA for ; Tue, 20 Sep 2022 20:42:13 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Sep 2022 20:42:13 +0300 Message-Id: <20220920174213.35055-1-remi@remlab.net> X-Mailer: git-send-email 2.37.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/aacpsdsp: precompute constant factors X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gswH5+u7QCsR From: RĂ©mi Denis-Courmont The input complex factors are constant for each iterations. This substitudes 4 loads, 2 additions and 2 subtractions per iteration of the inner-loop with another 4 loads. Thus effectively 4 arithmetic operations per iteration of the inner loop are avoided, i.e. 24 operations per iteration of the outer loop, or 24 * (n - 1) operations in total. If the inner loop is not unrolled by the compiler, this also might also save some pointer arithmetic as most instruction sets do not have addressing modes with negated register offsets (12 - j). Unless the compiler is optimising for code size, this is unlikely though. --- libavcodec/aacpsdsp_template.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/libavcodec/aacpsdsp_template.c b/libavcodec/aacpsdsp_template.c index 31ff718420..c063788b89 100644 --- a/libavcodec/aacpsdsp_template.c +++ b/libavcodec/aacpsdsp_template.c @@ -47,21 +47,24 @@ static void ps_hybrid_analysis_c(INTFLOAT (*out)[2], INTFLOAT (*in)[2], const INTFLOAT (*filter)[8][2], ptrdiff_t stride, int n) { - int i, j; + INT64FLOAT inre0[6], inre1[6], inim0[6], inim1[6]; - for (i = 0; i < n; i++) { + for (int j = 0; j < 6; j++) { + inre0[j] = in[j][0] + in[12 - j][0]; + inre1[j] = in[j][1] - in[12 - j][1]; + inim0[j] = in[j][1] + in[12 - j][1]; + inim1[j] = in[j][0] - in[12 - j][0]; + } + + for (int i = 0; i < n; i++) { INT64FLOAT sum_re = (INT64FLOAT)filter[i][6][0] * in[6][0]; INT64FLOAT sum_im = (INT64FLOAT)filter[i][6][0] * in[6][1]; - for (j = 0; j < 6; j++) { - INT64FLOAT in0_re = in[j][0]; - INT64FLOAT in0_im = in[j][1]; - INT64FLOAT in1_re = in[12-j][0]; - INT64FLOAT in1_im = in[12-j][1]; - sum_re += (INT64FLOAT)filter[i][j][0] * (in0_re + in1_re) - - (INT64FLOAT)filter[i][j][1] * (in0_im - in1_im); - sum_im += (INT64FLOAT)filter[i][j][0] * (in0_im + in1_im) + - (INT64FLOAT)filter[i][j][1] * (in0_re - in1_re); + for (int j = 0; j < 6; j++) { + sum_re += (INT64FLOAT)filter[i][j][0] * inre0[j] - + (INT64FLOAT)filter[i][j][1] * inre1[j]; + sum_im += (INT64FLOAT)filter[i][j][0] * inim0[j] + + (INT64FLOAT)filter[i][j][1] * inim1[j]; } #if USE_FIXED out[i * stride][0] = (int)((sum_re + 0x40000000) >> 31);