From patchwork Thu Jul 25 20:25:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50737 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a830:0:b0:482:c625:d099 with SMTP id z16csp79844vqo; Thu, 25 Jul 2024 13:25:58 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWZifpTWj+N7a3/QEmxBzhpoaSACjCqV05yaqj1m/iXlKWLq4DrXLuZ6VpmZ4/a9rf/4LX0BRgWw7QktXapBaZRduG3QPIuDrMBug== X-Google-Smtp-Source: AGHT+IH4ARxT/EFgMmOidTEuxaE7bMZqxuz6Eswj0GSq6O7+cx3VYBCnPIg7JgoQ23EmvHhtOJBE X-Received: by 2002:a05:6402:3511:b0:58c:36e:51bf with SMTP id 4fb4d7f45d1cf-5ac123a7ac3mr4453903a12.3.1721939158120; Thu, 25 Jul 2024 13:25:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721939158; cv=none; d=google.com; s=arc-20160816; b=mu69oMRZHd4PH3nktRw4TFsof6ZXa8rq5HZxFzTiITFXCfzIZSH9apDNaq3mvQeQA6 dWpPkbkcMVk0Xhed7kQLh+enxLhnrXB+rRZm3OCUfYP945CZ1JkgjonXwKpH2IR9SpX0 RauVFH7B4KN2Xz6JGkuhku/LiQaqetOSR825b4bLu6CD7OBIEpWBcsUuHwFAn9OP2/g5 e/ZusAnDzX4TG4VMnXB67PAlxd+CQX8dKnsihiYJRh31JT/ZMlHm+EsfEhHFPfyOT1yF KxHeeyEzbnlGWs/nJQpKWgdrerrhXKdFT1LMZBqa4NvwrGmidGgKLr2XnqZIdYgMQ6rO jCzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=JClJAbgGuqiOGAt2/LMkZytnm38bx4vVEnUy/Ywcki8=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=iRVClG1taP/otTzcqpHmw/vW5oCnqOvEhl+auC2QIHe1EF4w1sTPgm+zdUEEUeBIyR r6Z5LhgjkFcDfIhQyfxBurchNZEKVfzEVUZUwTCbwqkQEXt27Oj38eThzBF0YDRuRYod H5SBv5aaHqOcmU20O97wj1wU0QzkFE84qTeMcNqsslgpEHyJrgKdX8nsOFilNwo13cLp CQyvLJGmd7X41IxqfStz7jCbgxf8oqgnx6kuO27heubLhT7JyR76ezYUwqjO6jxvWliq Iuhc6K0iZew13aLZJfMEGHfz5BLnnUypr1emqs4rjUrZqBf3WkLD0XEjvgGlyAYOlg6+ MP9Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5ac64eb87b4si1528617a12.325.2024.07.25.13.25.33; Thu, 25 Jul 2024 13:25:58 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8DE9268D793; Thu, 25 Jul 2024 23:25:29 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7C51068D42D for ; Thu, 25 Jul 2024 23:25:23 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 1A3C5C013D for ; Thu, 25 Jul 2024 23:25:23 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 25 Jul 2024 23:25:16 +0300 Message-ID: <20240725202522.276182-2-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240725202522.276182-1-remi@remlab.net> References: <20240725202522.276182-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/6] lavc/audiodsp: properly unroll vector_clipf X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6br165L/Ifg0 Given that source and destination can alias, the compiler was forced to perform each read-modify-write sequentially. We cannot use the `restrict` qualifier to avoid this here because the AC-3 encoder uses the function in-place. Instead this commit provides an explicit guarantee to the compiler that batches of 8 elements will not overlap, so that it can interleave calculations. In practice contemporary optimising compilers are able to unroll and keep the temporary array in FPU registers (without spilling). On SiFive-U74, this speeds the same signs branch by 4x, and the opposite signs branch 1.5x. --- libavcodec/audiodsp.c | 40 +++++++++++++++++----------------------- 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/libavcodec/audiodsp.c b/libavcodec/audiodsp.c index c5427d3535..9e83f06aaa 100644 --- a/libavcodec/audiodsp.c +++ b/libavcodec/audiodsp.c @@ -38,41 +38,35 @@ static inline float clipf_c_one(float a, uint32_t mini, static void vector_clipf_c_opposite_sign(float *dst, const float *src, float min, float max, int len) { - int i; uint32_t mini = av_float2int(min); uint32_t maxi = av_float2int(max); uint32_t maxisign = maxi ^ (1U << 31); - for (i = 0; i < len; i += 8) { - dst[i + 0] = clipf_c_one(src[i + 0], mini, maxi, maxisign); - dst[i + 1] = clipf_c_one(src[i + 1], mini, maxi, maxisign); - dst[i + 2] = clipf_c_one(src[i + 2], mini, maxi, maxisign); - dst[i + 3] = clipf_c_one(src[i + 3], mini, maxi, maxisign); - dst[i + 4] = clipf_c_one(src[i + 4], mini, maxi, maxisign); - dst[i + 5] = clipf_c_one(src[i + 5], mini, maxi, maxisign); - dst[i + 6] = clipf_c_one(src[i + 6], mini, maxi, maxisign); - dst[i + 7] = clipf_c_one(src[i + 7], mini, maxi, maxisign); + for (int i = 0; i < len; i += 8) { + float tmp[8]; + + for (int j = 0; j < 8; j++) + tmp[j]= clipf_c_one(src[i + j], mini, maxi, maxisign); + for (int j = 0; j < 8; j++) + dst[i + j] = tmp[j]; } } static void vector_clipf_c(float *dst, const float *src, int len, float min, float max) { - int i; - if (min < 0 && max > 0) { vector_clipf_c_opposite_sign(dst, src, min, max, len); - } else { - for (i = 0; i < len; i += 8) { - dst[i] = av_clipf(src[i], min, max); - dst[i + 1] = av_clipf(src[i + 1], min, max); - dst[i + 2] = av_clipf(src[i + 2], min, max); - dst[i + 3] = av_clipf(src[i + 3], min, max); - dst[i + 4] = av_clipf(src[i + 4], min, max); - dst[i + 5] = av_clipf(src[i + 5], min, max); - dst[i + 6] = av_clipf(src[i + 6], min, max); - dst[i + 7] = av_clipf(src[i + 7], min, max); - } + return; + } + + for (int i = 0; i < len; i += 8) { + float tmp[8]; + + for (int j = 0; j < 8; j++) + tmp[j]= av_clipf(src[i + j], min, max); + for (int j = 0; j < 8; j++) + dst[i + j] = tmp[j]; } }