From patchwork Mon Oct 3 14:10:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38534 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345555pzh; Mon, 3 Oct 2022 07:11:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Vx2F8de0j0NjiSThsr6QvGZWDJm25GcgpL9FnvlqQt1VhFGgtCqWWzWAx/oh+854FRnzT X-Received: by 2002:a17:907:7204:b0:783:e152:f1f1 with SMTP id dr4-20020a170907720400b00783e152f1f1mr15744533ejc.119.1664806271637; Mon, 03 Oct 2022 07:11:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806271; cv=none; d=google.com; s=arc-20160816; b=XdIQRH956TfK+OpaFR4QPaSkFzJpFflgTwgfGNWkZvm/4YConZVy63eiLDtgmEY63g GklCyEwQEXxpc3+b1P+FR19ifwW/9AVpjmOnm7Upsw1Dt4P/nBZDsUa+9C9lEiwFupUo S9TeNq6mdZgBg3dRNhC8SfRLBs61oSayIeqNsXeyUwokxfwBUhkaAfgWb7hUoayzrY8c Lj8RtZCqBFNMCFP5WtDg4b5Isx3GJYG+RcfmCDJrSNbBUkSD5hXdrrjys0T3stNuIy5j JZ8WzkYvwzqZKntIkEnfKNcSd4CO/2MJe5iFukZEVzYfIbiYnGyxRR7qFdzEon58d8xz O7Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=XgEmvv4401WntkJhaQG9v5lEn7yu0D86UYMUbCxW7Zk=; b=Uhj+r41zhlUBJnmwKKx22mKLkg6qDd3Dz6ZqWl+ExghOzRxSmtNMd3qew9cjqzEC1e IhLPpRL2mjYohKyhSzGEfnp2luF1A01saifsuzk7xrrwzaYOoRR8SmEPtY0YhEdCsvPu nVb0G+Jz6A4yWOm3kKL5sjFaDB3qGbaDewyL88VI2uYrkums2EU8H5fDU6HMZ9PNQQEk 6p0ysRdtJHbcLakY8FIBYckom9HhWYRd//gLOY5+bnwpFt1j81DS9BjvAMs8y0epcLm6 yTofm32gZ2HD+N+nKov+MXoNJjOtPnFnGdMU8acBTL75+gLZ6iMHOmExXMkVR0v7KJVs /b9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=qIk06ZTh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w18-20020a50d792000000b00458524a939dsi7516626edi.307.2022.10.03.07.11.10; Mon, 03 Oct 2022 07:11:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=qIk06ZTh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4D60868BB87; Mon, 3 Oct 2022 17:10:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 526D568BB72 for ; Mon, 3 Oct 2022 17:10:40 +0300 (EEST) Received: by mail-lj1-f173.google.com with SMTP id b6so11904114ljr.10 for ; Mon, 03 Oct 2022 07:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=FvPcvxQF663QyMYSgzfeaW39sEeAJ+HvtPbuBQ7pw10=; b=qIk06ZThMZmZXVMlu9lJu8o8w/IUI7fzUnOkZ43Byq/lYJdWoFwhXIhHtw81QTSLra 1v9AS+iqAa1uDiwfTzC996RipYsZZWrVB3htrd9aNI5FpfVY4v8XKH481LwPs2mYB+PZ T03xJ269za8INs6pr7wW3wc2VFvPWECS2Nrm/eEwe6Uz/KBsz4bpOs3V51VFS/XlqL8t TahkxgEG26VpHkoO1b2sMEBrxY0LaQ64WjcZQQOxSuj06JAlJ+2gwtIERCpfD7RDRo6N AKHd/JC6vs5m+oIWc5EqFV+vKWMEwn9ZhkDVXRhP5X77RzAWHTqP8Ey31VcJHgDqgWE9 xD4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=FvPcvxQF663QyMYSgzfeaW39sEeAJ+HvtPbuBQ7pw10=; b=6YWwEUWh7YCLlej4vdpsv9x4to2fw+3qsyOFZFHz0JDGSFGjM9wvauwr6K78xWViBX HtRyXZ//HpN7Gn8ZiFbOEBatCC62UQ8G3looG4C1MImryaHl6L1rK5FC9DghFSxoJDcw YhukFZTEOwK3IR6HuiY3ZSkJNGuOC/vWonwUkmHW8vsmG4YGWg7EuNAN4pDdRsDXSpsc qNzNY8TWs9J9JNk09qUv6uuXXMTtFpSWTUxtUmXFTNRdr9yWdUw008Qrqy16UmV//tpq 0cu1kSLk7sCljSj9PzcEpjHhRwg+5MMIYL8iSESCT0ABUY45CBTGRSc0WL8rPC1tauf8 Xwgw== X-Gm-Message-State: ACrzQf2CpFhX7xhpAeB7SIJtmgAlYtjcCHRuBDkKvLSqjEvQvNBAF8mb /yoLsgk3NEbSoOih9eqy3AMHdAIuAGCUcQ== X-Received: by 2002:a2e:b6d3:0:b0:26d:df89:6601 with SMTP id m19-20020a2eb6d3000000b0026ddf896601mr1520055ljo.433.1664806239135; Mon, 03 Oct 2022 07:10:39 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:38 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:15 +0200 Message-Id: <20221003141020.3564715-3-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/7] aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5agKuqhfkji1 From: Martin Storsjö Before: Cortex A53 A72 A73 pix_abs_1_2_neon: 73.7 31.0 25.7 After: pix_abs_1_2_neon: 61.7 30.2 24.7 Signed-off-by: Martin Storsjö --- libavcodec/aarch64/me_cmp_neon.S | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 43e068bb7f..3662419edf 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -193,21 +193,20 @@ function ff_pix_abs8_y2_neon, export=1 1: ld1 {v2.8b}, [x2], x3 ld1 {v0.8b}, [x1], x3 - ld1 {v6.8b}, [x1], x3 urhadd v30.8b, v1.8b, v2.8b ld1 {v5.8b}, [x2], x3 - ld1 {v21.8b}, [x1], x3 + ld1 {v6.8b}, [x1], x3 uabal v26.8h, v0.8b, v30.8b urhadd v29.8b, v2.8b, v5.8b ld1 {v20.8b}, [x2], x3 - ld1 {v24.8b}, [x1], x3 + ld1 {v21.8b}, [x1], x3 uabal v26.8h, v6.8b, v29.8b urhadd v28.8b, v5.8b, v20.8b - uabal v26.8h, v21.8b, v28.8b - ld1 {v23.8b}, [x2], x3 - mov v1.8b, v23.8b + ld1 {v1.8b}, [x2], x3 + ld1 {v24.8b}, [x1], x3 + urhadd v27.8b, v20.8b, v1.8b sub w4, w4, #4 - urhadd v27.8b, v20.8b, v23.8b + uabal v26.8h, v21.8b, v28.8b cmp w4, #4 uabal v26.8h, v24.8b, v27.8b