From patchwork Mon Oct 3 14:10:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38533 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345477pzh; Mon, 3 Oct 2022 07:11:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7e0G58G7hen9yxZTosF4F5BWW05/WpaN9l+ojPOdVq7U/swPoNnBrC2xQee5npL6WL/M6W X-Received: by 2002:a05:6402:5210:b0:451:d4ff:ab02 with SMTP id s16-20020a056402521000b00451d4ffab02mr18811966edd.345.1664806262094; Mon, 03 Oct 2022 07:11:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806262; cv=none; d=google.com; s=arc-20160816; b=nRg8JGSRi0E8C8sG+ONuzBccDnjVeaspghtCuxYgIRldjSv75ZjJIWFlsD5gqos8Gn FE0YHrQard+uMmqd91ZihczBEExD4UZ/Vwyrbh1OIA1RnwKj11vS/bWxncYtveUBHiXT vFyy1qXCVIiyccdM4/GgT5TmdaP+MkVFjHQYAtipqK3Z1QK4JZYPG2FH8S5lK/vj21gl CWp2tMtS5emMyphD8d7syQZEHP0KPCMYj+lwqK6OBqAnJSPhOXcCABNoS1Qwr46R6VXM Gm7slwuEW1Fuj+7C9y6yWQNM9xcKJE/d42q4/ZeStvQKpybm6GUe27Cwz3yVTRpiwDz2 3lJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=Ct9DytHmVbmGB7njqs8Ivui2e4UZHORYfo9SVpW9E7k=; b=V+lqgNOfCXHxoQQVc1zxXcfMx00AMB5JA7QqIHdjMOfw4G5+69YiEgvDXQe7KGTu9d yixGmSZGnw9ED2h0xqFQjJU+4YfaYh9oHNFvtxId6TxYQRtV/f71EjJX+ulbPTRjyF/T CWEo+PzrJuSYixOzHpCCDwDcJZF8iCEPlEX95MQZhBMJTHndEYs4ogPD5HQnw5NJpIKu /71DG+WUZKriwA0NyHfYfpEH0tVXBgmSlxwqI/BZaabSFlvvIHF0+TSWAWn6uW8hs0Sp azwSN0DHBNjq2IsdMETlwlq1zDOq5m6pLuHjyVuzaepAnAGpI7v2VwanFUq1tmLA4mXs p2NQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=I4sDTdFK; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id gb12-20020a170907960c00b00787803711f9si9400810ejc.353.2022.10.03.07.11.01; Mon, 03 Oct 2022 07:11:02 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=I4sDTdFK; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4C87A68BB7E; Mon, 3 Oct 2022 17:10:46 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 401B768B20B for ; Mon, 3 Oct 2022 17:10:39 +0300 (EEST) Received: by mail-lf1-f49.google.com with SMTP id b2so5834675lfp.6 for ; Mon, 03 Oct 2022 07:10:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=1x7DoyeBTdKEFNTnG50pItebTNZO06oSyE5ev1jM684=; b=I4sDTdFKcEWOklzC3y2q3f+2ubdCgrVh/jXcvoGQ7MVN02zi1I3IZq2qOCKZa2iJhV JNHNWFDMVLBXfdF7tArZRsxEv9gHKuE7mfrh6j5NFGLm3LMCvvlM0c08MaTMXW5hXK6w towAkXEiJeIf6XkIIEyYkSHGf/T+dIutOZrTjOVTPBAXkmiyWIC+TaomT+PpWcccLH8Z EIH1pXtnUaG6wyJrmL5WbWdkfQsXfZ1AQ8BCqV0Tcjkea8u9B+vN9L6e1v7wWk9DsW9p ZlMIpbidsv0lu9LkjST98fUirB/jPWbWQ1E3d5qgXOd+xOcjjpF8i1DNO5/rdaHiDHOc yvlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=1x7DoyeBTdKEFNTnG50pItebTNZO06oSyE5ev1jM684=; b=R0QnjBHJE1HEdNteva0zMD6eiaQ9GzeuGmFlUliebmExcMdwV2mH7WcT8t+5cBvdub N3whgha78/E56hvUh93J8LiNDr19C3jNHPXULKWHdsykowkUk9oxGehrC7pbNoM0HPVd 216CWrgd0jobprJWyKXuU9zNi3CCxNPaGrybkekc/MEczFLGKigonQagbyVEItEuZIql 1ifVTBXkci4x1kVjiDD15yANLK0AOl5E86WOSJH89OLY2q1jX1o3HNWV56W5mbakwzKm w4RzXHyck2nVgt92uySWEfRg/v1osQuuPIW71Z1JQhQ1erf6d2/lAnR/EosAGtkNN8cC x03w== X-Gm-Message-State: ACrzQf0lqyVORHpTijWGQ+yGdql3KRfk2NSaLQD3u8EAgseYSB42yZPT y0yU09bhCGFuhN16QzgvM55xC0uMjK7mcw== X-Received: by 2002:a05:6512:22c3:b0:4a2:1698:58db with SMTP id g3-20020a05651222c300b004a2169858dbmr5341243lfu.554.1664806238190; Mon, 03 Oct 2022 07:10:38 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:37 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:14 +0200 Message-Id: <20221003141020.3564715-2-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 1/7] lavc/aarch64: Add neon implementation for pix_abs8 functions. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 0iCf1Vs9k2Sj Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below: pix_abs_1_1_c: 162.5 pix_abs_1_1_neon: 27.0 pix_abs_1_2_c: 174.0 pix_abs_1_2_neon: 23.5 pix_abs_1_3_c: 203.2 pix_abs_1_3_neon: 34.7 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 9 ++ libavcodec/aarch64/me_cmp_neon.S | 193 +++++++++++++++++++++++ 2 files changed, 202 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index e143f0816e..695ed35fc1 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -59,6 +59,12 @@ int pix_median_abs16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t ptrdiff_t stride, int h); int pix_median_abs8_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int ff_pix_abs8_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); +int ff_pix_abs8_y2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); +int ff_pix_abs8_xy2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { @@ -70,6 +76,9 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->pix_abs[0][2] = ff_pix_abs16_y2_neon; c->pix_abs[0][3] = ff_pix_abs16_xy2_neon; c->pix_abs[1][0] = ff_pix_abs8_neon; + c->pix_abs[1][1] = ff_pix_abs8_x2_neon; + c->pix_abs[1][2] = ff_pix_abs8_y2_neon; + c->pix_abs[1][3] = ff_pix_abs8_xy2_neon; c->sad[0] = ff_pix_abs16_neon; c->sad[1] = ff_pix_abs8_neon; diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index c710358ab7..43e068bb7f 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -119,6 +119,199 @@ function ff_pix_abs8_neon, export=1 ret endfunc +function ff_pix_abs8_x2_neon, export=1 + // x0 unused + // x1 uint8_t *pix1 + // x2 uint8_t *pix2 + // x3 ptrdiff_t stride + // w4 int h + + cmp w4, #4 + movi v26.8h, #0 + add x5, x2, #1 // pix2 + 1 + b.lt 2f + +// make 4 iterations at once +1: + ld1 {v1.8b}, [x2], x3 + ld1 {v2.8b}, [x5], x3 + ld1 {v0.8b}, [x1], x3 + ld1 {v4.8b}, [x2], x3 + urhadd v30.8b, v1.8b, v2.8b + ld1 {v5.8b}, [x5], x3 + uabal v26.8h, v0.8b, v30.8b + ld1 {v6.8b}, [x1], x3 + urhadd v29.8b, v4.8b, v5.8b + ld1 {v7.8b}, [x2], x3 + ld1 {v20.8b}, [x5], x3 + uabal v26.8h, v6.8b, v29.8b + ld1 {v21.8b}, [x1], x3 + urhadd v28.8b, v7.8b, v20.8b + ld1 {v22.8b}, [x2], x3 + ld1 {v23.8b}, [x5], x3 + uabal v26.8h, v21.8b, v28.8b + sub w4, w4, #4 + ld1 {v24.8b}, [x1], x3 + urhadd v27.8b, v22.8b, v23.8b + cmp w4, #4 + uabal v26.8h, v24.8b, v27.8b + + b.ge 1b + cbz w4, 3f + +// iterate by one +2: + ld1 {v1.8b}, [x2], x3 + ld1 {v2.8b}, [x5], x3 + ld1 {v0.8b}, [x1], x3 + urhadd v30.8b, v1.8b, v2.8b + subs w4, w4, #1 + uabal v26.8h, v0.8b, v30.8b + + b.ne 2b +3: + uaddlv s20, v26.8h + fmov w0, s20 + + ret + +endfunc + +function ff_pix_abs8_y2_neon, export=1 + // x0 unused + // x1 uint8_t *pix1 + // x2 uint8_t *pix2 + // x3 ptrdiff_t stride + // w4 int h + + cmp w4, #4 + movi v26.8h, #0 + ld1 {v1.8b}, [x2], x3 + b.lt 2f + +// make 4 iterations at once +1: + ld1 {v2.8b}, [x2], x3 + ld1 {v0.8b}, [x1], x3 + ld1 {v6.8b}, [x1], x3 + urhadd v30.8b, v1.8b, v2.8b + ld1 {v5.8b}, [x2], x3 + ld1 {v21.8b}, [x1], x3 + uabal v26.8h, v0.8b, v30.8b + urhadd v29.8b, v2.8b, v5.8b + ld1 {v20.8b}, [x2], x3 + ld1 {v24.8b}, [x1], x3 + uabal v26.8h, v6.8b, v29.8b + urhadd v28.8b, v5.8b, v20.8b + uabal v26.8h, v21.8b, v28.8b + ld1 {v23.8b}, [x2], x3 + mov v1.8b, v23.8b + sub w4, w4, #4 + urhadd v27.8b, v20.8b, v23.8b + cmp w4, #4 + uabal v26.8h, v24.8b, v27.8b + + b.ge 1b + cbz w4, 3f + +// iterate by one +2: + ld1 {v0.8b}, [x1], x3 + ld1 {v2.8b}, [x2], x3 + urhadd v30.8b, v1.8b, v2.8b + subs w4, w4, #1 + uabal v26.8h, v0.8b, v30.8b + mov v1.8b, v2.8b + + b.ne 2b +3: + uaddlv s20, v26.8h + fmov w0, s20 + + ret + +endfunc + +function ff_pix_abs8_xy2_neon, export=1 + // x0 unused + // x1 uint8_t *pix1 + // x2 uint8_t *pix2 + // x3 ptrdiff_t stride + // w4 int h + + movi v31.8h, #0 + add x0, x2, 1 // pix2 + 1 + + add x5, x2, x3 // pix2 + stride = pix3 + cmp w4, #4 + add x6, x5, 1 // pix3 + stride + 1 + + b.lt 2f + + ld1 {v0.8b}, [x2], x3 + ld1 {v1.8b}, [x0], x3 + uaddl v2.8h, v0.8b, v1.8b + +// make 4 iterations at once +1: + ld1 {v4.8b}, [x5], x3 + ld1 {v5.8b}, [x6], x3 + ld1 {v7.8b}, [x5], x3 + uaddl v0.8h, v4.8b, v5.8b + ld1 {v16.8b}, [x6], x3 + add v4.8h, v0.8h, v2.8h + ld1 {v5.8b}, [x1], x3 + rshrn v4.8b, v4.8h, #2 + uaddl v7.8h, v7.8b, v16.8b + uabal v31.8h, v5.8b, v4.8b + add v2.8h, v0.8h, v7.8h + ld1 {v17.8b}, [x1], x3 + rshrn v2.8b, v2.8h, #2 + ld1 {v20.8b}, [x5], x3 + uabal v31.8h, v17.8b, v2.8b + ld1 {v21.8b}, [x6], x3 + ld1 {v25.8b}, [x5], x3 + uaddl v20.8h, v20.8b, v21.8b + ld1 {v26.8b}, [x6], x3 + add v7.8h, v7.8h, v20.8h + uaddl v25.8h, v25.8b, v26.8b + rshrn v7.8b, v7.8h, #2 + ld1 {v22.8b}, [x1], x3 + mov v2.16b, v25.16b + uabal v31.8h, v22.8b, v7.8b + add v20.8h, v20.8h, v25.8h + ld1 {v27.8b}, [x1], x3 + sub w4, w4, #4 + rshrn v20.8b, v20.8h, #2 + cmp w4, #4 + uabal v31.8h, v27.8b, v20.8b + + b.ge 1b + + cbz w4, 3f + +// iterate by one +2: + ld1 {v0.8b}, [x5], x3 + ld1 {v1.8b}, [x6], x3 + ld1 {v4.8b}, [x1], x3 + uaddl v21.8h, v0.8b, v1.8b + subs w4, w4, #1 + add v3.8h, v2.8h, v21.8h + mov v2.16b, v21.16b + rshrn v3.8b, v3.8h, #2 + uabal v31.8h, v4.8b, v3.8b + b.ne 2b + +3: + uaddlv s18, v31.8h + fmov w0, s18 + + ret + +endfunc + + function ff_pix_abs16_xy2_neon, export=1 // x0 unused // x1 uint8_t *pix1 From patchwork Mon Oct 3 14:10:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38534 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345555pzh; Mon, 3 Oct 2022 07:11:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Vx2F8de0j0NjiSThsr6QvGZWDJm25GcgpL9FnvlqQt1VhFGgtCqWWzWAx/oh+854FRnzT X-Received: by 2002:a17:907:7204:b0:783:e152:f1f1 with SMTP id dr4-20020a170907720400b00783e152f1f1mr15744533ejc.119.1664806271637; Mon, 03 Oct 2022 07:11:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806271; cv=none; d=google.com; s=arc-20160816; b=XdIQRH956TfK+OpaFR4QPaSkFzJpFflgTwgfGNWkZvm/4YConZVy63eiLDtgmEY63g GklCyEwQEXxpc3+b1P+FR19ifwW/9AVpjmOnm7Upsw1Dt4P/nBZDsUa+9C9lEiwFupUo S9TeNq6mdZgBg3dRNhC8SfRLBs61oSayIeqNsXeyUwokxfwBUhkaAfgWb7hUoayzrY8c Lj8RtZCqBFNMCFP5WtDg4b5Isx3GJYG+RcfmCDJrSNbBUkSD5hXdrrjys0T3stNuIy5j JZ8WzkYvwzqZKntIkEnfKNcSd4CO/2MJe5iFukZEVzYfIbiYnGyxRR7qFdzEon58d8xz O7Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=XgEmvv4401WntkJhaQG9v5lEn7yu0D86UYMUbCxW7Zk=; b=Uhj+r41zhlUBJnmwKKx22mKLkg6qDd3Dz6ZqWl+ExghOzRxSmtNMd3qew9cjqzEC1e IhLPpRL2mjYohKyhSzGEfnp2luF1A01saifsuzk7xrrwzaYOoRR8SmEPtY0YhEdCsvPu nVb0G+Jz6A4yWOm3kKL5sjFaDB3qGbaDewyL88VI2uYrkums2EU8H5fDU6HMZ9PNQQEk 6p0ysRdtJHbcLakY8FIBYckom9HhWYRd//gLOY5+bnwpFt1j81DS9BjvAMs8y0epcLm6 yTofm32gZ2HD+N+nKov+MXoNJjOtPnFnGdMU8acBTL75+gLZ6iMHOmExXMkVR0v7KJVs /b9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=qIk06ZTh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w18-20020a50d792000000b00458524a939dsi7516626edi.307.2022.10.03.07.11.10; Mon, 03 Oct 2022 07:11:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=qIk06ZTh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4D60868BB87; Mon, 3 Oct 2022 17:10:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 526D568BB72 for ; Mon, 3 Oct 2022 17:10:40 +0300 (EEST) Received: by mail-lj1-f173.google.com with SMTP id b6so11904114ljr.10 for ; Mon, 03 Oct 2022 07:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=FvPcvxQF663QyMYSgzfeaW39sEeAJ+HvtPbuBQ7pw10=; b=qIk06ZThMZmZXVMlu9lJu8o8w/IUI7fzUnOkZ43Byq/lYJdWoFwhXIhHtw81QTSLra 1v9AS+iqAa1uDiwfTzC996RipYsZZWrVB3htrd9aNI5FpfVY4v8XKH481LwPs2mYB+PZ T03xJ269za8INs6pr7wW3wc2VFvPWECS2Nrm/eEwe6Uz/KBsz4bpOs3V51VFS/XlqL8t TahkxgEG26VpHkoO1b2sMEBrxY0LaQ64WjcZQQOxSuj06JAlJ+2gwtIERCpfD7RDRo6N AKHd/JC6vs5m+oIWc5EqFV+vKWMEwn9ZhkDVXRhP5X77RzAWHTqP8Ey31VcJHgDqgWE9 xD4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=FvPcvxQF663QyMYSgzfeaW39sEeAJ+HvtPbuBQ7pw10=; b=6YWwEUWh7YCLlej4vdpsv9x4to2fw+3qsyOFZFHz0JDGSFGjM9wvauwr6K78xWViBX HtRyXZ//HpN7Gn8ZiFbOEBatCC62UQ8G3looG4C1MImryaHl6L1rK5FC9DghFSxoJDcw YhukFZTEOwK3IR6HuiY3ZSkJNGuOC/vWonwUkmHW8vsmG4YGWg7EuNAN4pDdRsDXSpsc qNzNY8TWs9J9JNk09qUv6uuXXMTtFpSWTUxtUmXFTNRdr9yWdUw008Qrqy16UmV//tpq 0cu1kSLk7sCljSj9PzcEpjHhRwg+5MMIYL8iSESCT0ABUY45CBTGRSc0WL8rPC1tauf8 Xwgw== X-Gm-Message-State: ACrzQf2CpFhX7xhpAeB7SIJtmgAlYtjcCHRuBDkKvLSqjEvQvNBAF8mb /yoLsgk3NEbSoOih9eqy3AMHdAIuAGCUcQ== X-Received: by 2002:a2e:b6d3:0:b0:26d:df89:6601 with SMTP id m19-20020a2eb6d3000000b0026ddf896601mr1520055ljo.433.1664806239135; Mon, 03 Oct 2022 07:10:39 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:38 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:15 +0200 Message-Id: <20221003141020.3564715-3-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/7] aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5agKuqhfkji1 From: Martin Storsjö Before: Cortex A53 A72 A73 pix_abs_1_2_neon: 73.7 31.0 25.7 After: pix_abs_1_2_neon: 61.7 30.2 24.7 Signed-off-by: Martin Storsjö --- libavcodec/aarch64/me_cmp_neon.S | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 43e068bb7f..3662419edf 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -193,21 +193,20 @@ function ff_pix_abs8_y2_neon, export=1 1: ld1 {v2.8b}, [x2], x3 ld1 {v0.8b}, [x1], x3 - ld1 {v6.8b}, [x1], x3 urhadd v30.8b, v1.8b, v2.8b ld1 {v5.8b}, [x2], x3 - ld1 {v21.8b}, [x1], x3 + ld1 {v6.8b}, [x1], x3 uabal v26.8h, v0.8b, v30.8b urhadd v29.8b, v2.8b, v5.8b ld1 {v20.8b}, [x2], x3 - ld1 {v24.8b}, [x1], x3 + ld1 {v21.8b}, [x1], x3 uabal v26.8h, v6.8b, v29.8b urhadd v28.8b, v5.8b, v20.8b - uabal v26.8h, v21.8b, v28.8b - ld1 {v23.8b}, [x2], x3 - mov v1.8b, v23.8b + ld1 {v1.8b}, [x2], x3 + ld1 {v24.8b}, [x1], x3 + urhadd v27.8b, v20.8b, v1.8b sub w4, w4, #4 - urhadd v27.8b, v20.8b, v23.8b + uabal v26.8h, v21.8b, v28.8b cmp w4, #4 uabal v26.8h, v24.8b, v27.8b From patchwork Mon Oct 3 14:10:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38535 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345671pzh; Mon, 3 Oct 2022 07:11:22 -0700 (PDT) X-Google-Smtp-Source: AMsMyM72CZ4iu98MMIy/VrjqZNGtvNBP5Z0T++SiapAkiqekzpsWisuI3jPohUv09ZximSeYq7ou X-Received: by 2002:a17:906:ef90:b0:77f:8f0d:e925 with SMTP id ze16-20020a170906ef9000b0077f8f0de925mr15125037ejb.622.1664806282022; Mon, 03 Oct 2022 07:11:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806282; cv=none; d=google.com; s=arc-20160816; b=RzEBvHvDYQmGTE9kTZDfRiT0Qs4hy+qhIZ9S9W10FoQoWoDK7VhZGAgKkM9TyPDN/y WzN8khMrbR3X4wrbRArqFNymOTFE92R3v033V53B7jiAvFT/Mls3K+t3N6Q+7XdB2Ylq Mfoe7k0ZhJiDtCkGzQ73B2VnJeDgFgNZGB2x+fdOm7H2idIrmCmJ+71wXyRnHzUg3hr4 bkZXFSeiAXeghzFhB8+YMXIpAj1Yx5NX99luVBXeqoGULIy4VE/uj5FT5Y9pP2ZETd4v nwxxZ4DmP2n8RDE8O+a4I5bpKkZcDa1O0w9RULUX5nmE+ezIPZ1SnKh+OAyEuem/DA24 HnAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=WPvGf6QIWeSVdnIPc8j4VVhXgDZEVqFhErifw458VFw=; b=Rw9u5TTFqAmVuscW6VRS/5uq+cZg8luTZW65uCewYxk63VZ6uZxSLorImkjg5W/JL+ SnEnzEHiBpjiECAlZErn4Pv9ztyOftzkT0PO/uJSE11h9wNz1W5PP3EBJH7sNRLHVZJi OzZZ/nKsaCys0A2KhgUCS3f16cOqAPAy4JuP2P4PpnKC/kqoOQJkh7xMphQPBO3QGAyX 1sFNiY8rPak7DAvCRZuixWkwH03ewXZukZOhO6L+xIvbUqUeF9PfUTNsydIk1fl0Qdt8 vp5t0gXfj5ImKvSQE6P2Ug7BWYHPcwSRdfcfzdVs+wwMIn0mQBJyF5ZMKIHELLEm9FCn nWdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b="il4jqz2/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ka10-20020a170907990a00b00770880dff50si7943088ejc.586.2022.10.03.07.11.20; Mon, 03 Oct 2022 07:11:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b="il4jqz2/"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5D76B68BBA8; Mon, 3 Oct 2022 17:10:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F1E6468BB7E for ; Mon, 3 Oct 2022 17:10:40 +0300 (EEST) Received: by mail-lf1-f52.google.com with SMTP id bu25so16771739lfb.3 for ; Mon, 03 Oct 2022 07:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=3oaapjlB6f0bywOoXnNK0i9IlZfR5xmvpXqxaW3CPwQ=; b=il4jqz2/JxuNHXSJiuv1BBA+F1SldLoejMDLI6Rjnrd3qE1PRFbXIJgXx8AyT6Ehkb D06vhi5GPxT5bqfcriaHjmqBiKNimtcn3u01gOluyQyOrRJ7nmJPqpqKOLef9RT6jICJ 74ucedvOCBqdSzMlJhGSB7Gnx0XZ7tKmCeGa7GwOte/o52CHEkKt+DBEr1m75yG8KCwN Ya2zt5PDMVaXCl1hNrYpEFxpr74teQyEqfbL8LbMeWJLlWWuDq/x9GDL5a1kjnzIevxA 6H1ohaOoVufmYFH3P7d+I56oFPULGk61tzWNliLgAZ90yrH007N81bDQohONLNKttfEY 45Aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=3oaapjlB6f0bywOoXnNK0i9IlZfR5xmvpXqxaW3CPwQ=; b=WOJMY5YmTpKnQ3U5VIGcYdaqtBGjJT2RsBchDgvyQPwQHiXVeaz1Ke+VPl51DXMfC2 EmkNUVTnuJFzBBO7egIdH6JKeEphSDnjOfcMgfN7pZfYvRf5OqZBW735Nz1Y0FK6HxkE DaAmZTPGgEoTfgA5VABs2MX39fQ83Id1Eq62U028Bb4JFETMT+UZx8T4VYLXzNiOK3kS DCfVruJJekzN5hmn+g41lFQwR/f0Qf2l34rhAYcWQd5UXB6oBk+6LTjLzHVe6ZsKC+HX gX6q7WRdayYrN7slhWid+rWi1BtsvPw6qcp92QoDLAEkC0hQk4HskBsWsPkrGKSy1AIZ HOWg== X-Gm-Message-State: ACrzQf2fWmdFV4xMBGGceEafSa+FH0P3t9gNH2hg7mBC+bE1BU83km8z emVc17BSsE5ho6p2SobvjwVBOy7AI8/9PQ== X-Received: by 2002:a19:6446:0:b0:49a:9b06:f4be with SMTP id b6-20020a196446000000b0049a9b06f4bemr7912614lfj.157.1664806240027; Mon, 03 Oct 2022 07:10:40 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:39 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:16 +0200 Message-Id: <20221003141020.3564715-4-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/7] aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: nQuMhTCnh/kY From: Martin Storsjö This initializes things properly if this were to be called with h < 4. --- libavcodec/aarch64/me_cmp_neon.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 3662419edf..cfba3eb33a 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -245,12 +245,12 @@ function ff_pix_abs8_xy2_neon, export=1 cmp w4, #4 add x6, x5, 1 // pix3 + stride + 1 - b.lt 2f - ld1 {v0.8b}, [x2], x3 ld1 {v1.8b}, [x0], x3 uaddl v2.8h, v0.8b, v1.8b + b.lt 2f + // make 4 iterations at once 1: ld1 {v4.8b}, [x5], x3 From patchwork Mon Oct 3 14:10:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38536 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345778pzh; Mon, 3 Oct 2022 07:11:31 -0700 (PDT) X-Google-Smtp-Source: AMsMyM53jfE5y/wVBLeT3s6d0BSBx0Ltx3dgEPHMyagFQIz6Uwf16s+8bGPfpmK0SnE4Az4Ux8UT X-Received: by 2002:a17:906:4bd3:b0:731:3bdf:b95c with SMTP id x19-20020a1709064bd300b007313bdfb95cmr15543335ejv.677.1664806290543; Mon, 03 Oct 2022 07:11:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806290; cv=none; d=google.com; s=arc-20160816; b=Y39A/PtqM++9SwjZgev0P0rMCsGZAVB2AlM9YTHMNHNpsoG6jG/m1AyUJGjzMs5qNu dz1FSXCWTe7VdZ2Q1/5poC4jeZGAJzqWylWmvBsNHDXbTGSklhwHKp1uEIrzMSGM2wmg ie022El4cdSbVvGt9EcfJdRQKmC2BsP8L5/1aMkLgipx5QvMBdQyCjf47P2GoRUogx+V gdIfTrCBMJsyhcfDBPdXW6+vBGCbezpO2JsFy4avxHwY0b6eJuY72iKkwhiCSH7Uv872 3UHNv0topGYAGKGtrV1IMZcMPdYnUn3OkOGU17njJWnoPydG41JBOFtMYe4YxyThMRyJ cepA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=dgWzfVPmVl2/IoPjPxATeUFOTAfKV+pR5ADI+14e+Fs=; b=ZzF0W90DcOeF/Iwy5pjrAIR3Zdw7F9I8xtaxLgbSEoCdpzTDHT3qOpR60ZYDr6ePl/ l2R+P/fgtfS2wUXl+GYXX/W6ikj/5XIiQV3RcR0eI/vW2KC3NowpBK3Uj9rEar/FKTGt W3WHzglUhBLXS8l/O2X1uAsH4EDLrO1c3aYnw9PpoYmyw4BwV59lS7vO+qaXoVTxK/wB pKLf5kmvYkXTYhpJEHFsG3bPfdRzVjxiofwXcUHDGE+z65h6/xXZYQSUubCuI2EYUJif i5NTir6k2NuuwYspuzVvniBpAJYyDhdUAprIW/YsnWjWECkOQiMw6uRwXkL2bihpSl9O EMtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=MfEVcR4c; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id jg4-20020a170907970400b007821d14f2a8si8113259ejc.398.2022.10.03.07.11.29; Mon, 03 Oct 2022 07:11:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=MfEVcR4c; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8BA0168BBB1; Mon, 3 Oct 2022 17:10:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E74FF68BB96 for ; Mon, 3 Oct 2022 17:10:41 +0300 (EEST) Received: by mail-lj1-f179.google.com with SMTP id s17so1605613ljs.12 for ; Mon, 03 Oct 2022 07:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=PvZVB8RfkZKS7auXOkdDqHUa5CDhCJU74wdhenpZjt0=; b=MfEVcR4c2j0nOfY3UkvNpSlBTDZFfzJPxT22padWqXc0AA++zvwZazgmZk5zzdqFdS +OXViSay+XkuQiHviHdpO9hxZQ9ypY4sr1i3t2lb2V2473LfZBoybqfKKlmfQ8mYZjSl /wyotFvwu2MDyc4DhtXdfaQFg7gaapBaiyvIVSYouw8v7sFTF1s9dAh5dJ18sEc0pZEV t8R8e9fTiCzmww8ox6V4Su8seQfkxUxkctwzQ25fzgV5l3NNq4decnpay0dOiF5K8EEy z+GSb3Okw8vY4YFT7MWrdulmoOUaEpdThNMhlITJMAUqEyOXU1p8QwROmsg70vaKC1kl +9hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=PvZVB8RfkZKS7auXOkdDqHUa5CDhCJU74wdhenpZjt0=; b=IIhSAgV22ZECUFQC9QFuXO/hK3+LJ40Nplc/vqY3hkMAlcJYE/fsQ6PyW/lmL2LURA YRSkAtp3Mkx0/yzJsjU7fkRah5GC2AytdY6no8w6DY62apbN0/kUMd46oa0TCpZ51bC0 bNPhcpZuhFC4/PB0kEWkf5PbEJd6NN3PQrkfk5LRMhMJuAeEQeaZymtD3W1WDAVfLCfn Ikh8Bsu5Oi6Z49q26YyYCCaK8FO3u17pQeULcd6IyQIIlOtbfMTHKhsxi67QHmacIs5d p3MKZ1H8/jNaHpO+yAkj9fRLmu0h7YNMLejc/gJ2YabUvgP02gFYvHM60CQaL6bo/PgM uX9A== X-Gm-Message-State: ACrzQf0h37RYCMzkWt1MOEFYYf5uLuTJHCQehHBeJX/e7YTdH3fDB9m9 iFJSd0vXgyrNmBQXjG77ROiG1NOFBZ/hYQ== X-Received: by 2002:a2e:bcc2:0:b0:26c:7278:b33a with SMTP id z2-20020a2ebcc2000000b0026c7278b33amr6440722ljp.273.1664806240931; Mon, 03 Oct 2022 07:10:40 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:40 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:17 +0200 Message-Id: <20221003141020.3564715-5-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 4/7] lavc/aarch64: Provide neon implementation of nsse8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: dc21T+U981xj Add vectorized implementation of nsse8 function. Performance comparison tests are shown below. - nsse_1_c: 256.0 - nsse_1_neon: 82.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 15 ++++ libavcodec/aarch64/me_cmp_neon.S | 99 ++++++++++++++++++++++++ 2 files changed, 114 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 695ed35fc1..05156627fa 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -66,6 +66,11 @@ int ff_pix_abs8_y2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *p int ff_pix_abs8_xy2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int nsse8_neon(int multiplier, const uint8_t *s, const uint8_t *s2, + ptrdiff_t stride, int h); +int nsse8_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { int cpu_flags = av_get_cpu_flags(); @@ -94,6 +99,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->vsse[4] = vsse_intra16_neon; c->nsse[0] = nsse16_neon_wrapper; + c->nsse[1] = nsse8_neon_wrapper; c->median_sad[0] = pix_median_abs16_neon; c->median_sad[1] = pix_median_abs8_neon; @@ -108,3 +114,12 @@ int nsse16_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, else return nsse16_neon(8, s1, s2, stride, h); } + +int nsse8_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h) +{ + if (c) + return nsse8_neon(c->avctx->nsse_weight, s1, s2, stride, h); + else + return nsse8_neon(8, s1, s2, stride, h); +} diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index cfba3eb33a..24be05dd18 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -1158,6 +1158,105 @@ function nsse16_neon, export=1 ret endfunc +function nsse8_neon, export=1 + // x0 multiplier + // x1 uint8_t *pix1 + // x2 uint8_t *pix2 + // x3 ptrdiff_t stride + // w4 int h + + str x0, [sp, #-0x40]! + stp x1, x2, [sp, #0x10] + stp x3, x4, [sp, #0x20] + str x30, [sp, #0x30] + bl X(sse8_neon) + ldr x30, [sp, #0x30] + mov w9, w0 // here we store score1 + ldr x5, [sp] + ldp x1, x2, [sp, #0x10] + ldp x3, x4, [sp, #0x20] + add sp, sp, #0x40 + + movi v16.8h, #0 + movi v17.8h, #0 + movi v18.8h, #0 + movi v19.8h, #0 + + ld1 {v0.8b}, [x1], x3 + subs w4, w4, #1 // we need to make h-1 iterations + ext v1.8b, v0.8b, v0.8b, #1 // x1 + 1 + ld1 {v2.8b}, [x2], x3 + cmp w4, #2 + ext v3.8b, v2.8b, v2.8b, #1 // x2 + 1 + + b.lt 2f + +// make 2 iterations at once +1: + ld1 {v4.8b}, [x1], x3 + ld1 {v20.8b}, [x1], x3 + ld1 {v6.8b}, [x2], x3 + ext v5.8b, v4.8b, v4.8b, #1 // x1 + stride + 1 + ext v21.8b, v20.8b, v20.8b, #1 + ld1 {v22.8b}, [x2], x3 + ext v7.8b, v6.8b, v6.8b, #1 // x2 + stride + 1 + usubl v31.8h, v0.8b, v4.8b + ext v23.8b, v22.8b, v22.8b, #1 + usubl v29.8h, v1.8b, v5.8b + usubl v27.8h, v2.8b, v6.8b + usubl v25.8h, v3.8b, v7.8b + saba v16.8h, v31.8h, v29.8h + usubl v31.8h, v4.8b, v20.8b + saba v18.8h, v27.8h, v25.8h + sub w4, w4, #2 + usubl v29.8h, v5.8b, v21.8b + mov v0.16b, v20.16b + mov v1.16b, v21.16b + saba v16.8h, v31.8h, v29.8h + usubl v27.8h, v6.8b, v22.8b + usubl v25.8h, v7.8b, v23.8b + mov v2.16b, v22.16b + mov v3.16b, v23.16b + cmp w4, #2 + saba v18.8h, v27.8h, v25.8h + b.ge 1b + cbz w4, 3f + +// iterate by one +2: + ld1 {v4.8b}, [x1], x3 + subs w4, w4, #1 + ext v5.8b, v4.8b, v4.8b, #1 // x1 + stride + 1 + ld1 {v6.8b}, [x2], x3 + usubl v31.8h, v0.8b, v4.8b + ext v7.8b, v6.8b, v6.8b, #1 // x2 + stride + 1 + + usubl v29.8h, v1.8b, v5.8b + saba v16.8h, v31.8h, v29.8h + usubl v27.8h, v2.8b, v6.8b + usubl v25.8h, v3.8b, v7.8b + saba v18.8h, v27.8h, v25.8h + + mov v0.16b, v4.16b + mov v1.16b, v5.16b + mov v2.16b, v6.16b + mov v3.16b, v7.16b + + cbnz w4, 2b + +3: + sqsub v16.8h, v16.8h, v18.8h + ins v16.h[7], wzr + saddlv s16, v16.8h + sqabs s16, s16 + fmov w0, s16 + + mul w0, w0, w5 + add w0, w0, w9 + + ret +endfunc + function pix_median_abs16_neon, export=1 // x0 unused // x1 uint8_t *pix1 From patchwork Mon Oct 3 14:10:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38537 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345886pzh; Mon, 3 Oct 2022 07:11:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4YJ7YFeiuHLkjFig1akdHRCsRf8qN3xXFQSlmrkv9GN38sndJ//8fesVi/vVH168H94nFQ X-Received: by 2002:a05:6402:ca6:b0:458:60af:6875 with SMTP id cn6-20020a0564020ca600b0045860af6875mr15287237edb.295.1664806300011; Mon, 03 Oct 2022 07:11:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806300; cv=none; d=google.com; s=arc-20160816; b=IsBmMZBABkS6lSsvXOFxLvxELjQ7Ss3SO3V7X9JD1jfeY/mUPda+X85K6iGsOpz8TV DDHyEAI3vnNxkvjAm6qkRkU2gWYT3gkv3nfPNVKu5KZI4Zh9EAKYQqcaVRZbQJPFYBVV 8QmdnE4jSdBPURAw8qcnA7X03CVWrjenn5m6oy87EZr3qEL5A/0AGebtGqU2NXKscXby xedi0vvPRZutY4SGxAB1R22Gk2LNleEI4exHbrvEcpd3xT/Y16Lc4z7BSZ3dFpwiqD5r HM+l2MRIIc65wprepf3FXNli0ILujGl/WlQH1UsZyakI6vrjggln/3kfPSO1hh18cGrV dZZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=Nj0KYe3QekzhbWGGrnizsvY6jo2ud91l9/rSj6YV0jY=; b=qRPYXRW0fCcb2tcEpnWr+nAMqFqjzrF7zRSyJnBVT635YjAcBdho/w5pyUGur1t0CZ 6uTRaa2On0AvfBYNdEso4DVtrU72PHqj+4BGWaNtBfrrzucsbXh5Hu9PTZFChK+zNKCo UoNMWa2DV4lAsdM87oLKAM5qn8COej6PMEAqCr7AIpTfkdO4QKEm2WFjV25YqERQ4G1T HvYGHThNII3shn8g+Vgw58hkIhdAVsajgUG2p5Jm5V7Sp69xbanBrGq2C2YDv7am3xrv DcaE7zU4dHSljt4e97wvuAGNx7i72DMNUjquH2FakmdsSMp6Q9BAQQlu8iv6GDMDNzMw +uSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=kL4j2yhy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h11-20020a056402280b00b00457e9f88873si8358146ede.462.2022.10.03.07.11.39; Mon, 03 Oct 2022 07:11:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=kL4j2yhy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 913B768BBDB; Mon, 3 Oct 2022 17:10:50 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DD6EE68BBA4 for ; Mon, 3 Oct 2022 17:10:42 +0300 (EEST) Received: by mail-lf1-f43.google.com with SMTP id u26so8144019lfk.8 for ; Mon, 03 Oct 2022 07:10:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=kai/emWzL6d1oU8idwAFA1gFjvQKCsNAiB3LnWuAjgo=; b=kL4j2yhy0lTHbARbBO9gPTdNwCTi1adKq0hN9pnFeBPy8Ewkwfkxhmt1MVGwYfMlxF sd8sbEwzEkMK8bL1W9xoMcVBTW30CPJl5U7mjKOAG4HwMbyXCvJCYq2xtjVgbsihMsZZ TIKlzAYsgToHQzs4t22GFM6rrP/bIgIrgLZnQAeWT/hKAbvnZ+CmQQmmcUQkT5jP+U5n UL//Y0aD2z5vzrh3JBUXH9w14DU2FjM9BsoMW6tpNtMmjry2SE13AnhhbGAy3pULe8pI mVwOQD0j2lmendwEnzW6IMp1m89ekmZrN/iI52DBKX8NPKGgFnpBN64XUbwsNjZSRDEf V01w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=kai/emWzL6d1oU8idwAFA1gFjvQKCsNAiB3LnWuAjgo=; b=qpLWpuPbLz8J4lEnNKavQk7KhXCtkfZOw2rYsBih+pPRIF7WG+9lsHWxpsTdM2yUA8 8cSAwS5NlqN+iuMeIM/y8Jnb3fmw9hozlfFOF8+QXwcUncKJX01FvgGn+vzhDDMZZpKZ wTdHLr8OocFA3HVrFDzPfmJns+hJFTRmnEFYDpio3VL724mI6crFPNdGYWQJq92EeyPV 4g7q2cO5V1QwkHwwlWPm5slgI5Z23Cm5PcyLxQUvijYH+yxoZ6fxRVabxYsVr/upe8XZ Ti0ox31WjH81b5f/TOY2kgXO7fK6DhhShTEvA1csB7a7SbPVZvGGok6gtxSJ8nCO4zCj TTJw== X-Gm-Message-State: ACrzQf1XxRzBCjpkdPqChpbEC137pg4c9QY9lCjTUuGI5JxjJggguwiK PDMA23nXwZKuik+PgPEs22a9G+QHQtkaaQ== X-Received: by 2002:a05:6512:e9e:b0:4a2:1503:c2e7 with SMTP id bi30-20020a0565120e9e00b004a21503c2e7mr5081643lfb.476.1664806241859; Mon, 03 Oct 2022 07:10:41 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:41 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:18 +0200 Message-Id: <20221003141020.3564715-6-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 5/7] lavc/aarch64: Provide optimized implementation of vsse8 for arm64. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: NSR0ZZxYHJu2 Provide optimized implementation of vsse8 for arm64. Performance comparison tests are shown below. - vsse_1_c: 141.5 - vsse_1_neon: 32.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 5 ++ libavcodec/aarch64/me_cmp_neon.S | 70 ++++++++++++++++++++++++ 2 files changed, 75 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 05156627fa..1a0c3e90bb 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -71,6 +71,9 @@ int nsse8_neon(int multiplier, const uint8_t *s, const uint8_t *s2, int nsse8_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride, int h); +int vsse8_neon(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { int cpu_flags = av_get_cpu_flags(); @@ -96,6 +99,8 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->vsad[5] = vsad_intra8_neon; c->vsse[0] = vsse16_neon; + c->vsse[1] = vsse8_neon; + c->vsse[4] = vsse_intra16_neon; c->nsse[0] = nsse16_neon_wrapper; diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 24be05dd18..104e02f495 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -838,6 +838,76 @@ function vsad16_neon, export=1 ret endfunc +function vsse8_neon, export=1 + // x0 unused + // x1 uint8_t *pix1 + // x2 uint8_t *pix2 + // x3 ptrdiff_t stride + // w4 int h + + ld1 {v0.8b}, [x1], x3 // Load pix1[0], first iteration + ld1 {v1.8b}, [x2], x3 // Load pix2[0], first iteration + + sub w4, w4, #1 // we need to make h-1 iterations + movi v16.4s, #0 + movi v17.4s, #0 + + cmp w4, #3 // check if we can make 3 iterations at once + usubl v31.8h, v0.8b, v1.8b // Signed difference of pix1[0] - pix2[0], first iteration + b.lt 2f + + +1: + // x = abs(pix1[0] - pix2[0] - pix1[0 + stride] + pix2[0 + stride]) + // res = (x) * (x) + ld1 {v0.8b}, [x1], x3 // Load pix1[0 + stride], first iteration + ld1 {v1.8b}, [x2], x3 // Load pix2[0 + stride], first iteration + ld1 {v2.8b}, [x1], x3 // Load pix1[0 + stride], second iteration + ld1 {v3.8b}, [x2], x3 // Load pix2[0 + stride], second iteration + usubl v29.8h, v0.8b, v1.8b + usubl2 v28.8h, v0.16b, v1.16b + ld1 {v4.8b}, [x1], x3 // Load pix1[0 + stride], third iteration + ld1 {v5.8b}, [x2], x3 // Load pix1[0 + stride], third iteration + sabd v31.8h, v31.8h, v29.8h + usubl v27.8h, v2.8b, v3.8b + usubl v25.8h, v4.8b, v5.8b + sabd v29.8h, v29.8h, v27.8h + sabd v27.8h, v27.8h, v25.8h + umlal v16.4s, v31.4h, v31.4h + umlal2 v17.4s, v31.8h, v31.8h + mov v31.16b, v25.16b + umlal v16.4s, v29.4h, v29.4h + umlal2 v17.4s, v29.8h, v29.8h + sub w4, w4, #3 + umlal v16.4s, v27.4h, v27.4h + umlal2 v17.4s, v27.8h, v27.8h + cmp w4, #3 + + b.ge 1b + + cbz w4, 3f + +// iterate by once +2: + ld1 {v0.8b}, [x1], x3 + ld1 {v1.8b}, [x2], x3 + subs w4, w4, #1 + usubl v29.8h, v0.8b, v1.8b + sabd v31.8h, v31.8h, v29.8h + umlal v16.4s, v31.4h, v31.4h + umlal2 v17.4s, v31.8h, v31.8h + mov v31.16b, v29.16b + b.ne 2b + +3: + add v16.4s, v16.4s, v17.4s + uaddlv d17, v16.4s + fmov w0, s17 + + ret +endfunc + + function vsse16_neon, export=1 // x0 unused // x1 uint8_t *pix1 From patchwork Mon Oct 3 14:10:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38538 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345987pzh; Mon, 3 Oct 2022 07:11:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5aJutS1RLSUL1LWDtFbr1z45wtSpY0A1/fWA0pZ0tsABbr1YtFYQcNSLTSyYcjUhJQFt8N X-Received: by 2002:a05:6402:3584:b0:458:d3fa:fb89 with SMTP id y4-20020a056402358400b00458d3fafb89mr7581219edc.218.1664806308238; Mon, 03 Oct 2022 07:11:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806308; cv=none; d=google.com; s=arc-20160816; b=k+6UJjrNKI/P7yMGSzJNYFwKWo8KPi8jTVO/X79B8s4uka6gWB0bX0s1y0ukFXWBoz vj5IzY+armddVU64yI46Y1ZWS4lMeb0C0dAdQF95dbVzJzCp11vzLXuwdeO2370xuuBk u2ZFd6Q8mhFylpZ26Mhi9QdyDFig5JcgdtaYrJ8HUB+JUIKU5I4+UNK7XCadhgc3Y0vK DxLscqZX/V0S51s2TCtKZkt9Zp5BpZwXzpvT2tRj4NO7aMdASA2LX0dTVzvX+kq0mCoU 38AtS3d1/8LjlihQzraCCE3Hl0esBkHKigNdd+xTX9hT/g2jJ1gTQRNPbCLsHbjXZqPk aPTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=2gRGmvh0JNVG8DCimR5RFSKaYIapmMqEIWQVSPjvLjQ=; b=oj0HmFH+OiXdUFQU3tvHVLzvDBL5zE9G0q7z3WcOPwDSdefQiK+cR4PKaNRu/YD9Y6 K2g5rlVlTfDx66d7w6hKPvrQPaGIVg3dpLEWi/FF7SR31KlZefBV031AwvQ+L+MuzENZ Tdr/O5vIJdEvgCeLFk9uFhfyC52IO2BXhICND+/UyGFalCKI4tUSzGFcfqgRiw1GB36i a4R0l5YOLzhxXuwNEYTvABCJQP8cvK6AnDKrnt1ts7twkZqDJ/EyPm7k0eT9KcXcG8m7 aluK9QppLOV4PVJru9KY3x1iy3/IVRrx8NaaHg1ptTfJCv26q3iGDQxmysL6aVT7PZ/T 69bQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=XubfuMBO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id x2-20020a05640225c200b00447b0cc1501si8275918edb.99.2022.10.03.07.11.47; Mon, 03 Oct 2022 07:11:48 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=XubfuMBO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 875C268B20B; Mon, 3 Oct 2022 17:10:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C22D568BB93 for ; Mon, 3 Oct 2022 17:10:43 +0300 (EEST) Received: by mail-lj1-f180.google.com with SMTP id bs18so10325489ljb.1 for ; Mon, 03 Oct 2022 07:10:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=11GkLrbQO359cd9Y93C6AkPENDFEoWBAEWBEc16yUX0=; b=XubfuMBO/yiz2YuOxi6xNFF8nz0X4ZFi+GhaHU5WCFgXYdKG57Pq9H+Tp4mAvbVFcu YG5+4UBQtCSx9Uq/mXMIr3H1PkZBdZdnxTVzoMV/0Xof9wgRU79rYxaI8UzI2m2LJPbC L3Ake71N0j9hUfO/0tTlo4qZTBRKBlgJ63luqNdbHGZAlaTp9t7OL4gPtV5r1mWu9PIR ZeHzIU+C3bZ7RmPGePkHn9dg4JUaGzR6nIr90HUS/wua2+IoH9FhYMJYuRnNk0UU2MAk Ssq9jmPZxboFWhZL0NrutW2uoJ7yCcXOz3g1NCmWr/EIorxeogQMbAAIplQ1qccfwGzx A6Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=11GkLrbQO359cd9Y93C6AkPENDFEoWBAEWBEc16yUX0=; b=NmTltPhhB6lHGrKIZef0kSFDfinRUenL1V6HmIX4AuQJey/OpfKQZ9V4DYF1u0Cn/0 +DhrcYWnZkeZ/5X3sE1GREs4YVjEL41e6c4AOyPxTG9Fex3ZZLf1olqpi0Tab0d2gkqo 2hTXvj2whzfVRoJPaFZ/HrPs8U2jYgomYGkp+KEH7BpNtvOphZYtIU3XewAHeagBjao7 mCGltwvRot+xdRHzRX5dCHvLA6bj6WeljVNvdDAh+YBduaM9TahVE1WAkTaj860eVjb6 KCJ73ekmItselRIM6q45u0tQ/esPb/tBNetcf+3m42RTm55v+sjaRkYBxsMI76ZUoAss 0QbA== X-Gm-Message-State: ACrzQf3YF4nTQp0G5ZxYbchpgcNFzSfQ6SeDIQDVhKZ4BLU2QHlkB/22 bzZeNbMO9AV375heY+MSPnPrSvu7uRqPvQ== X-Received: by 2002:a2e:be9e:0:b0:26c:2d48:5c98 with SMTP id a30-20020a2ebe9e000000b0026c2d485c98mr6693240ljr.395.1664806242810; Mon, 03 Oct 2022 07:10:42 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:42 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:19 +0200 Message-Id: <20221003141020.3564715-7-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 6/7] lavc/aarch64: Add neon implementation for vsse_intra8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1ATncPw/bTaM Provide optimized implementation for vsse_intra8 for arm64. Performance tests are shown below. - vsse_5_c: 87.7 - vsse_5_neon: 26.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. --- libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++ libavcodec/aarch64/me_cmp_neon.S | 53 ++++++++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 1a0c3e90bb..1e0f1cf4f1 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -74,6 +74,9 @@ int nsse8_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, int vsse8_neon(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride, int h); +int vsse_intra8_neon(MpegEncContext *c, const uint8_t *s, const uint8_t *dummy, + ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { int cpu_flags = av_get_cpu_flags(); @@ -102,6 +105,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->vsse[1] = vsse8_neon; c->vsse[4] = vsse_intra16_neon; + c->vsse[5] = vsse_intra8_neon; c->nsse[0] = nsse16_neon_wrapper; c->nsse[1] = nsse8_neon_wrapper; diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 104e02f495..61e4f68335 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -1106,6 +1106,59 @@ function vsse_intra16_neon, export=1 ret endfunc +function vsse_intra8_neon, export=1 + // x0 unused + // x1 uint8_t *pix1 + // x2 uint8_t *dummy + // x3 ptrdiff_t stride + // w4 int h + + ld1 {v0.8b}, [x1], x3 + movi v16.4s, #0 + + sub w4, w4, #1 // we need to make h-1 iterations + cmp w4, #3 + b.lt 2f + +1: + // v = abs( pix1[0] - pix1[0 + stride] ) + // score = sum( v * v ) + ld1 {v1.8b}, [x1], x3 + ld1 {v2.8b}, [x1], x3 + uabd v30.8b, v0.8b, v1.8b + ld1 {v3.8b}, [x1], x3 + umull v29.8h, v30.8b, v30.8b + uabd v27.8b, v1.8b, v2.8b + uadalp v16.4s, v29.8h + umull v26.8h, v27.8b, v27.8b + uabd v25.8b, v2.8b, v3.8b + uadalp v16.4s, v26.8h + umull v24.8h, v25.8b, v25.8b + sub w4, w4, #3 + uadalp v16.4s, v24.8h + cmp w4, #3 + mov v0.8b, v3.8b + + b.ge 1b + cbz w4, 3f + +// iterate by one +2: + ld1 {v1.8b}, [x1], x3 + subs w4, w4, #1 + uabd v30.8b, v0.8b, v1.8b + mov v0.8b, v1.8b + umull v29.8h, v30.8b, v30.8b + uadalp v16.4s, v29.8h + cbnz w4, 2b + +3: + uaddlv d17, v16.4s + fmov w0, s17 + + ret +endfunc + function nsse16_neon, export=1 // x0 multiplier // x1 uint8_t *pix1 From patchwork Mon Oct 3 14:10:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 38539 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1346061pzh; Mon, 3 Oct 2022 07:11:57 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Htdh8NZhYg2SrI6rKN5jOkstQJGjSKLc0cuyLJXMdftkCTybP26omG8Jp4COyiJxQgeOe X-Received: by 2002:a17:906:cc59:b0:779:f094:af3d with SMTP id mm25-20020a170906cc5900b00779f094af3dmr14926100ejb.239.1664806317099; Mon, 03 Oct 2022 07:11:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806317; cv=none; d=google.com; s=arc-20160816; b=laKDJgUOLGNZ3e7HOXZGYadtgy7g3gDPEH682bO3vJLdDCxk3ZzLBinM+4SN6kE32v 8XizwSk08vs4QbBIcttPyLWcoq5aFpJCHHZir0kJvxOTFFNi/OCpJMsvPjRQTCWydzMo PWwmfd2PI9aQ3pV8nuiAt+smA9oqJmphhz7M6bPA0YEfUswk8+2Y8viT1Z63YnhPPBch Dvq6jCnjt3dMk6e3vMo8czMv6eL42eZc/kwcYnltC3IT+X5ZyZVQ4EgSg1dxcMBIKz7E v+izfvMmWO5TPMptinqQ0+UkOi0N5JcSVEnOhTPl6hwqvwqszby1wJvhQxlnk8Dw12OG WxjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=DtHRV+TfHOjWvbJnDx6Gf3OkYvu5EgH7WGe/+SPhcW0=; b=bnpNIVAaVacoMwGHIydCE9DYOYHKZWvADKpeUCacwOXrdal5Jh4jEQfYB00/0C6j4Z o24Q+e78jJS6R0e/SiuFiYSlVIaPQY2giDMFaB8zcHMspBv++NKquzqFGSBlyBYTp1AP f+gdOeD0I5eAuSSqLA+J7uBbK1IXc6d7IMSG+2oavIvlkq6rpozOlFiLs8QeNrSgzcwR PUU2MvpjhjDSxcuxKR+yKis08HD1HWCd1yjrtfZOTPWVrqMTq5iZo/Kh/V5NYLqJWC/S P/JNw7MliLkYoeCAaAaVj3CpV0XcYPKViM6gQjfR9/Vmc77PJzspIzjHYZmXXVxLtqpK N/YA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b="s9/m8B+c"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s17-20020aa7c551000000b0045751f72100si7807419edr.143.2022.10.03.07.11.56; Mon, 03 Oct 2022 07:11:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b="s9/m8B+c"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6E9AE68BBEF; Mon, 3 Oct 2022 17:10:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A838968BBB2 for ; Mon, 3 Oct 2022 17:10:44 +0300 (EEST) Received: by mail-lf1-f44.google.com with SMTP id bu25so16772020lfb.3 for ; Mon, 03 Oct 2022 07:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=P8Ivd+SToAIxAZHkeyZ77TUUddrz1L734c6VZo74i6U=; b=s9/m8B+cna8FYbWEa1MXCQdbSSvSHAO+X5R4UrWFbjVKdLhU2ED4ukWaTXiykRi+x2 uHt/WrV79ka4dz8JpHf4AFupR7sG0g1XCUOWqsY/LzSEqeDdEzXuKP9C1gwItg0qD9GJ a9Za3BAbA8pj362YvxGHrPNfJD2z5WbMMfPeG3VVIvv/Ms8PnXSff/w6eWPHM2yzJmLs udi/wrUaXLPmT9bL68y5PSvoJkwLvBT8Bp11aTWRXpHZS+RF2mj1ftgE6iWj3jAzl7WJ PQK/GTCUFuW9MlKSPnECX5qDN2WbMOCIF4StoENS2DFzNW5ZhHH/nhZ6xlOTpwI+EGp9 /kMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=P8Ivd+SToAIxAZHkeyZ77TUUddrz1L734c6VZo74i6U=; b=SEDemTlEmn+NNTj+CH3JNs1iwq+9SWwKfgkLEWFvL8eLhrqQvzpu2hV1OJ7OC9B4iO B9aNkdqLtRLFKN9mJB7k1v2PRLuzxvwZ24kSDiux5RKq32o6EhBDTmFsHHBRlizYljhC r0GaJgEFxYGERqqiqrD6t552P9ogjtrHBzUV6t4uEwwImFV3B9wGRloYJN3V6bdAIOqu FvfAr1TNrlZ6W5JGKvSEOnWIe+LpJVoT7KOv22Rtu8dQAL7oXDKK2Btd91doZhlBA2Ut 74LG1kckmDqpvsY681wnbd6ouJ0OmtSHturBO/I4EhkECyPzqqetM9zg+iVggoTJFlks ox2g== X-Gm-Message-State: ACrzQf1YyMX1Wzv1a+jdeofvnUFd6wPzoern0HOeibcaIftKvlkRzKBg LgekMdOsGWQqqs1qY7d4YVW/CUc5dcEvBw== X-Received: by 2002:ac2:5a08:0:b0:4a2:2db2:2dfb with SMTP id q8-20020ac25a08000000b004a22db22dfbmr3015724lfn.638.1664806243737; Mon, 03 Oct 2022 07:10:43 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:43 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:20 +0200 Message-Id: <20221003141020.3564715-8-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 In-Reply-To: <20221003141020.3564715-1-gjb@semihalf.com> References: <20221003141020.3564715-1-gjb@semihalf.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 7/7] aarch64: me_cmp: Improve scheduling in vsse_intra8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MgBBoeN+VShi From: Martin Storsjö Before: Cortex A53 A72 A73 vsse_5_neon: 74.7 31.5 26.0 After: vsse_5_neon: 62.7 32.5 25.7 --- libavcodec/aarch64/me_cmp_neon.S | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 61e4f68335..d8a18cd4b8 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -1113,11 +1113,11 @@ function vsse_intra8_neon, export=1 // x3 ptrdiff_t stride // w4 int h + sub w4, w4, #1 // we need to make h-1 iterations ld1 {v0.8b}, [x1], x3 + cmp w4, #3 movi v16.4s, #0 - sub w4, w4, #1 // we need to make h-1 iterations - cmp w4, #3 b.lt 2f 1: @@ -1127,13 +1127,13 @@ function vsse_intra8_neon, export=1 ld1 {v2.8b}, [x1], x3 uabd v30.8b, v0.8b, v1.8b ld1 {v3.8b}, [x1], x3 - umull v29.8h, v30.8b, v30.8b uabd v27.8b, v1.8b, v2.8b - uadalp v16.4s, v29.8h - umull v26.8h, v27.8b, v27.8b + umull v29.8h, v30.8b, v30.8b uabd v25.8b, v2.8b, v3.8b - uadalp v16.4s, v26.8h + umull v26.8h, v27.8b, v27.8b + uadalp v16.4s, v29.8h umull v24.8h, v25.8b, v25.8b + uadalp v16.4s, v26.8h sub w4, w4, #3 uadalp v16.4s, v24.8h cmp w4, #3