From patchwork Wed Aug 21 14:55:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 51104 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:4062:b0:48e:c0f8:d0de with SMTP id kz34csp558579vqb; Wed, 21 Aug 2024 09:51:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWSizn8J6e8x6SmWWv2BFkKP+K/4RX8/Zgonzjb/7Em49FaTprMHmqqjBjIJiNgpFaYtaccxjv7Oxs3JZE6ytHg@gmail.com X-Google-Smtp-Source: AGHT+IFWmlr1ZV8+UKbqTuLMbtcBm7XwI6F2k6uHRoRPCVJxFIkYamMknDJQ2KDVLbVf1WI6/S/7 X-Received: by 2002:a05:6402:380c:b0:5be:dab8:1bb3 with SMTP id 4fb4d7f45d1cf-5bf1f0dc628mr2463868a12.13.1724259084328; Wed, 21 Aug 2024 09:51:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1724259084; cv=none; d=google.com; s=arc-20160816; b=itr9SmJt8y3bo84VtyymivnGuzD1RTsu8YVgZkRZXssEK8q/BDVDuV1lsSzzW9zXIY kAOrQXurSkyI+8rT3ju02Z8s+GA9fwZ2vYrv4EjeUR9O0AYIGSFLd9F0xer1Bs5LMxVJ Y+PnvLn+Mvz0UkFVtcIxfkKVS362WtwbENN1HLVk2lvQYe3u09ySuUdVxiRHoZZqKKqb Cm1IwV66BoiV+y0dPClsKgE/R3mNwRWWGCLpHWIUNVjl+2vXT+QaE27L1htqhHAHEG5r 3W6l4JHczDXxiZcJhODacRUDJDk0CGEYNZp+xayXUpwVukKIZQX/NZm2AJ76TWwrWjRD Gt7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=085Z7a9P7EYk7awyqiJu2iM/LezWrmqrSN3mNJ53Ubk=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=TZfyazy5j+pYx5IUTHU536HBUS5q/JTCWsWp9zf17JJhjNq5Pvp1zoH94bVq4wNqjh lRghz0pbCmr+1B7RSHM4N0SPOgotWVnEcTlb2MWgtWIDtpuKcgvPnNcs2ucQQYqIB0rv ae+efVG0QFpl1U7RrDDY/EqXX/RIpA3EDK5FuZA0i9IR4A/FU4HqPwwPMD8G8ivRue5i C6BWKFxq5ErqnLzrS7qj80PnICGKcr4NOxjqhrquTo1Z4MNO+U618VSZHYxbZGc+bYDK jKTIcvRKSNcM5ro1xr2SoyaY1olni9ggufjh7soYFb3mR/ypoTYR2GTH/MeQ5stAM+qr r/6w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=mWuFrPky; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5bebc08a191si9134586a12.431.2024.08.21.09.51.23; Wed, 21 Aug 2024 09:51:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=mWuFrPky; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 09C7768DC3E; Wed, 21 Aug 2024 17:56:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 585BF68DC22 for ; Wed, 21 Aug 2024 17:56:03 +0300 (EEST) Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-42ab99fb45dso20259495e9.1 for ; Wed, 21 Aug 2024 07:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724252162; x=1724856962; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=ACm4Tuvqi694HqoJKH+BaiU2vMVvfMh8bxavo7472hU=; b=mWuFrPkyrpfEFxcakzSQJBlp2UslTiidEwJYxVY7pms3jboF0YBJtJoXs4DFIsZiC2 wOsZaet6Hl8PJ6klm8yASUDl1DIVkQ4CgB79ifQW2airhuqJcx1WY2xwBz3zJPj7n8v4 9FE4v6L9+7YeA4iEhY+2P6uAii0ekU6Dz8G5K0mRDUYjdmmE2+xz3xlKE406FWoeL/bH wW8AyPgERF6a4OLe116t1ezS16KhquOCPkCzrgaicQ7F5EIMqdUard9ui/J0WDVaBK2+ c8WG1+RRBQTXxVeXG6NsBH9BG0Xb4CI1L++ctflvlIoBZoqoDdf+jF35uk8/m37QdysM ynpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724252162; x=1724856962; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ACm4Tuvqi694HqoJKH+BaiU2vMVvfMh8bxavo7472hU=; b=jJBlivfe1B13ycMcKPK+tQgqEHbUIRZSFwE9KboEgQMTCuP2sN7I4S2vSZ37cLVVBe 0bOms561LHr/L8ky7EcoEDoMM3Sfr0Rnal/uVqboxWDO3VigsKxzgFHHYyjvMPMU00Mb axDBODEdS4IGGf1O2+CA5yfnqHbAdwg83oC8b+FO5LKPJObBIhJAufIbqyBkboQq0n1I 43OHOHlry5umjtYu9Q/DwRh/QDyNntEKIVhb6vnQp+pZItxAf28MM//fIVyYkvZWfhzy SLqaAq0EQZXKke768yn7LmvnPD+CYsXmMaZ94vdD5fc0xPGh3OcMLl0wsaybesO8qnhH iF0A== X-Gm-Message-State: AOJu0YxzeqMHJ1P2pdnqLzJsQr7w0X+kr/ARz2hf2sB8Q5ODxZLP+XXU 5pZEntfHOFZoLiXRV2O0PTSgA0eBKMmvOM9LaU25T22/TlqYGoJGKOxcAsrQ X-Received: by 2002:a5d:4b82:0:b0:371:93d1:428b with SMTP id ffacd0b85a97d-372fd826cb4mr2092984f8f.58.1724252162138; Wed, 21 Aug 2024 07:56:02 -0700 (PDT) Received: from localhost.localdomain (196.105-180-91.adsl-dyn.isp.belgacom.be. [91.180.105.196]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37189896c50sm15873554f8f.85.2024.08.21.07.56.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Aug 2024 07:56:01 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Wed, 21 Aug 2024 16:55:52 +0200 Message-Id: <20240821145555.235323-5-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240821145555.235323-1-ramiro.polla@gmail.com> References: <20240821145555.235323-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 4/7] avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 9SGPvPTrkmfz A55 A76 pix_norm1_c: 484.3 235.2 pix_norm1_neon: 193.8 ( 2.50x) 44.7 ( 5.26x) pix_norm1_dotprod: 91.8 ( 5.28x) 21.2 (11.09x) --- libavcodec/aarch64/mpegvideoencdsp_init.c | 10 ++++++++ libavcodec/aarch64/mpegvideoencdsp_neon.S | 28 +++++++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/libavcodec/aarch64/mpegvideoencdsp_init.c b/libavcodec/aarch64/mpegvideoencdsp_init.c index 7eb632ed1b..d0ce07e178 100644 --- a/libavcodec/aarch64/mpegvideoencdsp_init.c +++ b/libavcodec/aarch64/mpegvideoencdsp_init.c @@ -27,6 +27,10 @@ int ff_pix_sum16_neon(const uint8_t *pix, int line_size); int ff_pix_norm1_neon(const uint8_t *pix, int line_size); +#if HAVE_DOTPROD +int ff_pix_norm1_neon_dotprod(const uint8_t *pix, int line_size); +#endif + av_cold void ff_mpegvideoencdsp_init_aarch64(MpegvideoEncDSPContext *c, AVCodecContext *avctx) { @@ -36,4 +40,10 @@ av_cold void ff_mpegvideoencdsp_init_aarch64(MpegvideoEncDSPContext *c, c->pix_sum = ff_pix_sum16_neon; c->pix_norm1 = ff_pix_norm1_neon; } + +#if HAVE_DOTPROD + if (have_dotprod(cpu_flags)) { + c->pix_norm1 = ff_pix_norm1_neon_dotprod; + } +#endif } diff --git a/libavcodec/aarch64/mpegvideoencdsp_neon.S b/libavcodec/aarch64/mpegvideoencdsp_neon.S index 6e7a9319ba..0dbafef87b 100644 --- a/libavcodec/aarch64/mpegvideoencdsp_neon.S +++ b/libavcodec/aarch64/mpegvideoencdsp_neon.S @@ -67,3 +67,31 @@ function ff_pix_norm1_neon, export=1 ret endfunc + +#if HAVE_DOTPROD +ENABLE_DOTPROD + +function ff_pix_norm1_neon_dotprod, export=1 +// x0 const uint8_t *pix +// x1 int line_size + + sxtw x1, w1 + movi v0.16b, #0 + mov w2, #16 + +1: + ld1 {v1.16b}, [x0], x1 + ld1 {v2.16b}, [x0], x1 + udot v0.4s, v1.16b, v1.16b + subs w2, w2, #2 + udot v0.4s, v2.16b, v2.16b + b.ne 1b + + uaddlv d0, v0.4s + fmov w0, s0 + + ret +endfunc + +DISABLE_DOTPROD +#endif