From patchwork Mon Oct 3 14:10:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Grzegorz Bernacki X-Patchwork-Id: 34830 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1345356pzh; Mon, 3 Oct 2022 07:10:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6aQtA1xxiFVw9EqTVAfg610mBVRAulrooiQJzU76JY1x8YkO1HO9ENZAw8hsa7LAfVyh8T X-Received: by 2002:a05:6402:d05:b0:425:b5c8:faeb with SMTP id eb5-20020a0564020d0500b00425b5c8faebmr18168722edb.273.1664806253047; Mon, 03 Oct 2022 07:10:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664806253; cv=none; d=google.com; s=arc-20160816; b=TVwa54v+vGo6gTQ/3YrbgXIsaaIR0qU4GOrKe5EU5tefITmz66oPb7TKdYRRD9AecW TOoHIJD8PVTBgRr+Z10SUQQ9KwYKoEVoDvFaAmiPImZqoRobyJyVBWfrqyKAcoTbMEbX rU+XYhHQkEEq+5hPfcWrv7rVHjCRj9/gwdi+/ywSW3j+KmV5dZRxlBnzCNrZQaHrpC9u +iP7Bopr94vz8ylyEbSEsz+oQTlAqa7fyxv5cDao64WS4d60Woxp/pAkjnuI0f9TjBMc mCc32N7iA9/HimOruQx/EHJEDENNVruZCti3fhM30LTHisKmEcLsiCpS2idddfTEWAbY 4KMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=4ynh0sljrDRVsG5D7G2iYeYV5X+9eylt9kUNT/zIE2M=; b=cM0CiI4+nxOKvaEgrpkvvXcMDBIxX9h5t3RpvKWMnG1r6W+kgZ5ACVzb7sIXGFOpYf G+aWz/jkFX5n5AwzA9WGICY1hpU92AuI7cTmIhUAX1BN/wn9ibmzjmFyNuYEXxgTbsvL BSBYXYuJCcnyz0JVJWnIE5YkiKWBK782JKhfWdfdHmD1f/bzTYH8gz7ex0a7InrsjLvi OyovzQWhtB/5uhlOyqK9OVWfR1ZMQ0ptBJqcmg5BIqadSgDFO3l5kzdaxeKKrh4+kmCM adG4/XdjhoT16YgcsedzzUeWBny9Dj+ylZDPvhYnCf1kCqXhFg1PMfMKk/NlgXfXJv6m N/1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=NfYi5F4Q; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i7-20020a0564020f0700b0045405cf4edesi5948957eda.608.2022.10.03.07.10.49; Mon, 03 Oct 2022 07:10:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=NfYi5F4Q; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2191568B9F0; Mon, 3 Oct 2022 17:10:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7616968B20B for ; Mon, 3 Oct 2022 17:10:38 +0300 (EEST) Received: by mail-lj1-f182.google.com with SMTP id a10so11969756ljq.0 for ; Mon, 03 Oct 2022 07:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date; bh=hwJM/ZlgFjyFV1UZJOtO/JrFQpKM9aFcwwRvAsaGpko=; b=NfYi5F4Qeem46kYe9Uw2s2uIwoD44YXt7LjejjEfpGGo7nIm7n6fzhvkBNTekjd5AB WU+Ji9k+v4hlhflWMkUTezo9mLjcedmrqdTRrmJPHhuyylc34j9GR5SJROzspyjP286x F/KOoQU38OVa6Fa1vVI/mx+IA+1egyt10T6PpVpotucBK5C0en54oBPsWSR82Jh1aU76 /KeTPfOxmpOkweo3oL18KxhVCTEDAjf+7ERBuPRgbzRDGlOcDECVV2TRGtrdsD49Q/hZ x8IIDdwjCwQrVyfpkSfiOzSR/kTtaOTB5g6a0L80ZDdqIgw1CVYPDl2/DHR5bvdpSdWu ZLcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=hwJM/ZlgFjyFV1UZJOtO/JrFQpKM9aFcwwRvAsaGpko=; b=MGkJJhHzvzOOTCBorri9xwq9bgzx0nQs6UAjy4kywHKOcd9BxiVR+1pnVY/6w7QQmU r2OMAV6Rql0peC7IwsGFrBXY31k76KK1IyHIUScQTHK8hnKAoyIbJrAF+9r3HHDSCfid qna0DZppJT0crNhYqfxtNbMRCxODXZg4/a+vRkPfZBV4YlP2fnR7/+z0qnfdbYpNRRFN hsTjJ8Y8mKI5vtKYo3R7hWKO0yBjyjb3y+Yp4r3NLE9WFB+fbYFa6pk92qRGRIyNk5VO o596IlW0vAYoo05cZhiOY2zFcmrHvz9uAO/kHm+7sI9pUZcoYJxEq2783eguRZE7UFkq 64WA== X-Gm-Message-State: ACrzQf0KhEb23/JUyB/0Ey5+kCGDbEvS2q8Tu/0IiaV0UAlFDjgTxFpg WcUTcTmZM31/OinxAOxpyzfVljKwEvytQQ== X-Received: by 2002:a2e:91d7:0:b0:26a:c623:ad26 with SMTP id u23-20020a2e91d7000000b0026ac623ad26mr6563160ljg.512.1664806237276; Mon, 03 Oct 2022 07:10:37 -0700 (PDT) Received: from gilgamesh.lab.semihalf.net ([83.142.187.85]) by smtp.gmail.com with ESMTPSA id k15-20020a05651239cf00b00499b19f23e8sm1470610lfu.279.2022.10.03.07.10.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 07:10:36 -0700 (PDT) From: Grzegorz Bernacki To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Oct 2022 16:10:13 +0200 Message-Id: <20221003141020.3564715-1-gjb@semihalf.com> X-Mailer: git-send-email 2.29.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, hum@semihalf.com, martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 7+uByS9s91ob Changes since v1: - changed tabs to spaces - modified branch instruction in vsse8 - apply Martin's patches with improved instructions scheduling Grzegorz Bernacki (4): lavc/aarch64: Add neon implementation for pix_abs8 functions. lavc/aarch64: Provide neon implementation of nsse8 lavc/aarch64: Provide optimized implementation of vsse8 for arm64. lavc/aarch64: Add neon implementation for vsse_intra8 Martin Storsjö (3): aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon aarch64: me_cmp: Improve scheduling in vsse_intra8 libavcodec/aarch64/me_cmp_init_aarch64.c | 33 ++ libavcodec/aarch64/me_cmp_neon.S | 414 +++++++++++++++++++++++ 2 files changed, 447 insertions(+)