From patchwork Tue Aug 16 12:20:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hubert Mazur X-Patchwork-Id: 34789 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3d0d:b0:8d:a68e:8a0e with SMTP id y13csp1829177pzi; Tue, 16 Aug 2022 05:21:03 -0700 (PDT) X-Google-Smtp-Source: AA6agR4Q0IThid4zrpK2PI9aX3JkZYKGdHueBVxOA/lI2pVzdkQ0gyMrsxd9NuyHVP5skdHEIy8z X-Received: by 2002:a50:fe91:0:b0:43d:c97d:1b93 with SMTP id d17-20020a50fe91000000b0043dc97d1b93mr18998560edt.67.1660652463364; Tue, 16 Aug 2022 05:21:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660652463; cv=none; d=google.com; s=arc-20160816; b=yop5tqTRqhBjPWVkyF7CB1mX5QS9Qd1pG5/nrlZnObjYWK/YKPD+JakobZrDtEPHNI FcxmMgHObrBkzX3rFn7ypstNR4uz//K4KMf1u9eJDaLAOm9mP9lCgUwGt4bZ/OeWE47Z iA0AYX0b30C2QAfARy9QD4XWJQ+mYOJhsH9oD+l22OIblEevBXcKTXyFOBByjHVqMpw4 YOr4ctkGsJEUJ8IhCNy0jMZGYaHVh1MGSTy+8jqiHfXnrXEa8mVxGRCNqYIUMlALcFtv QxUVrAiw0uXDFGVb4opZnfBjiHfqZ43oVjsC2ZeYWXeU9Pn2rpi/uPEU2R8Z3f+CXqtl vqmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=g5cSJaJ69zGvDi+mNhhfGow/oDohfAoufIGwV9eDoEU=; b=xCoOzzOKOAqdQvEERUQ60z3qnyoVG3EgHLVcRYtTsTfyMjDS2Azc/pL46Gbus4T5KX gJ6fAz6DMaoDuJPGmX7Iwia4m2K23v1EdoOJvhSMlisn8pdv+K4MpcXPSKic+eldXxaf vdRP1wWPufVMaKhD5e7Lj5tINA8yhwG0eabXkCuhtYSoWk9gcTUnh96reZfZSfQRH1Vf sAZ0fqK6+A25XXCT2xQs7/BiloyE70BXYXqGS9o459lTP2V/eJreU1GG/+eZ0ZAKVhmJ gdQ49gBYA4IDQFYYd7EETYLEOGYdSMxcXGRF6+V9s8PQy6ybQDhw/1wYCTn+/wE4mM5N uaWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=ZNxyHkli; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c7-20020a170906694700b00730a539ac16si8819581ejs.982.2022.08.16.05.21.02; Tue, 16 Aug 2022 05:21:03 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@semihalf.com header.s=google header.b=ZNxyHkli; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3334D68B8E8; Tue, 16 Aug 2022 15:21:00 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C421468B41C for ; Tue, 16 Aug 2022 15:20:53 +0300 (EEST) Received: by mail-lj1-f170.google.com with SMTP id v10so10327405ljh.9 for ; Tue, 16 Aug 2022 05:20:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc; bh=EZbpt4DvKlaHik1tA7ecuqinC8K0vlW36rO8vGvB85I=; b=ZNxyHklitSUvn/MwZ8urHGzteCmfeqL7Xefnj483QlpMqdEEy30J0TImmiFHClode5 6Ry6G+QWQAhtf08SUDQ0v5yEUJ35xkeMuscU2Li0iPbR3VrOXw/Uob6Pkn+MWZ984nDV whT0OKcr1P4w2VOUZJFPbvUzUXhiCgppHo0/NPGItHHmOudwk3lPc9MgxT8Hb9qZmnVt qCg8vYHB3bIhaPVnY8DKdj19goIWk2pFoAx7rW6076sU0amPgu4XXl8suCRMuWTK3PLN PRnxd/83XDQnMa1f3Iu6OP7JWAGpQJba1nbQRqoBWo2nZc8wW7HeswZraU58STX8Iftv PafQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc; bh=EZbpt4DvKlaHik1tA7ecuqinC8K0vlW36rO8vGvB85I=; b=UdYtqi1gSweaTEaaY7KlV5kHng7++qWT9hPTKtlZhIdtw93Tnz4/QMf1x/rdRerX5N qPdaPBN0s++nkgsw0eMJdZ4ZOOrC1MoK5kJzbBgA2B5UaZGWBeDVSP0WWJirczdZdLsU nJGmCyhTzNbA/aNOdpEZYv9I+sZ/+eCFbPuqdbag1+3bd+DQIfZt1c4zmoRZIVz4HI2d VveY4qVJraXQCf945To1Owzq7xkTNbxWVW9W0yaJ+vnCV1N1Ylppyevwj8QYGE4bfz4I HKQQFlKRrb1bk2xyGVSAdQHkkvKItXqi4KR8SWX6pDmj5vyKolytQAEhULeXswGzahCh ZZxA== X-Gm-Message-State: ACgBeo3jWOeCG8oSW+zTcm7AbcOkqeC4lqm/C4xo/n2LATOEerFAjEtI a+5edEbB5i5X9RdZ98MlV8uqx4kaJUf9sxUN X-Received: by 2002:a2e:be05:0:b0:25f:d8fb:afbc with SMTP id z5-20020a2ebe05000000b0025fd8fbafbcmr6732582ljq.369.1660652452589; Tue, 16 Aug 2022 05:20:52 -0700 (PDT) Received: from hum-HP-ProBook-440-G7.office.semihalf.net ([83.142.187.84]) by smtp.gmail.com with ESMTPSA id p20-20020ac24ed4000000b0048b998be041sm1365707lfr.309.2022.08.16.05.20.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Aug 2022 05:20:51 -0700 (PDT) From: Hubert Mazur To: ffmpeg-devel@ffmpeg.org Date: Tue, 16 Aug 2022 14:20:11 +0200 Message-Id: <20220816122016.64929-1-hum@semihalf.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 0/5] Provide neon implementation for me_cmp functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, Hubert Mazur , martin@martin.st, mw@semihalf.com, spop@amazon.com Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Onlyuc2oDxao Add arm64 neon implementation for functions from motion estimation family. All of them were tested and benchmarked using checkasm tool. The rare code paths, e.g. when filter_size % 4 != 0 were also tested. Instructions were manualy deinterleaved to reach best performance. Hubert Mazur (5): lavc/aarch64: Add neon implementation for sse16 lavc/aarch64: Add neon implementation for sse4 lavc/aarch64: Add neon implementation for pix_abs16_y2 lavc/aarch64: Add neon implementation for sse8 lavc/aarch64: Add neon implementation for pix_abs8 libavcodec/aarch64/me_cmp_init_aarch64.c | 18 ++ libavcodec/aarch64/me_cmp_neon.S | 324 +++++++++++++++++++++++ 2 files changed, 342 insertions(+)