From patchwork Thu Jun 29 17:57:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 34943 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp1945076pzh; Thu, 29 Jun 2023 10:58:31 -0700 (PDT) X-Google-Smtp-Source: APBJJlFQt0yvRT9YNTJPD4NBoRXLLrEhT3Pb3WUbvoT9JE1c8Yqi5ONOQ7UKG3hhhSjg/2NIKhC2 X-Received: by 2002:a17:906:4dcb:b0:96f:8439:6143 with SMTP id f11-20020a1709064dcb00b0096f84396143mr175610ejw.40.1688061511491; Thu, 29 Jun 2023 10:58:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688061511; cv=none; d=google.com; s=arc-20160816; b=vPtrojI40DpbIDlkS+wCufnFex2LJqSMW4XF6WfbLWqm6UwGnYVFaecHM2zJVCnYsW LrPM5yWAvoGvNF7sED/YWLIy55Z6iEcBCah1SjXcZJ1nzTbATbJWmL+/Qd2ZYvVe2TgL adJMOHsKd5H/3rrEFfWaRslyqirV1F4Z0TRR3NgJzKtbasXvvnJxno4P8O/zxSswsEAL ebFX0ra2hnWw7drCVIdyRaW2aLYFu4ol7ey6OVtANfr7sXiW516LaHpaHNXmMs9J5wqs 8jdxSYGMhca0RhZuYK6ODuBVOfLz1eWu+UFUa0x94rzzDF0O525dRUExvuf0tpoRyp/K RV9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=IhtfHj4OJHJ9fsqosmLf582WQyd41EFu1Ckl5JnqpHM=; fh=4TI4rjEBNZIiFzmH/zgNEtnT9CzHjNyw0MAPqTxWP34=; b=P13tMCr+Hiznxz3/Nsiv4xSvO6MHcgr7FtgDQj8+kKzD8D35A+oiWvJvpl/L+x2l6P idvIlUfXdl96b7XPgqjcA/6XRfHLEo0q7RQYk6dbVqTv9rOac5r8NFIXuHxF8M/Ndod7 43EkzM3kT2GyLsEfSONq/idjxGbt30Z+J/JN03bewCY2DYh8Vqy3YvJCtInCuvm21cgZ bGTQ54FwJsr4Z6nadm3oBQo1GLTSzEezm5CEv2rnEP+h8KKGLdM0WEPdalCWcvQVufWj d7vwkErA0DDyqwsDoW9diUrrVuZ48Ak3wGMqZkAB5fCJsjzmamrDPvBp/Q5UZz0P3Snh N1gQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b="Uy/ngUTv"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t7-20020a17090616c700b0098cf69d9447si6322642ejd.820.2023.06.29.10.58.30; Thu, 29 Jun 2023 10:58:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b="Uy/ngUTv"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 07A9E68C27B; Thu, 29 Jun 2023 20:58:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2C5ED68AFA8 for ; Thu, 29 Jun 2023 20:58:20 +0300 (EEST) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-3128fcd58f3so1096168f8f.1 for ; Thu, 29 Jun 2023 10:58:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688061499; x=1690653499; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Upo+kfMHK0xCC0oIZ5LY7gWd1SO6xQl8KYwAvRQ3lN8=; b=Uy/ngUTvl79OrB+hS3Fc6C5oImRmaBn2p00TdmqSikhiU9UTLn8a2lj3ZH+PK4/ghX AmRXngiMc/NxEOc6KWfiL3OiKZBJkaN3/WkdAzQVX42pBTPqRRQNuAq1z1PrpnqM6f4y DEe40iZCtlqrxAKiyD/HSe4Vxwt1LrBheRE8lmwgGBdjara3y0voJHVZQqaOFzwa9eVR 7UBgQre53fEifJi+ROiahoaNUZJFYPn/V+JVItGsfIxl80E0MZYFLXxmAE6m5SU2W+NB dzmM2aakplYcyjXCGW6MR8BLjWa11oV/+BKZ9y96yE/ciKWmGA+JDn/Ip2ez6851TPKH uajA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688061499; x=1690653499; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Upo+kfMHK0xCC0oIZ5LY7gWd1SO6xQl8KYwAvRQ3lN8=; b=NDrS/sYZW5kaEMF2zg6tXtz/A5sYTYmRPsyiGvYxiHLDA/bvhK1fcIS03uy/bQtK+R 2z6GiBL07YTlpX/2O1DpgIUrmjfpXwzdKvfG35XNLrmhKl37e+PxzwqVdm0HRgF4IUTx bgVn6NizK6XeYBxwtM+2T1lJ9GBeEiUcWHhgfl7gbJXiIWA56A3xXHete7NVkwz7f3es n6c5ckHxX7fsX0pr9rtHdMGs48c2yizmCONplAZcGB2OK84a+I3F315BJ9LBll6PVAJG bHUBW1XxYDNDvMRUti3aajfr7XgGXY2PV9DovBHPAFM+5MZaXpmLLg8vD4Pk32NY/SXS 2vvA== X-Gm-Message-State: AC+VfDxOpz1Pd2hCjHn6p+dsjPJYXbXgh/r0YknIsW2g23vfjh9gEK6H +CTABdW2bTuMWgfEa7fbHvrXGxT0sRZhNS7UnVY= X-Received: by 2002:a05:600c:2484:b0:3f9:b79d:a5fa with SMTP id 4-20020a05600c248400b003f9b79da5famr87252wms.16.1688061499209; Thu, 29 Jun 2023 10:58:19 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id f26-20020a7bcd1a000000b003fbba5f21b6sm2041541wmj.28.2023.06.29.10.58.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Jun 2023 10:58:18 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Thu, 29 Jun 2023 17:57:14 +0000 Message-Id: <20230629175729.224383-1-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 00/15] avfilter/vf_bwdif: Add aarch64 neon functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: BlW5cu7dJxWc Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy John Cox (15): avfilter/vf_bwdif: Add outline for aarch neon functions avfilter/vf_bwdif: Add common macros and consts for aarch64 neon avfilter/vf_bwdif: Export C filter_intra avfilter/vf_bwdif: Add neon for filter_intra tests/checkasm: Add test for vf_bwdif filter_intra avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon avfilter/vf_bwdif: Export C filter_edge avfilter/vf_bwdif: Add neon for filter_edge tests/checkasm: Add test for vf_bwdif filter_edge avfilter/vf_bwdif: Export C filter_line avfilter/vf_bwdif: Add neon for filter_line avfilter/vf_bwdif: Add a filter_line3 method for optimisation avfilter/vf_bwdif: Add neon for filter_line3 tests/checkasm: Add test for vf_bwdif filter_line3 avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines libavfilter/aarch64/Makefile | 2 + libavfilter/aarch64/vf_bwdif_init_aarch64.c | 125 ++++ libavfilter/aarch64/vf_bwdif_neon.S | 780 ++++++++++++++++++++ libavfilter/bwdif.h | 20 + libavfilter/vf_bwdif.c | 70 +- tests/checkasm/vf_bwdif.c | 172 +++++ 6 files changed, 1154 insertions(+), 15 deletions(-) create mode 100644 libavfilter/aarch64/vf_bwdif_init_aarch64.c create mode 100644 libavfilter/aarch64/vf_bwdif_neon.S