From patchwork Mon Jul 3 19:04:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 34948 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp4574222pzh; Mon, 3 Jul 2023 12:05:10 -0700 (PDT) X-Google-Smtp-Source: APBJJlF6MpM311bn71labt+1sxsfSoFDEaFLZpT0Mjajv0b2IQQpGeuXfM9/0Wn1GQ+R8y3nYGrG X-Received: by 2002:a17:906:fa1a:b0:992:6939:2998 with SMTP id lo26-20020a170906fa1a00b0099269392998mr8645478ejb.27.1688411109756; Mon, 03 Jul 2023 12:05:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688411109; cv=none; d=google.com; s=arc-20160816; b=OD40UE7udSZs89y9UbPAlVdZFzqBLdz82v9evi2qS1YDdp1e+1Eg92ji+mHK4nj1N9 Fj90UM09OWaT7V2Pnhwu+ya6CMJ+1ToHuCW/iSri9W56NQHAXPTtVQ8PemOB3pWHkZmM ccTjFwNn67Zis32qriYDL2pbl+yP8qZes+DdnvZFxX1CXmnNt0lhUnqReRuHkPrA1bBF Pp2U79FZJ2qqQ80ShX2j7KAc0ajE3CUdchT0cFwGB5GVvVy48Ar3JQg+saSzXd2qTVJC Nn7R0OUgt6SxYl5XHuIryJ4yk1LgrXEp6HHO3F60Qskfo5nJJK0w0CgnDcZKcV1iBobE HcFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=l2XIe3dSV/rxSKv9h7lzNcRC+DSgfNX40YsOCp5pY0M=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=kT2FWeS5URsOX8UjN6CEAsOtpN28ASwDP0PVOegKxJ57BaijMEBoa1fz8LWM4Tgt+p NtC5HLd0Cc19VDh5uANSmQrq1e8KbpT4gppq5XDiFcJGhmt1yyHj6izN008HsTUTk1O7 or6EdppvEPUifSYFBNmvwY4T28+kV294UnlW5P5A1uSiT6+qEhctbhlA6rudIMxyz5UV zBbuw0NaolUlK2FscNPSg6P2BPt2uqbwo/kzdR11Hms1mrRzFzFy4vuELAn1EhDR8q2E gCzfMEcqNeq6req3WbYExsJp3TjJxM3UhE+nrUNEQ8EHEChpCi922gGJj4CRd7OBtJzM 3V0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=2J30edeS; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g15-20020a1709064e4f00b00992f1a3b9cfsi4198261ejw.363.2023.07.03.12.04.46; Mon, 03 Jul 2023 12:05:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=2J30edeS; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D0DDB68C56D; Mon, 3 Jul 2023 22:04:41 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A090E68C26E for ; Mon, 3 Jul 2023 22:04:35 +0300 (EEST) Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-3fbc6ab5ff5so43993335e9.1 for ; Mon, 03 Jul 2023 12:04:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688411074; x=1691003074; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=W7w7dMzubGehwNg2e7vPYBZFapT4npmU0wrvGsSquz0=; b=2J30edeSH5P7h6hbSwy5WsrUhOewr/OD5WEChfXjVLv8e525G9VPjkbRkq0rDluN2U jHjrsHxpaCy04N10j8Nu3f5Fzc0Xx2XxlB7H+jAKWDvISDrTL1AXC5P4QUhorRs2OEUr 44JlPEsjEmQ6FdEivNywHpMhf6AyEPLds0EqZuQqjOVuninphriIAHJ5HLtYQWHrXcMl qgp/fdC+Ylc/XJwyZETwY4XHltdSAl3VGQNz9atimSol7vfWijAIpEqE80h/OkREGM3D weapDt6AI/nhwfl0uzc4jr0tgtCTA2518vnIqS72u2vSVS1/7+jqo35h8pBIg/Ysxq71 DkDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688411074; x=1691003074; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=W7w7dMzubGehwNg2e7vPYBZFapT4npmU0wrvGsSquz0=; b=OEjL1uME2aqxDVnPFfJZSVOztQ9bucNi84OV2HG1KlM8YCs0faW5J3d89pkGRejo8B vP9biqjE44jNYw9wFfb2BmAaUP8wifdNNWBPFzKHu0HE30BSen6V4Ojz9BYRn+qoQRpD q3TzBjHHpNrJUbZ4pt8QNw+g7YuspP4AAi+iwr0MBIUvDPIdQBUWmKpIIGRXvPUClEpm 8wZEs7gWt1gfjxBI99/J3csrHsk89wWXDScKA/lYDNeFRUw7/boLxanhLywp37R5r9A+ fuMUttwaR30ChzfAivSDADG9wZ10gfcbuj/MXvzTGmJi++FG+yk3u61L3LCDpShEb44s Zq7g== X-Gm-Message-State: AC+VfDw/VF5TlBwKtgux29NEPAlvBsThgLnO4ajkj/bQxmjmUmKzrkFB ohLhZysiuP04ASC4QfBU/aBPmrNjrai58AGTsmE= X-Received: by 2002:a05:600c:22c3:b0:3fa:79af:15c8 with SMTP id 3-20020a05600c22c300b003fa79af15c8mr9333246wmg.23.1688411074567; Mon, 03 Jul 2023 12:04:34 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id o18-20020a056000011200b00313f9085119sm18972035wrx.113.2023.07.03.12.04.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jul 2023 12:04:34 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Jul 2023 19:04:03 +0000 Message-Id: <20230703190410.237473-1-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 0/7] avfilter/vf_bwdif: Add aarch64 neon functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: m7Zb0MgzSwMZ Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy Differences from v2: coeffs moved into const segment number of patches reduced John Cox (7): tests/checkasm: Add test for vf_bwdif filter_intra avfilter/vf_bwdif: Add neon for filter_intra tests/checkasm: Add test for vf_bwdif filter_edge avfilter/vf_bwdif: Add neon for filter_edge avfilter/vf_bwdif: Add neon for filter_line Exports C filter_line needed for tail fixup of neon code avfilter/vf_bwdif: Add a filter_line3 method for optimisation avfilter/vf_bwdif: Add neon for filter_line3 libavfilter/aarch64/Makefile | 2 + libavfilter/aarch64/vf_bwdif_init_aarch64.c | 125 +++ libavfilter/aarch64/vf_bwdif_neon.S | 793 ++++++++++++++++++++ libavfilter/bwdif.h | 20 + libavfilter/vf_bwdif.c | 70 +- tests/checkasm/vf_bwdif.c | 172 +++++ 6 files changed, 1167 insertions(+), 15 deletions(-) create mode 100644 libavfilter/aarch64/vf_bwdif_init_aarch64.c create mode 100644 libavfilter/aarch64/vf_bwdif_neon.S