From patchwork Tue Jul 4 14:04:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 34950 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125227pzh; Tue, 4 Jul 2023 07:05:42 -0700 (PDT) X-Google-Smtp-Source: APBJJlERFEcT7ukm4fvUXz/N9J/gCJ5T+jYsdUTeLiY8MzBJUJcV0FlX58q0Vvi/hvvZ+MKJFrG4 X-Received: by 2002:a05:6512:2214:b0:4f9:556b:93c5 with SMTP id h20-20020a056512221400b004f9556b93c5mr10772536lfu.40.1688479542470; Tue, 04 Jul 2023 07:05:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479542; cv=none; d=google.com; s=arc-20160816; b=va6GsH2Bw5Ar+9aerMOugZs3lrWBo8nb8BUGmA0Sito+6/vCjcjt4F9DgbSiErolYR 63lwnIGltvQ0/3G2anAJgJqwdTeIqeG6nybq5pVJJv2qTCq0M2yH83r14lc29STyDGM3 KXSaSYTmtebVOuemjwUVuE/ieWwo4RRxY2UgkzyleK+b+/Hwmwcn+QAzTNAwj6eDOzZy 4CgguatRdXzrEzxT7ChONvmQMFBFWbHD3DbTtub4T/W4Xn6i7FYT1LOjlP8Iv6jxOtox Rl3Hx7TT7mvNTEOtDT20FOEEECXO0LcGGwwyF9iOa4b/B/jk30nSA2DH4/kxM/SvsVKh wTRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=UutM3zIT/N4H96U/JvTUXPB2+2egyG27TxxB4wYWcHc=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=FGsmu/ChxJCgNRbdLOez8lyu9oNiHF9jhW6g2VWgW4NloGzW5e3doVVpRWzYwtvBPQ oiY9NzHJu4Fe6tvqxZWuNysSYTReLz2yPuen6Syf5hSP9R1gy67D5waH2NL4U0/wsS7V HXvqQcwQKer+3Uhe1di+RAwvIudUtOYt1GCzo9nWPvscwDbOgEo2Q8lFYWlrylmo7Crr rDEYyFewmFc1RFUFkN+007zoasOwV/lne0iLmf9AzK4fpgDwI3HnG0L//ZBE2dHVkI21 DVFMbYsejDhwa2axQisYREpVkMvllIXFeRK6/nBeqimn8QhA/hK+Q/etwtmFvc9/3vyv ahEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=aXzxkDNC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u24-20020a170906951800b0098f99532db0si10989787ejx.660.2023.07.04.07.05.41; Tue, 04 Jul 2023 07:05:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=aXzxkDNC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8795568C5D4; Tue, 4 Jul 2023 17:05:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CBFF668C530 for ; Tue, 4 Jul 2023 17:05:29 +0300 (EEST) Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-3fb4146e8ceso68758405e9.0 for ; Tue, 04 Jul 2023 07:05:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=q4Zvf1SlgYHuDnx9FKiupP8j0UtJYDKYXD8aIXvg0uM=; b=aXzxkDNCkVGqk8HFBFZrWF1eDI1BjcSLs2B6CHJDeyPzqJs75dSnRPdUKMtZNwBueI lxivMAJIPhRCEIzk9ix4AdFmhM+y+oRcbMdfgB756Tj6wDeq2tFOT0XaEest4bm+nUTc nAmzFcaC+6OCvW8Y88l6Xp2AmohLhSvTZ7Z7FYomVWeD6VAte6Is3SV0toNft/w1bwBA ZQrQ0DWtVEG9Q9tqowCFhiOQ4PqfpI/vH+r4Z9WRvRyAC12nPnmRVc7tV8Z1gB45H4u7 jpvi1/q5f2jPEPJ95QTJWCoVmMNHH0gsnpjIdLyp/+9UzulvBmaJvnZrwceo5mUm2/Ef U82w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=q4Zvf1SlgYHuDnx9FKiupP8j0UtJYDKYXD8aIXvg0uM=; b=FF6uBvn5IWt/zMMqvfJHDyrM1TaJXTVWe9zQcizFSjW2o+to9CRUepWDYqs9Bhbgmk yAJiULF0FYnJMfp2gHrk6V7K4BfE+acQ6uedDc6xR4rP+egb8fomKLFVwmzz0T0p23RN wdfuHXF8pY/uq9Pr9Gh3uve9yU91agqNHhvezNxjM/0/QG+z8Vv6xHa7ErHCiGmcXU/2 4rfdKuhBfaMeC9HspZeSurqx9U+z4iOKn6tXbh/d8dooRx0irOdhQ64FnicLyMKzLSiU BsOoijPbvRdgcXibKk4biC2G62HEif/1wPKeVlA/05V42RQ+MgpyOj40NelEn3Wbpjrp QmEg== X-Gm-Message-State: AC+VfDwVhocraWqY490coJxriD4RSajuFLH/Fpe2bk+sGTxx7AoOCHdR McYlWfS6mKrxiJVQBuNYNFzngxe0iqrX2PqfPQA= X-Received: by 2002:a05:600c:2145:b0:3fa:993f:acc1 with SMTP id v5-20020a05600c214500b003fa993facc1mr10794459wml.2.1688479527250; Tue, 04 Jul 2023 07:05:27 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:26 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:38 +0000 Message-Id: <20230704140445.240426-1-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 0/7] avfilter/vf_bwdif: Add aarch64 neon functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: s9GwJ+GvII7h Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy Differences from v3: Remove a few lines of neon in filter_line that should have been removed when copying from line3 Sorry about the two patch sets in quick succession, but I think I've applied all the requested changes and I didn't want this mistake in the final patchset. (The mistake was benign - it just wasted a few cycles.) John Cox (7): tests/checkasm: Add test for vf_bwdif filter_intra avfilter/vf_bwdif: Add neon for filter_intra tests/checkasm: Add test for vf_bwdif filter_edge avfilter/vf_bwdif: Add neon for filter_edge avfilter/vf_bwdif: Add neon for filter_line avfilter/vf_bwdif: Add a filter_line3 method for optimisation avfilter/vf_bwdif: Add neon for filter_line3 libavfilter/aarch64/Makefile | 2 + libavfilter/aarch64/vf_bwdif_init_aarch64.c | 125 ++++ libavfilter/aarch64/vf_bwdif_neon.S | 788 ++++++++++++++++++++ libavfilter/bwdif.h | 20 + libavfilter/vf_bwdif.c | 70 +- tests/checkasm/vf_bwdif.c | 172 +++++ 6 files changed, 1162 insertions(+), 15 deletions(-) create mode 100644 libavfilter/aarch64/vf_bwdif_init_aarch64.c create mode 100644 libavfilter/aarch64/vf_bwdif_neon.S