From patchwork Tue Jul 4 14:04:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42424 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125462pzh; Tue, 4 Jul 2023 07:05:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7xvvk306tm+3SggJw+uQxZoWSntLbLBIrK3/aVaGXvkpTjby+iLMhvvldWlnWhu+V3+LSb X-Received: by 2002:a17:906:90d4:b0:991:fef4:bb9 with SMTP id v20-20020a17090690d400b00991fef40bb9mr10795839ejw.58.1688479559038; Tue, 04 Jul 2023 07:05:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479559; cv=none; d=google.com; s=arc-20160816; b=bIJD6iTAfk9W0xuqQBm/Is9j0U6aDNejoZB9W/yktmCpWckE0CxDjUmlJQ9hqT/DL9 mjDpu4fd0AqHmouQsynCpQyx2Nu3JlXN9LGDb52bRqd5mICbRjyG5GKhOECcWjQ5G5aO l7u3z+6KlGM+rGGR4suNEbTrIfoKdxpZuNQm3LC5IHXkxJZP2yr3SobjqdMq3FXrIIT/ K6cJtmdqybIDuh4IBwPGxedVQq7Z8o8NCMA/aYN/xMRfT7dhN1DC3OBwqyOgJcr82v9J T/1JpnaOvqUvG29+5h1cmY6ptaWJXApUNJl0ItZPUY8OBj1BOQlwG0xp4+s9gOE9nzKL 2ORg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=mBkcAyzbSJsLVcdttt0nIhQxf7D3dZirnxr2KZ2eUHA=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=oI1ISYl3kDPuSLIjQfd9mOZvPgR1VcUvndgrGChrjXWtwUegiQ2k4i43wgjqNg4Qiq iDyEwxRLYRmBO93LBwG+S+zGYN+s3gb+bkxUICjOZpWsp5o9VqxB6oEEr4nFir+UNEaI Mzj2v9IwbdTUzTsVA6/nh5Aw+Mi8m0XyFlGf7TDQ0kSNbA99o6eFQ290y72RChygoZfs FCaxZPjitA64D+wdyp48Jncp24wa8fF3qpt2mgaJwN18GUSPD/YL//7fnhjzqRKaY4jq dTbhytImLJCTyBROUlQEJzMIy+4XblpGyZZYNzKYNqxnxLDJuZ3b7BuoXNF0B8FkNFs9 hVKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b="XR/wiB3y"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id p10-20020a056402074a00b0051dd240ed28si7609170edy.157.2023.07.04.07.05.58; Tue, 04 Jul 2023 07:05:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b="XR/wiB3y"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD9A368C5D3; Tue, 4 Jul 2023 17:05:37 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E9EA368C5BC for ; Tue, 4 Jul 2023 17:05:29 +0300 (EEST) Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-3fbd33a573cso29121135e9.2 for ; Tue, 04 Jul 2023 07:05:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C/w6PoAuGLGL3ZTYafoZAe58Ptm7jXiB1zU85IKuR+g=; b=XR/wiB3y8Bh6VNs2DsxGb0AvDtKfHg48sjkMH/Q2ZQaw23SOtq6PTkRY15yOZBpM9z d1+frtcJZrSYNtA5QjrRr4igd2XfovkY1ow08+LN5ATLrkEmHguFVnTFs37qhn8z2/rI e98V/rOWI78xR93ZjJGReBTljthopm06pHu7do1yToyCLD9jS+uG7b6TtOZyNKFMRL+f w/aCYOGTVfIhbut2FrnVCMw+vBb6yHexRzgUuhlivNP6w1BHGJLWLpGk9rLc2LKjxF2l PE1w2vpiAXQ7hEFKM9itO7ukaMcQiLl1u4ciA2s2GSKewL4DiO+ZzIP3WN/Pmy7OIp5i dYnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C/w6PoAuGLGL3ZTYafoZAe58Ptm7jXiB1zU85IKuR+g=; b=CsOKzrHDqtT87Moo6mebf6SnGXybMhF8LPlqRCpxcu4ulzdn8szfIxZA6zMv86rSRZ BmYvlJgaCq7fG8lS6qJGTPwcL+0Ab+Tn+lwW6mnMvCIKPTioq1678R83gzoLJD+MB7aE MtVWHIJH7fLZ7Zc2vOz5BJmKDTfdmNN7jl6kRwIQFiUnrGkqj0RdcY9bEa5jBEPMk9F8 a74MIchzWhvOuEewBf/yMNL+AsmFNjHv4/Nul+ZQ5gLyQ/cbMHMMRkVMzLC2CPbDv3Lk avRNtpAM90vD3JeNmD0Sj6Ng2gDZokACjos4cPx5keCtj1OAzzkpXWw8hjeFDfnJR8AK 5wtQ== X-Gm-Message-State: AC+VfDzuiG2jLaRGFj+FQpDFtSxipYQdK91gumuvSFLUh3C6zAKOgh/f 6xZK/cJMNCLTrACHbSJohA/bslUzA1FKGq2TCOI= X-Received: by 2002:a05:600c:2219:b0:3fb:c217:7243 with SMTP id z25-20020a05600c221900b003fbc2177243mr10831971wml.18.1688479529159; Tue, 04 Jul 2023 07:05:29 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:28 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:39 +0000 Message-Id: <20230704140445.240426-2-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 1/7] tests/checkasm: Add test for vf_bwdif filter_intra X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lYwF5OtDnCP3 Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 46224bb575..034bbabb4c 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm/vf_bwdif.c @@ -20,6 +20,7 @@ #include "checkasm.h" #include "libavcodec/internal.h" #include "libavfilter/bwdif.h" +#include "libavutil/mem_internal.h" #define WIDTH 256 @@ -81,4 +82,40 @@ void checkasm_check_vf_bwdif(void) BODY(uint16_t, 10); report("bwdif10"); } + + if (check_func(ctx_8.filter_intra, "bwdif8.intra")) { + LOCAL_ALIGNED_16(uint8_t, cur0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, cur1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, dst0, [WIDTH*3]); + LOCAL_ALIGNED_16(uint8_t, dst1, [WIDTH*3]); + const int stride = WIDTH; + const int mask = (1<<8)-1; + + declare_func(void, void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max); + + randomize_buffers( cur0, cur1, mask, 11*WIDTH); + memset(dst0, 0xba, WIDTH * 3); + memset(dst1, 0xba, WIDTH * 3); + + call_ref(dst0 + stride, + cur0 + stride * 4, WIDTH, + stride, -stride, stride * 3, -stride * 3, + 0, mask); + call_new(dst1 + stride, + cur0 + stride * 4, WIDTH, + stride, -stride, stride * 3, -stride * 3, + 0, mask); + + if (memcmp(dst0, dst1, WIDTH*3) + || memcmp( cur0, cur1, WIDTH*11)) + fail(); + + bench_new(dst1 + stride, + cur0 + stride * 4, WIDTH, + stride, -stride, stride * 3, -stride * 3, + 0, mask); + + report("bwdif8.intra"); + } } From patchwork Tue Jul 4 14:04:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42425 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125667pzh; Tue, 4 Jul 2023 07:06:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4peFkTpUwa7dCRIkq6izpWnV15AEpwik63EPyLEd6rKBlqy2Yx0wHs35y4lU4k/ZihpT/o X-Received: by 2002:a05:6402:43cb:b0:51b:fd09:9ec1 with SMTP id p11-20020a05640243cb00b0051bfd099ec1mr16985458edc.0.1688479572252; Tue, 04 Jul 2023 07:06:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479572; cv=none; d=google.com; s=arc-20160816; b=E+KRQnKEng3j+I5M5j+LOavM+EQOCpnq1Nu8Ado0K8dbPJ41Z2SDAh88hQGg19EW87 IznSFIE5tSPEQ2H1lxln+2B4MES9FsIMA/knOthFwCayGArv09AKH0tLbnwrNEZ/xADp dtfryKjgwP2Lh/cmK4Sx1vPxtk+CM5+hjQkMm6+G/F/xaHd3G8zDvS0aD9riHth0go9g pOEb35PV7wZqqMHWpJjcmok1e5jaZlnGGZUaGJBuKamg+b2qKy4mbE0dqfzoJxMM3fGT S8HZuOzQ4uAUadtI+PkYjrhXKeQeJVhSFKpxVRqn08OYMSrLNvPkAEk1LdX8xHZB9cz+ xypg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=wX3i6M6nCMwBbjtlqD6zSZFguserXCa5+pRRkLyL1/o=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=WW3znd+Lb6GL3CWpvM5XHlV3Kko1UoRFJN38IUBhnPFKK2DpXLIlloxhDiZRt9UglI DWcUD7zo8hVDPavdxeiq6SOPyljH5jZrEPUsuZXtb/FjL4aAOCe7gIHbrtUYi6uV03AE 0HuHqAuZnntjAEue03Qi5HHgU2e2kVQaoCeNwfw5F6TurU04zdp4BBEsykizReB4Sgor cgE5+ZTZa0X/Z6oaiOBmBx1m/8gX6PHwcO+POO4AbphGmyOzitxMOnktWyLIyHYVsyrm osX33bWCVVq62jF2TwOeWTHkyd6SdAchntaJ51DpPkLXpIDrY/CvGJfQgPrLizOfGs3y h12g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=YUF5SARi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l3-20020aa7d943000000b0051dd4e48d6dsi7551262eds.32.2023.07.04.07.06.11; Tue, 04 Jul 2023 07:06:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=YUF5SARi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DFDD268C5F0; Tue, 4 Jul 2023 17:05:38 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5221768C593 for ; Tue, 4 Jul 2023 17:05:30 +0300 (EEST) Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-3fbab0d0b88so46996915e9.0 for ; Tue, 04 Jul 2023 07:05:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gQswBIUE/Y1WxsAVGtOcYEE4iTzqcM5hncNvle4fTNI=; b=YUF5SARiGo3rxgxr3x8z3NrcpZNuyHCmWOv3f6hpVRwmZFLkLFwGGM0GECnobwaNuC Q6CXQs512DsL6hJ58HK7HuTKMRa/4GSo7AvoWBQIidvkiTnWRqYlWG2Y3vqBGuLzKpIJ HO7ncZ5CgPvt+wy6md881Iv1zqPPmEXoFzTSojjd+qsDTdceD01sp960n2hxp+bnuGa6 cTwEoN9gjBu0hXir8DXfsbxfmcnjrwRqMXgrK/6iK51hvNwwkY7Ai5fhQ7vzKT+v/6bs VWMXsS6kiGriyU/KmS9x/WA1+V0c9O5uzgIHCM57PvvoMiDN3tcQpeaOYNKFOu0E5VnQ L3SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gQswBIUE/Y1WxsAVGtOcYEE4iTzqcM5hncNvle4fTNI=; b=Z8SFHlQuOe9Z8x53kEBcdd4xFVrk39V6l5exKubMwzJv1DRDmNG4PrJ7HAjBQIDYI4 M5tu6UijIvkZHLJJLKjovf3JTbfkWB6OGuIU/kluSvr30/9StYkjubdN2z0hMrVf056I Jqp9e29lDIMquyUQbQFuTMY8V4bEZaTFNy0h9ZQ6nDLxbEMAZF5cFcIfIUSriYmOFccD NVU83NcccCSUgR2QEuHAj/t40dVBlXR33IHEl9yFGmmp7RUhpcmGjH5ExA96z1hwSUGh FKJqqsNuwJE0geFzR/EEyXqejrPRKauTz8MI8S4GUQ5tDPJpDSMAPL7Y4//qq4SEy+aj 45XQ== X-Gm-Message-State: AC+VfDzSQ++wwKDDot01V4KlhVgv3t1ON6BnJgbdq50nSG+lkox5NV93 OO+mEYfHI+eLGvQy6NpE0AgLD9NPPyO/odnCDAE= X-Received: by 2002:a7b:ca57:0:b0:3fb:af9a:bf30 with SMTP id m23-20020a7bca57000000b003fbaf9abf30mr12342364wml.2.1688479529642; Tue, 04 Jul 2023 07:05:29 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:29 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:40 +0000 Message-Id: <20230704140445.240426-3-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 2/7] avfilter/vf_bwdif: Add neon for filter_intra X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: X6aireDPV4NQ Adds an outline for aarch neon functions Adds common macros and consts for aarch64 neon Exports C filter_intra needed for tail fixup of neon code Adds neon for filter_intra Signed-off-by: John Cox --- libavfilter/aarch64/Makefile | 2 + libavfilter/aarch64/vf_bwdif_init_aarch64.c | 56 ++++++++ libavfilter/aarch64/vf_bwdif_neon.S | 136 ++++++++++++++++++++ libavfilter/bwdif.h | 4 + libavfilter/vf_bwdif.c | 8 +- 5 files changed, 203 insertions(+), 3 deletions(-) create mode 100644 libavfilter/aarch64/vf_bwdif_init_aarch64.c create mode 100644 libavfilter/aarch64/vf_bwdif_neon.S diff --git a/libavfilter/aarch64/Makefile b/libavfilter/aarch64/Makefile index b58daa3a3f..b68209bc94 100644 --- a/libavfilter/aarch64/Makefile +++ b/libavfilter/aarch64/Makefile @@ -1,3 +1,5 @@ +OBJS-$(CONFIG_BWDIF_FILTER) += aarch64/vf_bwdif_init_aarch64.o OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/vf_nlmeans_init.o +NEON-OBJS-$(CONFIG_BWDIF_FILTER) += aarch64/vf_bwdif_neon.o NEON-OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/vf_nlmeans_neon.o diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c new file mode 100644 index 0000000000..3ffaa07ab3 --- /dev/null +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -0,0 +1,56 @@ +/* + * bwdif aarch64 NEON optimisations + * + * Copyright (c) 2023 John Cox + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/common.h" +#include "libavfilter/bwdif.h" +#include "libavutil/aarch64/cpu.h" + +void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max); + + +static void filter_intra_helper(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max) +{ + const int w0 = clip_max != 255 ? 0 : w & ~15; + + ff_bwdif_filter_intra_neon(dst1, cur1, w0, prefs, mrefs, prefs3, mrefs3, parity, clip_max); + + if (w0 < w) + ff_bwdif_filter_intra_c((char *)dst1 + w0, (char *)cur1 + w0, + w - w0, prefs, mrefs, prefs3, mrefs3, parity, clip_max); +} + +void +ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) +{ + const int cpu_flags = av_get_cpu_flags(); + + if (bit_depth != 8) + return; + + if (!have_neon(cpu_flags)) + return; + + s->filter_intra = filter_intra_helper; +} + diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S new file mode 100644 index 0000000000..e288efbe6c --- /dev/null +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -0,0 +1,136 @@ +/* + * bwdif aarch64 NEON optimisations + * + * Copyright (c) 2023 John Cox + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + + +#include "libavutil/aarch64/asm.S" + +// Space taken on the stack by an int (32-bit) +#ifdef __APPLE__ +.set SP_INT, 4 +#else +.set SP_INT, 8 +#endif + +.macro SQSHRUNN b, s0, s1, s2, s3, n + sqshrun \s0\().4h, \s0\().4s, #\n - 8 + sqshrun2 \s0\().8h, \s1\().4s, #\n - 8 + sqshrun \s1\().4h, \s2\().4s, #\n - 8 + sqshrun2 \s1\().8h, \s3\().4s, #\n - 8 + uzp2 \b\().16b, \s0\().16b, \s1\().16b +.endm + +.macro SMULL4K a0, a1, a2, a3, s0, s1, k + smull \a0\().4s, \s0\().4h, \k + smull2 \a1\().4s, \s0\().8h, \k + smull \a2\().4s, \s1\().4h, \k + smull2 \a3\().4s, \s1\().8h, \k +.endm + +.macro UMULL4K a0, a1, a2, a3, s0, s1, k + umull \a0\().4s, \s0\().4h, \k + umull2 \a1\().4s, \s0\().8h, \k + umull \a2\().4s, \s1\().4h, \k + umull2 \a3\().4s, \s1\().8h, \k +.endm + +.macro UMLAL4K a0, a1, a2, a3, s0, s1, k + umlal \a0\().4s, \s0\().4h, \k + umlal2 \a1\().4s, \s0\().8h, \k + umlal \a2\().4s, \s1\().4h, \k + umlal2 \a3\().4s, \s1\().8h, \k +.endm + +.macro UMLSL4K a0, a1, a2, a3, s0, s1, k + umlsl \a0\().4s, \s0\().4h, \k + umlsl2 \a1\().4s, \s0\().8h, \k + umlsl \a2\().4s, \s1\().4h, \k + umlsl2 \a3\().4s, \s1\().8h, \k +.endm + +.macro LDR_COEFFS d, t0 + movrel \t0, coeffs, 0 + ld1 {\d\().8h}, [\t0] +.endm + +// static const uint16_t coef_lf[2] = { 4309, 213 }; +// static const uint16_t coef_hf[3] = { 5570, 3801, 1016 }; +// static const uint16_t coef_sp[2] = { 5077, 981 }; + +const coeffs, align=4 // align 4 means align on 2^4 boundry + .hword 4309 * 4, 213 * 4 // lf[0]*4 = v0.h[0] + .hword 5570, 3801, 1016, -3801 // hf[0] = v0.h[2], -hf[1] = v0.h[5] + .hword 5077, 981 // sp[0] = v0.h[6] +endconst + +// ============================================================================ +// +// void ff_bwdif_filter_intra_neon( +// void *dst1, // x0 +// void *cur1, // x1 +// int w, // w2 +// int prefs, // w3 +// int mrefs, // w4 +// int prefs3, // w5 +// int mrefs3, // w6 +// int parity, // w7 unused +// int clip_max) // [sp, #0] unused + +function ff_bwdif_filter_intra_neon, export=1 + cmp w2, #0 + ble 99f + + LDR_COEFFS v0, x17 + +// for (x = 0; x < w; x++) { +10: + +// interpol = (coef_sp[0] * (cur[mrefs] + cur[prefs]) - coef_sp[1] * (cur[mrefs3] + cur[prefs3])) >> 13; + ldr q31, [x1, w4, sxtw] + ldr q30, [x1, w3, sxtw] + ldr q29, [x1, w6, sxtw] + ldr q28, [x1, w5, sxtw] + + uaddl v20.8h, v31.8b, v30.8b + uaddl2 v21.8h, v31.16b, v30.16b + + UMULL4K v2, v3, v4, v5, v20, v21, v0.h[6] + + uaddl v20.8h, v29.8b, v28.8b + uaddl2 v21.8h, v29.16b, v28.16b + + UMLSL4K v2, v3, v4, v5, v20, v21, v0.h[7] + +// dst[0] = av_clip(interpol, 0, clip_max); + SQSHRUNN v2, v2, v3, v4, v5, 13 + str q2, [x0], #16 + +// dst++; +// cur++; +// } + + subs w2, w2, #16 + add x1, x1, #16 + bgt 10b + +99: + ret +endfunc diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index 5749345f78..ae6f6ce223 100644 --- a/libavfilter/bwdif.h +++ b/libavfilter/bwdif.h @@ -39,5 +39,9 @@ typedef struct BWDIFContext { void ff_bwdif_init_filter_line(BWDIFContext *bwdif, int bit_depth); void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth); +void ff_bwdif_init_aarch64(BWDIFContext *bwdif, int bit_depth); + +void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max); #endif /* AVFILTER_BWDIF_H */ diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index e278cf1217..035fc58670 100644 --- a/libavfilter/vf_bwdif.c +++ b/libavfilter/vf_bwdif.c @@ -122,8 +122,8 @@ typedef struct ThreadData { next2++; \ } -static void filter_intra(void *dst1, void *cur1, int w, int prefs, int mrefs, - int prefs3, int mrefs3, int parity, int clip_max) +void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max) { uint8_t *dst = dst1; uint8_t *cur = cur1; @@ -362,13 +362,15 @@ av_cold void ff_bwdif_init_filter_line(BWDIFContext *s, int bit_depth) s->filter_line = filter_line_c_16bit; s->filter_edge = filter_edge_16bit; } else { - s->filter_intra = filter_intra; + s->filter_intra = ff_bwdif_filter_intra_c; s->filter_line = filter_line_c; s->filter_edge = filter_edge; } #if ARCH_X86 ff_bwdif_init_x86(s, bit_depth); +#elif ARCH_AARCH64 + ff_bwdif_init_aarch64(s, bit_depth); #endif } From patchwork Tue Jul 4 14:04:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42426 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125788pzh; Tue, 4 Jul 2023 07:06:21 -0700 (PDT) X-Google-Smtp-Source: APBJJlGKUTI/pPLoNqaj/ikW1vOHygRCI5Jhk9O37XBOqmXWuBBltHnJKjuZmKU7Zfk0WnvMQL0P X-Received: by 2002:a05:6402:1848:b0:51d:8967:325f with SMTP id v8-20020a056402184800b0051d8967325fmr10649595edy.36.1688479581671; Tue, 04 Jul 2023 07:06:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479581; cv=none; d=google.com; s=arc-20160816; b=UNmhArY0l4IwpnsodSIo2U2QsKz9zWPRbTeTWwFdYxNo3jPUQ5u+R0iPvYWpDcnMzB qF6HOSI+cBw+fZJX9jqjetjrvaII0i3Wt+G8ISmko2ex6smcKQ1YDl7z4qspdvGX+MdF zqe0nn0vWbPPM5EQYfQXTqJH1CMIPoylJzZtW++cM/BpD9uDoUCy1TM4byTBSEZJXbXu S3bhkMi6CqjVbcil38LM6ceQqCdUFZmj5D6ObYy6ydt2WbPlpJz1Ezwcuf74xIYoWoDu hsp7jKR9pts7Yc5eDzL6JLd1YNfFPS7tkOdiC1aRfode3KM8chDPz+wDrcJBQp06iW8F ZTJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=pVerx6Psiww/y7OXeU7omNQmy4mHleX9ybw4rId0XT0=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=PEXlXKct8ZpOLBvSvFnqCe/pWKsiw5rO4Aj19e3oBZE2NDCiAHlNC+ZooKQiA/7nhK t0uo+f7wuTvGNnWg1HzCtud0igqMBQbnG3RBfLHyfwVpcHtlSPo+wiNvCworQUrI6Dbl 6uIeNYw2zhWWCa8caleL/Yh/FaI4PcWx+f2mrY1tVP6UQYkVOhjfjEoclK+jXdyqgP9k 6vCKjtE0wbr11l1CV+0LMqtrIdWNwjju/ugjUp5zldV8Pp7NmD3C+XppDLkxxOFGpWRQ W7UFFUIBVaJDnHZfnbs68rOF8tgxyOfs6zlFe7xfsNONqSuIKaW7bt3L7zCoPTT2Ytic R+uQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=Lej0yQi6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dy19-20020a05640231f300b0051e0e38a950si3271981edb.637.2023.07.04.07.06.21; Tue, 04 Jul 2023 07:06:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=Lej0yQi6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1362168C5F4; Tue, 4 Jul 2023 17:05:40 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E676368C530 for ; Tue, 4 Jul 2023 17:05:30 +0300 (EEST) Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-3fbdfda88f4so10518645e9.1 for ; Tue, 04 Jul 2023 07:05:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479530; x=1691071530; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w3CBS6ChqvgGoB8egsttzaLTWOB20TnMyP6sgmBn6Ic=; b=Lej0yQi6Zd7vJObDtnyEmmT6deiEiqoFO33ejhy7KwX9O/yaVttovA3HQamoV2b+cH /LQdTFgiqbwdexr2w3cTkCjSr/BXfQCeQKHTM/lNC4H9z2MZHFGlWs9e0VId6IdJkGR2 PGG3V70lpkGLTjUN2RuNCFTMlslIXBsCBXzFCfEHE7CvQzDQYCZOhdYsCh4XnDUKZO9c LZuQuvL7LcVX1heZ2KLbA9DqvISV/Q4mFgAe/+bcdMRObbBJgmy9lU1sOVT5L1lEWcNG u2a9hqO8pK0t7TfoIimmrSKIOAYXsL/yXNZZ+8JbDYC5tbwQB4hQGX5mheBkqoEGLeYO dBdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479530; x=1691071530; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w3CBS6ChqvgGoB8egsttzaLTWOB20TnMyP6sgmBn6Ic=; b=SU34kwR0j6w+qhz1KIxPhle7lVORFDPvwMPk/8ymK3xsYUVsIMLlEiwAkaeWePncjI omuSSlxXG/zsHtvl/ReR/1avSwrfzgRc9TIKF1FBacTBIDGziXl2hiyEm7bR6btcvknB cBGxnAFigdPbvCKzDW+3lz2L6grpnXn6w/bgTbjXTvBSFK0VvNoAwIMoQCJqunWKrPHu yqE/Z+BgJFVQ1XD+mkdxjE6rcmScRwrkb4UlZQWN3E6Ok+2g7IrAp4j0v+3tigFYNtCI aw8pF6JIW+9yMcvOz2aVdAvW47EF5a5ExCnGtU8aFhmQLAWcbaqYWxt7Ne3hV9ppnYoo qS2w== X-Gm-Message-State: AC+VfDzT581dF33eodEKo3zqIsRsM+k0o3eFg9p2ZaYYRmaqv7oSFNeg yEQXR14ogyehozRRgHuhi5jPH/0n09HLNYEqZbc= X-Received: by 2002:a7b:c4d1:0:b0:3f8:f015:69c9 with SMTP id g17-20020a7bc4d1000000b003f8f01569c9mr11155252wmk.11.1688479530293; Tue, 04 Jul 2023 07:05:30 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:30 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:41 +0000 Message-Id: <20230704140445.240426-4-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 3/7] tests/checkasm: Add test for vf_bwdif filter_edge X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: nUZPe85BOD+/ Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 54 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 034bbabb4c..5fdba09fdc 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm/vf_bwdif.c @@ -83,6 +83,60 @@ void checkasm_check_vf_bwdif(void) report("bwdif10"); } + { + LOCAL_ALIGNED_16(uint8_t, prev0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, prev1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, next0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, next1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, cur0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, cur1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, dst0, [WIDTH*3]); + LOCAL_ALIGNED_16(uint8_t, dst1, [WIDTH*3]); + const int stride = WIDTH; + const int mask = (1<<8)-1; + int spat; + int parity; + + for (spat = 0; spat != 2; ++spat) { + for (parity = 0; parity != 2; ++parity) { + if (check_func(ctx_8.filter_edge, "bwdif8.edge.s%d.p%d", spat, parity)) { + + declare_func(void, void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat); + + randomize_buffers(prev0, prev1, mask, 11*WIDTH); + randomize_buffers(next0, next1, mask, 11*WIDTH); + randomize_buffers( cur0, cur1, mask, 11*WIDTH); + memset(dst0, 0xba, WIDTH * 3); + memset(dst1, 0xba, WIDTH * 3); + + call_ref(dst0 + stride, + prev0 + stride * 4, cur0 + stride * 4, next0 + stride * 4, WIDTH, + stride, -stride, stride * 2, -stride * 2, + parity, mask, spat); + call_new(dst1 + stride, + prev1 + stride * 4, cur1 + stride * 4, next1 + stride * 4, WIDTH, + stride, -stride, stride * 2, -stride * 2, + parity, mask, spat); + + if (memcmp(dst0, dst1, WIDTH*3) + || memcmp(prev0, prev1, WIDTH*11) + || memcmp(next0, next1, WIDTH*11) + || memcmp( cur0, cur1, WIDTH*11)) + fail(); + + bench_new(dst1 + stride, + prev1 + stride * 4, cur1 + stride * 4, next1 + stride * 4, WIDTH, + stride, -stride, stride * 2, -stride * 2, + parity, mask, spat); + } + } + } + + report("bwdif8.edge"); + } + if (check_func(ctx_8.filter_intra, "bwdif8.intra")) { LOCAL_ALIGNED_16(uint8_t, cur0, [11*WIDTH]); LOCAL_ALIGNED_16(uint8_t, cur1, [11*WIDTH]); From patchwork Tue Jul 4 14:04:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42427 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125949pzh; Tue, 4 Jul 2023 07:06:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlFGPEEGO28WzCzTwHjcZuZrAvcUUChi6F0u8RRRzvhJuIv0bmNY9rDQc2b9tHmobaEj3xZe X-Received: by 2002:aa7:c506:0:b0:51e:1690:1b9a with SMTP id o6-20020aa7c506000000b0051e16901b9amr3769202edq.29.1688479592014; Tue, 04 Jul 2023 07:06:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479591; cv=none; d=google.com; s=arc-20160816; b=qCkXK+Kb5mQj7dz3Kr/MFznSG6mU1go0Rbf9WVw5JiPOe0Swq7Ag+EgeDa7WvdO9b3 uTOR6+UbAIiRZISYkHgrIand743cHCEX0vJBKwaDWxBqt1qv8OZ9n3y5a1RwouFcVZSk bPVN9cjdeybSv/bEb3XW9Fd2BZ3ofHUVlxDy3IaNJbJd3yQUHWdePVCOmNjEy/iXqeV1 9LRZ7ONUQ1Ytp7CoND+hwrBkaCq6JGuMnnrSV96Axk5Y/ut4l36jyrWLutwBzMgAc6zg ifYy1mwSr3hsYAWnnhcQM29cMT8kDEicPDL7Suryj5xSsJ4OTX7ggObs66LJg9FRfq2P sbOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=myfxbphmPTHOtVgh+DPMHOeqw4FFkBRqMFwX0hNXO+w=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=mSpVCssvwu6XKzqavDxVOAbzKhKSzLlAVXfk8VCVWdgRk6KdApjvfIFsZDQBXvAkEJ P4WFSARtMSEA1MjzUW+6b5B5yP7tCXj2JUP4hiCKpWNyZv90+mcxDzLfhgX/+LF8VxyR cZO7OVzdE+82TnMaTWwhyoJdiqhuU7Kyr7+1/nqYg7g8IFkak3Ag+Aw6l1awmedCdthf qNRx2LUgnYYK3uK6CckdtyhHWGV+b+YZGcxKxq6kq+zyFveiW1FH2QXx8bq5ojNwa8p6 5TJxfG199xkn0sNQ5rJfe4IEqeqRO9Q+GUssIFkr7OZukSUdHuM1bPYZoYw5S6WQvCpf 5nqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=jycIqTDy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u19-20020aa7d893000000b0051dd07680f2si8441441edq.370.2023.07.04.07.06.31; Tue, 04 Jul 2023 07:06:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=jycIqTDy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 21E7B68C607; Tue, 4 Jul 2023 17:05:41 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6FF0A68C530 for ; Tue, 4 Jul 2023 17:05:31 +0300 (EEST) Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-3fbc244d39dso74443495e9.3 for ; Tue, 04 Jul 2023 07:05:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479531; x=1691071531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BX9JfA02WKRDOPinOWS/nABkTMDKqGKyK3FBK7ELZJg=; b=jycIqTDyWkoddP4UBUQsyjXeUoAD5GD7XnirClCa9ws+095CtiXmthA2VC10Q/0/YT e52DUQYaNz35nMl0DjiM6rzp50HMIUmwR4dlz0dGzU+VeDqGxzdJTNYDyu1mrg5uxT3q NXtfXDORmIYkyfrYedFuppUAL5ViEbOqqyHamvXQuriN2KNed/QjC8YTAK1HcYMUksp+ 1qElL8o4Wn1mtdkuL3RE5Fa80eeEc1bl04vYbeZML5L2E8ntgMhF85ggw/ZmIKoATdiI IxRAv+uRslSWvKheW0oueX4HjroXwMlCmCzD5wSeu3KUCWR45d4qMkq8+mt95aWAB/it 5lYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479531; x=1691071531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BX9JfA02WKRDOPinOWS/nABkTMDKqGKyK3FBK7ELZJg=; b=S2WRPoyaPE4FdlfCELv+57VcOGgDG21mUroU0vul7x3YwDfktK83UtGD/YoMvGvb10 YhGbbuZy4lU7Gh+vwCAA/533w9i5luLESxCSzEdIpRnc3RAgo7JAfrjuf89HW8ArkM1R 1qrkWiVup+Y6TiCJyoasHLYuRwaeu3LYSiJTxDsXawX5BZSdH7JEZfSBcjn5HftoLJ1g 15j+BW9jLNwb3NaVscmp5XUZ1NeMqWfQPe62feT8Xoj2xx+30IPiLYm7Q+lFd6yP3ALm lRRsUxka9dIw29TckWA3HocmGB5O9Nyu/zjAxNLI1ldwoteOYXH5anRpCZt+Og0EcU+o s9HQ== X-Gm-Message-State: AC+VfDzz+Q5HFKLkWx+f3rlB+eJt8zul4DtCFEqzfUtMDz872S4W+/Wu vFLS7E+ZzAgABHHcwV06g+ki7GGxzJKOUp3vfQo= X-Received: by 2002:a7b:cd13:0:b0:3fb:4053:a9d5 with SMTP id f19-20020a7bcd13000000b003fb4053a9d5mr18013269wmj.25.1688479530844; Tue, 04 Jul 2023 07:05:30 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:30 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:42 +0000 Message-Id: <20230704140445.240426-5-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 4/7] avfilter/vf_bwdif: Add neon for filter_edge X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KAPzdCODP071 Adds clip and spatial macros for aarch64 neon Exports C filter_edge needed for tail fixup of neon code Adds neon for filter_edge Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 +++ libavfilter/aarch64/vf_bwdif_neon.S | 177 ++++++++++++++++++++ libavfilter/bwdif.h | 4 + libavfilter/vf_bwdif.c | 8 +- 4 files changed, 205 insertions(+), 4 deletions(-) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c index 3ffaa07ab3..e75cf2f204 100644 --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -24,10 +24,29 @@ #include "libavfilter/bwdif.h" #include "libavutil/aarch64/cpu.h" +void ff_bwdif_filter_edge_neon(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat); + void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max); +static void filter_edge_helper(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat) +{ + const int w0 = clip_max != 255 ? 0 : w & ~15; + + ff_bwdif_filter_edge_neon(dst1, prev1, cur1, next1, w0, prefs, mrefs, prefs2, mrefs2, + parity, clip_max, spat); + + if (w0 < w) + ff_bwdif_filter_edge_c((char *)dst1 + w0, (char *)prev1 + w0, (char *)cur1 + w0, (char *)next1 + w0, + w - w0, prefs, mrefs, prefs2, mrefs2, + parity, clip_max, spat); +} + static void filter_intra_helper(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max) { @@ -52,5 +71,6 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) return; s->filter_intra = filter_intra_helper; + s->filter_edge = filter_edge_helper; } diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index e288efbe6c..389302b813 100644 --- a/libavfilter/aarch64/vf_bwdif_neon.S +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -66,6 +66,79 @@ umlsl2 \a3\().4s, \s1\().8h, \k .endm +// int b = m2s1 - m1; +// int f = p2s1 - p1; +// int dc = c0s1 - m1; +// int de = c0s1 - p1; +// int sp_max = FFMIN(p1 - c0s1, m1 - c0s1); +// sp_max = FFMIN(sp_max, FFMAX(-b,-f)); +// int sp_min = FFMIN(c0s1 - p1, c0s1 - m1); +// sp_min = FFMIN(sp_min, FFMAX(b,f)); +// diff = diff == 0 ? 0 : FFMAX3(diff, sp_min, sp_max); +.macro SPAT_CHECK diff, m2s1, m1, c0s1, p1, p2s1, t0, t1, t2, t3 + uqsub \t0\().16b, \p1\().16b, \c0s1\().16b + uqsub \t2\().16b, \m1\().16b, \c0s1\().16b + umin \t2\().16b, \t0\().16b, \t2\().16b + + uqsub \t1\().16b, \m1\().16b, \m2s1\().16b + uqsub \t3\().16b, \p1\().16b, \p2s1\().16b + umax \t3\().16b, \t3\().16b, \t1\().16b + umin \t3\().16b, \t3\().16b, \t2\().16b + + uqsub \t0\().16b, \c0s1\().16b, \p1\().16b + uqsub \t2\().16b, \c0s1\().16b, \m1\().16b + umin \t2\().16b, \t0\().16b, \t2\().16b + + uqsub \t1\().16b, \m2s1\().16b, \m1\().16b + uqsub \t0\().16b, \p2s1\().16b, \p1\().16b + umax \t0\().16b, \t0\().16b, \t1\().16b + umin \t2\().16b, \t2\().16b, \t0\().16b + + cmeq \t1\().16b, \diff\().16b, #0 + umax \diff\().16b, \diff\().16b, \t3\().16b + umax \diff\().16b, \diff\().16b, \t2\().16b + bic \diff\().16b, \diff\().16b, \t1\().16b +.endm + +// i0 = s0; +// if (i0 > d0 + diff0) +// i0 = d0 + diff0; +// else if (i0 < d0 - diff0) +// i0 = d0 - diff0; +// +// i0 = s0 is safe +.macro DIFF_CLIP i0, s0, d0, diff, t0, t1 + uqadd \t0\().16b, \d0\().16b, \diff\().16b + uqsub \t1\().16b, \d0\().16b, \diff\().16b + umin \i0\().16b, \s0\().16b, \t0\().16b + umax \i0\().16b, \i0\().16b, \t1\().16b +.endm + +// i0 = FFABS(m1 - p1) > td0 ? i1 : i2; +// DIFF_CLIP +// +// i0 = i1 is safe +.macro INTERPOL i0, i1, i2, m1, d0, p1, td0, diff, t0, t1, t2 + uabd \t0\().16b, \m1\().16b, \p1\().16b + cmhi \t0\().16b, \t0\().16b, \td0\().16b + bsl \t0\().16b, \i1\().16b, \i2\().16b + DIFF_CLIP \i0, \t0, \d0, \diff, \t1, \t2 +.endm + +.macro PUSH_VREGS + stp d8, d9, [sp, #-64]! + stp d10, d11, [sp, #16] + stp d12, d13, [sp, #32] + stp d14, d15, [sp, #48] +.endm + +.macro POP_VREGS + ldp d14, d15, [sp, #48] + ldp d12, d13, [sp, #32] + ldp d10, d11, [sp, #16] + ldp d8, d9, [sp], #64 +.endm + .macro LDR_COEFFS d, t0 movrel \t0, coeffs, 0 ld1 {\d\().8h}, [\t0] @@ -81,6 +154,110 @@ const coeffs, align=4 // align 4 means align on 2^4 boundry .hword 5077, 981 // sp[0] = v0.h[6] endconst +// ============================================================================ +// +// void ff_bwdif_filter_edge_neon( +// void *dst1, // x0 +// void *prev1, // x1 +// void *cur1, // x2 +// void *next1, // x3 +// int w, // w4 +// int prefs, // w5 +// int mrefs, // w6 +// int prefs2, // w7 +// int mrefs2, // [sp, #0] +// int parity, // [sp, #SP_INT] +// int clip_max, // [sp, #SP_INT*2] unused +// int spat); // [sp, #SP_INT*3] + +function ff_bwdif_filter_edge_neon, export=1 + // Sanity check w + cmp w4, #0 + ble 99f + +// #define prev2 cur +// const uint8_t * restrict next2 = parity ? prev : next; + + ldr w8, [sp, #0] // mrefs2 + + ldr w17, [sp, #SP_INT] // parity + ldr w16, [sp, #SP_INT*3] // spat + cmp w17, #0 + csel x17, x1, x3, ne + +// for (x = 0; x < w; x++) { + +10: +// int m1 = cur[mrefs]; +// int d = (prev2[0] + next2[0]) >> 1; +// int p1 = cur[prefs]; +// int temporal_diff0 = FFABS(prev2[0] - next2[0]); +// int temporal_diff1 =(FFABS(prev[mrefs] - m1) + FFABS(prev[prefs] - p1)) >> 1; +// int temporal_diff2 =(FFABS(next[mrefs] - m1) + FFABS(next[prefs] - p1)) >> 1; +// int diff = FFMAX3(temporal_diff0 >> 1, temporal_diff1, temporal_diff2); + ldr q31, [x2] + ldr q21, [x17] + uhadd v16.16b, v31.16b, v21.16b // d0 = v16 + uabd v17.16b, v31.16b, v21.16b // td0 = v17 + ldr q24, [x2, w6, sxtw] // m1 = v24 + ldr q22, [x2, w5, sxtw] // p1 = v22 + + ldr q0, [x1, w6, sxtw] // prev[mrefs] + ldr q2, [x1, w5, sxtw] // prev[prefs] + ldr q1, [x3, w6, sxtw] // next[mrefs] + ldr q3, [x3, w5, sxtw] // next[prefs] + + ushr v29.16b, v17.16b, #1 + + uabd v31.16b, v0.16b, v24.16b + uabd v30.16b, v2.16b, v22.16b + uhadd v0.16b, v31.16b, v30.16b // td1 = q0 + + uabd v31.16b, v1.16b, v24.16b + uabd v30.16b, v3.16b, v22.16b + uhadd v1.16b, v31.16b, v30.16b // td2 = q1 + + umax v0.16b, v0.16b, v29.16b + umax v0.16b, v0.16b, v1.16b // diff = v0 + +// if (spat) { +// SPAT_CHECK() +// } +// i0 = (m1 + p1) >> 1; + cbz w16, 1f + + ldr q31, [x2, w8, sxtw] + ldr q18, [x17, w8, sxtw] + ldr q30, [x2, w7, sxtw] + ldr q19, [x17, w7, sxtw] + uhadd v18.16b, v18.16b, v31.16b + uhadd v19.16b, v19.16b, v30.16b + + SPAT_CHECK v0, v18, v24, v16, v22, v19, v31, v30, v29, v28 + +1: + uhadd v2.16b, v22.16b, v24.16b + + // i0 = v2, s0 = v2, d0 = v16, diff = v0, t0 = v31, t1 = v30 + DIFF_CLIP v2, v2, v16, v0, v31, v30 + +// dst[0] = av_clip(interpol, 0, clip_max); + str q2, [x0], #16 + +// dst++; +// cur++; +// } + subs w4, w4, #16 + add x1, x1, #16 + add x2, x2, #16 + add x3, x3, #16 + add x17, x17, #16 + bgt 10b + +99: + ret +endfunc + // ============================================================================ // // void ff_bwdif_filter_intra_neon( diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae6f6ce223..ae1616d366 100644 --- a/libavfilter/bwdif.h +++ b/libavfilter/bwdif.h @@ -41,6 +41,10 @@ void ff_bwdif_init_filter_line(BWDIFContext *bwdif, int bit_depth); void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth); void ff_bwdif_init_aarch64(BWDIFContext *bwdif, int bit_depth); +void ff_bwdif_filter_edge_c(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat); + void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max); diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index 035fc58670..bec83111b4 100644 --- a/libavfilter/vf_bwdif.c +++ b/libavfilter/vf_bwdif.c @@ -150,9 +150,9 @@ static void filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, FILTER2() } -static void filter_edge(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int prefs2, int mrefs2, - int parity, int clip_max, int spat) +void ff_bwdif_filter_edge_c(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat) { uint8_t *dst = dst1; uint8_t *prev = prev1; @@ -364,7 +364,7 @@ av_cold void ff_bwdif_init_filter_line(BWDIFContext *s, int bit_depth) } else { s->filter_intra = ff_bwdif_filter_intra_c; s->filter_line = filter_line_c; - s->filter_edge = filter_edge; + s->filter_edge = ff_bwdif_filter_edge_c; } #if ARCH_X86 From patchwork Tue Jul 4 14:04:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42428 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5126117pzh; Tue, 4 Jul 2023 07:06:42 -0700 (PDT) X-Google-Smtp-Source: APBJJlEaIvAmhcMuIfF6mrahpU7jElK3keMFq+zeWMozEPa+/fy6QS/+W++4rjWtqWil/n8+jsi4 X-Received: by 2002:aa7:d3d4:0:b0:51d:e20c:59e4 with SMTP id o20-20020aa7d3d4000000b0051de20c59e4mr10078428edr.29.1688479602664; Tue, 04 Jul 2023 07:06:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479602; cv=none; d=google.com; s=arc-20160816; b=rMK4h8YS+J7t5HK0E3eUZ+kv9Ed/TJPbDIHj4Ebx8XhbEXhVeIue1b6bpsYLtTCBRt WNVBNxWJam4uBE0ufsraQVsvj3w/EdSpnSoB2c99qQpNUB6vkcZLEi8XpBnCz5lhfLhD K0L2/4ZaAUXZCG7c/vkI/TkgqOe980Uhnwc3FHN+xxWVngDMFAGhMwMKwF5EkwIYhTmq ELDJ/g1DHzdU4Lwvy0FD6Xzzh1v8UfMAwyzq9EESnpK67ammmbzP5+UJpzTave5KbAaY +8s530/Gn7h3/2/Y0w2j+ehNxQ4OF875K4PTFV1PssnDpK0TSTGSqrpeyyrOwd8LbWmj ZNew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=Umnm4e6zTf1GVPMU2DyqRuSMNf4i+wXcQEQU898AolU=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=ebY6Mco9dnHXsMgTG40GDnvyPMc7oN1/umrInSWnB8NQM6QkS0vL3TmDpvxtNKiEbJ KL0MDFYr9TXsgVPFO7G4EGS84UbBruVEHcNuEVzLwdwtWub0EpgQSoNLCj8W0nfdbTSW D31uBUW5HGhylzZIZKYam/TnLvgIOcQoIXqfp4y9Ma8QwCJvHQX1BmaSZ/TKCf+YwQMz 7PETeQXr9RCJj1hrvraAEIzPXLy8Y3eZi+B7p3t0HHLVCpsTh8+etdMIveSxhFwmmM8A awK2+UZt2S2RUEXfCI/mYHLq11azdN+5qyeHG/peWETmv00wiHUC+JvVpTK/9QBH1LYF sOTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=chYLz59I; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id be14-20020a0564021a2e00b0051df5573fcasi4867893edb.429.2023.07.04.07.06.42; Tue, 04 Jul 2023 07:06:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=chYLz59I; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 098F968C60C; Tue, 4 Jul 2023 17:05:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1A0D768C5D8 for ; Tue, 4 Jul 2023 17:05:32 +0300 (EEST) Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-3fbc0981733so55585485e9.3 for ; Tue, 04 Jul 2023 07:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479531; x=1691071531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GKz5/eI6Pkg3rpIkDa5lwGzO1IFqZrGDu9w50PdQHek=; b=chYLz59IJJcpOgcCXIAWbNZh4jridmuLW4zs+/oLNmFGmaIgO2Fl87K6JD3yiydvxm wa6+yZauHX0vfY4Kb93AvruIXARgZ17Fitd5nnIhEhdtJCsyJaJ3cK+ZiF7sJwX+bUFs DpTHtKgoFaTLN4k817Hx6lp19p7tXaUTZmkProDh3mhjd3FCftmNbnrDGZFSvnM/YkER FUud4/0u6Z1/mPWNxjJMRAEnNbPDr7c72Pg2ccwkP2HsYblCkpwyq8gp1tmLA3L/zHml gKLP6oN9IsHqMUP//dbWvxZS1+jh52MqcoIIPkEHVucKMsJFAkgwBKgvEilRaIRIbDuo 0Eqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479531; x=1691071531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GKz5/eI6Pkg3rpIkDa5lwGzO1IFqZrGDu9w50PdQHek=; b=LblxswAY4FGn0x76eG93DHTdmz/YxVjZbm6vzN4XPw8AhMB7nI6PWHg0JMVukuO5c+ 8OW2d0c8DucTc7qyoFhX3ud8807zDuMVlVaToVxbTLuUb4GFDu6wdZY8A0spXVCGVgJ7 TU3prHBiyhoj59SRYSPibrEBdk4+fEUdWYqROmj+zxbLEP4o6E29QzDktwu5r+mWhv8/ qFh373QvSrjF745BsLvuCKSrsgu5EPNOXFMD5BVLjXM+82NtYTFBeSXzmJtfC3FvxlzD Yx7YgkowhXiPrAIO6ziAGNouOJkNtK3dKyPPYxhp2+LCfo/76q6MSXHD6fpUrVbXmXyw os5A== X-Gm-Message-State: AC+VfDw2n9mV3mmFQphfplikw0xZutHB+RHHLB71yDXEm2KCGs1+SaHD fraN5xBJThm3UNAc+DDtmtKG7oPLGGaveM8Z7cM= X-Received: by 2002:a7b:c04a:0:b0:3fb:403d:90c5 with SMTP id u10-20020a7bc04a000000b003fb403d90c5mr10223685wmc.39.1688479531443; Tue, 04 Jul 2023 07:05:31 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:31 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:43 +0000 Message-Id: <20230704140445.240426-6-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 5/7] avfilter/vf_bwdif: Add neon for filter_line X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6GRL4OJqpzbm Exports C filter_line needed for tail fixup of neon code Adds neon for filter_line Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ libavfilter/aarch64/vf_bwdif_neon.S | 203 ++++++++++++++++++++ libavfilter/bwdif.h | 5 + libavfilter/vf_bwdif.c | 10 +- 4 files changed, 234 insertions(+), 5 deletions(-) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c index e75cf2f204..21e67884ab 100644 --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -31,6 +31,26 @@ void ff_bwdif_filter_edge_neon(void *dst1, void *prev1, void *cur1, void *next1, void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max); +void ff_bwdif_filter_line_neon(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int prefs3, int mrefs3, int prefs4, int mrefs4, + int parity, int clip_max); + + +static void filter_line_helper(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int prefs3, int mrefs3, int prefs4, int mrefs4, + int parity, int clip_max) +{ + const int w0 = clip_max != 255 ? 0 : w & ~15; + + ff_bwdif_filter_line_neon(dst1, prev1, cur1, next1, + w0, prefs, mrefs, prefs2, mrefs2, prefs3, mrefs3, prefs4, mrefs4, parity, clip_max); + + if (w0 < w) + ff_bwdif_filter_line_c((char *)dst1 + w0, (char *)prev1 + w0, (char *)cur1 + w0, (char *)next1 + w0, + w - w0, prefs, mrefs, prefs2, mrefs2, prefs3, mrefs3, prefs4, mrefs4, parity, clip_max); +} static void filter_edge_helper(void *dst1, void *prev1, void *cur1, void *next1, int w, int prefs, int mrefs, int prefs2, int mrefs2, @@ -71,6 +91,7 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) return; s->filter_intra = filter_intra_helper; + s->filter_line = filter_line_helper; s->filter_edge = filter_edge_helper; } diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index 389302b813..f185e94e3c 100644 --- a/libavfilter/aarch64/vf_bwdif_neon.S +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -154,6 +154,209 @@ const coeffs, align=4 // align 4 means align on 2^4 boundry .hword 5077, 981 // sp[0] = v0.h[6] endconst +// =========================================================================== +// +// void filter_line( +// void *dst1, // x0 +// void *prev1, // x1 +// void *cur1, // x2 +// void *next1, // x3 +// int w, // w4 +// int prefs, // w5 +// int mrefs, // w6 +// int prefs2, // w7 +// int mrefs2, // [sp, #0] +// int prefs3, // [sp, #SP_INT] +// int mrefs3, // [sp, #SP_INT*2] +// int prefs4, // [sp, #SP_INT*3] +// int mrefs4, // [sp, #SP_INT*4] +// int parity, // [sp, #SP_INT*5] +// int clip_max) // [sp, #SP_INT*6] + +function ff_bwdif_filter_line_neon, export=1 + // Sanity check w + cmp w4, #0 + ble 99f + + // Rearrange regs to be the same as line3 for ease of debug! + mov w10, w4 // w10 = loop count + mov w9, w6 // w9 = mref + mov w12, w7 // w12 = pref2 + mov w11, w5 // w11 = pref + ldr w8, [sp, #0] // w8 = mref2 + ldr w7, [sp, #SP_INT*2] // w7 = mref3 + ldr w6, [sp, #SP_INT*4] // w6 = mref4 + ldr w13, [sp, #SP_INT] // w13 = pref3 + ldr w14, [sp, #SP_INT*3] // w14 = pref4 + + mov x4, x3 + mov x3, x2 + mov x2, x1 + + LDR_COEFFS v0, x17 + +// #define prev2 cur +// const uint8_t * restrict next2 = parity ? prev : next; + ldr w17, [sp, #SP_INT*5] // parity + cmp w17, #0 + csel x17, x2, x4, ne + + PUSH_VREGS + +// for (x = 0; x < w; x++) { +// int diff0, diff2; +// int d0, d2; +// int temporal_diff0, temporal_diff2; +// +// int i1, i2; +// int j1, j2; +// int p6, p5, p4, p3, p2, p1, c0, m1, m2, m3, m4; + +10: +// c0 = prev2[0] + next2[0]; // c0 = v20, v21 +// d0 = c0 >> 1; // d0 = v10 +// temporal_diff0 = FFABS(prev2[0] - next2[0]); // td0 = v11 + ldr q31, [x3] + ldr q21, [x17] + uhadd v10.16b, v31.16b, v21.16b + uabd v11.16b, v31.16b, v21.16b + uaddl v20.8h, v21.8b, v31.8b + uaddl2 v21.8h, v21.16b, v31.16b + + ldr q31, [x3, w6, sxtw] + ldr q23, [x17, w6, sxtw] + +// i1 = coef_hf[0] * c0; // i1 = v2-v5 + UMULL4K v2, v3, v4, v5, v20, v21, v0.h[2] + + ldr q30, [x3, w14, sxtw] + ldr q25, [x17, w14, sxtw] + +// m4 = prev2[mrefs4] + next2[mrefs4]; // m4 = v22,v23 + uaddl v22.8h, v23.8b, v31.8b + uaddl2 v23.8h, v23.16b, v31.16b + +// p4 = prev2[prefs4] + next2[prefs4]; // p4 = v24,v25, (p4 >> 1) = v12 + uhadd v12.16b, v25.16b, v30.16b + uaddl v24.8h, v25.8b, v30.8b + uaddl2 v25.8h, v25.16b, v30.16b + +// m3 = cur[mrefs3]; // m3 = v20 + ldr q20, [x3, w7, sxtw] + +// p3 = cur[prefs3]; // p3 = v21 + ldr q21, [x3, w13, sxtw] + +// i1 += coef_hf[2] * (m4 + p4); // (-m4:v22,v23) (-p4:v24,v25) + add v22.8h, v22.8h, v24.8h + add v23.8h, v23.8h, v25.8h + UMLAL4K v2, v3, v4, v5, v22, v23, v0.h[4] + + ldr q29, [x3, w8, sxtw] + ldr q23, [x17, w8, sxtw] + +// i1 -= coef_lf[1] * 4 * (m3 + p3); // - + uaddl v30.8h, v20.8b, v21.8b + uaddl2 v31.8h, v20.16b, v21.16b + + UMLSL4K v2, v3, v4, v5, v30, v31, v0.h[1] + + ldr q31, [x3, w12, sxtw] + ldr q27, [x17, w12, sxtw] + +// m2 = prev2[mrefs2] + next2[mrefs2]; // m2 = v22,v23, (m2 >> 1) = v13 + uhadd v13.16b, v23.16b, v29.16b + uaddl v22.8h, v23.8b, v29.8b + uaddl2 v23.8h, v23.16b, v29.16b + +// m1 = cur[mrefs]; // m1 = v24 + ldr q24, [x3, w9, sxtw] + +// p2 = prev2[prefs2] + next2[prefs2]; // p2 = v26, v27 +// temporal_diff2 = FFABS(prev2[prefs2] - next2[prefs2]); // td2 = v14 +// d2 = p2 >> 1; // d2 = v15 + uabd v14.16b, v31.16b, v27.16b + uhadd v15.16b, v31.16b, v27.16b + uaddl v26.8h, v27.8b, v31.8b + uaddl2 v27.8h, v27.16b, v31.16b + +// i1 -= coef_hf[1] * (m2 + p2); // (-m2:v22,v23*) (-p2:v26*,v27*) + add v22.8h, v22.8h, v26.8h + add v23.8h, v23.8h, v27.8h + UMLSL4K v2, v3, v4, v5, v22, v23, v0.h[3] + +// p1 = cur[prefs]; // p1 = v22 + ldr q22, [x3, w11, sxtw] + +// i2 = (coef_sp[0] * (m1 + p1) - coef_sp[1] * (m3 + p3)) >> 13; // (-m3:v20*) i2=v17 + uaddl v18.8h, v22.8b, v24.8b + uaddl2 v19.8h, v22.16b, v24.16b + UMULL4K v28, v29, v30, v31, v18, v19, v0.h[6] + + uaddl v18.8h, v20.8b, v21.8b + uaddl2 v19.8h, v20.16b, v21.16b + UMLSL4K v28, v29, v30, v31, v18, v19, v0.h[7] + + SQSHRUNN v17, v28, v29, v30, v31, 13 + +// i1 += coef_lf[0] * 4 * (m1 + p1); // p1 = v22, m1 = v24 + uaddl v26.8h, v24.8b, v22.8b + uaddl2 v27.8h, v24.16b, v22.16b + UMLAL4K v2, v3, v4, v5, v26, v27, v0.h[0] + + ldr q31, [x2, w9, sxtw] + ldr q29, [x4, w9, sxtw] + + ldr q30, [x2, w11, sxtw] + ldr q28, [x4, w11, sxtw] + +// i1 >>= 15; // i1 = v2, -v3, -v4*, -v5* + SQSHRUNN v2, v2, v3, v4, v5, 15 + +// { +// int t1 =(FFABS(prev[mrefs] - m1) + FFABS(prev[prefs] - p1)) >> 1; +// int t2 =(FFABS(next[mrefs] - m1) + FFABS(next[prefs] - p1)) >> 1; + uabd v30.16b, v22.16b, v30.16b + uabd v31.16b, v24.16b, v31.16b + uabd v28.16b, v22.16b, v28.16b + uabd v29.16b, v24.16b, v29.16b + uhadd v31.16b, v31.16b, v30.16b + uhadd v29.16b, v29.16b, v28.16b + +// diff0 = FFMAX3(temporal_diff0 >> 1, t1, t2); // diff0=v18 + ushr v18.16b, v11.16b, #1 + umax v18.16b, v18.16b, v31.16b + umax v18.16b, v18.16b, v29.16b + + // diff0 = v18, (m2 >> 1) = v13, m1 = v24, d0 = v10, p1 = v22, d2 = v15 + SPAT_CHECK v18, v13, v24, v10, v22, v15, v31, v30, v29, v28 + + // i1 = v2, i2 = v17, m1 = v24, d0 = v10, p1 = v22, td2 = v11, diff2 = v18 + INTERPOL v2, v2, v17, v24, v10, v22, v11, v18, v31, v30, v29 + +// dst[0] = av_clip_uint8(interpol); + str q2, [x0], #16 +// } +// +// dst++; +// cur++; +// prev++; +// prev2++; +// next++; +// } + + subs w10, w10, #16 + add x2, x2, #16 + add x3, x3, #16 + add x4, x4, #16 + add x17, x17, #16 + bgt 10b + + POP_VREGS +99: + ret +endfunc + // ============================================================================ // // void ff_bwdif_filter_edge_neon( diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae1616d366..cce99953f3 100644 --- a/libavfilter/bwdif.h +++ b/libavfilter/bwdif.h @@ -48,4 +48,9 @@ void ff_bwdif_filter_edge_c(void *dst1, void *prev1, void *cur1, void *next1, void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max); +void ff_bwdif_filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int prefs3, int mrefs3, int prefs4, int mrefs4, + int parity, int clip_max); + #endif /* AVFILTER_BWDIF_H */ diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index bec83111b4..26349da1fd 100644 --- a/libavfilter/vf_bwdif.c +++ b/libavfilter/vf_bwdif.c @@ -132,10 +132,10 @@ void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs FILTER_INTRA() } -static void filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int prefs2, int mrefs2, - int prefs3, int mrefs3, int prefs4, int mrefs4, - int parity, int clip_max) +void ff_bwdif_filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int prefs3, int mrefs3, int prefs4, int mrefs4, + int parity, int clip_max) { uint8_t *dst = dst1; uint8_t *prev = prev1; @@ -363,7 +363,7 @@ av_cold void ff_bwdif_init_filter_line(BWDIFContext *s, int bit_depth) s->filter_edge = filter_edge_16bit; } else { s->filter_intra = ff_bwdif_filter_intra_c; - s->filter_line = filter_line_c; + s->filter_line = ff_bwdif_filter_line_c; s->filter_edge = ff_bwdif_filter_edge_c; } From patchwork Tue Jul 4 14:04:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42429 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5126298pzh; Tue, 4 Jul 2023 07:06:54 -0700 (PDT) X-Google-Smtp-Source: APBJJlGxaC0d6lsUureXp/wRiUJTJ/mkXgS+mNufx27/RxHPUIls+QV17PPYYSQXK2oXSbAqdDY3 X-Received: by 2002:a17:906:a112:b0:988:71c8:9f3a with SMTP id t18-20020a170906a11200b0098871c89f3amr11742715ejy.16.1688479613942; Tue, 04 Jul 2023 07:06:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479613; cv=none; d=google.com; s=arc-20160816; b=LXJjCHf/M8ah0sKkjzzvtLzJcq/Eaef6qhoIVha6tF3MSqXfEeY50l3oMzObIgIsy8 b1kaFL5dpyB+nNB7JDIlrYo56dAHJqvKtdzEDqAr16xzQC2wMizIwVNAv6yh/SyeiGOU O1ru6leFWXd839N4lGOdURmYS7nS2fSEJo33xHcxNWLUa30M0qiChVjZuiEBdmUpqfRB 8PWo09CLpdQLVXiU00YZb9bYxUOmN5DG2rq2KGBpVDWi5513+XKHcIkRLHfgRx/cLxyR +7NBmM2GCfXx06Rl6aMHXB4j4xONfTg4C/iajO5uTwphAMK5pnc9RNsccldSBhXNHUKa Rrcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=KuC+g7Q4gtIKbHE9s36pEYQFhuZUF79rHO2rGBTCv70=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=Jr+oi/tbJxM3JOZATAgBC+6rHt15I1qrNEZHq8BOANdieRrdsSxcJgK5riT+gYbVTs zuMXSB/3MqOlMKgIiiWBZqCIp89olTrOLq6Uga4xuuceoMo6PM/2rOYb2ry2lC+T4zz5 TmISBb7aHnqlkth0Lwf+4UiPa5rOB+gCTocTqv0quipQbSPYs1fVEOCu9aAgYMXHSrIv nxc6p8xBY7RZUUYQiKvM/Z6Np4UVtFTv/0ZXmTgYyiVZ0/xd1cfGDcxMtWVBwMBClKvd GNpxahpCB/CFMZ2YKkv91s8xCS7U3//3eLh+11ZSc11/80GOOlVmTNFfXUfEwKp2o4hB e0jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=sEN8IAsY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id gg18-20020a170906e29200b0098d7390816asi14821105ejb.756.2023.07.04.07.06.53; Tue, 04 Jul 2023 07:06:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=sEN8IAsY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 17E9D68C5E2; Tue, 4 Jul 2023 17:05:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AA9AE68C54B for ; Tue, 4 Jul 2023 17:05:32 +0300 (EEST) Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-31297125334so4862728f8f.0 for ; Tue, 04 Jul 2023 07:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479532; x=1691071532; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H687VJhKJFwkKLWCcK28jqX8tG6UW/1CdnXuKPDp1d0=; b=sEN8IAsYAXDocbwhsQjY0JOL05piPRdRMeDqsXHkV1bziXSjsHQJnQXDF+oCk+ygLZ 197aJwQkt+bBl+IGHwVS1U2uYRiq5nc8ag8sCFCTpk+J0BMe7P/6wzyFZzoU+SKDbWG9 EHySOg5tMTIbmx30TyBABofNONWn5cvrMJn1kQWPA2zDTrZTsy/Gj7Oz3xa9XjVOgyLt pz7tWnGMdwMoUNL/+qFl8k0Py1kUc0ukjGyI6/tVkHgg7fTsXJG2wnsoi+I7pi35D+21 +5sw/CmusKsBLD+rrJmXKmNmPrTdCiHpviW+7riLqNlu8xV8LLGkj/9GzlYADrKnQ0H7 1fLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479532; x=1691071532; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H687VJhKJFwkKLWCcK28jqX8tG6UW/1CdnXuKPDp1d0=; b=HfImC4LSR4QHYM3M/M6oQAkpbQhgGI1zCiTPq6vme595KcIpTkAIU1PIDgeO4+LmKI /pksQUn2zA70im8g89q9hlYDzgOsXVy85CK8c77cCbaIYNiPfbtrqvG4c5Da1PoDIRjM CQdYSFo9zR5mtY1mC9OJvHl+ncE35ZZqVDCJ1gq1zMn76KD2ebA4Vw4rvGhEyAY+1adG XpAy72Z6FkIyEYiGZ9HcCPrPPGxzMJCVdYGH8XWdOvnD4lLAf3LJuctbLy3ZTAPqezHL ceS255AGPpNV3trwEbS7bekutRNCWL35cy9eTpFK+ocf/dgVcl6fJytzqyxLw1Ifz2PF PUTw== X-Gm-Message-State: AC+VfDygsSaq6bT2RS4DnPvFgbb1c3e/KefbxkXKl83MGcXyHbkETDZe 4P0bdbu4GBEWfwb+09qnzjg20BBWopPGjqS1M2s= X-Received: by 2002:a5d:6308:0:b0:30f:c050:88dd with SMTP id i8-20020a5d6308000000b0030fc05088ddmr16077824wru.8.1688479532029; Tue, 04 Jul 2023 07:05:32 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:31 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:44 +0000 Message-Id: <20230704140445.240426-7-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 6/7] avfilter/vf_bwdif: Add a filter_line3 method for optimisation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: W/vHBCaoAkND Add an optional filter_line3 to the available optimisations. filter_line3 is equivalent to filter_line, memcpy, filter_line filter_line shares quite a number of loads and some calculations in common with its next iteration and testing shows that using aarch64 neon filter_line3s performance is 30% better than two filter_lines and a memcpy. Adds a test for vf_bwdif filter_line3 to checkasm Rounds job start lines down to a multiple of 4. This means that if filter_line3 exists then filter_line will not sometimes be called once at the end of a slice depending on thread count. The final slice may do up to 3 extra lines but filter_edge is faster than filter_line so it is unlikely to create any noticable thread load variation. Signed-off-by: John Cox --- libavfilter/bwdif.h | 7 ++++ libavfilter/vf_bwdif.c | 44 +++++++++++++++++++-- tests/checkasm/vf_bwdif.c | 81 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 129 insertions(+), 3 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index cce99953f3..496cec72ef 100644 --- a/libavfilter/bwdif.h +++ b/libavfilter/bwdif.h @@ -35,6 +35,9 @@ typedef struct BWDIFContext { void (*filter_edge)(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int prefs2, int mrefs2, int parity, int clip_max, int spat); + void (*filter_line3)(void *dst, int dstride, + const void *prev, const void *cur, const void *next, int prefs, + int w, int parity, int clip_max); } BWDIFContext; void ff_bwdif_init_filter_line(BWDIFContext *bwdif, int bit_depth); @@ -53,4 +56,8 @@ void ff_bwdif_filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line3_c(void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int s_stride, + int w, int parity, int clip_max); + #endif /* AVFILTER_BWDIF_H */ diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index 26349da1fd..6701208efe 100644 --- a/libavfilter/vf_bwdif.c +++ b/libavfilter/vf_bwdif.c @@ -150,6 +150,31 @@ void ff_bwdif_filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, FILTER2() } +#define NEXT_LINE()\ + dst += d_stride; \ + prev += prefs; \ + cur += prefs; \ + next += prefs; + +void ff_bwdif_filter_line3_c(void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int s_stride, + int w, int parity, int clip_max) +{ + const int prefs = s_stride; + uint8_t * dst = dst1; + const uint8_t * prev = prev1; + const uint8_t * cur = cur1; + const uint8_t * next = next1; + + ff_bwdif_filter_line_c(dst, (void*)prev, (void*)cur, (void*)next, w, + prefs, -prefs, prefs * 2, - prefs * 2, prefs * 3, -prefs * 3, prefs * 4, -prefs * 4, parity, clip_max); + NEXT_LINE(); + memcpy(dst, cur, w); + NEXT_LINE(); + ff_bwdif_filter_line_c(dst, (void*)prev, (void*)cur, (void*)next, w, + prefs, -prefs, prefs * 2, - prefs * 2, prefs * 3, -prefs * 3, prefs * 4, -prefs * 4, parity, clip_max); +} + void ff_bwdif_filter_edge_c(void *dst1, void *prev1, void *cur1, void *next1, int w, int prefs, int mrefs, int prefs2, int mrefs2, int parity, int clip_max, int spat) @@ -212,6 +237,13 @@ static void filter_edge_16bit(void *dst1, void *prev1, void *cur1, void *next1, FILTER2() } +// Round job start line down to multiple of 4 so that if filter_line3 exists +// and the frame is a multiple of 4 high then filter_line will never be called +static inline int job_start(const int jobnr, const int nb_jobs, const int h) +{ + return jobnr >= nb_jobs ? h : ((h * jobnr) / nb_jobs) & ~3; +} + static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) { BWDIFContext *s = ctx->priv; @@ -221,8 +253,8 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int clip_max = (1 << (yadif->csp->comp[td->plane].depth)) - 1; int df = (yadif->csp->comp[td->plane].depth + 7) / 8; int refs = linesize / df; - int slice_start = (td->h * jobnr ) / nb_jobs; - int slice_end = (td->h * (jobnr+1)) / nb_jobs; + int slice_start = job_start(jobnr, nb_jobs, td->h); + int slice_end = job_start(jobnr + 1, nb_jobs, td->h); int y; for (y = slice_start; y < slice_end; y++) { @@ -244,6 +276,11 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) refs << 1, -(refs << 1), td->parity ^ td->tff, clip_max, (y < 2) || ((y + 3) > td->h) ? 0 : 1); + } else if (s->filter_line3 && y + 2 < slice_end && y + 6 < td->h) { + s->filter_line3(dst, td->frame->linesize[td->plane], + prev, cur, next, linesize, td->w, + td->parity ^ td->tff, clip_max); + y += 2; } else { s->filter_line(dst, prev, cur, next, td->w, refs, -refs, refs << 1, -(refs << 1), @@ -280,7 +317,7 @@ static void filter(AVFilterContext *ctx, AVFrame *dstpic, td.plane = i; ff_filter_execute(ctx, filter_slice, &td, NULL, - FFMIN(h, ff_filter_get_nb_threads(ctx))); + FFMIN((h+3)/4, ff_filter_get_nb_threads(ctx))); } if (yadif->current_field == YADIF_FIELD_END) { yadif->current_field = YADIF_FIELD_NORMAL; @@ -357,6 +394,7 @@ static int config_props(AVFilterLink *link) av_cold void ff_bwdif_init_filter_line(BWDIFContext *s, int bit_depth) { + s->filter_line3 = 0; if (bit_depth > 8) { s->filter_intra = filter_intra_16bit; s->filter_line = filter_line_c_16bit; diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 5fdba09fdc..3399cacdf7 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm/vf_bwdif.c @@ -28,6 +28,10 @@ for (size_t i = 0; i < count; i++) \ buf0[i] = buf1[i] = rnd() & mask +#define randomize_overflow_check(buf0, buf1, mask, count) \ + for (size_t i = 0; i < count; i++) \ + buf0[i] = buf1[i] = (rnd() & 1) != 0 ? mask : 0; + #define BODY(type, depth) \ do { \ type prev0[9*WIDTH], prev1[9*WIDTH]; \ @@ -83,6 +87,83 @@ void checkasm_check_vf_bwdif(void) report("bwdif10"); } + if (!ctx_8.filter_line3) + ctx_8.filter_line3 = ff_bwdif_filter_line3_c; + + { + LOCAL_ALIGNED_16(uint8_t, prev0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, prev1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, next0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, next1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, cur0, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, cur1, [11*WIDTH]); + LOCAL_ALIGNED_16(uint8_t, dst0, [WIDTH*3]); + LOCAL_ALIGNED_16(uint8_t, dst1, [WIDTH*3]); + const int stride = WIDTH; + const int mask = (1<<8)-1; + int parity; + + for (parity = 0; parity != 2; ++parity) { + if (check_func(ctx_8.filter_line3, "bwdif8.line3.rnd.p%d", parity)) { + + declare_func(void, void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int prefs, + int w, int parity, int clip_max); + + randomize_buffers(prev0, prev1, mask, 11*WIDTH); + randomize_buffers(next0, next1, mask, 11*WIDTH); + randomize_buffers( cur0, cur1, mask, 11*WIDTH); + + call_ref(dst0, stride, + prev0 + stride * 4, cur0 + stride * 4, next0 + stride * 4, stride, + WIDTH, parity, mask); + call_new(dst1, stride, + prev1 + stride * 4, cur1 + stride * 4, next1 + stride * 4, stride, + WIDTH, parity, mask); + + if (memcmp(dst0, dst1, WIDTH*3) + || memcmp(prev0, prev1, WIDTH*11) + || memcmp(next0, next1, WIDTH*11) + || memcmp( cur0, cur1, WIDTH*11)) + fail(); + + bench_new(dst1, stride, + prev1 + stride * 4, cur1 + stride * 4, next1 + stride * 4, stride, + WIDTH, parity, mask); + } + } + + // Use just 0s and ~0s to try to provoke bad cropping or overflow + // Parity makes no difference to this test so just test 0 + if (check_func(ctx_8.filter_line3, "bwdif8.line3.overflow")) { + + declare_func(void, void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int prefs, + int w, int parity, int clip_max); + + randomize_overflow_check(prev0, prev1, mask, 11*WIDTH); + randomize_overflow_check(next0, next1, mask, 11*WIDTH); + randomize_overflow_check( cur0, cur1, mask, 11*WIDTH); + + call_ref(dst0, stride, + prev0 + stride * 4, cur0 + stride * 4, next0 + stride * 4, stride, + WIDTH, 0, mask); + call_new(dst1, stride, + prev1 + stride * 4, cur1 + stride * 4, next1 + stride * 4, stride, + WIDTH, 0, mask); + + if (memcmp(dst0, dst1, WIDTH*3) + || memcmp(prev0, prev1, WIDTH*11) + || memcmp(next0, next1, WIDTH*11) + || memcmp( cur0, cur1, WIDTH*11)) + fail(); + + // No point to benching + } + + report("bwdif8.line3"); + } + { LOCAL_ALIGNED_16(uint8_t, prev0, [11*WIDTH]); LOCAL_ALIGNED_16(uint8_t, prev1, [11*WIDTH]); From patchwork Tue Jul 4 14:04:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42430 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5126465pzh; Tue, 4 Jul 2023 07:07:05 -0700 (PDT) X-Google-Smtp-Source: APBJJlHGAYif5uTJnzBe5LczYQGgiVbZudnORr/PKhnRur/LWCQKs8hWka4ruzjSOqYKpW/56sZf X-Received: by 2002:a05:6512:707:b0:4f9:7aee:8dc5 with SMTP id b7-20020a056512070700b004f97aee8dc5mr8029681lfs.19.1688479624844; Tue, 04 Jul 2023 07:07:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479624; cv=none; d=google.com; s=arc-20160816; b=ICHeU8rl9dAk96wuKrsP3EV/fJjGLb6QphstNoVxSYqIGNm5EdQlm9LHoVuNcXU3VW TkPWNujmDn12lWxKl8u4HP+bw+2otBa6Ymob1L9nAmNoMiTVCYqSW/kVbJ+r5cFKTFdl NBYqr3ZTjXknaTfZmaCSvu2SM29+Tfp657F5PKaQAFLQ/uiBgQajXou27tInDbBHFLBk SzH5YEeWSgyaBRVFE4f9g7MOHBlnvrfpA71vC23UFny8xfE6KU5Mx3+1zPzfBAIk/V1E Vg6QSe/lL+SxeExZMlhAEBkx8wd6d1lXDg2IwWCrxaB7U7cXL7o/FW9iBfOmgtUQsrye M+7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=2ntkGVEzcJkHJRApAqYXVgpUs7uUP36LNzZtOARdC88=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=XKHWix7+3RDLMo0UQhInxNlX5U6huP7JwaOFaP09mS/ApENPwMLzfUj8N92SACH4EH 7vJMWkv6QRrKqF3uaz8uiFV9qoPgEfl1/IaMDoFWKGwC9kj8At8Bi4p4084+1jO7Gbre vDVSTnIAX59PP9ESWx4Eajym8pqvKhKG5TAtX291Q2boK5GmKnpc4A6PQDccBpPB9wLi 6c4J9WNLppK71exLwcbzf0soRAfkmSxUJIkSkUMCL3Nu6kx7cwUzHXuucYUveJtpDLLY JgoFaMqHtf63I0L+TRmUwZGVsPs0RjuH6VIjPysM/0EAhkw6e1jAL09qN1bY7RGZ6xOZ lOSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=SrShPI5O; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d25-20020a17090694d900b00992d0de8761si5788023ejy.909.2023.07.04.07.07.04; Tue, 04 Jul 2023 07:07:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=SrShPI5O; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 18B6568C616; Tue, 4 Jul 2023 17:05:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6B2B968C5E4 for ; Tue, 4 Jul 2023 17:05:33 +0300 (EEST) Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-3fbdfda88f4so10519485e9.1 for ; Tue, 04 Jul 2023 07:05:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479533; x=1691071533; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=p9HRI4Q0OBbdCnP3Os88Ucm+m8x+yYsLaCNkNM57z08=; b=SrShPI5OLHvsROGPYIzPHepKiKeYnd0mqBiP+Z4QDhpyytNe1HJq6Hc1LFD+uiTyWi H3+TPW+Ic2iFrUngttA8WkJJUZoDLsxTmcapkHi0oB45Wq0madZFu4bnzWkPucBut0Eq zx+ENYzOHVGLOigTR8PnCxWhjF90eBMnHmJn+DFNQERoWLuoXwtmxmc7G4ggkYAQr6nr F3i0Gb12AkjVuIv60IObgLKeISudIDd/KSfGVyp6FGkEBE57Bz8lVa4n04JG4DNAEXR/ HNyNBrMNVRlC295vvIyAWvrKJ1EkTBx0BOAlX88/9oHCAq2YfUl5+Qcrg8LuwyzkzgR8 CQBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479533; x=1691071533; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p9HRI4Q0OBbdCnP3Os88Ucm+m8x+yYsLaCNkNM57z08=; b=l5Wh7aG+zCI/RGcxuZowCv1I3brr9oW9GozVaG7bjeA3f7Wjc/OmoMEYwF+E4tOW0A FDkk5vSi0lu06eZGtB5ndSkSDNzeY6dI7vYiFPK7XdOr7X+ByGoZOfiYChJ3J0ac+jT+ 9hQflPOW6fiBn8jfkol32wWUuJelPIMTpF2ED6nW5dnTZtqTPCstpoeS5y5O/5s5ZFuJ vNDudrHw5Kj4aXgcrF/8eW8LQ3Zr/s67/WG23okLje7uhPd010iXuXXOGLdbahB6d4yV yc1RDwAsc881DdKo9S/dcLSBkn+p2PjkUGK/ZZB15zwN93zkF61XWiQ9x0n4MKsbEvv2 pRvw== X-Gm-Message-State: AC+VfDxW6LSG53huHOVX436eYy7UNUDxZtXN2VqQEVBu/1YHT7ty9xtH rY4BKTt6gDg6aU2KussMVYwJNC8VMAQB1RMMIAM= X-Received: by 2002:a05:600c:28b:b0:3fb:be82:d0e8 with SMTP id 11-20020a05600c028b00b003fbbe82d0e8mr10418930wmk.34.1688479532516; Tue, 04 Jul 2023 07:05:32 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:32 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:45 +0000 Message-Id: <20230704140445.240426-8-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 7/7] avfilter/vf_bwdif: Add neon for filter_line3 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: fJlEvGbNB30j Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++ libavfilter/aarch64/vf_bwdif_neon.S | 272 ++++++++++++++++++++ 2 files changed, 300 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c index 21e67884ab..f52bc4b9b4 100644 --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -36,6 +36,33 @@ void ff_bwdif_filter_line_neon(void *dst1, void *prev1, void *cur1, void *next1, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line3_neon(void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int s_stride, + int w, int parity, int clip_max); + + +static void filter_line3_helper(void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int s_stride, + int w, int parity, int clip_max) +{ + // Asm works on 16 byte chunks + // If w is a multiple of 16 then all is good - if not then if width rounded + // up to nearest 16 will fit in both src & dst strides then allow the asm + // to write over the padding bytes as that is almost certainly faster than + // having to invoke the C version to clean up the tail. + const int w1 = FFALIGN(w, 16); + const int w0 = clip_max != 255 ? 0 : + d_stride <= w1 && s_stride <= w1 ? w : w & ~15; + + ff_bwdif_filter_line3_neon(dst1, d_stride, + prev1, cur1, next1, s_stride, + w0, parity, clip_max); + + if (w0 < w) + ff_bwdif_filter_line3_c((char *)dst1 + w0, d_stride, + (const char *)prev1 + w0, (const char *)cur1 + w0, (const char *)next1 + w0, s_stride, + w - w0, parity, clip_max); +} static void filter_line_helper(void *dst1, void *prev1, void *cur1, void *next1, int w, int prefs, int mrefs, int prefs2, int mrefs2, @@ -93,5 +120,6 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) s->filter_intra = filter_intra_helper; s->filter_line = filter_line_helper; s->filter_edge = filter_edge_helper; + s->filter_line3 = filter_line3_helper; } diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index f185e94e3c..ae9aab20cd 100644 --- a/libavfilter/aarch64/vf_bwdif_neon.S +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -154,6 +154,278 @@ const coeffs, align=4 // align 4 means align on 2^4 boundry .hword 5077, 981 // sp[0] = v0.h[6] endconst +// =========================================================================== +// +// void ff_bwdif_filter_line3_neon( +// void * dst1, // x0 +// int d_stride, // w1 +// const void * prev1, // x2 +// const void * cur1, // x3 +// const void * next1, // x4 +// int s_stride, // w5 +// int w, // w6 +// int parity, // w7 +// int clip_max); // [sp, #0] (Ignored) + +function ff_bwdif_filter_line3_neon, export=1 + // Sanity check w + cmp w6, #0 + ble 99f + + LDR_COEFFS v0, x17 + +// #define prev2 cur +// const uint8_t * restrict next2 = parity ? prev : next; + cmp w7, #0 + csel x17, x2, x4, ne + + // We want all the V registers - save all the ones we must + PUSH_VREGS + + // Some rearrangement of initial values for nice layout of refs in regs + mov w10, w6 // w10 = loop count + neg w9, w5 // w9 = mref + lsl w8, w9, #1 // w8 = mref2 + add w7, w9, w9, LSL #1 // w7 = mref3 + lsl w6, w9, #2 // w6 = mref4 + mov w11, w5 // w11 = pref + lsl w12, w5, #1 // w12 = pref2 + add w13, w5, w5, LSL #1 // w13 = pref3 + lsl w14, w5, #2 // w14 = pref4 + add w15, w5, w5, LSL #2 // w15 = pref5 + add w16, w14, w12 // w16 = pref6 + + lsl w5, w1, #1 // w5 = d_stride * 2 + +// for (x = 0; x < w; x++) { +// int diff0, diff2; +// int d0, d2; +// int temporal_diff0, temporal_diff2; +// +// int i1, i2; +// int j1, j2; +// int p6, p5, p4, p3, p2, p1, c0, m1, m2, m3, m4; + +10: +// c0 = prev2[0] + next2[0]; // c0 = v20, v21 +// d0 = c0 >> 1; // d0 = v10 +// temporal_diff0 = FFABS(prev2[0] - next2[0]); // td0 = v11 + ldr q31, [x3] + ldr q21, [x17] + uhadd v10.16b, v31.16b, v21.16b + uabd v11.16b, v31.16b, v21.16b + uaddl v20.8h, v21.8b, v31.8b + uaddl2 v21.8h, v21.16b, v31.16b + + ldr q31, [x3, w6, sxtw] + ldr q23, [x17, w6, sxtw] + +// i1 = coef_hf[0] * c0; // i1 = v2-v5 + UMULL4K v2, v3, v4, v5, v20, v21, v0.h[2] + + ldr q30, [x3, w14, sxtw] + ldr q25, [x17, w14, sxtw] + +// m4 = prev2[mrefs4] + next2[mrefs4]; // m4 = v22,v23 + uaddl v22.8h, v23.8b, v31.8b + uaddl2 v23.8h, v23.16b, v31.16b + +// p4 = prev2[prefs4] + next2[prefs4]; // p4 = v24,v25, (p4 >> 1) = v12 + uhadd v12.16b, v25.16b, v30.16b + uaddl v24.8h, v25.8b, v30.8b + uaddl2 v25.8h, v25.16b, v30.16b + +// j1 = -coef_hf[1] * (c0 + p4); // j1 = v6-v9 (-c0:v20,v21) + add v20.8h, v20.8h, v24.8h + add v21.8h, v21.8h, v25.8h + SMULL4K v6, v7, v8, v9, v20, v21, v0.h[5] + +// m3 = cur[mrefs3]; // m3 = v20 + ldr q20, [x3, w7, sxtw] + +// p3 = cur[prefs3]; // p3 = v21 + ldr q21, [x3, w13, sxtw] + +// i1 += coef_hf[2] * (m4 + p4); // (-m4:v22,v23) (-p4:v24,v25) + add v22.8h, v22.8h, v24.8h + add v23.8h, v23.8h, v25.8h + UMLAL4K v2, v3, v4, v5, v22, v23, v0.h[4] + + ldr q29, [x3, w8, sxtw] + ldr q23, [x17, w8, sxtw] + +// i1 -= coef_lf[1] * 4 * (m3 + p3); // - + uaddl v30.8h, v20.8b, v21.8b + uaddl2 v31.8h, v20.16b, v21.16b + + ldr q28, [x3, w16, sxtw] + ldr q25, [x17, w16, sxtw] + + UMLSL4K v2, v3, v4, v5, v30, v31, v0.h[1] + +// m2 = prev2[mrefs2] + next2[mrefs2]; // m2 = v22,v23, (m2 >> 1) = v13 + uhadd v13.16b, v23.16b, v29.16b + uaddl v22.8h, v23.8b, v29.8b + uaddl2 v23.8h, v23.16b, v29.16b + + ldr q31, [x3, w12, sxtw] + ldr q27, [x17, w12, sxtw] + +// p6 = prev2[prefs6] + next2[prefs6]; // p6 = v24,v25 + uaddl v24.8h, v25.8b, v28.8b + uaddl2 v25.8h, v25.16b, v28.16b + +// j1 += coef_hf[2] * (m2 + p6); // (-p6:v24,v25) + add v24.8h, v24.8h, v22.8h + add v25.8h, v25.8h, v23.8h + UMLAL4K v6, v7, v8, v9, v24, v25, v0.h[4] + +// m1 = cur[mrefs]; // m1 = v24 + ldr q24, [x3, w9, sxtw] + +// p5 = cur[prefs5]; // p5 = v25 + ldr q25, [x3, w15, sxtw] + +// p2 = prev2[prefs2] + next2[prefs2]; // p2 = v26, v27 +// temporal_diff2 = FFABS(prev2[prefs2] - next2[prefs2]); // td2 = v14 +// d2 = p2 >> 1; // d2 = v15 + uabd v14.16b, v31.16b, v27.16b + uhadd v15.16b, v31.16b, v27.16b + uaddl v26.8h, v27.8b, v31.8b + uaddl2 v27.8h, v27.16b, v31.16b + +// j1 += coef_hf[0] * p2; // - + UMLAL4K v6, v7, v8, v9, v26, v27, v0.h[2] + +// i1 -= coef_hf[1] * (m2 + p2); // (-m2:v22,v23*) (-p2:v26*,v27*) + add v22.8h, v22.8h, v26.8h + add v23.8h, v23.8h, v27.8h + UMLSL4K v2, v3, v4, v5, v22, v23, v0.h[3] + +// p1 = cur[prefs]; // p1 = v22 + ldr q22, [x3, w11, sxtw] + +// j1 -= coef_lf[1] * 4 * (m1 + p5); // - + uaddl v26.8h, v24.8b, v25.8b + uaddl2 v27.8h, v24.16b, v25.16b + UMLSL4K v6, v7, v8, v9, v26, v27, v0.h[1] + +// j2 = (coef_sp[0] * (p1 + p3) - coef_sp[1] * (m1 + p5)) >> 13; // (-p5:v25*) j2=v16 + uaddl v18.8h, v22.8b, v21.8b + uaddl2 v19.8h, v22.16b, v21.16b + UMULL4K v28, v29, v30, v31, v18, v19, v0.h[6] + + uaddl v18.8h, v24.8b, v25.8b + uaddl2 v19.8h, v24.16b, v25.16b + UMLSL4K v28, v29, v30, v31, v18, v19, v0.h[7] + + SQSHRUNN v16, v28, v29, v30, v31, 13 + +// i2 = (coef_sp[0] * (m1 + p1) - coef_sp[1] * (m3 + p3)) >> 13; // (-m3:v20*) i2=v17 + uaddl v18.8h, v22.8b, v24.8b + uaddl2 v19.8h, v22.16b, v24.16b + UMULL4K v28, v29, v30, v31, v18, v19, v0.h[6] + + uaddl v18.8h, v20.8b, v21.8b + uaddl2 v19.8h, v20.16b, v21.16b + UMLSL4K v28, v29, v30, v31, v18, v19, v0.h[7] + + SQSHRUNN v17, v28, v29, v30, v31, 13 + +// i1 += coef_lf[0] * 4 * (m1 + p1); // p1 = v22, m1 = v24 + uaddl v26.8h, v24.8b, v22.8b + uaddl2 v27.8h, v24.16b, v22.16b + UMLAL4K v2, v3, v4, v5, v26, v27, v0.h[0] + + ldr q31, [x2, w9, sxtw] + ldr q29, [x4, w9, sxtw] + +// j1 += coef_lf[0] * 4 * (p1 + p3); // p1 = v22, p3 = v21 + uaddl v26.8h, v21.8b, v22.8b + uaddl2 v27.8h, v21.16b, v22.16b + UMLAL4K v6, v7, v8, v9, v26, v27, v0.h[0] + + ldr q30, [x2, w11, sxtw] + ldr q28, [x4, w11, sxtw] + +// i1 >>= 15; // i1 = v2, -v3, -v4*, -v5* + SQSHRUNN v2, v2, v3, v4, v5, 15 + +// j1 >>= 15; // j1 = v3, -v6*, -v7*, -v8*, -v9* + SQSHRUNN v3, v6, v7, v8, v9, 15 + +// { +// int t1 =(FFABS(prev[mrefs] - m1) + FFABS(prev[prefs] - p1)) >> 1; +// int t2 =(FFABS(next[mrefs] - m1) + FFABS(next[prefs] - p1)) >> 1; + uabd v30.16b, v22.16b, v30.16b + uabd v31.16b, v24.16b, v31.16b + uabd v28.16b, v22.16b, v28.16b + uabd v29.16b, v24.16b, v29.16b + uhadd v31.16b, v31.16b, v30.16b + uhadd v29.16b, v29.16b, v28.16b + + ldr q27, [x2, w13, sxtw] + ldr q26, [x4, w13, sxtw] + +// diff0 = FFMAX3(temporal_diff0 >> 1, t1, t2); // diff0=v18 + ushr v18.16b, v11.16b, #1 + umax v18.16b, v18.16b, v31.16b + umax v18.16b, v18.16b, v29.16b +// } // v28, v30 preserved for next block +// { // tdiff2 = v14 +// int t1 =(FFABS(prev[prefs] - p1) + FFABS(prev[prefs3] - p3)) >> 1; +// int t2 =(FFABS(next[prefs] - p1) + FFABS(next[prefs3] - p3)) >> 1; + uabd v31.16b, v21.16b, v27.16b + uabd v29.16b, v21.16b, v26.16b + uhadd v31.16b, v31.16b, v30.16b + uhadd v29.16b, v29.16b, v28.16b + +// diff2 = FFMAX3(temporal_diff2 >> 1, t1, t2); // diff2=v19 + ushr v19.16b, v14.16b, #1 + umax v19.16b, v19.16b, v31.16b + umax v19.16b, v19.16b, v29.16b +// } + + // diff0 = v18, (m2 >> 1) = v13, m1 = v24, d0 = v10, p1 = v22, d2 = v15 + SPAT_CHECK v18, v13, v24, v10, v22, v15, v31, v30, v29, v28 + + // diff2 = v19, d0 = v10, p1 = v22, d2 = v15, p3 = v21, (p4 >> 1) = v12 + SPAT_CHECK v19, v10, v22, v15, v21, v12, v31, v30, v29, v28 + + // j1 = v3, j2 = v16, p1 = v22, d2 = v15, p3 = v21, td2 = v14, diff2 = v19 + INTERPOL v3, v3, v16, v22, v15, v21, v14, v19, v31, v30, v29 + +// dst[d_stride * 2] = av_clip_uint8(interpol); + str q3, [x0, w5, sxtw] + +// dst[d_stride] = p1; + str q22, [x0, w1, sxtw] + + // i1 = v2, i2 = v17, m1 = v24, d0 = v10, p1 = v22, td2 = v11, diff2 = v18 + INTERPOL v2, v2, v17, v24, v10, v22, v11, v18, v31, v30, v29 + +// dst[0] = av_clip_uint8(interpol); + str q2, [x0], #16 +// } +// +// dst++; +// cur++; +// prev++; +// prev2++; +// next++; +// } + subs w10, w10, #16 + add x2, x2, #16 + add x3, x3, #16 + add x4, x4, #16 + add x17, x17, #16 + bgt 10b + + POP_VREGS +99: + ret +endfunc + // =========================================================================== // // void filter_line(