From patchwork Tue Jul 4 14:04:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42425 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125667pzh; Tue, 4 Jul 2023 07:06:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4peFkTpUwa7dCRIkq6izpWnV15AEpwik63EPyLEd6rKBlqy2Yx0wHs35y4lU4k/ZihpT/o X-Received: by 2002:a05:6402:43cb:b0:51b:fd09:9ec1 with SMTP id p11-20020a05640243cb00b0051bfd099ec1mr16985458edc.0.1688479572252; Tue, 04 Jul 2023 07:06:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479572; cv=none; d=google.com; s=arc-20160816; b=E+KRQnKEng3j+I5M5j+LOavM+EQOCpnq1Nu8Ado0K8dbPJ41Z2SDAh88hQGg19EW87 IznSFIE5tSPEQ2H1lxln+2B4MES9FsIMA/knOthFwCayGArv09AKH0tLbnwrNEZ/xADp dtfryKjgwP2Lh/cmK4Sx1vPxtk+CM5+hjQkMm6+G/F/xaHd3G8zDvS0aD9riHth0go9g pOEb35PV7wZqqMHWpJjcmok1e5jaZlnGGZUaGJBuKamg+b2qKy4mbE0dqfzoJxMM3fGT S8HZuOzQ4uAUadtI+PkYjrhXKeQeJVhSFKpxVRqn08OYMSrLNvPkAEk1LdX8xHZB9cz+ xypg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=wX3i6M6nCMwBbjtlqD6zSZFguserXCa5+pRRkLyL1/o=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=WW3znd+Lb6GL3CWpvM5XHlV3Kko1UoRFJN38IUBhnPFKK2DpXLIlloxhDiZRt9UglI DWcUD7zo8hVDPavdxeiq6SOPyljH5jZrEPUsuZXtb/FjL4aAOCe7gIHbrtUYi6uV03AE 0HuHqAuZnntjAEue03Qi5HHgU2e2kVQaoCeNwfw5F6TurU04zdp4BBEsykizReB4Sgor cgE5+ZTZa0X/Z6oaiOBmBx1m/8gX6PHwcO+POO4AbphGmyOzitxMOnktWyLIyHYVsyrm osX33bWCVVq62jF2TwOeWTHkyd6SdAchntaJ51DpPkLXpIDrY/CvGJfQgPrLizOfGs3y h12g== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=YUF5SARi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l3-20020aa7d943000000b0051dd4e48d6dsi7551262eds.32.2023.07.04.07.06.11; Tue, 04 Jul 2023 07:06:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=YUF5SARi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DFDD268C5F0; Tue, 4 Jul 2023 17:05:38 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5221768C593 for ; Tue, 4 Jul 2023 17:05:30 +0300 (EEST) Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-3fbab0d0b88so46996915e9.0 for ; Tue, 04 Jul 2023 07:05:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gQswBIUE/Y1WxsAVGtOcYEE4iTzqcM5hncNvle4fTNI=; b=YUF5SARiGo3rxgxr3x8z3NrcpZNuyHCmWOv3f6hpVRwmZFLkLFwGGM0GECnobwaNuC Q6CXQs512DsL6hJ58HK7HuTKMRa/4GSo7AvoWBQIidvkiTnWRqYlWG2Y3vqBGuLzKpIJ HO7ncZ5CgPvt+wy6md881Iv1zqPPmEXoFzTSojjd+qsDTdceD01sp960n2hxp+bnuGa6 cTwEoN9gjBu0hXir8DXfsbxfmcnjrwRqMXgrK/6iK51hvNwwkY7Ai5fhQ7vzKT+v/6bs VWMXsS6kiGriyU/KmS9x/WA1+V0c9O5uzgIHCM57PvvoMiDN3tcQpeaOYNKFOu0E5VnQ L3SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479529; x=1691071529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gQswBIUE/Y1WxsAVGtOcYEE4iTzqcM5hncNvle4fTNI=; b=Z8SFHlQuOe9Z8x53kEBcdd4xFVrk39V6l5exKubMwzJv1DRDmNG4PrJ7HAjBQIDYI4 M5tu6UijIvkZHLJJLKjovf3JTbfkWB6OGuIU/kluSvr30/9StYkjubdN2z0hMrVf056I Jqp9e29lDIMquyUQbQFuTMY8V4bEZaTFNy0h9ZQ6nDLxbEMAZF5cFcIfIUSriYmOFccD NVU83NcccCSUgR2QEuHAj/t40dVBlXR33IHEl9yFGmmp7RUhpcmGjH5ExA96z1hwSUGh FKJqqsNuwJE0geFzR/EEyXqejrPRKauTz8MI8S4GUQ5tDPJpDSMAPL7Y4//qq4SEy+aj 45XQ== X-Gm-Message-State: AC+VfDzSQ++wwKDDot01V4KlhVgv3t1ON6BnJgbdq50nSG+lkox5NV93 OO+mEYfHI+eLGvQy6NpE0AgLD9NPPyO/odnCDAE= X-Received: by 2002:a7b:ca57:0:b0:3fb:af9a:bf30 with SMTP id m23-20020a7bca57000000b003fbaf9abf30mr12342364wml.2.1688479529642; Tue, 04 Jul 2023 07:05:29 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:29 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:40 +0000 Message-Id: <20230704140445.240426-3-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 2/7] avfilter/vf_bwdif: Add neon for filter_intra X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: X6aireDPV4NQ Adds an outline for aarch neon functions Adds common macros and consts for aarch64 neon Exports C filter_intra needed for tail fixup of neon code Adds neon for filter_intra Signed-off-by: John Cox --- libavfilter/aarch64/Makefile | 2 + libavfilter/aarch64/vf_bwdif_init_aarch64.c | 56 ++++++++ libavfilter/aarch64/vf_bwdif_neon.S | 136 ++++++++++++++++++++ libavfilter/bwdif.h | 4 + libavfilter/vf_bwdif.c | 8 +- 5 files changed, 203 insertions(+), 3 deletions(-) create mode 100644 libavfilter/aarch64/vf_bwdif_init_aarch64.c create mode 100644 libavfilter/aarch64/vf_bwdif_neon.S diff --git a/libavfilter/aarch64/Makefile b/libavfilter/aarch64/Makefile index b58daa3a3f..b68209bc94 100644 --- a/libavfilter/aarch64/Makefile +++ b/libavfilter/aarch64/Makefile @@ -1,3 +1,5 @@ +OBJS-$(CONFIG_BWDIF_FILTER) += aarch64/vf_bwdif_init_aarch64.o OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/vf_nlmeans_init.o +NEON-OBJS-$(CONFIG_BWDIF_FILTER) += aarch64/vf_bwdif_neon.o NEON-OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/vf_nlmeans_neon.o diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c new file mode 100644 index 0000000000..3ffaa07ab3 --- /dev/null +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -0,0 +1,56 @@ +/* + * bwdif aarch64 NEON optimisations + * + * Copyright (c) 2023 John Cox + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/common.h" +#include "libavfilter/bwdif.h" +#include "libavutil/aarch64/cpu.h" + +void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max); + + +static void filter_intra_helper(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max) +{ + const int w0 = clip_max != 255 ? 0 : w & ~15; + + ff_bwdif_filter_intra_neon(dst1, cur1, w0, prefs, mrefs, prefs3, mrefs3, parity, clip_max); + + if (w0 < w) + ff_bwdif_filter_intra_c((char *)dst1 + w0, (char *)cur1 + w0, + w - w0, prefs, mrefs, prefs3, mrefs3, parity, clip_max); +} + +void +ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) +{ + const int cpu_flags = av_get_cpu_flags(); + + if (bit_depth != 8) + return; + + if (!have_neon(cpu_flags)) + return; + + s->filter_intra = filter_intra_helper; +} + diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S new file mode 100644 index 0000000000..e288efbe6c --- /dev/null +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -0,0 +1,136 @@ +/* + * bwdif aarch64 NEON optimisations + * + * Copyright (c) 2023 John Cox + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + + +#include "libavutil/aarch64/asm.S" + +// Space taken on the stack by an int (32-bit) +#ifdef __APPLE__ +.set SP_INT, 4 +#else +.set SP_INT, 8 +#endif + +.macro SQSHRUNN b, s0, s1, s2, s3, n + sqshrun \s0\().4h, \s0\().4s, #\n - 8 + sqshrun2 \s0\().8h, \s1\().4s, #\n - 8 + sqshrun \s1\().4h, \s2\().4s, #\n - 8 + sqshrun2 \s1\().8h, \s3\().4s, #\n - 8 + uzp2 \b\().16b, \s0\().16b, \s1\().16b +.endm + +.macro SMULL4K a0, a1, a2, a3, s0, s1, k + smull \a0\().4s, \s0\().4h, \k + smull2 \a1\().4s, \s0\().8h, \k + smull \a2\().4s, \s1\().4h, \k + smull2 \a3\().4s, \s1\().8h, \k +.endm + +.macro UMULL4K a0, a1, a2, a3, s0, s1, k + umull \a0\().4s, \s0\().4h, \k + umull2 \a1\().4s, \s0\().8h, \k + umull \a2\().4s, \s1\().4h, \k + umull2 \a3\().4s, \s1\().8h, \k +.endm + +.macro UMLAL4K a0, a1, a2, a3, s0, s1, k + umlal \a0\().4s, \s0\().4h, \k + umlal2 \a1\().4s, \s0\().8h, \k + umlal \a2\().4s, \s1\().4h, \k + umlal2 \a3\().4s, \s1\().8h, \k +.endm + +.macro UMLSL4K a0, a1, a2, a3, s0, s1, k + umlsl \a0\().4s, \s0\().4h, \k + umlsl2 \a1\().4s, \s0\().8h, \k + umlsl \a2\().4s, \s1\().4h, \k + umlsl2 \a3\().4s, \s1\().8h, \k +.endm + +.macro LDR_COEFFS d, t0 + movrel \t0, coeffs, 0 + ld1 {\d\().8h}, [\t0] +.endm + +// static const uint16_t coef_lf[2] = { 4309, 213 }; +// static const uint16_t coef_hf[3] = { 5570, 3801, 1016 }; +// static const uint16_t coef_sp[2] = { 5077, 981 }; + +const coeffs, align=4 // align 4 means align on 2^4 boundry + .hword 4309 * 4, 213 * 4 // lf[0]*4 = v0.h[0] + .hword 5570, 3801, 1016, -3801 // hf[0] = v0.h[2], -hf[1] = v0.h[5] + .hword 5077, 981 // sp[0] = v0.h[6] +endconst + +// ============================================================================ +// +// void ff_bwdif_filter_intra_neon( +// void *dst1, // x0 +// void *cur1, // x1 +// int w, // w2 +// int prefs, // w3 +// int mrefs, // w4 +// int prefs3, // w5 +// int mrefs3, // w6 +// int parity, // w7 unused +// int clip_max) // [sp, #0] unused + +function ff_bwdif_filter_intra_neon, export=1 + cmp w2, #0 + ble 99f + + LDR_COEFFS v0, x17 + +// for (x = 0; x < w; x++) { +10: + +// interpol = (coef_sp[0] * (cur[mrefs] + cur[prefs]) - coef_sp[1] * (cur[mrefs3] + cur[prefs3])) >> 13; + ldr q31, [x1, w4, sxtw] + ldr q30, [x1, w3, sxtw] + ldr q29, [x1, w6, sxtw] + ldr q28, [x1, w5, sxtw] + + uaddl v20.8h, v31.8b, v30.8b + uaddl2 v21.8h, v31.16b, v30.16b + + UMULL4K v2, v3, v4, v5, v20, v21, v0.h[6] + + uaddl v20.8h, v29.8b, v28.8b + uaddl2 v21.8h, v29.16b, v28.16b + + UMLSL4K v2, v3, v4, v5, v20, v21, v0.h[7] + +// dst[0] = av_clip(interpol, 0, clip_max); + SQSHRUNN v2, v2, v3, v4, v5, 13 + str q2, [x0], #16 + +// dst++; +// cur++; +// } + + subs w2, w2, #16 + add x1, x1, #16 + bgt 10b + +99: + ret +endfunc diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index 5749345f78..ae6f6ce223 100644 --- a/libavfilter/bwdif.h +++ b/libavfilter/bwdif.h @@ -39,5 +39,9 @@ typedef struct BWDIFContext { void ff_bwdif_init_filter_line(BWDIFContext *bwdif, int bit_depth); void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth); +void ff_bwdif_init_aarch64(BWDIFContext *bwdif, int bit_depth); + +void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max); #endif /* AVFILTER_BWDIF_H */ diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index e278cf1217..035fc58670 100644 --- a/libavfilter/vf_bwdif.c +++ b/libavfilter/vf_bwdif.c @@ -122,8 +122,8 @@ typedef struct ThreadData { next2++; \ } -static void filter_intra(void *dst1, void *cur1, int w, int prefs, int mrefs, - int prefs3, int mrefs3, int parity, int clip_max) +void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, + int prefs3, int mrefs3, int parity, int clip_max) { uint8_t *dst = dst1; uint8_t *cur = cur1; @@ -362,13 +362,15 @@ av_cold void ff_bwdif_init_filter_line(BWDIFContext *s, int bit_depth) s->filter_line = filter_line_c_16bit; s->filter_edge = filter_edge_16bit; } else { - s->filter_intra = filter_intra; + s->filter_intra = ff_bwdif_filter_intra_c; s->filter_line = filter_line_c; s->filter_edge = filter_edge; } #if ARCH_X86 ff_bwdif_init_x86(s, bit_depth); +#elif ARCH_AARCH64 + ff_bwdif_init_aarch64(s, bit_depth); #endif }