From patchwork Tue Jul 4 14:04:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42427 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp5125949pzh; Tue, 4 Jul 2023 07:06:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlFGPEEGO28WzCzTwHjcZuZrAvcUUChi6F0u8RRRzvhJuIv0bmNY9rDQc2b9tHmobaEj3xZe X-Received: by 2002:aa7:c506:0:b0:51e:1690:1b9a with SMTP id o6-20020aa7c506000000b0051e16901b9amr3769202edq.29.1688479592014; Tue, 04 Jul 2023 07:06:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688479591; cv=none; d=google.com; s=arc-20160816; b=qCkXK+Kb5mQj7dz3Kr/MFznSG6mU1go0Rbf9WVw5JiPOe0Swq7Ag+EgeDa7WvdO9b3 uTOR6+UbAIiRZISYkHgrIand743cHCEX0vJBKwaDWxBqt1qv8OZ9n3y5a1RwouFcVZSk bPVN9cjdeybSv/bEb3XW9Fd2BZ3ofHUVlxDy3IaNJbJd3yQUHWdePVCOmNjEy/iXqeV1 9LRZ7ONUQ1Ytp7CoND+hwrBkaCq6JGuMnnrSV96Axk5Y/ut4l36jyrWLutwBzMgAc6zg ifYy1mwSr3hsYAWnnhcQM29cMT8kDEicPDL7Suryj5xSsJ4OTX7ggObs66LJg9FRfq2P sbOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=myfxbphmPTHOtVgh+DPMHOeqw4FFkBRqMFwX0hNXO+w=; fh=2QQVLAqz5Dgp0O7PTQ7hb1i3rOEvtuxkp5BnHStC38U=; b=mSpVCssvwu6XKzqavDxVOAbzKhKSzLlAVXfk8VCVWdgRk6KdApjvfIFsZDQBXvAkEJ P4WFSARtMSEA1MjzUW+6b5B5yP7tCXj2JUP4hiCKpWNyZv90+mcxDzLfhgX/+LF8VxyR cZO7OVzdE+82TnMaTWwhyoJdiqhuU7Kyr7+1/nqYg7g8IFkak3Ag+Aw6l1awmedCdthf qNRx2LUgnYYK3uK6CckdtyhHWGV+b+YZGcxKxq6kq+zyFveiW1FH2QXx8bq5ojNwa8p6 5TJxfG199xkn0sNQ5rJfe4IEqeqRO9Q+GUssIFkr7OZukSUdHuM1bPYZoYw5S6WQvCpf 5nqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=jycIqTDy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u19-20020aa7d893000000b0051dd07680f2si8441441edq.370.2023.07.04.07.06.31; Tue, 04 Jul 2023 07:06:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=jycIqTDy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 21E7B68C607; Tue, 4 Jul 2023 17:05:41 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6FF0A68C530 for ; Tue, 4 Jul 2023 17:05:31 +0300 (EEST) Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-3fbc244d39dso74443495e9.3 for ; Tue, 04 Jul 2023 07:05:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688479531; x=1691071531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BX9JfA02WKRDOPinOWS/nABkTMDKqGKyK3FBK7ELZJg=; b=jycIqTDyWkoddP4UBUQsyjXeUoAD5GD7XnirClCa9ws+095CtiXmthA2VC10Q/0/YT e52DUQYaNz35nMl0DjiM6rzp50HMIUmwR4dlz0dGzU+VeDqGxzdJTNYDyu1mrg5uxT3q NXtfXDORmIYkyfrYedFuppUAL5ViEbOqqyHamvXQuriN2KNed/QjC8YTAK1HcYMUksp+ 1qElL8o4Wn1mtdkuL3RE5Fa80eeEc1bl04vYbeZML5L2E8ntgMhF85ggw/ZmIKoATdiI IxRAv+uRslSWvKheW0oueX4HjroXwMlCmCzD5wSeu3KUCWR45d4qMkq8+mt95aWAB/it 5lYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688479531; x=1691071531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BX9JfA02WKRDOPinOWS/nABkTMDKqGKyK3FBK7ELZJg=; b=S2WRPoyaPE4FdlfCELv+57VcOGgDG21mUroU0vul7x3YwDfktK83UtGD/YoMvGvb10 YhGbbuZy4lU7Gh+vwCAA/533w9i5luLESxCSzEdIpRnc3RAgo7JAfrjuf89HW8ArkM1R 1qrkWiVup+Y6TiCJyoasHLYuRwaeu3LYSiJTxDsXawX5BZSdH7JEZfSBcjn5HftoLJ1g 15j+BW9jLNwb3NaVscmp5XUZ1NeMqWfQPe62feT8Xoj2xx+30IPiLYm7Q+lFd6yP3ALm lRRsUxka9dIw29TckWA3HocmGB5O9Nyu/zjAxNLI1ldwoteOYXH5anRpCZt+Og0EcU+o s9HQ== X-Gm-Message-State: AC+VfDzz+Q5HFKLkWx+f3rlB+eJt8zul4DtCFEqzfUtMDz872S4W+/Wu vFLS7E+ZzAgABHHcwV06g+ki7GGxzJKOUp3vfQo= X-Received: by 2002:a7b:cd13:0:b0:3fb:4053:a9d5 with SMTP id f19-20020a7bcd13000000b003fb4053a9d5mr18013269wmj.25.1688479530844; Tue, 04 Jul 2023 07:05:30 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id m23-20020a7bca57000000b003fbc30825fbsm13585970wml.39.2023.07.04.07.05.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 07:05:30 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jul 2023 14:04:42 +0000 Message-Id: <20230704140445.240426-5-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230704140445.240426-1-jc@kynesim.co.uk> References: <20230704140445.240426-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 4/7] avfilter/vf_bwdif: Add neon for filter_edge X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox , martin@martin.st Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KAPzdCODP071 Adds clip and spatial macros for aarch64 neon Exports C filter_edge needed for tail fixup of neon code Adds neon for filter_edge Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 +++ libavfilter/aarch64/vf_bwdif_neon.S | 177 ++++++++++++++++++++ libavfilter/bwdif.h | 4 + libavfilter/vf_bwdif.c | 8 +- 4 files changed, 205 insertions(+), 4 deletions(-) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c index 3ffaa07ab3..e75cf2f204 100644 --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -24,10 +24,29 @@ #include "libavfilter/bwdif.h" #include "libavutil/aarch64/cpu.h" +void ff_bwdif_filter_edge_neon(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat); + void ff_bwdif_filter_intra_neon(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max); +static void filter_edge_helper(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat) +{ + const int w0 = clip_max != 255 ? 0 : w & ~15; + + ff_bwdif_filter_edge_neon(dst1, prev1, cur1, next1, w0, prefs, mrefs, prefs2, mrefs2, + parity, clip_max, spat); + + if (w0 < w) + ff_bwdif_filter_edge_c((char *)dst1 + w0, (char *)prev1 + w0, (char *)cur1 + w0, (char *)next1 + w0, + w - w0, prefs, mrefs, prefs2, mrefs2, + parity, clip_max, spat); +} + static void filter_intra_helper(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max) { @@ -52,5 +71,6 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) return; s->filter_intra = filter_intra_helper; + s->filter_edge = filter_edge_helper; } diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index e288efbe6c..389302b813 100644 --- a/libavfilter/aarch64/vf_bwdif_neon.S +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -66,6 +66,79 @@ umlsl2 \a3\().4s, \s1\().8h, \k .endm +// int b = m2s1 - m1; +// int f = p2s1 - p1; +// int dc = c0s1 - m1; +// int de = c0s1 - p1; +// int sp_max = FFMIN(p1 - c0s1, m1 - c0s1); +// sp_max = FFMIN(sp_max, FFMAX(-b,-f)); +// int sp_min = FFMIN(c0s1 - p1, c0s1 - m1); +// sp_min = FFMIN(sp_min, FFMAX(b,f)); +// diff = diff == 0 ? 0 : FFMAX3(diff, sp_min, sp_max); +.macro SPAT_CHECK diff, m2s1, m1, c0s1, p1, p2s1, t0, t1, t2, t3 + uqsub \t0\().16b, \p1\().16b, \c0s1\().16b + uqsub \t2\().16b, \m1\().16b, \c0s1\().16b + umin \t2\().16b, \t0\().16b, \t2\().16b + + uqsub \t1\().16b, \m1\().16b, \m2s1\().16b + uqsub \t3\().16b, \p1\().16b, \p2s1\().16b + umax \t3\().16b, \t3\().16b, \t1\().16b + umin \t3\().16b, \t3\().16b, \t2\().16b + + uqsub \t0\().16b, \c0s1\().16b, \p1\().16b + uqsub \t2\().16b, \c0s1\().16b, \m1\().16b + umin \t2\().16b, \t0\().16b, \t2\().16b + + uqsub \t1\().16b, \m2s1\().16b, \m1\().16b + uqsub \t0\().16b, \p2s1\().16b, \p1\().16b + umax \t0\().16b, \t0\().16b, \t1\().16b + umin \t2\().16b, \t2\().16b, \t0\().16b + + cmeq \t1\().16b, \diff\().16b, #0 + umax \diff\().16b, \diff\().16b, \t3\().16b + umax \diff\().16b, \diff\().16b, \t2\().16b + bic \diff\().16b, \diff\().16b, \t1\().16b +.endm + +// i0 = s0; +// if (i0 > d0 + diff0) +// i0 = d0 + diff0; +// else if (i0 < d0 - diff0) +// i0 = d0 - diff0; +// +// i0 = s0 is safe +.macro DIFF_CLIP i0, s0, d0, diff, t0, t1 + uqadd \t0\().16b, \d0\().16b, \diff\().16b + uqsub \t1\().16b, \d0\().16b, \diff\().16b + umin \i0\().16b, \s0\().16b, \t0\().16b + umax \i0\().16b, \i0\().16b, \t1\().16b +.endm + +// i0 = FFABS(m1 - p1) > td0 ? i1 : i2; +// DIFF_CLIP +// +// i0 = i1 is safe +.macro INTERPOL i0, i1, i2, m1, d0, p1, td0, diff, t0, t1, t2 + uabd \t0\().16b, \m1\().16b, \p1\().16b + cmhi \t0\().16b, \t0\().16b, \td0\().16b + bsl \t0\().16b, \i1\().16b, \i2\().16b + DIFF_CLIP \i0, \t0, \d0, \diff, \t1, \t2 +.endm + +.macro PUSH_VREGS + stp d8, d9, [sp, #-64]! + stp d10, d11, [sp, #16] + stp d12, d13, [sp, #32] + stp d14, d15, [sp, #48] +.endm + +.macro POP_VREGS + ldp d14, d15, [sp, #48] + ldp d12, d13, [sp, #32] + ldp d10, d11, [sp, #16] + ldp d8, d9, [sp], #64 +.endm + .macro LDR_COEFFS d, t0 movrel \t0, coeffs, 0 ld1 {\d\().8h}, [\t0] @@ -81,6 +154,110 @@ const coeffs, align=4 // align 4 means align on 2^4 boundry .hword 5077, 981 // sp[0] = v0.h[6] endconst +// ============================================================================ +// +// void ff_bwdif_filter_edge_neon( +// void *dst1, // x0 +// void *prev1, // x1 +// void *cur1, // x2 +// void *next1, // x3 +// int w, // w4 +// int prefs, // w5 +// int mrefs, // w6 +// int prefs2, // w7 +// int mrefs2, // [sp, #0] +// int parity, // [sp, #SP_INT] +// int clip_max, // [sp, #SP_INT*2] unused +// int spat); // [sp, #SP_INT*3] + +function ff_bwdif_filter_edge_neon, export=1 + // Sanity check w + cmp w4, #0 + ble 99f + +// #define prev2 cur +// const uint8_t * restrict next2 = parity ? prev : next; + + ldr w8, [sp, #0] // mrefs2 + + ldr w17, [sp, #SP_INT] // parity + ldr w16, [sp, #SP_INT*3] // spat + cmp w17, #0 + csel x17, x1, x3, ne + +// for (x = 0; x < w; x++) { + +10: +// int m1 = cur[mrefs]; +// int d = (prev2[0] + next2[0]) >> 1; +// int p1 = cur[prefs]; +// int temporal_diff0 = FFABS(prev2[0] - next2[0]); +// int temporal_diff1 =(FFABS(prev[mrefs] - m1) + FFABS(prev[prefs] - p1)) >> 1; +// int temporal_diff2 =(FFABS(next[mrefs] - m1) + FFABS(next[prefs] - p1)) >> 1; +// int diff = FFMAX3(temporal_diff0 >> 1, temporal_diff1, temporal_diff2); + ldr q31, [x2] + ldr q21, [x17] + uhadd v16.16b, v31.16b, v21.16b // d0 = v16 + uabd v17.16b, v31.16b, v21.16b // td0 = v17 + ldr q24, [x2, w6, sxtw] // m1 = v24 + ldr q22, [x2, w5, sxtw] // p1 = v22 + + ldr q0, [x1, w6, sxtw] // prev[mrefs] + ldr q2, [x1, w5, sxtw] // prev[prefs] + ldr q1, [x3, w6, sxtw] // next[mrefs] + ldr q3, [x3, w5, sxtw] // next[prefs] + + ushr v29.16b, v17.16b, #1 + + uabd v31.16b, v0.16b, v24.16b + uabd v30.16b, v2.16b, v22.16b + uhadd v0.16b, v31.16b, v30.16b // td1 = q0 + + uabd v31.16b, v1.16b, v24.16b + uabd v30.16b, v3.16b, v22.16b + uhadd v1.16b, v31.16b, v30.16b // td2 = q1 + + umax v0.16b, v0.16b, v29.16b + umax v0.16b, v0.16b, v1.16b // diff = v0 + +// if (spat) { +// SPAT_CHECK() +// } +// i0 = (m1 + p1) >> 1; + cbz w16, 1f + + ldr q31, [x2, w8, sxtw] + ldr q18, [x17, w8, sxtw] + ldr q30, [x2, w7, sxtw] + ldr q19, [x17, w7, sxtw] + uhadd v18.16b, v18.16b, v31.16b + uhadd v19.16b, v19.16b, v30.16b + + SPAT_CHECK v0, v18, v24, v16, v22, v19, v31, v30, v29, v28 + +1: + uhadd v2.16b, v22.16b, v24.16b + + // i0 = v2, s0 = v2, d0 = v16, diff = v0, t0 = v31, t1 = v30 + DIFF_CLIP v2, v2, v16, v0, v31, v30 + +// dst[0] = av_clip(interpol, 0, clip_max); + str q2, [x0], #16 + +// dst++; +// cur++; +// } + subs w4, w4, #16 + add x1, x1, #16 + add x2, x2, #16 + add x3, x3, #16 + add x17, x17, #16 + bgt 10b + +99: + ret +endfunc + // ============================================================================ // // void ff_bwdif_filter_intra_neon( diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae6f6ce223..ae1616d366 100644 --- a/libavfilter/bwdif.h +++ b/libavfilter/bwdif.h @@ -41,6 +41,10 @@ void ff_bwdif_init_filter_line(BWDIFContext *bwdif, int bit_depth); void ff_bwdif_init_x86(BWDIFContext *bwdif, int bit_depth); void ff_bwdif_init_aarch64(BWDIFContext *bwdif, int bit_depth); +void ff_bwdif_filter_edge_c(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat); + void ff_bwdif_filter_intra_c(void *dst1, void *cur1, int w, int prefs, int mrefs, int prefs3, int mrefs3, int parity, int clip_max); diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index 035fc58670..bec83111b4 100644 --- a/libavfilter/vf_bwdif.c +++ b/libavfilter/vf_bwdif.c @@ -150,9 +150,9 @@ static void filter_line_c(void *dst1, void *prev1, void *cur1, void *next1, FILTER2() } -static void filter_edge(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int prefs2, int mrefs2, - int parity, int clip_max, int spat) +void ff_bwdif_filter_edge_c(void *dst1, void *prev1, void *cur1, void *next1, + int w, int prefs, int mrefs, int prefs2, int mrefs2, + int parity, int clip_max, int spat) { uint8_t *dst = dst1; uint8_t *prev = prev1; @@ -364,7 +364,7 @@ av_cold void ff_bwdif_init_filter_line(BWDIFContext *s, int bit_depth) } else { s->filter_intra = ff_bwdif_filter_intra_c; s->filter_line = filter_line_c; - s->filter_edge = filter_edge; + s->filter_edge = ff_bwdif_filter_edge_c; } #if ARCH_X86