From patchwork Thu Jun 29 17:57:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 42317 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1e:b0:12b:9ae3:586d with SMTP id c30csp1945590pzh; Thu, 29 Jun 2023 10:59:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5lOVuu5f/eZjzui4sfyJxUZoU10Mo2grdSkOVtVnHoLs+xpo6xpdfXaxzPZAA0tIN+1hH1 X-Received: by 2002:a50:ed0a:0:b0:51d:9653:94a with SMTP id j10-20020a50ed0a000000b0051d9653094amr8476967eds.35.1688061576232; Thu, 29 Jun 2023 10:59:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688061576; cv=none; d=google.com; s=arc-20160816; b=xro8NMbZMMRARxmDMt0yjYaz7wfM95IOJvV3t4ssPouzOUxhI/++3VB+GumXBVvdny RVazMbz+LgVj2RgLCpNa1cr2WJUfix+Oh5e4znjr8UykQ6dMTop36UJkUQD+65W1HkFI zJ31L/SJNIRj7jBmzGVOKioNaidwGpAu625vOGpJCRMauB7BUFvN++b0AG/LeiwohvL4 /dE3osRiGFwqtWOO8v9cThATjImgNraQ9+BDOInN85TD0OyBcC//5zENMg8XfoPq/Xj9 yerKeQfLgBSZ2G+uwZsM/i5PVHIhRM6Qo+IegjJkkL3h2VPZxplCT3oQq40UftqACyyj 9rPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=YNUaf4K6zu90BAVEPLZ0Owzb9iK3RG0acJfj/XYoBIM=; fh=4TI4rjEBNZIiFzmH/zgNEtnT9CzHjNyw0MAPqTxWP34=; b=uixelR9b3R70tskr9Rgpl0zxJcfo0CV13O+lswIiH74nKwPJJ1IYjop9XTAuVwlNxz /jdLV4VMv/HlHMAz8jbliPIR/qrWlyhoASBjp/k7oacg89aBS0DwDKttcb5u4Ki+DXY6 2ORmPBzSktl6y8nzJhTEDVbMiN05HVtTHLvbGGfk5WaaUh9b1zioTSv2wHE+TY9uXKR/ mnZHdusuI3EOfIaJjPrX9DAK1SjkxpriSmyKIM539yjYTJy3Qpp2X+6odntH8oe9H+jT Eqoz8Ng772xh8G7+pbhB2gzR6bvvWlZTmVHOoA0OEVrOfJw713GtdtaHPiaTwY5L3c2J B/Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=QB5pjWjo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g7-20020a056402114700b0051da5244b7bsi3810310edw.469.2023.06.29.10.59.35; Thu, 29 Jun 2023 10:59:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20221208.gappssmtp.com header.s=20221208 header.b=QB5pjWjo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 79AA868C41B; Thu, 29 Jun 2023 20:58:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1485668C2FB for ; Thu, 29 Jun 2023 20:58:28 +0300 (EEST) Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-3fbc060a2caso7360335e9.3 for ; Thu, 29 Jun 2023 10:58:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20221208.gappssmtp.com; s=20221208; t=1688061508; x=1690653508; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rOauBTqc+9+JEChYytcNcPoJvZfxrHK913BBxmaEt+k=; b=QB5pjWjogz35uV0CYQV7vkwwv2qmmMS8uFVBc03dellSGqO0vIcw93fRQ/TbKb8MBV sJKu3KNnMpWJvYXTvvMW5j9RiJ1shywpOTlmLwY1uvkNA83XDa2K/6D4bzxpb5G4/ofS UYO+J6dIXnCiVLU2XsV1FgDcVWJrEzJQGmsFaJRpXjJs6tqRJQFOCzIJPD22S5ykar5q bmopFp0SXpmXpQ+i2uhThrQnEbJxI2VUP/LtVMUl51NEzwJFSSEyf82evjfaN0CqACxS wyJaqpDFQ3/Jx4l94GTDFdgZI9s5xE25BMddAWQux0PnjQBzfscLfHWUwElpXHQfAcG0 eAsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688061508; x=1690653508; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rOauBTqc+9+JEChYytcNcPoJvZfxrHK913BBxmaEt+k=; b=cCkkfDaiPkO1Gk0cVq4bk+kJFqV6Vxkqs0q/60scENFLVdwJcgq59c23wSexzl/D7T oQ98mgzVxLGs0MOlyboAUaUN7Wuo3llN9mNDGlEWtgMLgYQkdRT7e7/Yuq0o8D50vBhr NWroD+yTx8o32iyIFsQH15H/6LyQAnF9R9FmdjWizcIsYxdTAOtYBBZIgL9QUUP7Sw/q CDKd8wYv5zxxXEEmR0BNDNZ6kpL8vapl6pKIx+Sb4k4CXb8VszPMc8fO9Gl/LP1lKAHm oGmCCitHx47tNBOaJYEurbsuzUmEbrOEdB+NkyCy5FOWl1yiaCCkOfd3b7Zf4Q1u+Dty U9eg== X-Gm-Message-State: AC+VfDx5LW1wtEQypJCMCt4WYjICTiOvz8RJGTBRFZs7q6e4UYhpLvmT r+Iy0RDDNT8kGB9Wx8JlA23jOF6oiZ3BbirTKYc= X-Received: by 2002:a1c:7219:0:b0:3fa:d160:fc6d with SMTP id n25-20020a1c7219000000b003fad160fc6dmr61190wmc.30.1688061508182; Thu, 29 Jun 2023 10:58:28 -0700 (PDT) Received: from sucnaath.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id f26-20020a7bcd1a000000b003fbba5f21b6sm2041541wmj.28.2023.06.29.10.58.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Jun 2023 10:58:27 -0700 (PDT) From: John Cox To: ffmpeg-devel@ffmpeg.org Date: Thu, 29 Jun 2023 17:57:27 +0000 Message-Id: <20230629175729.224383-14-jc@kynesim.co.uk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230629175729.224383-1-jc@kynesim.co.uk> References: <20230629175729.224383-1-jc@kynesim.co.uk> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 13/15] avfilter/vf_bwdif: Add neon for filter_line3 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: thomas.mundt@hr.de, John Cox Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: l/Sa31tDmDak Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++ libavfilter/aarch64/vf_bwdif_neon.S | 278 ++++++++++++++++++++ 2 files changed, 306 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c index 21e67884ab..f52bc4b9b4 100644 --- a/libavfilter/aarch64/vf_bwdif_init_aarch64.c +++ b/libavfilter/aarch64/vf_bwdif_init_aarch64.c @@ -36,6 +36,33 @@ void ff_bwdif_filter_line_neon(void *dst1, void *prev1, void *cur1, void *next1, int prefs3, int mrefs3, int prefs4, int mrefs4, int parity, int clip_max); +void ff_bwdif_filter_line3_neon(void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int s_stride, + int w, int parity, int clip_max); + + +static void filter_line3_helper(void * dst1, int d_stride, + const void * prev1, const void * cur1, const void * next1, int s_stride, + int w, int parity, int clip_max) +{ + // Asm works on 16 byte chunks + // If w is a multiple of 16 then all is good - if not then if width rounded + // up to nearest 16 will fit in both src & dst strides then allow the asm + // to write over the padding bytes as that is almost certainly faster than + // having to invoke the C version to clean up the tail. + const int w1 = FFALIGN(w, 16); + const int w0 = clip_max != 255 ? 0 : + d_stride <= w1 && s_stride <= w1 ? w : w & ~15; + + ff_bwdif_filter_line3_neon(dst1, d_stride, + prev1, cur1, next1, s_stride, + w0, parity, clip_max); + + if (w0 < w) + ff_bwdif_filter_line3_c((char *)dst1 + w0, d_stride, + (const char *)prev1 + w0, (const char *)cur1 + w0, (const char *)next1 + w0, s_stride, + w - w0, parity, clip_max); +} static void filter_line_helper(void *dst1, void *prev1, void *cur1, void *next1, int w, int prefs, int mrefs, int prefs2, int mrefs2, @@ -93,5 +120,6 @@ ff_bwdif_init_aarch64(BWDIFContext *s, int bit_depth) s->filter_intra = filter_intra_helper; s->filter_line = filter_line_helper; s->filter_edge = filter_edge_helper; + s->filter_line3 = filter_line3_helper; } diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index 675e97d966..bcffbe5793 100644 --- a/libavfilter/aarch64/vf_bwdif_neon.S +++ b/libavfilter/aarch64/vf_bwdif_neon.S @@ -128,6 +128,284 @@ coeffs: .hword 5570, 3801, 1016, -3801 // hf[0] = v0.h[2], -hf[1] = v0.h[5] .hword 5077, 981 // sp[0] = v0.h[6] +// =========================================================================== +// +// void ff_bwdif_filter_line3_neon( +// void * dst1, // x0 +// int d_stride, // w1 +// const void * prev1, // x2 +// const void * cur1, // x3 +// const void * next1, // x4 +// int s_stride, // w5 +// int w, // w6 +// int parity, // w7 +// int clip_max); // [sp, #0] (Ignored) + +function ff_bwdif_filter_line3_neon, export=1 + // Sanity check w + cmp w6, #0 + ble 99f + +// #define prev2 cur +// const uint8_t * restrict next2 = parity ? prev : next; + cmp w7, #0 + csel x17, x2, x4, ne + + // We want all the V registers - save all the ones we must + stp d14, d15, [sp, #-64]! + stp d8, d9, [sp, #48] + stp d10, d11, [sp, #32] + stp d12, d13, [sp, #16] + + ldr q0, coeffs + + // Some rearrangement of initial values for nice layout of refs in regs + mov w10, w6 // w10 = loop count + neg w9, w5 // w9 = mref + lsl w8, w9, #1 // w8 = mref2 + add w7, w9, w9, LSL #1 // w7 = mref3 + lsl w6, w9, #2 // w6 = mref4 + mov w11, w5 // w11 = pref + lsl w12, w5, #1 // w12 = pref2 + add w13, w5, w5, LSL #1 // w13 = pref3 + lsl w14, w5, #2 // w14 = pref4 + add w15, w5, w5, LSL #2 // w15 = pref5 + add w16, w14, w12 // w16 = pref6 + + lsl w5, w1, #1 // w5 = d_stride * 2 + +// for (x = 0; x < w; x++) { +// int diff0, diff2; +// int d0, d2; +// int temporal_diff0, temporal_diff2; +// +// int i1, i2; +// int j1, j2; +// int p6, p5, p4, p3, p2, p1, c0, m1, m2, m3, m4; + +10: +// c0 = prev2[0] + next2[0]; // c0 = v20, v21 +// d0 = c0 >> 1; // d0 = v10 +// temporal_diff0 = FFABS(prev2[0] - next2[0]); // td0 = v11 + ldr q31, [x3] + ldr q21, [x17] + uhadd v10.16b, v31.16b, v21.16b + uabd v11.16b, v31.16b, v21.16b + uaddl v20.8h, v21.8b, v31.8b + uaddl2 v21.8h, v21.16b, v31.16b + + ldr q31, [x3, w6, SXTW] + ldr q23, [x17, w6, SXTW] + +// i1 = coef_hf[0] * c0; // i1 = v2-v5 + UMULL4K v2, v3, v4, v5, v20, v21, v0.h[2] + + ldr q30, [x3, w14, SXTW] + ldr q25, [x17, w14, SXTW] + +// m4 = prev2[mrefs4] + next2[mrefs4]; // m4 = v22,v23 + uaddl v22.8h, v23.8b, v31.8b + uaddl2 v23.8h, v23.16b, v31.16b + +// p4 = prev2[prefs4] + next2[prefs4]; // p4 = v24,v25, (p4 >> 1) = v12 + uhadd v12.16b, v25.16b, v30.16b + uaddl v24.8h, v25.8b, v30.8b + uaddl2 v25.8h, v25.16b, v30.16b + +// j1 = -coef_hf[1] * (c0 + p4); // j1 = v6-v9 (-c0:v20,v21) + add v20.8h, v20.8h, v24.8h + add v21.8h, v21.8h, v25.8h + SMULL4K v6, v7, v8, v9, v20, v21, v0.h[5] + +// m3 = cur[mrefs3]; // m3 = v20 + ldr q20, [x3, w7, SXTW] + +// p3 = cur[prefs3]; // p3 = v21 + ldr q21, [x3, w13, SXTW] + +// i1 += coef_hf[2] * (m4 + p4); // (-m4:v22,v23) (-p4:v24,v25) + add v22.8h, v22.8h, v24.8h + add v23.8h, v23.8h, v25.8h + UMLAL4K v2, v3, v4, v5, v22, v23, v0.h[4] + + ldr q29, [x3, w8, SXTW] + ldr q23, [x17, w8, SXTW] + +// i1 -= coef_lf[1] * 4 * (m3 + p3); // - + uaddl v30.8h, v20.8b, v21.8b + uaddl2 v31.8h, v20.16b, v21.16b + + ldr q28, [x3, w16, SXTW] + ldr q25, [x17, w16, SXTW] + + UMLSL4K v2, v3, v4, v5, v30, v31, v0.h[1] + +// m2 = prev2[mrefs2] + next2[mrefs2]; // m2 = v22,v23, (m2 >> 1) = v13 + uhadd v13.16b, v23.16b, v29.16b + uaddl v22.8h, v23.8b, v29.8b + uaddl2 v23.8h, v23.16b, v29.16b + + ldr q31, [x3, w12, SXTW] + ldr q27, [x17, w12, SXTW] + +// p6 = prev2[prefs6] + next2[prefs6]; // p6 = v24,v25 + uaddl v24.8h, v25.8b, v28.8b + uaddl2 v25.8h, v25.16b, v28.16b + +// j1 += coef_hf[2] * (m2 + p6); // (-p6:v24,v25) + add v24.8h, v24.8h, v22.8h + add v25.8h, v25.8h, v23.8h + UMLAL4K v6, v7, v8, v9, v24, v25, v0.h[4] + +// m1 = cur[mrefs]; // m1 = v24 + ldr q24, [x3, w9, SXTW] + +// p5 = cur[prefs5]; // p5 = v25 + ldr q25, [x3, w15, SXTW] + +// p2 = prev2[prefs2] + next2[prefs2]; // p2 = v26, v27 +// temporal_diff2 = FFABS(prev2[prefs2] - next2[prefs2]); // td2 = v14 +// d2 = p2 >> 1; // d2 = v15 + uabd v14.16b, v31.16b, v27.16b + uhadd v15.16b, v31.16b, v27.16b + uaddl v26.8h, v27.8b, v31.8b + uaddl2 v27.8h, v27.16b, v31.16b + +// j1 += coef_hf[0] * p2; // - + UMLAL4K v6, v7, v8, v9, v26, v27, v0.h[2] + +// i1 -= coef_hf[1] * (m2 + p2); // (-m2:v22,v23*) (-p2:v26*,v27*) + add v22.8h, v22.8h, v26.8h + add v23.8h, v23.8h, v27.8h + UMLSL4K v2, v3, v4, v5, v22, v23, v0.h[3] + +// p1 = cur[prefs]; // p1 = v22 + ldr q22, [x3, w11, SXTW] + +// j1 -= coef_lf[1] * 4 * (m1 + p5); // - + uaddl v26.8h, v24.8b, v25.8b + uaddl2 v27.8h, v24.16b, v25.16b + UMLSL4K v6, v7, v8, v9, v26, v27, v0.h[1] + +// j2 = (coef_sp[0] * (p1 + p3) - coef_sp[1] * (m1 + p5)) >> 13; // (-p5:v25*) j2=v16 + uaddl v18.8h, v22.8b, v21.8b + uaddl2 v19.8h, v22.16b, v21.16b + UMULL4K v28, v29, v30, v31, v18, v19, v0.h[6] + + uaddl v18.8h, v24.8b, v25.8b + uaddl2 v19.8h, v24.16b, v25.16b + UMLSL4K v28, v29, v30, v31, v18, v19, v0.h[7] + + SQSHRUNN v16, v28, v29, v30, v31, 13 + +// i2 = (coef_sp[0] * (m1 + p1) - coef_sp[1] * (m3 + p3)) >> 13; // (-m3:v20*) i2=v17 + uaddl v18.8h, v22.8b, v24.8b + uaddl2 v19.8h, v22.16b, v24.16b + UMULL4K v28, v29, v30, v31, v18, v19, v0.h[6] + + uaddl v18.8h, v20.8b, v21.8b + uaddl2 v19.8h, v20.16b, v21.16b + UMLSL4K v28, v29, v30, v31, v18, v19, v0.h[7] + + SQSHRUNN v17, v28, v29, v30, v31, 13 + +// i1 += coef_lf[0] * 4 * (m1 + p1); // p1 = v22, m1 = v24 + uaddl v26.8h, v24.8b, v22.8b + uaddl2 v27.8h, v24.16b, v22.16b + UMLAL4K v2, v3, v4, v5, v26, v27, v0.h[0] + + ldr q31, [x2, w9, SXTW] + ldr q29, [x4, w9, SXTW] + +// j1 += coef_lf[0] * 4 * (p1 + p3); // p1 = v22, p3 = v21 + uaddl v26.8h, v21.8b, v22.8b + uaddl2 v27.8h, v21.16b, v22.16b + UMLAL4K v6, v7, v8, v9, v26, v27, v0.h[0] + + ldr q30, [x2, w11, SXTW] + ldr q28, [x4, w11, SXTW] + +// i1 >>= 15; // i1 = v2, -v3, -v4*, -v5* + SQSHRUNN v2, v2, v3, v4, v5, 15 + +// j1 >>= 15; // j1 = v3, -v6*, -v7*, -v8*, -v9* + SQSHRUNN v3, v6, v7, v8, v9, 15 + +// { +// int t1 =(FFABS(prev[mrefs] - m1) + FFABS(prev[prefs] - p1)) >> 1; +// int t2 =(FFABS(next[mrefs] - m1) + FFABS(next[prefs] - p1)) >> 1; + uabd v30.16b, v22.16b, v30.16b + uabd v31.16b, v24.16b, v31.16b + uabd v28.16b, v22.16b, v28.16b + uabd v29.16b, v24.16b, v29.16b + uhadd v31.16b, v31.16b, v30.16b + uhadd v29.16b, v29.16b, v28.16b + + ldr q27, [x2, w13, SXTW] + ldr q26, [x4, w13, SXTW] + +// diff0 = FFMAX3(temporal_diff0 >> 1, t1, t2); // diff0=v18 + ushr v18.16b, v11.16b, #1 + umax v18.16b, v18.16b, v31.16b + umax v18.16b, v18.16b, v29.16b +// } // v28, v30 preserved for next block +// { // tdiff2 = v14 +// int t1 =(FFABS(prev[prefs] - p1) + FFABS(prev[prefs3] - p3)) >> 1; +// int t2 =(FFABS(next[prefs] - p1) + FFABS(next[prefs3] - p3)) >> 1; + uabd v31.16b, v21.16b, v27.16b + uabd v29.16b, v21.16b, v26.16b + uhadd v31.16b, v31.16b, v30.16b + uhadd v29.16b, v29.16b, v28.16b + +// diff2 = FFMAX3(temporal_diff2 >> 1, t1, t2); // diff2=v19 + ushr v19.16b, v14.16b, #1 + umax v19.16b, v19.16b, v31.16b + umax v19.16b, v19.16b, v29.16b +// } + + // diff0 = v18, (m2 >> 1) = v13, m1 = v24, d0 = v10, p1 = v22, d2 = v15 + SPAT_CHECK v18, v13, v24, v10, v22, v15, v31, v30, v29, v28 + + // diff2 = v19, d0 = v10, p1 = v22, d2 = v15, p3 = v21, (p4 >> 1) = v12 + SPAT_CHECK v19, v10, v22, v15, v21, v12, v31, v30, v29, v28 + + // j1 = v3, j2 = v16, p1 = v22, d2 = v15, p3 = v21, td2 = v14, diff2 = v19 + INTERPOL v3, v3, v16, v22, v15, v21, v14, v19, v31, v30, v29 + +// dst[d_stride * 2] = av_clip_uint8(interpol); + str q3, [x0, w5, SXTW] + +// dst[d_stride] = p1; + str q22, [x0, w1, SXTW] + + // i1 = v2, i2 = v17, m1 = v24, d0 = v10, p1 = v22, td2 = v11, diff2 = v18 + INTERPOL v2, v2, v17, v24, v10, v22, v11, v18, v31, v30, v29 + +// dst[0] = av_clip_uint8(interpol); + str q2, [x0], #16 +// } +// +// dst++; +// cur++; +// prev++; +// prev2++; +// next++; +// } + subs w10, w10, #16 + add x2, x2, #16 + add x3, x3, #16 + add x4, x4, #16 + add x17, x17, #16 + bgt 10b + + ldp d12, d13, [sp, #16] + ldp d10, d11, [sp, #32] + ldp d8, d9, [sp, #48] + ldp d14, d15, [sp], #64 +99: + ret +endfunc + // =========================================================================== // // void filter_line(