From patchwork Wed Sep 28 09:13:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 38397 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp74016pzh; Wed, 28 Sep 2022 02:13:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6zuYiwIRFpQecctZM6yQlFEAT255IuS0q0gCwQ+glyIf+ztQa+FC9jYtQG1wjMFVhIVVwY X-Received: by 2002:a05:6402:5290:b0:453:5942:4ef8 with SMTP id en16-20020a056402529000b0045359424ef8mr32720769edb.180.1664356425808; Wed, 28 Sep 2022 02:13:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664356425; cv=none; d=google.com; s=arc-20160816; b=d7QoKBHq8nHA+Zytw/epZaabB2DJEky4Z4xILqVDKWaF0ofyVz2WO7A9N6/hyfNGKd /7ShFzztXGQydGklsGoxwmRPnw5yXnVN1IQa5x1NJVh6kpsRe2aCqxbcFyqhsNpyPvYz bcxhbSFK8Wx84ob2RIbofoGWHEgxd1DLAapnXN2xKlxFe6n1BzhArSs/xcIR21JbYNGE LBi2MkEmkSF28p4Glsp3rvUN5z6lby5uDWeFjxw9qEEecQ6esZMcrl13Cne1WAu02Khf cOrj1SIJm8l85LZlRkIsm/jdp47anD5tlwJ0a9t0EjFD36yT+fv25Q1oe0sjsYa3ZFq1 6zAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=DiaWAes1T20QWmzzt0ArSmsmJ4kx3ChcvcJTS8aejps=; b=qVQzKkm4ksYJoSmWfGwPf2l7UifggsSMktaVDR/WNi3NhGNdTFKKuHz5bku06puqD8 Wb42IpOut7dkKW4ee5oGLFuXrlN0wAswBBioMy0CUjehL85UF8qGXJ284ogFb5yKo50b 5FyH/Oq5JhfZdIFi9De2ObHQ0G3WKObTBP0WCPo/G5eb/veu67w1F2YXo16JBeUqm+rB cPVvdBT45n9cetLCVrDZv2poCcbX+iLSoeWbkPHRA+/CIjgqcTwWAvkRUYTx4hCWTqq1 K6XtH6NCSJZpcR8YPTlVyGo+Om/yglIUyXDMsYjN05tkYVMRuLwhKkGEfMqdtOL7ARyj WwpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20210112.gappssmtp.com header.s=20210112 header.b=V7lxDNqL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s20-20020aa7d794000000b00457fdee46e4si125220edq.257.2022.09.28.02.13.45; Wed, 28 Sep 2022 02:13:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20210112.gappssmtp.com header.s=20210112 header.b=V7lxDNqL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A06D768BB87; Wed, 28 Sep 2022 12:13:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0A05768BB10 for ; Wed, 28 Sep 2022 12:13:36 +0300 (EEST) Received: by mail-lf1-f49.google.com with SMTP id a8so19399237lff.13 for ; Wed, 28 Sep 2022 02:13:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date; bh=g2WaV2ki52+Fh+BLI3RCz023ob5y/DindXdOTmAlTFo=; b=V7lxDNqLxm4UFEVtVhmfh3h+OngqcSZPzH8653TV7yYQa6DgKexbkTar+Dn04QweYk 6OnGZss78ZiCm8d2r5Y6hyGcBcVYh5g95Ge97MmDESaHacPokq40tN3gFIU3QKVdJUSn bFGTdJ5QLXGWY8YJO2s6aIqrvp3N+sk/Y4myneh+MNYpZkGR+EPqrRrBiNtxR74b2h9/ 1Xnc92cV3ufbJFLYjdrKbaRSFAFaJKHwffkcDxuAIJJjL7tmqwJX6r68m/8979JuPPcV ZRqrIJw1mqQMQmjqgx3GNwA5G7GvycEmaCenzb8NP/ADRcL8HFP06wJC5bi4Q+UvFfGQ TRHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=g2WaV2ki52+Fh+BLI3RCz023ob5y/DindXdOTmAlTFo=; b=eVLo4JoWHoC4m3jQU/1BLyk76G7+OjtoyD4SOwOHlFLUuR8rmOz8rRfA8OEpH4nbvk fBFWjQGqkU/yhkVSd9B6UMHeDU9SEkaoWgvevByLpP+dj0P8Pjc91g0hLV11vdu4s98B 9W5vhR/KKA8b0eLuPRlW4NiL797yoToyA8UELt65nSZKEPi7sus7HjJ4Ub3G0ZhO32G3 19f1rcvImex43E5C9oZsrFkomb7U1acjE9Z+9m3OviEna46bG64Y+xIlfvm4Hmuadz9F oGoVUrpB3kM94H631iY5VVlJknS0GXPz8J3xU01WlNHaneONpMjibVi03LPXb+RoneZL dRdQ== X-Gm-Message-State: ACrzQf1OwzzcF4qruPl0kpRl4p/lYyjyiraE3hWwPygCD8vhfaeI1gLQ L9TrqQsdZA+cX4uZRccgHvbF+jVQE952+qrS X-Received: by 2002:a05:6512:281d:b0:4a1:f54c:f245 with SMTP id cf29-20020a056512281d00b004a1f54cf245mr1462067lfb.421.1664356415134; Wed, 28 Sep 2022 02:13:35 -0700 (PDT) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id y13-20020a19750d000000b00497a1f92a72sm418884lfe.221.2022.09.28.02.13.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Sep 2022 02:13:34 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Sep 2022 12:13:33 +0300 Message-Id: <20220928091334.7838-1-martin@martin.st> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Grzegorz Bernacki , Jonathan Swinney , Hubert Mazur , =?utf-8?q?Martin_Storsj=C3=B6?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: H1avQB0EUGpz This avoids one redundant load per row; pix3 from the previous iteration can be used as pix2 in the next one. Before: Cortex A53 A72 A73 pix_abs_0_2_neon: 138.0 59.7 48.0 After: pix_abs_0_2_neon: 109.7 50.2 39.5 Signed-off-by: Martin Storsjö --- libavcodec/aarch64/me_cmp_neon.S | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 11af4849f9..832a7cb22d 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -326,9 +326,9 @@ function ff_pix_abs16_y2_neon, export=1 // w4 int h // initialize buffers + ld1 {v1.16b}, [x2], x3 // Load pix2 movi v29.8h, #0 // clear the accumulator movi v28.8h, #0 // clear the accumulator - add x5, x2, x3 // pix2 + stride cmp w4, #4 b.lt 2f @@ -339,29 +339,25 @@ function ff_pix_abs16_y2_neon, export=1 // avg2(a, b) = (((a) + (b) + 1) >> 1) // abs(x) = (x < 0 ? (-x) : (x)) - ld1 {v1.16b}, [x2], x3 // Load pix2 for first iteration - ld1 {v2.16b}, [x5], x3 // Load pix3 for first iteration + ld1 {v2.16b}, [x2], x3 // Load pix3 for first iteration ld1 {v0.16b}, [x1], x3 // Load pix1 for first iteration urhadd v30.16b, v1.16b, v2.16b // Rounding halving add, first iteration - ld1 {v4.16b}, [x2], x3 // Load pix2 for second iteration - ld1 {v5.16b}, [x5], x3 // Load pix3 for second iteartion + ld1 {v5.16b}, [x2], x3 // Load pix3 for second iteartion uabal v29.8h, v0.8b, v30.8b // Absolute difference of lower half, first iteration uabal2 v28.8h, v0.16b, v30.16b // Absolute difference of upper half, first iteration ld1 {v3.16b}, [x1], x3 // Load pix1 for second iteration - urhadd v27.16b, v4.16b, v5.16b // Rounding halving add, second iteration - ld1 {v7.16b}, [x2], x3 // Load pix2 for third iteration - ld1 {v20.16b}, [x5], x3 // Load pix3 for third iteration + urhadd v27.16b, v2.16b, v5.16b // Rounding halving add, second iteration + ld1 {v20.16b}, [x2], x3 // Load pix3 for third iteration uabal v29.8h, v3.8b, v27.8b // Absolute difference of lower half for second iteration uabal2 v28.8h, v3.16b, v27.16b // Absolute difference of upper half for second iteration ld1 {v6.16b}, [x1], x3 // Load pix1 for third iteration - urhadd v26.16b, v7.16b, v20.16b // Rounding halving add, third iteration - ld1 {v22.16b}, [x2], x3 // Load pix2 for fourth iteration - ld1 {v23.16b}, [x5], x3 // Load pix3 for fourth iteration + urhadd v26.16b, v5.16b, v20.16b // Rounding halving add, third iteration + ld1 {v1.16b}, [x2], x3 // Load pix3 for fourth iteration uabal v29.8h, v6.8b, v26.8b // Absolute difference of lower half for third iteration uabal2 v28.8h, v6.16b, v26.16b // Absolute difference of upper half for third iteration ld1 {v21.16b}, [x1], x3 // Load pix1 for fourth iteration sub w4, w4, #4 // h-= 4 - urhadd v25.16b, v22.16b, v23.16b // Rounding halving add + urhadd v25.16b, v20.16b, v1.16b // Rounding halving add cmp w4, #4 uabal v29.8h, v21.8b, v25.8b // Absolute difference of lower half for fourth iteration uabal2 v28.8h, v21.16b, v25.16b // Absolute difference of upper half for fourth iteration @@ -372,11 +368,11 @@ function ff_pix_abs16_y2_neon, export=1 // iterate by one 2: - ld1 {v1.16b}, [x2], x3 // Load pix2 - ld1 {v2.16b}, [x5], x3 // Load pix3 + ld1 {v2.16b}, [x2], x3 // Load pix3 subs w4, w4, #1 ld1 {v0.16b}, [x1], x3 // Load pix1 urhadd v30.16b, v1.16b, v2.16b // Rounding halving add + mov v1.16b, v2.16b // Shift pix3->pix2 uabal v29.8h, v30.8b, v0.8b uabal2 v28.8h, v30.16b, v0.16b From patchwork Wed Sep 28 09:13:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 38398 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp74078pzh; Wed, 28 Sep 2022 02:13:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4frd7wroVj2+vOBj0DJbrIqvY9Ay8jv7SnKzzvGK1P4eRlFBw8OeN0xj8PtkH9PP4ehk7Q X-Received: by 2002:a05:6402:1d55:b0:451:756e:439d with SMTP id dz21-20020a0564021d5500b00451756e439dmr32043615edb.226.1664356433742; Wed, 28 Sep 2022 02:13:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664356433; cv=none; d=google.com; s=arc-20160816; b=IZxQa1fbA4mH37R/XPgmCIA9itQdNCE5+TU5UODOIIrKWgA2aKtmiGgOL+8cit+yyb scY4mMo8BcoLXYgpUItpxv9WGi9I92X5qjSB3j/WZjgCuQc0I8aTpfDEpVGfCZOxvPWZ wWIUmF4z+K5comZaPfP+PdyjH+5LLZ0IzZ+ZUZ/TFu1awtp1A3lrklC7CQBT5EgE+BTo 2qHjuGxgt1yDIP3cUPIsnuuXQ3Pb9Gkl+APO5SlivkTNuhIGwvI8nSIxGdnZseuogg4r OP9wyd+fPsC/FKy162JvcjJC1RyHkjbISHpxN5BFmT7qccflDX4wAKxJQOBYxlD9nu96 0omQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=bAs60CrJR4zk9/1CuQHp2Vne5z97ublh3vNUKbrtLFs=; b=rXJouFX7GF6ocx/eNeH3dUGDAwcCjI30t7u0gY8LrKSI6ht42UZIBkJrZjvWwAp8Wa kdXjyK2Im2zV4uqbGLa8Phj5+aro1UPU2N3ivKOxdHhdgS+lPv80MyvLYSGUOWmx+sbI 6OohpB3lk4Fh8ntqPfLRmISMzZiQsTvO5pdyRPad79m/+3RGS1abXnaNZvaHl5E+uYcZ gGmbq+nZW7Q0WLB2cf1ZNaafxobgbcexScUx9U1xzfXPYF3n6dKm7RENl3bZivcyRn7H 4+Smsb7kWMCLIo0NXvyrkM6PSXWqaeTY1sXV6LXPIBYTOtDcrSU+x+DRvjlEte5UggEK yYSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20210112.gappssmtp.com header.s=20210112 header.b=R+tVVWyl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qf39-20020a1709077f2700b00780bc3725a0si4777692ejc.700.2022.09.28.02.13.53; Wed, 28 Sep 2022 02:13:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20210112.gappssmtp.com header.s=20210112 header.b=R+tVVWyl; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7623768BB8F; Wed, 28 Sep 2022 12:13:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 69DD968BB10 for ; Wed, 28 Sep 2022 12:13:36 +0300 (EEST) Received: by mail-lf1-f53.google.com with SMTP id d42so19521348lfv.0 for ; Wed, 28 Sep 2022 02:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=5gKRjgkhoLfzzZ09Z8B7Q9DeuNyaCHOl2AalVNc22ko=; b=R+tVVWyldBHYQbgPKPXLQZ3F6uwpOIyvgmWizeCzLea32g/AykXNCPN5qNqUD/pQaB Ahja/DsGTp+WP+0OvcIjW+fBLqxTIXE1Ub2I5RwhTwCz9DdZh3WrVPxtRehWSTG8UFGV xuH59btaAQJCTrHwF/AUI1jpBL3T51/Rcbu0HJj/Rk/Cf7MTwf/o8f5ixeKk/YRCiIoF nj3twIzcBQwZtooJCo8N27QLCRVr14qKXV0pdZ6Huc95MQjGH2W7g4/Fs1+RMr6V1B+o jD3ExMsxCEMWZ4N84OWnkOgE/MNSdxMxXfXJXQE1bCVxJl/dH8krzdW/dFTvMnrMf5gL MULw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=5gKRjgkhoLfzzZ09Z8B7Q9DeuNyaCHOl2AalVNc22ko=; b=lSD3Kq6/5kelQjGNWTq8qx73c2JWWnpjFXYZsVcz/IlH5YpzK5U08OrKt7D+8El5fc GQkGU/Lwj+UULzdtXgIY46PPTYNG6NdHYeyj5adMZ4vDMYPzDEdKJRY+APttEu34/8Vp G9vSP1aEnPrbAD7nBkjkTxk1QSw+kakKcORWh7Z56fvWsfF1zOrcSrQNIbwKPPed6/D+ swreQyRQxCm8G5XRcDOKL9MNjo45uYXGhCUwfiNWmzUcnaITUnjxeOcGBsQeSYPxeMO9 jxqkOjXkepcQv9g4lFlyUiJ9Mh1tj/YXPBcdxf/e0ngpsiGMCqjqARTAL1Jka1jZaVkt l4Tg== X-Gm-Message-State: ACrzQf2L1Qw77yl0ZM8iD0aL21NB4GwDyAtYtJ5bCd/AOHJSaxXFMqOe VmLnMwQAA813GYMYLhj/AxypUr16YKJ0wjC9 X-Received: by 2002:a05:6512:3d1d:b0:499:aefe:c71b with SMTP id d29-20020a0565123d1d00b00499aefec71bmr12546787lfv.589.1664356415772; Wed, 28 Sep 2022 02:13:35 -0700 (PDT) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id y13-20020a19750d000000b00497a1f92a72sm418884lfe.221.2022.09.28.02.13.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Sep 2022 02:13:35 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Sep 2022 12:13:34 +0300 Message-Id: <20220928091334.7838-2-martin@martin.st> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220928091334.7838-1-martin@martin.st> References: <20220928091334.7838-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Grzegorz Bernacki , Jonathan Swinney , Hubert Mazur , =?utf-8?q?Martin_Storsj=C3=B6?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FKEQtiLx2MJV Signed-off-by: Martin Storsjö --- libavcodec/aarch64/me_cmp_neon.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 832a7cb22d..c710358ab7 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -471,7 +471,7 @@ function sse8_neon, export=1 movi v21.4s, #0 movi v20.4s, #0 cmp w4, #4 - b.le 2f + b.lt 2f // make 4 iterations at once 1: @@ -534,7 +534,7 @@ function sse4_neon, export=1 movi v16.4s, #0 // clear the result accumulator cmp w4, #4 - b.le 2f + b.lt 2f // make 4 iterations at once 1: @@ -663,7 +663,7 @@ function vsse16_neon, export=1 cmp w4, #3 // check if we can make 3 iterations at once usubl v31.8h, v0.8b, v1.8b // Signed difference of pix1[0] - pix2[0], first iteration usubl2 v30.8h, v0.16b, v1.16b // Signed difference of pix1[0] - pix2[0], first iteration - b.le 2f + b.lt 2f 1: