From patchwork Fri May 15 09:10:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 19694 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 962B644AA24 for ; Fri, 15 May 2020 12:10:47 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6F438687F54; Fri, 15 May 2020 12:10:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 05CA168044F for ; Fri, 15 May 2020 12:10:40 +0300 (EEST) Received: by mail-lf1-f47.google.com with SMTP id e125so424019lfd.1 for ; Fri, 15 May 2020 02:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=WPAF3jxCJtOKk9jcGWBrRCjbEca6h1eMGEhB/tIYbD8=; b=rbaK4SAuuLRoD55AcZv5QSfDVubzyOftU2de5Wu5NVPWOlwjOaD/1enRF1M5VxKBzi qY+gbAH18FfRJ4X5Di+9hzWb0Wew/AdQwZkpULxD5RkqEPq/fbTY1GDaoMrxKQS3/Nsk VsTqg+gbey8t3YxlSMUTCpTQS1k1D96at9/O3pcCut14UgIJlupdaxp9cCbJwSwyMGtv lM2g6wh+8SnPir/BFAyDLpyx6wdXu1IS7lVU3R/bGE8xtiVkQ2U5uBLmSavufNDn25uX PjerEmQtDgzohVLcuA5jXkymBSsElm+h9Zm0p4Q+/ujuePFdpGpf1CqGsNoaal4F5abL L1zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=WPAF3jxCJtOKk9jcGWBrRCjbEca6h1eMGEhB/tIYbD8=; b=dmAWBqcXXq3HcyrjmTaVXBYhwYFBH7uY08JSSBdzipDQcwNSOZBmFSvyuDoLQnT6Jx UCKfa3IBOagUWjUfXCok2bphzIqy3ZTK+5HBV0wptaFj2gMgYIHdKiRP3sD2sZy8ZhVC mOp3s73Ut8CuXikHtecw6WGAS1fIMqVm8r8bzsm+4xJlxZOhJ1F4ZRLcy6vE4YfW2RQA sjdxtC7U8ZXS7X4MnrbkHE9O8YHQHKOzpwhadh9qveT8KCtjD21tTshLkUoY0UjAutfv sqYNEJ0m/SKu/73xPVG2nN7aGmSAwjgmC7XhnHJoY9290krskVpSw7fk/LIghcBKWxXq ncKA== X-Gm-Message-State: AOAM531qDTpEBRCnXelAChUHEMWhiErbXKbKu75qvgj6ff+wVioPogtw fn63FlhgPV+zmlR/2ZACh1+GjP3p5sw= X-Google-Smtp-Source: ABdhPJwaqtdLxgL3DEjl1OfsNDeJxe1ZN4pBNA8trjxtqQJP3PVpitWvUXctFZaU5MJbqQaZsOfkwA== X-Received: by 2002:ac2:58d7:: with SMTP id u23mr1634243lfo.119.1589533839723; Fri, 15 May 2020 02:10:39 -0700 (PDT) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id k22sm835206ljj.85.2020.05.15.02.10.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 May 2020 02:10:39 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Fri, 15 May 2020 12:10:37 +0300 Message-Id: <20200515091038.16743-1-martin@martin.st> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH 1/2] checkasm: sw_rgb: Add a test for interleaveBytes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- This depends on "checkasm: Add functions for printing pixel buffers". The existing x86 implementations of interleaveBytes seem to slow down significantly for unaligned copies (GCC 7.5, Sandy Bridge): interleave_bytes_c: 36251.6 interleave_bytes_mmx: 10038.8 interleave_bytes_mmxext: 58450.3 interleave_bytes_sse2: 57746.3 For the properly aligned case, it behaves better: interleave_bytes_aligned_c: 36109.8 interleave_bytes_aligned_mmx: 6033.8 interleave_bytes_aligned_mmxext: 6473.1 interleave_bytes_aligned_sse2: 6163.1 But Clang (in Xcode 11.3, run on Kaby Lake) seems to beat all the asm implementations, in its (autovectorized?) C version: interleave_bytes_c: 9893.0 interleave_bytes_mmx: 23153.5 interleave_bytes_mmxext: 43693.8 interleave_bytes_sse2: 55894.8 interleave_bytes_aligned_c: 3456.0 interleave_bytes_aligned_mmx: 5780.0 interleave_bytes_aligned_mmxext: 4913.8 interleave_bytes_aligned_sse2: 4154.3 --- tests/checkasm/sw_rgb.c | 53 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c index 000420d8f7..41c486a2d7 100644 --- a/tests/checkasm/sw_rgb.c +++ b/tests/checkasm/sw_rgb.c @@ -111,6 +111,56 @@ static void check_uyvy_to_422p(void) } } +static void check_interleave_bytes(void) +{ + LOCAL_ALIGNED_16(uint8_t, src0_buf, [MAX_STRIDE*MAX_HEIGHT+1]); + LOCAL_ALIGNED_16(uint8_t, src1_buf, [MAX_STRIDE*MAX_HEIGHT+1]); + LOCAL_ALIGNED_16(uint8_t, dst0_buf, [2*MAX_STRIDE*MAX_HEIGHT+2]); + LOCAL_ALIGNED_16(uint8_t, dst1_buf, [2*MAX_STRIDE*MAX_HEIGHT+2]); + // Intentionally using unaligned buffers, as this function doesn't have + // any alignment requirements. + uint8_t *src0 = src0_buf + 1; + uint8_t *src1 = src1_buf + 1; + uint8_t *dst0 = dst0_buf + 2; + uint8_t *dst1 = dst1_buf + 2; + + declare_func_emms(AV_CPU_FLAG_MMX, void, const uint8_t *, const uint8_t *, + uint8_t *, int, int, int, int, int); + + randomize_buffers(src0, MAX_STRIDE * MAX_HEIGHT); + randomize_buffers(src1, MAX_STRIDE * MAX_HEIGHT); + + if (check_func(interleaveBytes, "interleave_bytes")) { + for (int i = 0; i <= 16; i++) { + // Try all widths [1,16], and try one random width. + + int w = i > 0 ? i : (1 + (rnd() % (MAX_STRIDE-2))); + int h = 1 + (rnd() % (MAX_HEIGHT-2)); + + memset(dst0, 0, 2 * MAX_STRIDE * MAX_HEIGHT); + memset(dst1, 0, 2 * MAX_STRIDE * MAX_HEIGHT); + + call_ref(src0, src1, dst0, w, h, + MAX_STRIDE, MAX_STRIDE, 2*MAX_STRIDE); + call_new(src0, src1, dst1, w, h, + MAX_STRIDE, MAX_STRIDE, 2*MAX_STRIDE); + // Check a one pixel-pair edge around the destination area, + // to catch overwrites past the end. + checkasm_check(uint8_t, dst0, 2*MAX_STRIDE, dst1, 2*MAX_STRIDE, + 2 * w + 2, h + 1, "dst"); + } + + bench_new(src0, src1, dst1, 127, MAX_HEIGHT, + MAX_STRIDE, MAX_STRIDE, 2*MAX_STRIDE); + } + if (check_func(interleaveBytes, "interleave_bytes_aligned")) { + // Bench the function in a more typical case, with aligned + // buffers and widths. + bench_new(src0_buf, src1_buf, dst1_buf, 128, MAX_HEIGHT, + MAX_STRIDE, MAX_STRIDE, 2*MAX_STRIDE); + } +} + void checkasm_check_sw_rgb(void) { ff_sws_rgb2rgb_init(); @@ -132,4 +182,7 @@ void checkasm_check_sw_rgb(void) check_uyvy_to_422p(); report("uyvytoyuv422"); + + check_interleave_bytes(); + report("interleave_bytes"); } From patchwork Fri May 15 09:10:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 19695 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id B5AB044AA24 for ; Fri, 15 May 2020 12:10:49 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9622D68089D; Fri, 15 May 2020 12:10:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7EBA26801E0 for ; Fri, 15 May 2020 12:10:41 +0300 (EEST) Received: by mail-lj1-f194.google.com with SMTP id h4so1407013ljg.12 for ; Fri, 15 May 2020 02:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=3N3L+y3mh5IEJz3BNLbXbD4QQEFnWjrocb30jqfJEsE=; b=SXKR2EyrQjtQFMXhe0NS7syuVKMM5iRR28gYrOD3u9sVhI6DK05zeu6gUdDTuimre+ gCNoglVXt65qDGXuCSD6SnlL6JJugYKK8jaXVb3KnxCjYCz+iiX5hKZFCw8ytjfG7FQW JtfaE/lzJdQ/6Q3oF9DJlLNDjOKFjwKSXUz3tJtXxOoDFuxrRrEGZaKtcWZq5yqJxmS/ rn/beJxGkhyseeQc+dZL8yJIZMk7sUeIZKnCNJZyPVggLkOcGJhA142pjXO9W55S9E1W qMLs5JrvukKegGZUANSC7zVJstjijlozsqX2lUbSt+0wTNsWCm2hVBJeS0kFK+S6a//b e7xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=3N3L+y3mh5IEJz3BNLbXbD4QQEFnWjrocb30jqfJEsE=; b=TCkCziEdGPKaG1pOz8qma685olPBN/Cj4/7sIMi1kx7KA1xiQkoMTCyC8i4sndozfg eGS1qQBPf2N+1zDttVkhFCYd5dma0AaJk5ofo+ZnHwMDqJcvwZNyrAX6hy6nVbJwe1nc 1UUWQW/5msf0aStA8dj6BZFJ0opN7qc2kK3UCpQjUanFpgT9MJmB27IfqrRu25foWvgQ M8KEuBtAMScbE0H4xxk2WJvZ+4jtqrogsvfh+618zXFE2scBdAGw0ed+U5JBKP4VhSVG wRrvuhmNbHnA9xwxh0ky3rD8qn3dhzmGzNED5ke4jYYC3x1Dt0Lg1I+GhHA1Ks0i8kWk 7lnQ== X-Gm-Message-State: AOAM533EQZeFR24BrhubaESHL5bnxerkjczd1QRhb326vk5kVjx9Y5F/ S0EOosUy68odqNBcdwfkGpn7ENCo5Yo= X-Google-Smtp-Source: ABdhPJx8NVHZkCHQGuVrFFXE0g8++Gb9zExfwoGaCD9Z28nqPZX83+MrARgPJuMAsTXPtZ8y/U/3jg== X-Received: by 2002:a2e:b53b:: with SMTP id z27mr1636014ljm.114.1589533840254; Fri, 15 May 2020 02:10:40 -0700 (PDT) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id k22sm835206ljj.85.2020.05.15.02.10.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 May 2020 02:10:39 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Fri, 15 May 2020 12:10:38 +0300 Message-Id: <20200515091038.16743-2-martin@martin.st> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200515091038.16743-1-martin@martin.st> References: <20200515091038.16743-1-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 2/2] swscale: aarch64: Add a NEON implementation of interleaveBytes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This allows speeding up format conversions from yuv420 to nv12. Cortex A53 A72 A73 interleave_bytes_c: 86077.5 51433.0 66972.0 interleave_bytes_neon: 19701.7 23019.2 15859.2 interleave_bytes_aligned_c: 86603.0 52017.2 67484.2 interleave_bytes_aligned_neon: 9061.0 7623.0 6309.0 --- libswscale/aarch64/Makefile | 4 +- libswscale/aarch64/rgb2rgb.c | 41 ++++++++++++++++ libswscale/aarch64/rgb2rgb_neon.S | 79 +++++++++++++++++++++++++++++++ libswscale/rgb2rgb.c | 2 + libswscale/rgb2rgb.h | 1 + 5 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 libswscale/aarch64/rgb2rgb.c create mode 100644 libswscale/aarch64/rgb2rgb_neon.S diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile index 64a3fe208d..da1d909561 100644 --- a/libswscale/aarch64/Makefile +++ b/libswscale/aarch64/Makefile @@ -1,6 +1,8 @@ -OBJS += aarch64/swscale.o \ +OBJS += aarch64/rgb2rgb.o \ + aarch64/swscale.o \ aarch64/swscale_unscaled.o \ NEON-OBJS += aarch64/hscale.o \ aarch64/output.o \ + aarch64/rgb2rgb_neon.o \ aarch64/yuv2rgb_neon.o \ diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c new file mode 100644 index 0000000000..a9bf6ff9e0 --- /dev/null +++ b/libswscale/aarch64/rgb2rgb.c @@ -0,0 +1,41 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include + +#include "config.h" +#include "libavutil/attributes.h" +#include "libavutil/aarch64/cpu.h" +#include "libavutil/cpu.h" +#include "libavutil/bswap.h" +#include "libswscale/rgb2rgb.h" +#include "libswscale/swscale.h" +#include "libswscale/swscale_internal.h" + +void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2, + uint8_t *dest, int width, int height, + int src1Stride, int src2Stride, int dstStride); + +av_cold void rgb2rgb_init_aarch64(void) +{ + int cpu_flags = av_get_cpu_flags(); + + if (have_neon(cpu_flags)) { + interleaveBytes = ff_interleave_bytes_neon; + } +} diff --git a/libswscale/aarch64/rgb2rgb_neon.S b/libswscale/aarch64/rgb2rgb_neon.S new file mode 100644 index 0000000000..d8b282b6a5 --- /dev/null +++ b/libswscale/aarch64/rgb2rgb_neon.S @@ -0,0 +1,79 @@ +/* + * Copyright (c) 2020 Martin Storsjo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +// void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2, +// uint8_t *dest, int width, int height, +// int src1Stride, int src2Stride, int dstStride); +function ff_interleave_bytes_neon, export=1 + sub w5, w5, w3 + sub w6, w6, w3 + sub w7, w7, w3, lsl #1 +1: + ands w8, w3, #0xfffffff0 // & ~15 + b.eq 3f +2: + ld1 {v0.16b}, [x0], #16 + ld1 {v1.16b}, [x1], #16 + subs w8, w8, #16 + st2 {v0.16b, v1.16b}, [x2], #32 + b.gt 2b + + tst w3, #15 + b.eq 9f + +3: + tst w3, #8 + b.eq 4f + ld1 {v0.8b}, [x0], #8 + ld1 {v1.8b}, [x1], #8 + st2 {v0.8b, v1.8b}, [x2], #16 +4: + tst w3, #4 + b.eq 5f + + ld1 {v0.s}[0], [x0], #4 + ld1 {v1.s}[0], [x1], #4 + zip1 v0.8b, v0.8b, v1.8b + st1 {v0.8b}, [x2], #8 + +5: + ands w8, w3, #3 + b.eq 9f +6: + ldrb w9, [x0], #1 + ldrb w10, [x1], #1 + subs w8, w8, #1 + bfi w9, w10, #8, #8 + strh w9, [x2], #2 + b.gt 6b + +9: + subs w4, w4, #1 + b.eq 0f + add x0, x0, w5, uxtw + add x1, x1, w6, uxtw + add x2, x2, w7, uxtw + b 1b + +0: + ret +endfunc diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c index eab8e6aebb..a7300f3ba4 100644 --- a/libswscale/rgb2rgb.c +++ b/libswscale/rgb2rgb.c @@ -137,6 +137,8 @@ void (*yuyvtoyuv422)(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, av_cold void ff_sws_rgb2rgb_init(void) { rgb2rgb_init_c(); + if (ARCH_AARCH64) + rgb2rgb_init_aarch64(); if (ARCH_X86) rgb2rgb_init_x86(); } diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h index 3569254df9..48bba1586a 100644 --- a/libswscale/rgb2rgb.h +++ b/libswscale/rgb2rgb.h @@ -169,6 +169,7 @@ extern void (*yuyvtoyuv422)(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const u void ff_sws_rgb2rgb_init(void); +void rgb2rgb_init_aarch64(void); void rgb2rgb_init_x86(void); #endif /* SWSCALE_RGB2RGB_H */