From patchwork Tue Jul 23 12:46:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 50711 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a742:0:b0:482:c625:d099 with SMTP id f2csp2605546vqm; Tue, 23 Jul 2024 05:56:53 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW9h8OqR7jkKFTr3OP2omDi0JPv3MV/SgmORyJYO1fo9bueD/AXkArrOZbUIVPdUlzLiVrYeJzNWA8nG2B6woL8zyGRkTs75Vew5A== X-Google-Smtp-Source: AGHT+IEBmZ00UECdmcITL+8rna3C6k8KX8J2QvPkKCvTLECJyg4Jj7av6s1/jJnsa2ABO/YXMy42 X-Received: by 2002:a2e:98c6:0:b0:2ee:87b9:91a7 with SMTP id 38308e7fff4ca-2f01ea57aadmr18220701fa.18.1721739412775; Tue, 23 Jul 2024 05:56:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721739412; cv=none; d=google.com; s=arc-20160816; b=shU7DSM9yY3zePUJPkxhx463kAkVzePfy6d9qcsClOlZNIk5fzA3ZjlcpccrK2xSOv KTqbBW1VdUsq5OeE+DPdZwsxKSdrbMky6z1lFXMc8+JHfra2WAXIEr1qhmC1D5JV1ZbM HZCcF7/4+f/odA93O5rGHqlMWwQgV9nofwwADSnz2gAUVhlV0HvRBVulIDGMDIcWXc/S ADgxICkHuIUbNEEPXJu+Q0TpjXVQt1+1hDVr7ZDTezd56pWFU2I8llSE30J9KAl3D5oD Og/iFWZwSBKGvUmNGXQhKVEg91axb9GFr9yPinfBud1D92FDIas5ZWPaSGU/erhfjH+9 mpdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=n+tv5IGP7maJgEQ0YIjFJsFCl0QHKNEJu/cTyBmkbmE=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=i4APK+mM8XNIuoaEoI5Ft/Vs1kH15JZVqGB9YYVYbuu47uH2RTU9NrEWjhjbIPQUY8 33jDeFNDm5mIppCK9khajE33FeYN3CVtSck11QjGowCuWsOhFGSdLxGw9uvK3VEZ0r6D KolfiUBu6lBrWyowsHFr4C3voTIjqEL7n0DJeONUA+ronp6SBDutbheoHpohOk2R8vr6 lDGW2niHpIur+icCfBPfjxGsp/inlPjRA9ac/KvyvrM7rnzO7UoeJSEuCnSV/M5G9BDK oL/m1vsi7AEWn1qn2iB4jzHWfh5pzuvVo3ygIyu7h5r4MPPKZCIhs9QeyabZN999d5UO ak/A==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=cYp0eYnV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ef2fad89d9si10738621fa.421.2024.07.23.05.56.51; Tue, 23 Jul 2024 05:56:52 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=cYp0eYnV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DC7D468D67D; Tue, 23 Jul 2024 15:46:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B0EEE68D53F for ; Tue, 23 Jul 2024 15:46:11 +0300 (EEST) Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4266fd39527so41119925e9.1 for ; Tue, 23 Jul 2024 05:46:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721738771; x=1722343571; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=K08wK9Hu9DSAY51ffGvqaVZIsUyqxfAxlz9ByyKORCg=; b=cYp0eYnVALzx2KFBmOW1c5xby30H/Pw77iIoIxlKodNA2oFEQZIevLSJIJeHnTrq6J mqYuWpvqNSYMyrxqBCxKBgoT3NvnvxMaxN2Lj/fay2VsGsZH5ahQNou8qPv855ZEzQJU k2MGP2ltpNQiwxq9sn2SZq+3fv/eeQzBhZsXIcIGatwBNTjRFv2W/vbmXqvQv2vIQQKS ck+sA7bqBhjR3ITIwb58ffbUscnjY+3zw7cqfAdF2xY9evxEmEaXYwDBr6Gf70He70wJ JJTwnFxCHpbeVULEoRvs2jPZcOaUd1OGyzIKhI1RwEouHTTteY59wtmcfLdi3vLHb34b koAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721738771; x=1722343571; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K08wK9Hu9DSAY51ffGvqaVZIsUyqxfAxlz9ByyKORCg=; b=ZiC2K4ydidiNAOaeU6W/SkeTaMj/TtHgjBD/9qX0vUupG0Yx8od/ny8DZTtU5pnoag uHehy0YQyQsirqOSSro0q+NbvgKdij7FpMg8o4Cryw+uAZ08Gax2rm/AQO/fu23VFrI1 +mdrufrbWx418hCrS79Cuh6EJXNE4v3m0B9p3VlFN+ZiTmVjwb9mdfCU7i2pkCRApfIi wR48KXgKb99x+VvedkyywtMd05dxAF+NyUblXVYypjNP41rUtBdNc3vQsXCIUkzw6mNr 9doq8b9VaHFSbjz6FAwY6OsvOp83573kl+i0lzX6m1zRq91Znknp1C6Tln9zMSqjRRrQ CIxw== X-Gm-Message-State: AOJu0YwgSSV3uy5iPtOrwJTN6Ho/7SzouySpRIlaSpgopOGLRHC8aqUK P4mXQh3xptbUXmrE/ySApF5RAf/tqRpQRh8Ve7Xle1Bhgf4R75Fr2+RLRvBx X-Received: by 2002:a5d:464f:0:b0:367:8a9e:6bed with SMTP id ffacd0b85a97d-369dee631c6mr2041260f8f.61.1721738770468; Tue, 23 Jul 2024 05:46:10 -0700 (PDT) Received: from localhost.localdomain (232.39-67-87.adsl-dyn.isp.belgacom.be. [87.67.39.232]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-368787cee03sm11493779f8f.76.2024.07.23.05.46.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jul 2024 05:46:10 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Tue, 23 Jul 2024 14:46:06 +0200 Message-Id: <20240723124606.107774-4-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240723124606.107774-1-ramiro.polla@gmail.com> References: <20240723124606.107774-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64/yuv2rgb: add neon yuv42{0, 2}p -> gbrp unscaled colorspace converters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: wu4n35TfFtp3 --- libswscale/aarch64/swscale_unscaled.c | 58 +++++++++++++++++++++ libswscale/aarch64/yuv2rgb_neon.S | 73 ++++++++++++++++++++++----- 2 files changed, 118 insertions(+), 13 deletions(-) diff --git a/libswscale/aarch64/swscale_unscaled.c b/libswscale/aarch64/swscale_unscaled.c index b3093bbc9d..5c4f6fee34 100644 --- a/libswscale/aarch64/swscale_unscaled.c +++ b/libswscale/aarch64/swscale_unscaled.c @@ -52,11 +52,41 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], c->yuv2rgb_y_coeff); \ } \ +#define DECLARE_FF_YUVX_TO_GBRP_FUNCS(ifmt, ofmt) \ +int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ + uint8_t *dst, int linesize, \ + const uint8_t *srcY, int linesizeY, \ + const uint8_t *srcU, int linesizeU, \ + const uint8_t *srcV, int linesizeV, \ + const int16_t *table, \ + int y_offset, \ + int y_coeff, \ + uint8_t *dst1, int linesize1, \ + uint8_t *dst2, int linesize2); \ + \ +static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], \ + int srcStride[], int srcSliceY, int srcSliceH, \ + uint8_t *dst[], int dstStride[]) { \ + const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \ + \ + return ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \ + dst[0] + srcSliceY * dstStride[0], dstStride[0], \ + src[0], srcStride[0], \ + src[1], srcStride[1], \ + src[2], srcStride[2], \ + yuv2rgb_table, \ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff, \ + dst[1] + srcSliceY * dstStride[1], dstStride[1], \ + dst[2] + srcSliceY * dstStride[2], dstStride[2]); \ +} \ + #define DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuvx) \ DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, argb) \ DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, rgba) \ DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, abgr) \ DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, bgra) \ +DECLARE_FF_YUVX_TO_GBRP_FUNCS(yuvx, gbrp) \ DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuv420p) DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuv422p) @@ -83,11 +113,38 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], c->yuv2rgb_y_coeff); \ } \ +#define DECLARE_FF_NVX_TO_GBRP_FUNCS(ifmt, ofmt) \ +int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ + uint8_t *dst, int linesize, \ + const uint8_t *srcY, int linesizeY, \ + const uint8_t *srcC, int linesizeC, \ + const int16_t *table, \ + int y_offset, \ + int y_coeff, \ + uint8_t *dst1, int linesize1, \ + uint8_t *dst2, int linesize2); \ + \ +static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], \ + int srcStride[], int srcSliceY, int srcSliceH, \ + uint8_t *dst[], int dstStride[]) { \ + const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \ + \ + return ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \ + dst[0] + srcSliceY * dstStride[0], dstStride[0], \ + src[0], srcStride[0], src[1], srcStride[1], \ + yuv2rgb_table, \ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff, \ + dst[1] + srcSliceY * dstStride[1], dstStride[1], \ + dst[2] + srcSliceY * dstStride[2], dstStride[2]); \ +} \ + #define DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nvx) \ DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, argb) \ DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, rgba) \ DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, abgr) \ DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, bgra) \ +DECLARE_FF_NVX_TO_GBRP_FUNCS(nvx, gbrp) \ DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nv12) DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nv21) @@ -110,6 +167,7 @@ DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nv21) SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, rgba, RGBA, accurate_rnd); \ SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, abgr, ABGR, accurate_rnd); \ SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, bgra, BGRA, accurate_rnd); \ + SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, gbrp, GBRP, accurate_rnd); \ } while (0) static void get_unscaled_swscale_neon(SwsContext *c) { diff --git a/libswscale/aarch64/yuv2rgb_neon.S b/libswscale/aarch64/yuv2rgb_neon.S index 89d69e7f6c..b89eb2c781 100644 --- a/libswscale/aarch64/yuv2rgb_neon.S +++ b/libswscale/aarch64/yuv2rgb_neon.S @@ -30,23 +30,43 @@ #endif .endm -.macro load_args_nv12 +.macro load_dst1_dst2 dst1 linesize1 dst2 linesize2 +#if defined(__APPLE__) +#define DST_OFFSET 8 +#else +#define DST_OFFSET 0 +#endif + ldr x10, [sp, #\dst1 - DST_OFFSET] + ldr w12, [sp, #\linesize1 - DST_OFFSET] + ldr x15, [sp, #\dst2 - DST_OFFSET] + ldr w16, [sp, #\linesize2 - DST_OFFSET] +#undef DST_OFFSET + sub w12, w12, w0 // w12 = linesize1 - width (padding1) + sub w16, w16, w0 // w16 = linesize2 - width (padding2) +.endm + +.macro load_args_nv12 ofmt ldr x8, [sp] // table load_yoff_ycoeff 8, 16 // y_offset, y_coeff ld1 {v1.1d}, [x8] dup v0.8h, w10 dup v3.8h, w9 +.ifc \ofmt,gbrp + load_dst1_dst2 24, 32, 40, 48 + sub w3, w3, w0 // w3 = linesize - width (padding) +.else sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding) +.endif sub w5, w5, w0 // w5 = linesizeY - width (paddingY) sub w7, w7, w0 // w7 = linesizeC - width (paddingC) neg w11, w0 .endm -.macro load_args_nv21 - load_args_nv12 +.macro load_args_nv21 ofmt + load_args_nv12 \ofmt .endm -.macro load_args_yuv420p +.macro load_args_yuv420p ofmt ldr x13, [sp] // srcV ldr w14, [sp, #8] // linesizeV ldr x8, [sp, #16] // table @@ -54,7 +74,12 @@ ld1 {v1.1d}, [x8] dup v0.8h, w10 dup v3.8h, w9 +.ifc \ofmt,gbrp + load_dst1_dst2 40, 48, 56, 64 + sub w3, w3, w0 // w3 = linesize - width (padding) +.else sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding) +.endif sub w5, w5, w0 // w5 = linesizeY - width (paddingY) sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU) sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV) @@ -62,7 +87,7 @@ neg w11, w11 .endm -.macro load_args_yuv422p +.macro load_args_yuv422p ofmt ldr x13, [sp] // srcV ldr w14, [sp, #8] // linesizeV ldr x8, [sp, #16] // table @@ -70,7 +95,12 @@ ld1 {v1.1d}, [x8] dup v0.8h, w10 dup v3.8h, w9 +.ifc \ofmt,gbrp + load_dst1_dst2 40, 48, 56, 64 + sub w3, w3, w0 // w3 = linesize - width (padding) +.else sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding) +.endif sub w5, w5, w0 // w5 = linesizeY - width (paddingY) sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU) sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV) @@ -100,9 +130,9 @@ .endm .macro increment_nv12 - ands w15, w1, #1 - csel w16, w7, w11, ne // incC = (h & 1) ? paddincC : -width - add x6, x6, w16, sxtw // srcC += incC + ands w17, w1, #1 + csel w17, w7, w11, ne // incC = (h & 1) ? paddincC : -width + add x6, x6, w17, sxtw // srcC += incC .endm .macro increment_nv21 @@ -110,10 +140,10 @@ .endm .macro increment_yuv420p - ands w15, w1, #1 - csel w16, w7, w11, ne // incU = (h & 1) ? paddincU : -width/2 + ands w17, w1, #1 + csel w17, w7, w11, ne // incU = (h & 1) ? paddincU : -width/2 + add x6, x6, w17, sxtw // srcU += incU csel w17, w14, w11, ne // incV = (h & 1) ? paddincV : -width/2 - add x6, x6, w16, sxtw // srcU += incU add x13, x13, w17, sxtw // srcV += incV .endm @@ -122,7 +152,7 @@ add x13, x13, w14, sxtw // srcV += incV .endm -.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 +.macro compute_rgb r1 g1 b1 r2 g2 b2 add v20.8h, v26.8h, v20.8h // Y1 + R1 add v21.8h, v27.8h, v21.8h // Y2 + R2 add v22.8h, v26.8h, v22.8h // Y1 + G1 @@ -135,13 +165,18 @@ sqrshrun \g2, v23.8h, #1 // clip_u8((Y2 + G1) >> 1) sqrshrun \b1, v24.8h, #1 // clip_u8((Y1 + B1) >> 1) sqrshrun \b2, v25.8h, #1 // clip_u8((Y2 + B1) >> 1) +.endm + +.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 + compute_rgb \r1, \g1, \b1, \r2, \g2, \b2 movi \a1, #255 movi \a2, #255 .endm .macro declare_func ifmt ofmt function ff_\ifmt\()_to_\ofmt\()_neon, export=1 - load_args_\ifmt + load_args_\ifmt \ofmt + mov w9, w1 1: mov w8, w0 // w8 = width @@ -185,11 +220,22 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 compute_rgba v6.8b,v5.8b,v4.8b,v7.8b, v18.8b,v17.8b,v16.8b,v19.8b .endif +.ifc \ofmt,gbrp + compute_rgb v18.8b,v4.8b,v6.8b, v19.8b,v5.8b,v7.8b + st1 { v4.8b, v5.8b }, [x2], #16 + st1 { v6.8b, v7.8b }, [x10], #16 + st1 { v18.8b, v19.8b }, [x15], #16 +.else st4 { v4.8b, v5.8b, v6.8b, v7.8b}, [x2], #32 st4 {v16.8b,v17.8b,v18.8b,v19.8b}, [x2], #32 +.endif subs w8, w8, #16 // width -= 16 b.gt 2b add x2, x2, w3, sxtw // dst += padding +.ifc \ofmt,gbrp + add x10, x10, w12, sxtw // dst1 += padding1 + add x15, x15, w16, sxtw // dst2 += padding2 +.endif add x4, x4, w5, sxtw // srcY += paddingY increment_\ifmt subs w1, w1, #1 // height -= 1 @@ -204,6 +250,7 @@ endfunc declare_func \ifmt, rgba declare_func \ifmt, abgr declare_func \ifmt, bgra + declare_func \ifmt, gbrp .endm declare_rgb_funcs nv12