From patchwork Tue Jul 23 12:46:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 50710 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a742:0:b0:482:c625:d099 with SMTP id f2csp2600812vqm; Tue, 23 Jul 2024 05:46:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWJ5Qx2z6f8CnSjn3azyCxrgQ17nS859IdkaceSULw0mXCQ0Py5C9dbojcKBpEkT7vcjBdfZJG8goLzXTD8Yxgsqo/OjO3v3Kf3MA== X-Google-Smtp-Source: AGHT+IFYNNO4tH983CgvkIftx+K6NNo9/bYw5AINTbAKcMcc3qAWiwWP/jHYXO7uiGLi7no+o2Bz X-Received: by 2002:a50:8d4b:0:b0:5a3:41cb:676a with SMTP id 4fb4d7f45d1cf-5a9438290famr2092279a12.27.1721738800005; Tue, 23 Jul 2024 05:46:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721738799; cv=none; d=google.com; s=arc-20160816; b=mG86aN/+9WTLrwWhN++0oJM4tgvX+qgZCiV/eGHLTcWsu298CTfXlAQOeLsEIdvouE JUmhB+N+LfLZKprqrsmmz68pCh3BniJY/QDpoYOG0xaNMc7AGuk9dX/JpPnSgyKXGZ7L ae7j+nbdpoly4aPurnNOHJiNegtC5t6Ob12IW/2FMh3domvEaoadmuR5ot2DKWSpjnTX YnB6saJ+WxrZUPe3RHRr+dmxMJcWOCLe/TNb7m6ez2bdEGZX41ppxM9Y9dAKBIbg4Oqi uPWdjrBJEvrJ0F1G+PBVWq6aSguNLGAzUkQbrY33UeyKnk54PBhYK/31vq6ibf6XoIyS ylDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=ZNLUaSp4x9sC7muBgWHFJpubr+CPLr1cWXID1vb3QOg=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=vnxoxiuS45j/ziHnBchy4FbOJiM1vewvBXbxinx7PgLGDD3iHQIMl5buP7zz77bu7X Y4Jplp361HJOX+npVdUaFQHaFwbWDItgyuyGurgOXPGJqDeehhbVAIUPQzHo2SA+YIz1 EFCP0sQyrofc55Z380Ob7Rp+h6j550PqXTC7fbycoF9NpQiW2TVpV9unss0WWaY9nR4T /Dvh8nAck+WyMmDektKmUCbp2LhYFq6Lj05/+wdYj2mfZRzdTyg8S6zSuaxg6t7E/c6l QMXG0/niwNlieT7TNWtOwiXfcC0/W6mP1qrvbD/qAwGAmDjjFD59tam48E8VFsECoQd/ 9Cdw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=KgRaUYki; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5a30bf122fasi5852935a12.344.2024.07.23.05.46.39; Tue, 23 Jul 2024 05:46:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=KgRaUYki; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A2B0368D5EE; Tue, 23 Jul 2024 15:46:21 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2977268D519 for ; Tue, 23 Jul 2024 15:46:11 +0300 (EEST) Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-369cb9f086aso1777345f8f.0 for ; Tue, 23 Jul 2024 05:46:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721738770; x=1722343570; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=pH3ebm75zW/Fl9oJlMSHO1Ma/0MzPOSNnfZccVkyNWY=; b=KgRaUYkiLjZanFzXipX7T5psDkmkfQ5i6y4df351Z0qlNIpmMk2lsnAI8jrcI3jDbk qYw7cKfmU0VQxFPhOcgMiTqFDb5otNBxExfYCwEVZQV7daR+LGaLTWbT5syO4ZtlaBh9 n7yHlo5OizqZS4sgAbpTZutd6a8LuVuxL8UlPCwZMkYJZupD7ADcCcZoyquaGyDjW8+q hALBoWfiPjrlsGJlIs03qj7WHDy26CUhNxXl/FCmjXPpcdMfNfDxyzmzbqJasNzfnJKb PY+OWYtQ3A/kwg0mJJ+g8JTHXMfMu74+bJ7R+hGMlezkL9NPyF/pJZDqCy6t+08opeuI OfNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721738770; x=1722343570; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pH3ebm75zW/Fl9oJlMSHO1Ma/0MzPOSNnfZccVkyNWY=; b=jB3wnBUNyFsZ7gla4nj0TVmBBI1NmGyQHdMWqrWUTZMa6RGQNVY5BQeOVPSwYjNe4v oEPKocDs4+DZ5jx7ff3kvQ/6fTuDPUtKVmVX1X2snAlh3HP4dnDWJnXOWmSO2O2fdQfn C84wtpIFv+va+phQ+wnI1rMLYKPnelA/xlRbgZB/yXNjUFk6m16I7fBJpfWjpK38rjDX e+f2C4R7utCD0KwwsjkF3xYHZVDEx/rkvePZgcBtDwgFRbNmsGpn4dAVx7EC616BWQmm PKpLBglQS8psT8S9wgUluEHkGKr4IbCNqeTc3G81zbInXI74OPZPsjj4b3UNL0OSlJEj g0Pw== X-Gm-Message-State: AOJu0YxcQMdkaG5Y/iEuSv8OP/2xZD+cpPE70rLTL0zKnd6S0NEVUOxT mcUPfCK0Ai7zXZYxuBAp28lAnPH45EhKFcydCzVCr4xlU0Ja/7LjY6x7fjEG X-Received: by 2002:a05:6000:507:b0:369:b838:9155 with SMTP id ffacd0b85a97d-369dee579cbmr1968680f8f.40.1721738769649; Tue, 23 Jul 2024 05:46:09 -0700 (PDT) Received: from localhost.localdomain (232.39-67-87.adsl-dyn.isp.belgacom.be. [87.67.39.232]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-368787cee03sm11493779f8f.76.2024.07.23.05.46.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jul 2024 05:46:09 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Tue, 23 Jul 2024 14:46:05 +0200 Message-Id: <20240723124606.107774-3-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240723124606.107774-1-ramiro.polla@gmail.com> References: <20240723124606.107774-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] swscale/x86/yuv2rgb: add ssse3 yuv42{0, 2}p -> gbrp unscaled colorspace converters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: cbR/tXuy4Y9I Note: this implementation is limited to x86_64 due to general purpose register pressure. --- libswscale/x86/yuv2rgb.c | 39 ++++++++++++++++++++++++++++++++++++ libswscale/x86/yuv_2_rgb.asm | 24 +++++++++++++++++++++- 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2rgb.c b/libswscale/x86/yuv2rgb.c index 68e903c6ad..2a4505fa90 100644 --- a/libswscale/x86/yuv2rgb.c +++ b/libswscale/x86/yuv2rgb.c @@ -79,6 +79,12 @@ extern void ff_yuva_420_rgb32_ssse3(x86_reg index, uint8_t *image, const uint8_t extern void ff_yuva_420_bgr32_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, const uint8_t *pv_index, const uint64_t *pointer_c_dither, const uint8_t *py_2index, const uint8_t *pa_2index); +#if ARCH_X86_64 +extern void ff_yuv_420_gbrp24_ssse3(x86_reg index, uint8_t *image, uint8_t *dst_b, uint8_t *dst_r, + const uint8_t *pu_index, const uint8_t *pv_index, + const uint64_t *pointer_c_dither, + const uint8_t *py_2index); +#endif static inline int yuv420_rgb15_ssse3(SwsContext *c, const uint8_t *src[], int srcStride[], @@ -201,6 +207,35 @@ static inline int yuv420_bgr24_ssse3(SwsContext *c, const uint8_t *src[], return srcSliceH; } +#if ARCH_X86_64 +static inline int yuv420_gbrp_ssse3(SwsContext *c, const uint8_t *src[], + int srcStride[], + int srcSliceY, int srcSliceH, + uint8_t *dst[], int dstStride[]) +{ + int y, h_size, vshift; + + h_size = (c->dstW + 7) & ~7; + if (h_size * 3 > FFABS(dstStride[0])) + h_size -= 8; + + vshift = c->srcFormat != AV_PIX_FMT_YUV422P; + + for (y = 0; y < srcSliceH; y++) { + uint8_t *dst_g = dst[0] + (y + srcSliceY) * dstStride[0]; + uint8_t *dst_b = dst[1] + (y + srcSliceY) * dstStride[1]; + uint8_t *dst_r = dst[2] + (y + srcSliceY) * dstStride[2]; + const uint8_t *py = src[0] + y * srcStride[0]; + const uint8_t *pu = src[1] + (y >> vshift) * srcStride[1]; + const uint8_t *pv = src[2] + (y >> vshift) * srcStride[2]; + x86_reg index = -h_size / 2; + + ff_yuv_420_gbrp24_ssse3(index, dst_g, dst_b, dst_r, pu - index, pv - index, &(c->redDither), py - 2 * index); + } + return srcSliceH; +} +#endif + #endif /* HAVE_X86ASM */ av_cold SwsFunc ff_yuv2rgb_init_x86(SwsContext *c) @@ -234,6 +269,10 @@ av_cold SwsFunc ff_yuv2rgb_init_x86(SwsContext *c) return yuv420_rgb16_ssse3; case AV_PIX_FMT_RGB555: return yuv420_rgb15_ssse3; +#if ARCH_X86_64 + case AV_PIX_FMT_GBRP: + return yuv420_gbrp_ssse3; +#endif } } diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm index b67ab162d2..eeb1d25942 100644 --- a/libswscale/x86/yuv_2_rgb.asm +++ b/libswscale/x86/yuv_2_rgb.asm @@ -32,6 +32,7 @@ mask_dw25 : db 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0 rgb24_shuf1: db 0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5, 10, 11 rgb24_shuf2: db 10, 11, 0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5 rgb24_shuf3: db 4, 5, 10, 11, 0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15 +gbrp_shuf : db 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15 pw_00ff: times 8 dw 255 pb_f8: times 16 db 248 pb_e0: times 16 db 224 @@ -60,8 +61,13 @@ SECTION .text %define GPR_num 6 %endif %else + %ifidn %2, gbrp + %define parameters index, image, dst_b, dst_r, pu_index, pv_index, pointer_c_dither, py_2index + %define GPR_num 8 + %else %define parameters index, image, pu_index, pv_index, pointer_c_dither, py_2index %define GPR_num 6 + %endif %endif %define m_green m2 @@ -172,10 +178,22 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters paddsw m2, m6 ; G0 G2 G4 G6 ... %if %3 == 24 ; PACK RGB24 -%define depth 3 packuswb m0, m3 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ... packuswb m1, m5 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ... packuswb m2, m7 ; G0 G2 G4 G6 ... G1 G3 G5 G7 ... +%ifidn %2, gbrp ; PLANAR GBRP +%define depth 1 + mova m4, [gbrp_shuf] + pshufb m0, m4 + pshufb m1, m4 + pshufb m2, m4 + movu [imageq], m2 + movu [dst_bq], m0 + movu [dst_rq], m1 + add dst_bq, 8 * depth * time_num + add dst_rq, 8 * depth * time_num +%else +%define depth 3 mova m3, m_red mova m6, m_blue psrldq m_red, 8 @@ -206,6 +224,7 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters movu [imageq], m0 movu [imageq + 16], m1 movu [imageq + 32], m2 +%endif ; PLANAR GBRP %else ; PACK RGB15/16/32 packuswb m0, m1 packuswb m3, m5 @@ -292,3 +311,6 @@ yuv2rgb_fn yuva, rgb, 32 yuv2rgb_fn yuva, bgr, 32 yuv2rgb_fn yuv, rgb, 15 yuv2rgb_fn yuv, rgb, 16 +%if ARCH_X86_64 +yuv2rgb_fn yuv, gbrp, 24 +%endif