From patchwork Mon Sep 23 12:40:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 51768 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d154:0:b0:48e:c0f8:d0de with SMTP id bt20csp2571484vqb; Mon, 23 Sep 2024 09:19:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWTlPsHnsAzASVZjFBIh+PD6GF7CU3G6q4cgBo6P/8+o7bnTNq5gZCQA6wWylMcQOp5meiHgvjRQYdhiw3brxnS@gmail.com X-Google-Smtp-Source: AGHT+IE6LG8B93FDN1076lj+bgTfTqEbFnNczjm51os/ODSotqjC+WLQZuy8cPIdUrlvin0AHJEd X-Received: by 2002:a5d:614f:0:b0:376:f482:8fdf with SMTP id ffacd0b85a97d-37a42253316mr9744213f8f.4.1727108364234; Mon, 23 Sep 2024 09:19:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727108364; cv=none; d=google.com; s=arc-20240605; b=Ua2BqdwOiw41Hk3MUgSftbOK0EnKShlYMRoJQWc0543Fy3QC2WroDeUDI0WJjxpbj6 8J3IxDrdWcB3zH2dbv706l4n7Ls5YduXJXF+MPDab3Fu/WWnaqaXIyPG0h+CWBK4QUMl yDJWpVTtCcrfd5BU0MzSkj2fqxApgu+CZ4UhO26f/CT+yhmfrNITCX+tGMs76dadMlRA IE4u27eCg5WdtD+aj0XmGnr4tk4pKhF048/knSmAf0E5ZWAe8z+wrgSQCKaBjRc1DV9k 9/C002wc+Gge4qA5MTTxX8Nx2NaXcPrBLwHBrg0W9a813HeYtgUvoRPG/ILlxXx1dP0H qvZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=OP048WW/mViDkwVFFkXbL+ETqeBsscBdNij9TuL97B4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=bfz//Y9g+ueZccDQNdomW63mqyvb/FWwRSVnWUQG79BmG3O9AzuMYACCff2opHBUWr TuTnz/dWxaNAsgBtVKCuSnsilMUQhnSSvJAUClycr4C5Y6AZSOKpw+SiVEGcbdWNoL1a /cCTcxSU8eiCgBqd+F00NRzkv1tbs1fTfZ6MRIG+OFQkE8JPMz7j24o8ZXFjKf6e1nzX qF+0G1Qp4AkGChLjx/vwPtXzX7TqQ2BbTjbzjvH8F3tnXz4CM/U5DU75LuEKaaNY4/yI tH7S3XxZbVqXhy0MMkmOF+mIZ3p7xPn22n6niTucAcrBRoSRwnkTMkmaZUXST3zsE7/2 F/XQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=JMtBzrUw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a9061373519si1424634566b.1035.2024.09.23.09.19.23; Mon, 23 Sep 2024 09:19:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=JMtBzrUw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0F13068DC25; Mon, 23 Sep 2024 15:40:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B790768DB89 for ; Mon, 23 Sep 2024 15:40:34 +0300 (EEST) Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-374ba74e9b6so3551847f8f.0 for ; Mon, 23 Sep 2024 05:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727095233; x=1727700033; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=j2744gpxSMKXqOe+1jwchhmzaCmg5nBWzzY+Jm0ndhM=; b=JMtBzrUwFToNoZDyJo42l4ShXEJjmyIXAShkkKNAUKG7kihFWxNxznpGP6xk/KcohC E0/o/7HvF1oG/hijnScdsitdDrK3BkOcYFoOZFbUDGl+TIUzq1PR57/4/Gf+4sTCm2tN zMGLOqavV4yDAQ8vrFiuybY9yuyxK+KCUiZYYycErh1UXyCJFBN4tWKSa70Lye8Xbw9J umGOu/3hldDUcVYzzBzruxR3nBw0L0rfi1R6e/xaZyuqtiJxfr1iZ7CqnKhAtGgh+YiP Ny+vcMM78s2o3ZF550c9MJGzYlF77h/BXu5S+wqmaPiD4p9Kt3tELrb9w/+CPNlGRD0o 18vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727095233; x=1727700033; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j2744gpxSMKXqOe+1jwchhmzaCmg5nBWzzY+Jm0ndhM=; b=ngE4mNaRBjJWScHGg7aKwvN9c8DPrfkBAAaq13o4wY1/NnN1Ey5uaTfWdqSE6ZxyNY jmI3OmQmdHDzkZjwFHYu6HyIcTkOUKwpl82GX87nOmFS488UE1pFqwbpI9F4MpGuGqLc yQyDzI4UxBRHEdab4GOs5CA731GFR3yQrDIPxoXPIRxAkXiQJJOlPG41RZirZShN2ubs KDXT1caoRTguiGUx4DbRq3oJFDegG21FHSvooOAQW+S/sCmkPmMwp884A/+FRQWcSJ3U XhPp4LybvaLK9xWW6zk1yAwSP2B6T9hKYmyMOzTmGl1swBSscBFuqZ35t4pOHM+qBqWJ 4S+A== X-Gm-Message-State: AOJu0YywMHAR+UVQVvq4U+nKS5Xr8VnzCKv/d5j8zQcNJX4CvFXuyLux yLy+woarqZDtW5tclq486oGsn0gL+DsHkqiJzEIURCBGrneB5JbQdS6pUQ== X-Received: by 2002:a5d:4d49:0:b0:374:bf18:6092 with SMTP id ffacd0b85a97d-37a42367a61mr9779026f8f.38.1727095233229; Mon, 23 Sep 2024 05:40:33 -0700 (PDT) Received: from localhost.localdomain (213.95-240-81.adsl-dyn.isp.belgacom.be. [81.240.95.213]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378e71f0683sm24424345f8f.13.2024.09.23.05.40.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Sep 2024 05:40:32 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Mon, 23 Sep 2024 14:40:14 +0200 Message-Id: <20240923124017.33659-12-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240923124017.33659-1-ramiro.polla@gmail.com> References: <20240923124017.33659-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 11/14] swscale/x86/range_convert: update sse2 and avx2 range_convert functions to new API X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MD4Vx551zD5i chrRangeFromJpeg8_1920_c: 3874.8 ( 1.00x) chrRangeFromJpeg8_1920_sse2: 1493.8 ( 2.59x) chrRangeFromJpeg8_1920_avx2: 741.8 ( 5.22x) chrRangeToJpeg8_1920_c: 5232.8 ( 1.00x) chrRangeToJpeg8_1920_sse2: 1673.3 ( 3.13x) chrRangeToJpeg8_1920_avx2: 850.6 ( 6.15x) lumRangeFromJpeg8_1920_c: 2416.3 ( 1.00x) lumRangeFromJpeg8_1920_sse2: 760.1 ( 3.18x) lumRangeFromJpeg8_1920_avx2: 379.6 ( 6.37x) lumRangeToJpeg8_1920_c: 3121.1 ( 1.00x) lumRangeToJpeg8_1920_sse2: 870.1 ( 3.59x) lumRangeToJpeg8_1920_avx2: 434.8 ( 7.18x) --- libswscale/x86/range_convert.asm | 112 ++++++++++++++++++------------- libswscale/x86/swscale.c | 14 ++-- 2 files changed, 73 insertions(+), 53 deletions(-) diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm index 97c7525448..d1aff63d7c 100644 --- a/libswscale/x86/range_convert.asm +++ b/libswscale/x86/range_convert.asm @@ -20,55 +20,53 @@ %include "libavutil/x86/x86util.asm" -SECTION_RODATA - -chr_to_mult: times 4 dw 4663, 0 -chr_to_offset: times 4 dd -9289992 -%define chr_to_shift 12 - -chr_from_mult: times 4 dw 1799, 0 -chr_from_offset: times 4 dd 4081085 -%define chr_from_shift 11 - -lum_to_mult: times 4 dw 19077, 0 -lum_to_offset: times 4 dd -39057361 -%define lum_to_shift 14 - -lum_from_mult: times 4 dw 14071, 0 -lum_from_offset: times 4 dd 33561947 -%define lum_from_shift 14 - SECTION .text -; NOTE: there is no need to clamp the input when converting to jpeg range -; (like we do in the C code) because packssdw will saturate the output. - ;----------------------------------------------------------------------------- ; lumConvertRange ; -; void ff_lumRangeToJpeg_(int16_t *dst, int width); -; void ff_lumRangeFromJpeg_(int16_t *dst, int width); +; void ff_lumRangeToJpeg_(int16_t *dst, int width, +; int amax, int coeff, int64_t offset); +; void ff_lumRangeFromJpeg_(int16_t *dst, int width, +; int amax, int coeff, int64_t offset); ; ;----------------------------------------------------------------------------- -%macro LUMCONVERTRANGE 4 -cglobal %1, 2, 2, 5, dst, width +%macro LUMCONVERTRANGE 1 +%ifidni %1,To +cglobal lumRange%1Jpeg, 5, 5, 6, dst, width, amax, coeff, offset +%else +cglobal lumRange%1Jpeg, 5, 5, 5, dst, width, amax, coeff, offset +%endif shl widthd, 1 - VBROADCASTI128 m2, [%2] - VBROADCASTI128 m3, [%3] + movd xm2, coeffd + VBROADCASTSS m2, xm2 +%if ARCH_X86_64 + movq xm3, offsetq +%else + movq xm3, offsetm +%endif + VBROADCASTSS m3, xm3 pxor m4, m4 +%ifidni %1,To + movd xm5, amaxd + SPLATW m5, xm5 +%endif add dstq, widthq neg widthq .loop: movu m0, [dstq+widthq] +%ifidni %1,To + pminsw m0, m5 +%endif punpckhwd m1, m0, m4 punpcklwd m0, m4 pmaddwd m0, m2 pmaddwd m1, m2 paddd m0, m3 paddd m1, m3 - psrad m0, %4 - psrad m1, %4 + psrad m0, 14 + psrad m1, 14 packssdw m0, m1 movu [dstq+widthq], m0 add widthq, mmsize @@ -79,23 +77,43 @@ cglobal %1, 2, 2, 5, dst, width ;----------------------------------------------------------------------------- ; chrConvertRange ; -; void ff_chrRangeToJpeg_(int16_t *dstU, int16_t *dstV, int width); -; void ff_chrRangeFromJpeg_(int16_t *dstU, int16_t *dstV, int width); +; void ff_chrRangeToJpeg_(int16_t *dstU, int16_t *dstV, int width, +; int amax, int coeff, int64_t offset); +; void ff_chrRangeFromJpeg_(int16_t *dstU, int16_t *dstV, int width, +; int amax, int coeff, int64_t offset); ; ;----------------------------------------------------------------------------- -%macro CHRCONVERTRANGE 4 -cglobal %1, 3, 3, 7, dstU, dstV, width +%macro CHRCONVERTRANGE 1 +%ifidni %1,To +cglobal chrRange%1Jpeg, 6, 6, 8, dstU, dstV, width, amax, coeff, offset +%else +cglobal chrRange%1Jpeg, 6, 6, 7, dstU, dstV, width, amax, coeff, offset +%endif shl widthd, 1 - VBROADCASTI128 m4, [%2] - VBROADCASTI128 m5, [%3] + movd xm4, coeffd + VBROADCASTSS m4, xm4 +%if ARCH_X86_64 + movq xm5, offsetq +%else + movq xm5, offsetm +%endif + VBROADCASTSS m5, xm5 pxor m6, m6 +%ifidni %1,To + movd xm7, amaxd + SPLATW m7, xm7 +%endif add dstUq, widthq add dstVq, widthq neg widthq .loop: movu m0, [dstUq+widthq] movu m2, [dstVq+widthq] +%ifidni %1,To + pminsw m0, m7 + pminsw m2, m7 +%endif punpckhwd m1, m0, m6 punpckhwd m3, m2, m6 punpcklwd m0, m6 @@ -108,10 +126,10 @@ cglobal %1, 3, 3, 7, dstU, dstV, width paddd m1, m5 paddd m2, m5 paddd m3, m5 - psrad m0, %4 - psrad m1, %4 - psrad m2, %4 - psrad m3, %4 + psrad m0, 14 + psrad m1, 14 + psrad m2, 14 + psrad m3, 14 packssdw m0, m1 packssdw m2, m3 movu [dstUq+widthq], m0 @@ -122,15 +140,15 @@ cglobal %1, 3, 3, 7, dstU, dstV, width %endmacro INIT_XMM sse2 -LUMCONVERTRANGE lumRangeToJpeg, lum_to_mult, lum_to_offset, lum_to_shift -CHRCONVERTRANGE chrRangeToJpeg, chr_to_mult, chr_to_offset, chr_to_shift -LUMCONVERTRANGE lumRangeFromJpeg, lum_from_mult, lum_from_offset, lum_from_shift -CHRCONVERTRANGE chrRangeFromJpeg, chr_from_mult, chr_from_offset, chr_from_shift +LUMCONVERTRANGE To +CHRCONVERTRANGE To +LUMCONVERTRANGE From +CHRCONVERTRANGE From %if HAVE_AVX2_EXTERNAL INIT_YMM avx2 -LUMCONVERTRANGE lumRangeToJpeg, lum_to_mult, lum_to_offset, lum_to_shift -CHRCONVERTRANGE chrRangeToJpeg, chr_to_mult, chr_to_offset, chr_to_shift -LUMCONVERTRANGE lumRangeFromJpeg, lum_from_mult, lum_from_offset, lum_from_shift -CHRCONVERTRANGE chrRangeFromJpeg, chr_from_mult, chr_from_offset, chr_from_shift +LUMCONVERTRANGE To +CHRCONVERTRANGE To +LUMCONVERTRANGE From +CHRCONVERTRANGE From %endif diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index d55e45471f..2377365e91 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -464,24 +464,26 @@ INPUT_PLANAR_RGB_A_ALL_DECL(avx2); } while (0) #define RANGE_CONVERT_FUNCS_DECL(opt) \ -void ff_lumRangeFromJpeg_ ##opt(int16_t *dst, int width); \ -void ff_chrRangeFromJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ -void ff_lumRangeToJpeg_ ##opt(int16_t *dst, int width); \ -void ff_chrRangeToJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ +void ff_lumRangeFromJpeg_ ##opt(int16_t *dst, int width, \ + int amax, int coeff, int64_t offset); \ +void ff_chrRangeFromJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width, \ + int amax, int coeff, int64_t offset); \ +void ff_lumRangeToJpeg_ ##opt(int16_t *dst, int width, \ + int amax, int coeff, int64_t offset); \ +void ff_chrRangeToJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width, \ + int amax, int coeff, int64_t offset); \ RANGE_CONVERT_FUNCS_DECL(sse2); RANGE_CONVERT_FUNCS_DECL(avx2); av_cold void ff_sws_init_range_convert_x86(SwsContext *c) { -#if 0 int cpu_flags = av_get_cpu_flags(); if (EXTERNAL_AVX2_FAST(cpu_flags)) { RANGE_CONVERT_FUNCS(avx2); } else if (EXTERNAL_SSE2(cpu_flags)) { RANGE_CONVERT_FUNCS(sse2); } -#endif } av_cold void ff_sws_init_swscale_x86(SwsContext *c)