From patchwork Fri Sep 27 12:52:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 51892 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d8ca:0:b0:48e:c0f8:d0de with SMTP id dy10csp831628vqb; Fri, 27 Sep 2024 23:31:12 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVbTcAEF8djFSzBX8fkldJT+UNcHvgAbevtkpguSJDvLS0KBE3GJt9MN/yXOfmiNLWUZ9n3zwmgLlgKYIDJbqPj@gmail.com X-Google-Smtp-Source: AGHT+IFKiRsvySunvMGJXpKre4c9/Uoil9kkvC4K3xnLtjGvh81cVpICCMGLiHzLOaLR3P+sW2VS X-Received: by 2002:a17:907:6d03:b0:a8a:926a:d000 with SMTP id a640c23a62f3a-a93c491f3cbmr246233066b.5.1727505071624; Fri, 27 Sep 2024 23:31:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727505071; cv=none; d=google.com; s=arc-20240605; b=IXZhpqY8niz8NA1wgic11YyvaPyGKoH5KhC3alVcs3ImMexlxGzd0wpcS1WVYBQfa3 c/SwcVZ3yNmh15RJz4nr/Rtpd2KPuzPHoVrKMD65vZ0MynP8Q4Ja+sLGKTPlcfovXbnX Q+pU89HfrSmkv9qX1Ggc2avu2TeRDFQFoThTLxeVQspSmAFzxdSxS3l6ibSMtDlEAeGS 8SIUqCBsurqkzJiPXWf1vHSy3xFisAoXHvV0UbAjEYditXj/OXh10ubsSaRMbk/swIK2 OzXQ0OStQApWA5RaCI85yYXbIjLqwJ5Zp/iobbbV6+JN7H4jtoRAfccc9heXX5vOpTon AZXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=L/n8DZdGoVjYVlI8FjtfUMIk7mVA0/iMTLiPTarJBWo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=G5daareN2jYWbjvJPnEhN5jI2xpWJh0BH5MqqB3uUgIrB5x2Azr5/Ad8L9gokb1GCs tlm51wksnQOvP+MFccKe+MKZ5JJUCqxtY7XSY05RjuZ+IQltC5MGQEtlKdVITmIsJ7Oe 1nht5AfvP2a8QeqGGK3sInDQsHaD7yMnlnjQ3I3bc5t/KsfcVkyy0CzzXgO725AGD/cL YIOkt4Dx1Qre/uR84ZpIzsMCRNU/Z0k3c//TLbidQXi1MSq3vOQz6vgv9MQbHDmKlxPR 8CByUIJv0UChD6WzEwNkb/Q+YF/GNKzRUuirhg6rjxgBPyEvxp7ap/l/OuzG8rAjyB+W ItXQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=e8e9vRkP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a93c27f1dd6si280390766b.57.2024.09.27.23.31.11; Fri, 27 Sep 2024 23:31:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=e8e9vRkP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7403F68DD5E; Fri, 27 Sep 2024 15:53:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5E98768DD19 for ; Fri, 27 Sep 2024 15:53:07 +0300 (EEST) Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a8b155b5e9eso304529166b.1 for ; Fri, 27 Sep 2024 05:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727441586; x=1728046386; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Zph9HsQ+DsS/ejRn4fq4Rc43jYCyyIUeHt2Hhut1LnE=; b=e8e9vRkPZCD7wL+CBbWpGtw3/5AlCS2Mq1xiIIaOmvXigVXpjSHb9BJ1/RioYHhJlH 1NmnABfnGF5JgHN8AXQJ9rG/u+3LcarB89AZMK0s/Mn/+VPck3KSld43fCdXv2EY4nNb 4kPSx51LULHJkfwnf3ZlDdlr0QdUZ/7G78GncyZwaSKac/t09pk3jz3SPiri5iBcEUG2 jvfwP3GwJK/cmoXPBwL/M/Bh2soSLoQ8eosINzDTJ/dje18t+bYi4hqmrT9MsTrhp5Tw rW8KVKJXjC0Orh+RjjNWWdcRbiKHxX1s7eT8UnuvIFekmAu38NRmi8XFMwAoIwYF+ydT tZlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727441586; x=1728046386; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zph9HsQ+DsS/ejRn4fq4Rc43jYCyyIUeHt2Hhut1LnE=; b=u239m5+cJ7X8go5VU6MKFxjq/78wUTT2FiKNYSci8oh5wjWHXxVf6YV1USRA5MfunO PJmJr7Ifr+0gNLbwnSpEmN+s7tgqL8i4JOMPnO+Aa3ZohSX5+F/vWI0KFirpsrql8Ott Gnbg2W7AwWnl+zO4tKgAMj2+Wv+YxmTWXbtRpw289ZKSjp/5uAN7FvaIgckm/wgxwQ+p WqJTqdKIewC+PttzeYAso5BdzqIyG39VjmybWPMkueKQVcROaDBgr5CRHhXhvbJwBvOt q6HSuvzhRB9CBhSXAB1Axxf5USnzl3B0oCq1CHERXCBFrCWKdtnlv0v1ct75s2xYYI/g ITlA== X-Gm-Message-State: AOJu0YwoGfiGmwB9n9/aNIfLOqj0qcxbfDNlSqAR0zTLxuqEcrVU9oXn hqf3g3eiY+aJLvFyEAuOgnqHhycm1s3xNg8wOR18OOrd7CgkImezTT+PWzXD X-Received: by 2002:a17:907:a46:b0:a8a:7e24:3230 with SMTP id a640c23a62f3a-a93c4967997mr253174966b.28.1727441586068; Fri, 27 Sep 2024 05:53:06 -0700 (PDT) Received: from localhost.localdomain ([109.143.184.139]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a93c2777214sm130608566b.36.2024.09.27.05.53.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Sep 2024 05:53:05 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Sep 2024 14:52:36 +0200 Message-Id: <20240927125241.15887-12-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240927125241.15887-1-ramiro.polla@gmail.com> References: <20240927125241.15887-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 11/16] swscale/aarch64/range_convert: saturate output instead of limiting input X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gZNGcv5Re8WV aarch64 A55: chrRangeFromJpeg8_1920_c: 28839.3 ( 1.00x) chrRangeFromJpeg8_1920_neon: 5312.2 ( 5.43x) 5309.9 ( 5.43x) chrRangeToJpeg8_1920_c: 44196.1 ( 1.00x) chrRangeToJpeg8_1920_neon: 6035.9 ( 7.32x) 5550.8 ( 7.96x) lumRangeFromJpeg8_1920_c: 15384.3 ( 1.00x) lumRangeFromJpeg8_1920_neon: 3148.6 ( 4.89x) 3148.1 ( 4.89x) lumRangeToJpeg8_1920_c: 23066.7 ( 1.00x) lumRangeToJpeg8_1920_neon: 3868.8 ( 5.96x) 3624.9 ( 6.36x) aarch64 A76: chrRangeFromJpeg8_1920_c: 6316.2 ( 1.00x) chrRangeFromJpeg8_1920_neon: 2263.5 ( 2.79x) 2343.8 ( 2.69x) chrRangeToJpeg8_1920_c: 11389.3 ( 1.00x) chrRangeToJpeg8_1920_neon: 2644.2 ( 4.31x) 2823.8 ( 4.03x) lumRangeFromJpeg8_1920_c: 4376.0 ( 1.00x) lumRangeFromJpeg8_1920_neon: 1110.8 ( 3.94x) 1105.8 ( 3.96x) lumRangeToJpeg8_1920_c: 6667.0 ( 1.00x) lumRangeToJpeg8_1920_neon: 1327.5 ( 5.02x) 1331.0 ( 5.01x) --- libswscale/aarch64/range_convert_neon.S | 39 ++++++++++++------------- libswscale/aarch64/swscale.c | 5 ---- 2 files changed, 18 insertions(+), 26 deletions(-) diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S index 30991ab2a6..2f418adb24 100644 --- a/libswscale/aarch64/range_convert_neon.S +++ b/libswscale/aarch64/range_convert_neon.S @@ -20,12 +20,8 @@ #include "libavutil/aarch64/asm.S" -.macro lumConvertRange name, max, mult, offset, shift +.macro lumConvertRange name, fromto, mult, offset, shift function ff_\name, export=1 -.if \max != 0 - mov w3, #\max - dup v24.8h, w3 -.endif mov w3, #\mult dup v25.4s, w3 movz w3, #(\offset & 0xffff) @@ -33,17 +29,19 @@ function ff_\name, export=1 dup v26.4s, w3 1: ld1 {v0.8h}, [x0] -.if \max != 0 - smin v0.8h, v0.8h, v24.8h -.endif mov v16.16b, v26.16b mov v18.16b, v26.16b sxtl v20.4s, v0.4h sxtl2 v22.4s, v0.8h mla v16.4s, v20.4s, v25.4s mla v18.4s, v22.4s, v25.4s +.ifc \fromto, To + sqshrn v0.4h, v16.4s, #\shift + sqshrn2 v0.8h, v18.4s, #\shift +.else shrn v0.4h, v16.4s, #\shift shrn2 v0.8h, v18.4s, #\shift +.endif subs w1, w1, #8 st1 {v0.8h}, [x0], #16 b.gt 1b @@ -51,12 +49,8 @@ function ff_\name, export=1 endfunc .endm -.macro chrConvertRange name, max, mult, offset, shift +.macro chrConvertRange name, fromto, mult, offset, shift function ff_\name, export=1 -.if \max != 0 - mov w3, #\max - dup v24.8h, w3 -.endif mov w3, #\mult dup v25.4s, w3 movz w3, #(\offset & 0xffff) @@ -65,10 +59,6 @@ function ff_\name, export=1 1: ld1 {v0.8h}, [x0] ld1 {v1.8h}, [x1] -.if \max != 0 - smin v0.8h, v0.8h, v24.8h - smin v1.8h, v1.8h, v24.8h -.endif mov v16.16b, v26.16b mov v17.16b, v26.16b mov v18.16b, v26.16b @@ -81,10 +71,17 @@ function ff_\name, export=1 mla v17.4s, v21.4s, v25.4s mla v18.4s, v22.4s, v25.4s mla v19.4s, v23.4s, v25.4s +.ifc \fromto, To + sqshrn v0.4h, v16.4s, #\shift + sqshrn v1.4h, v17.4s, #\shift + sqshrn2 v0.8h, v18.4s, #\shift + sqshrn2 v1.8h, v19.4s, #\shift +.else shrn v0.4h, v16.4s, #\shift shrn v1.4h, v17.4s, #\shift shrn2 v0.8h, v18.4s, #\shift shrn2 v1.8h, v19.4s, #\shift +.endif subs w2, w2, #8 st1 {v0.8h}, [x0], #16 st1 {v1.8h}, [x1], #16 @@ -93,7 +90,7 @@ function ff_\name, export=1 endfunc .endm -lumConvertRange lumRangeToJpeg_neon, 30189, 19077, -39057361, 14 -chrConvertRange chrRangeToJpeg_neon, 30775, 4663, -9289992, 12 -lumConvertRange lumRangeFromJpeg_neon, 0, 14071, 33561947, 14 -chrConvertRange chrRangeFromJpeg_neon, 0, 1799, 4081085, 11 +lumConvertRange lumRangeToJpeg_neon, To, 19077, -39057361, 14 +chrConvertRange chrRangeToJpeg_neon, To, 4663, -9289992, 12 +lumConvertRange lumRangeFromJpeg_neon, From, 14071, 33561947, 14 +chrConvertRange chrRangeFromJpeg_neon, From, 1799, 4081085, 11 diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c index 94059cec51..653144dbca 100644 --- a/libswscale/aarch64/swscale.c +++ b/libswscale/aarch64/swscale.c @@ -225,10 +225,6 @@ void ff_chrRangeToJpeg_neon(int16_t *dstU, int16_t *dstV, int width); av_cold void ff_sws_init_range_convert_aarch64(SwsContext *c) { - /* This code is currently disabled because of changes in the base - * implementation of these functions. This code should be enabled - * again once those changes are ported to this architecture. */ -#if 0 int cpu_flags = av_get_cpu_flags(); if (have_neon(cpu_flags)) { @@ -242,7 +238,6 @@ av_cold void ff_sws_init_range_convert_aarch64(SwsContext *c) } } } -#endif } av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)