From patchwork Mon Sep 23 12:40:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 51760 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d154:0:b0:48e:c0f8:d0de with SMTP id bt20csp2523067vqb; Mon, 23 Sep 2024 08:09:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUGLxUbMUZG/14C49W2NYqL4u2aYtBRRpkBLkZzr/sPCCxMry1UvuzOAAHl9tlUbDtF3YW49CS7eeocxYIvY8F0@gmail.com X-Google-Smtp-Source: AGHT+IEMM6Ld+FRL0KYbbndc8lr3oLPTrEWZLTgyrTztvZIiQFfANPH/IUJsfQw2lmWunxuWG3I1 X-Received: by 2002:a17:907:7242:b0:a8d:67d5:5e16 with SMTP id a640c23a62f3a-a90d5999373mr1146701366b.57.1727104164506; Mon, 23 Sep 2024 08:09:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727104164; cv=none; d=google.com; s=arc-20240605; b=ev7D9dto/gzVLUv26bfVB238EZOjfigvWrWPSCOAYiUM0qhQyVan8jn66dlNnvAZmU 3mjRG2kMrdNwmKxGlGvzEgxPchYAm86GzeKmfLbwlxkiCoh21t9oqSADuieeF7PzHe1v TnVeCX95IySQpvXHwYMFVUqKAIX+7iI0YMOMNYX/sErEjsbnHBf++vdYvKTna2rrbt5F brdCubh1jlQS5UN/z9bi6Bf+WdYqVvJIRy7bxlUwWE15o8UwZBleCd8SgfMxymZwznB6 K7bwFi3DJsv/7OWTeJTxZAH2gwSAns7C1kR3jVBxI04aX8jL2lwsatM8rhQfDWzj2Vpm Rr6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=5ucN8jXYTKHnStiwgqiFbGimVs968BjHP59jU5HNzLI=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=AS4BQV0JYt3TKUUNiGNyi/MGsiLyU4YTYV70FUZmbGMs51nklVHIFULG78Yw3+P9pb J3/sIHAUt10LyNUBUAVJDQK3hXrpnSIzBd3I2vzqROjbkcqJKn+EwbSbDOAl4oYRFWmk APYryPBRiYjf8Lc1edTgn/iDewVtu07SzX4Sj4H9ZMhOuhcJ6vFMKv4ZROVyNHGb2H/n syemN2OH/0I0c0HNPdA9254kzo3quQKr0CSZybv3Hm9OQr74dGdJO6K6ALgOb0LwKNPE k5sJNI1mumaSYEs6UifgJLElw2xGwnPyzanjviCIHSO4c9gpxnFAutTGMJkiZy10V5d8 3HZA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=VO6TPuKV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a90612f0f29si1338311866b.683.2024.09.23.08.09.24; Mon, 23 Sep 2024 08:09:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=VO6TPuKV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ABD6A68DC49; Mon, 23 Sep 2024 15:40:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1816768DB1B for ; Mon, 23 Sep 2024 15:40:37 +0300 (EEST) Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-374c1e5fe79so3074761f8f.1 for ; Mon, 23 Sep 2024 05:40:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727095236; x=1727700036; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=upclHbjq9NEo6eREYsuKA1lml6E7PIThn8Pf3NJWFE0=; b=VO6TPuKVwu9euexcURZj4qeh+zXPuO1/qUtdrG2rpt4p+bPeHLDpgzM3VMxgup/GFe S/qk44/4RIo12ooctnvszzcOcfnrtc1uCaxJ5c9Y35DOEAghrQIT08AAnzwXEJL3lcYV Wvr3Biaqj/fYaecSLOEW3L6knZcq5t8lnt7mcqV9vjCYL6PlXklF7XdSOVH6Z8usyOeC MXpHyXR30ZZ/+v5+m7btGdiMCeWWdy/Ic3UN8D6yL2l/Z0hbqLMm/81C9LTqfME7Cm3m fFg1gDToFaHfBhXy2Qc7gPBSxwJwX9o5tuGVY6wUc0SNlj9orYVFZ+96gahO9L3kzAFj 1lPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727095236; x=1727700036; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=upclHbjq9NEo6eREYsuKA1lml6E7PIThn8Pf3NJWFE0=; b=d3+QCKtevkSeAvLGY5gXBN8E6gaDSRIMNFwU1mZdXdlpQGf6oBWBKRwRnH6I77uK/g NXer8hjDbUTDzCW76SePJTxANBtwQvlQA587cQsM8iZ7QwWRdaZHlDoFHeVZePd/YMRf N23KjqG88YRjMHPGSm6SCvoSH96OO4kqnB4O8/jbmqqtpVs+gLOKGrRkAo3EdhPOewyY v9z/p7NnEXblY5CD1UvycZqSN/p/V4oCk1yU8ETF8hmcjQcuiKY3kluXMU9nwIqmFTHD nuNMziHcq8zz8SKlx9OPRyubfJttuRHhulpIW0SbKZIWT/SkVmdIecCcMl3SyTqLmz2m F7tA== X-Gm-Message-State: AOJu0YyKtwHa4ABD9ObiGWJrfpcmRP7QWufRqGXv3nNQvOI3PiQkfQyB e2k/whsEpqjN9NX+sfGsbNaE5dyza3ijCkavt0z5paM4itAGqZWBJKzm1g== X-Received: by 2002:a05:6000:1a52:b0:374:c324:eab5 with SMTP id ffacd0b85a97d-37a431ad52fmr6445653f8f.41.1727095235818; Mon, 23 Sep 2024 05:40:35 -0700 (PDT) Received: from localhost.localdomain (213.95-240-81.adsl-dyn.isp.belgacom.be. [81.240.95.213]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378e71f0683sm24424345f8f.13.2024.09.23.05.40.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Sep 2024 05:40:34 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Mon, 23 Sep 2024 14:40:16 +0200 Message-Id: <20240923124017.33659-14-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240923124017.33659-1-ramiro.polla@gmail.com> References: <20240923124017.33659-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 13/14] swscale/aarch64/range_convert: update neon range_convert functions to new API X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: prYP3xb60rV5 A55 A76 chrRangeFromJpeg8_1920_c: 28842.4 6346.5 chrRangeFromJpeg8_1920_neon: 5310.9 ( 5.43x) 2264.2 ( 2.80x) chrRangeToJpeg8_1920_c: 36520.7 9514.0 chrRangeToJpeg8_1920_neon: 6033.2 ( 6.05x) 2645.5 ( 3.60x) lumRangeFromJpeg8_1920_c: 15387.2 4444.5 lumRangeFromJpeg8_1920_neon: 3148.9 ( 4.89x) 1108.0 ( 4.01x) lumRangeToJpeg8_1920_c: 19226.4 6015.5 lumRangeToJpeg8_1920_neon: 3866.7 ( 4.97x) 1344.8 ( 4.47x) --- libswscale/aarch64/range_convert_neon.S | 63 +++++++++++++------------ libswscale/aarch64/swscale.c | 14 +++--- 2 files changed, 41 insertions(+), 36 deletions(-) diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S index 30991ab2a6..3454ee4932 100644 --- a/libswscale/aarch64/range_convert_neon.S +++ b/libswscale/aarch64/range_convert_neon.S @@ -20,20 +20,21 @@ #include "libavutil/aarch64/asm.S" -.macro lumConvertRange name, max, mult, offset, shift -function ff_\name, export=1 -.if \max != 0 - mov w3, #\max - dup v24.8h, w3 +.macro lumConvertRange fromto +function ff_lumRange\fromto\()Jpeg_neon, export=1 +// x0 int16_t *dst +// w1 int width +// w2 int amax +// w3 int coeff +// x4 int64_t offset +.ifc \fromto, To + dup v24.8h, w2 .endif - mov w3, #\mult dup v25.4s, w3 - movz w3, #(\offset & 0xffff) - movk w3, #((\offset >> 16) & 0xffff), lsl #16 - dup v26.4s, w3 + dup v26.4s, w4 1: ld1 {v0.8h}, [x0] -.if \max != 0 +.ifc \fromto, To smin v0.8h, v0.8h, v24.8h .endif mov v16.16b, v26.16b @@ -42,8 +43,8 @@ function ff_\name, export=1 sxtl2 v22.4s, v0.8h mla v16.4s, v20.4s, v25.4s mla v18.4s, v22.4s, v25.4s - shrn v0.4h, v16.4s, #\shift - shrn2 v0.8h, v18.4s, #\shift + shrn v0.4h, v16.4s, 14 + shrn2 v0.8h, v18.4s, 14 subs w1, w1, #8 st1 {v0.8h}, [x0], #16 b.gt 1b @@ -51,21 +52,23 @@ function ff_\name, export=1 endfunc .endm -.macro chrConvertRange name, max, mult, offset, shift -function ff_\name, export=1 -.if \max != 0 - mov w3, #\max +.macro chrConvertRange fromto +function ff_chrRange\fromto\()Jpeg_neon, export=1 +// x0 int16_t *dstU +// x1 int16_t *dstV +// w2 int width +// w3 int amax +// w4 int coeff +// x5 int64_t offset +.ifc \fromto, To dup v24.8h, w3 .endif - mov w3, #\mult - dup v25.4s, w3 - movz w3, #(\offset & 0xffff) - movk w3, #((\offset >> 16) & 0xffff), lsl #16 - dup v26.4s, w3 + dup v25.4s, w4 + dup v26.4s, w5 1: ld1 {v0.8h}, [x0] ld1 {v1.8h}, [x1] -.if \max != 0 +.ifc \fromto, To smin v0.8h, v0.8h, v24.8h smin v1.8h, v1.8h, v24.8h .endif @@ -81,10 +84,10 @@ function ff_\name, export=1 mla v17.4s, v21.4s, v25.4s mla v18.4s, v22.4s, v25.4s mla v19.4s, v23.4s, v25.4s - shrn v0.4h, v16.4s, #\shift - shrn v1.4h, v17.4s, #\shift - shrn2 v0.8h, v18.4s, #\shift - shrn2 v1.8h, v19.4s, #\shift + shrn v0.4h, v16.4s, 14 + shrn v1.4h, v17.4s, 14 + shrn2 v0.8h, v18.4s, 14 + shrn2 v1.8h, v19.4s, 14 subs w2, w2, #8 st1 {v0.8h}, [x0], #16 st1 {v1.8h}, [x1], #16 @@ -93,7 +96,7 @@ function ff_\name, export=1 endfunc .endm -lumConvertRange lumRangeToJpeg_neon, 30189, 19077, -39057361, 14 -chrConvertRange chrRangeToJpeg_neon, 30775, 4663, -9289992, 12 -lumConvertRange lumRangeFromJpeg_neon, 0, 14071, 33561947, 14 -chrConvertRange chrRangeFromJpeg_neon, 0, 1799, 4081085, 11 +lumConvertRange To +chrConvertRange To +lumConvertRange From +chrConvertRange From diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c index 21788cad5d..55fb81c1e3 100644 --- a/libswscale/aarch64/swscale.c +++ b/libswscale/aarch64/swscale.c @@ -218,14 +218,17 @@ NEON_INPUT(bgra32); NEON_INPUT(rgb24); NEON_INPUT(rgba32); -void ff_lumRangeFromJpeg_neon(int16_t *dst, int width); -void ff_chrRangeFromJpeg_neon(int16_t *dstU, int16_t *dstV, int width); -void ff_lumRangeToJpeg_neon(int16_t *dst, int width); -void ff_chrRangeToJpeg_neon(int16_t *dstU, int16_t *dstV, int width); +void ff_lumRangeFromJpeg_neon(int16_t *dst, int width, + int amax, int coeff, int64_t offset); +void ff_chrRangeFromJpeg_neon(int16_t *dstU, int16_t *dstV, int width, + int amax, int coeff, int64_t offset); +void ff_lumRangeToJpeg_neon(int16_t *dst, int width, + int amax, int coeff, int64_t offset); +void ff_chrRangeToJpeg_neon(int16_t *dstU, int16_t *dstV, int width, + int amax, int coeff, int64_t offset); av_cold void ff_sws_init_range_convert_aarch64(SwsContext *c) { -#if 0 int cpu_flags = av_get_cpu_flags(); if (have_neon(cpu_flags)) { @@ -239,7 +242,6 @@ av_cold void ff_sws_init_range_convert_aarch64(SwsContext *c) } } } -#endif } av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)