From patchwork Fri Sep 27 12:52:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 35185 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d8ca:0:b0:48e:c0f8:d0de with SMTP id dy10csp836384vqb; Fri, 27 Sep 2024 23:51:11 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXeqbER0esILLBj4LK83JqxITR45fxX9gNcpmev9nEbdWpVi7M3StSFylXn0aZKZy+RkdiBTdzoCyNTYnRcbn+c@gmail.com X-Google-Smtp-Source: AGHT+IH+re1GiwKxghej5nun9rpNFB4fUL0/NTfWr5nqaJ4E4DhUHo0zdyY0F7J2QL2iDJH6xxoJ X-Received: by 2002:ac2:51c8:0:b0:539:8a7d:9fbf with SMTP id 2adb3069b0e04-5398a7dc08amr2326545e87.46.1727506271717; Fri, 27 Sep 2024 23:51:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1727506271; cv=none; d=google.com; s=arc-20240605; b=A8mCWGhZx7WqcYLvkYDDG3zNp7kqUKD9U61pPM56A0tnivHLZ3tGoOsdXryOMIRA9M RAxy1VihLMM4hUebdkId2AIr8uZGH4vReWJ3qrkWYqKN1/urNjKiki+DYW6DBMKmSmUJ YpUsaMZky/BHHXHcN9tE0F/vvd7E5VL44mn4lbBKRy3ZKEi6RhAP7SLMk4xVBu/CVyYS wVHVeCg6xWFtkFkYzlLi8HDo+h6ITMF0xUa0pAGhBRWNrE/hfV9nTE55E5KwJIq+Or37 CoQX47ExrO8oRxQr9fBZFq2gBGLcaqHijGfsvbwyt0kbozgOFfLLMV0+8vcK1wqL1t0r vHOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=4VCSjfuGCkBxg/vhPYk/uz+1jlncEKKLXtWwRhysn1k=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=PeJUUWuOkOYC2Q8t/PhSPVqRmqWelPRzjbVanP+oy74DXU56h+R4wwPnInzyF2EWsn xf7uPJ3MRfd0seP3UVYmXM9yuMrfG3gJ+OFviqSH/pTME+MLl/PIcYGMKfGd/DgPjahd SF/gBZ3K97MjueMdPgwDSbRVyfK9tAvvC+yzOu30Fx8zuHTdPBqkdR/Qa8fHWPJvApcT jdt2473u0gQ3F2AF8Uf0frdQm0nQaOZf5PwcN3D0Lsay/ghXqEucNVKBEo5cmBmrTlrJ +2Pki1hxPaOFkkiGKLEpVhsDi8yFnMykC9udRkxI/xhFYSF3lr51vdsRZhtfysHJMTGj OegQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=lOPoW9PP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f9d4614a35si10624461fa.419.2024.09.27.23.51.11; Fri, 27 Sep 2024 23:51:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=lOPoW9PP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A49CE68DCB3; Fri, 27 Sep 2024 15:52:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 862A668D9E9 for ; Fri, 27 Sep 2024 15:52:46 +0300 (EEST) Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a910860e4dcso314427466b.3 for ; Fri, 27 Sep 2024 05:52:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727441565; x=1728046365; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=W7mpC348p0xiOXhp+cesC855yBITB/vtCn21zD7P4mY=; b=lOPoW9PPyWlDntMsohxFRtirmSGsx1b2DUP+MiyWy32e3KWFMvyAw9Zgz0dHA0biyU 1ORKyos0LV0TRmLtdVhEHSG2WX6a22tYG6nk9ccDWir7OHDtr9UhBg+OrUn1wFUHVWf1 38Jppx/UWnCV9Lgh4taefl5CaTN/LNFTfQLxIgVohiwbQZSvFEqNwALSNs6r1+i/2bD2 ZNfFZ0lKyNbS4ovx5k1ZGZoAEd/urstmWdkeumwyISZqhgzWztgcgQn0kxKcvW/5+Oj5 r5/I6jYBNe8LOKQExr66rwcoOQ7SDTnMpI4MpG/TV3XuvCY2d8EtFPTk/RIJLvfGkmKY m0Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727441565; x=1728046365; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W7mpC348p0xiOXhp+cesC855yBITB/vtCn21zD7P4mY=; b=tduo9SLmeSo8g4yoQJwJP1wmrnrpi4mNKyXgIz5ZT4sI6sKiErSBrI2EZxl39Hdn+E vGzcNONIMXpYwybyyd8Fn+TPffk80J6GbUDay5gtAEB+mbG4wb635IObGGk8GUdpoRUk dRnRumUOsjpXvuO1hZP9p/+Dcxm/Bwc/ziThArvcAoVSeA4zt4ijtsKS3ppLOqoYt+13 R5KtYKfuaSKTgpkxBZlsHf7EkJXsU5k4tWZmH3ELSNnrFOwwNJU4koXgfZUIsOTR023d Fpob8pGxXsFg4A0RMUgPeqWjjMhWBFWwRIUbQH16XXOIMkncqW2K+c2GNPRqbYwaRoUn Wbpg== X-Gm-Message-State: AOJu0Yz7H764U5Df9Ua6wCCbp9s8eQntUx/Ey5z/csHTnMnsFkEnL1Hg d3cJWhsoQSJJNXeEi5j43unC7zHVxyQbmTjryCz95PqK4EYPe51dFGbSQdvt X-Received: by 2002:a17:907:3da7:b0:a90:3492:9ad8 with SMTP id a640c23a62f3a-a93c4ab1862mr321733566b.65.1727441565052; Fri, 27 Sep 2024 05:52:45 -0700 (PDT) Received: from localhost.localdomain ([109.143.184.139]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a93c2777214sm130608566b.36.2024.09.27.05.52.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Sep 2024 05:52:44 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Sep 2024 14:52:25 +0200 Message-Id: <20240927125241.15887-1-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 00/16] swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8WIaSdYO/J8+ There is an issue with the constants used in YUV to YUV range conversion, where the upper bound is not respected when converting to mpeg range. With this patchset, the constants are calculated at runtime, depending on the bit depth. This approach also allows us to more easily understand how the constants are derived. These are the speedups for the entire patchset: x86_64: chrRangeFromJpeg8_1920_c: 5827.4 5845.2 ( 1.00x) chrRangeFromJpeg8_1920_sse2: 1945.6 1955.2 ( 1.00x) chrRangeFromJpeg8_1920_avx2: 992.0 988.9 ( 1.00x) chrRangeFromJpeg16_1920_c: 5793.2 5809.1 ( 1.00x) chrRangeToJpeg8_1920_c: 11726.2 9462.2 ( 1.24x) chrRangeToJpeg8_1920_sse2: 1965.5 1949.9 ( 1.01x) chrRangeToJpeg8_1920_avx2: 984.2 988.5 ( 1.00x) chrRangeToJpeg16_1920_c: 10610.8 9261.5 ( 1.15x) lumRangeFromJpeg8_1920_c: 4165.7 4191.4 ( 0.99x) lumRangeFromJpeg8_1920_sse2: 1032.0 1040.5 ( 0.99x) lumRangeFromJpeg8_1920_avx2: 575.2 520.5 ( 1.11x) lumRangeFromJpeg16_1920_c: 4530.0 4143.4 ( 1.09x) lumRangeToJpeg8_1920_c: 6044.8 5720.5 ( 1.06x) lumRangeToJpeg8_1920_sse2: 1034.2 1046.0 ( 0.99x) lumRangeToJpeg8_1920_avx2: 513.5 540.5 ( 0.95x) lumRangeToJpeg16_1920_c: 5343.6 5139.5 ( 1.04x) aarch64 A55: chrRangeFromJpeg8_1920_c: 28839.3 28834.8 ( 1.00x) chrRangeFromJpeg8_1920_neon: 5312.2 5313.1 ( 1.00x) chrRangeFromJpeg16_1920_c: 28843.8 28840.6 ( 1.00x) chrRangeToJpeg8_1920_c: 44196.1 23072.5 ( 1.92x) chrRangeToJpeg8_1920_neon: 6035.9 5550.8 ( 1.09x) chrRangeToJpeg16_1920_c: 36526.7 23075.1 ( 1.58x) lumRangeFromJpeg8_1920_c: 15384.3 15386.7 ( 1.00x) lumRangeFromJpeg8_1920_neon: 3148.6 3145.8 ( 1.00x) lumRangeFromJpeg16_1920_c: 15390.1 15383.8 ( 1.00x) lumRangeToJpeg8_1920_c: 23066.7 19223.6 ( 1.20x) lumRangeToJpeg8_1920_neon: 3868.8 3624.9 ( 1.07x) lumRangeToJpeg16_1920_c: 19224.6 19225.5 ( 1.00x) aarch64 A76: chrRangeFromJpeg8_1920_c: 6316.2 6318.5 ( 1.00x) chrRangeFromJpeg8_1920_neon: 2263.5 2304.2 ( 0.98x) chrRangeFromJpeg16_1920_c: 6321.9 6323.5 ( 1.00x) chrRangeToJpeg8_1920_c: 11389.3 9170.0 ( 1.24x) chrRangeToJpeg8_1920_neon: 2644.2 2793.8 ( 0.95x) chrRangeToJpeg16_1920_c: 9514.4 9195.6 ( 1.03x) lumRangeFromJpeg8_1920_c: 4376.0 4425.5 ( 0.99x) lumRangeFromJpeg8_1920_neon: 1110.8 1105.0 ( 1.01x) lumRangeFromJpeg16_1920_c: 4437.9 4436.8 ( 1.00x) lumRangeToJpeg8_1920_c: 6667.0 6017.2 ( 1.11x) lumRangeToJpeg8_1920_neon: 1327.5 1328.0 ( 1.00x) lumRangeToJpeg16_1920_c: 6062.5 6017.2 ( 1.01x) NOTE: simd optimizations for x86 and aarch64 have been updated, but riscv and loongarch are still missing (and therefore disabled). NOTE2: the same issue still exists in rgb2yuv conversions, which is not addressed in this patchset. Changes from v1: - Saturate the output value instead of limiting the input with amax; - Add more comprehensive benchmarks to commit messages; - Add comments when disabling code with "#if 0"; Ramiro Polla (16): swscale/range_convert: call arch-specific init functions from main init function swscale/range_convert: drop redundant conditionals from arch-specific init functions swscale/range_convert: indent after previous commit checkasm: use FF_ARRAY_ELEMS instead of hardcoding size of arrays checkasm/sw_range_convert: use YUV pixel formats instead of YUVJ checkasm/sw_range_convert: reduce number of input sizes tested checkasm/sw_range_convert: only run benchmarks on largest input width checkasm/sw_range_convert: test all supported bit depths checkasm/sw_range_convert: indent after previous couple of commits swscale/range_convert: saturate output instead of limiting input swscale/aarch64/range_convert: saturate output instead of limiting input swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats swscale/x86/range_convert: update sse2 and avx2 range_convert functions to new API swscale/x86: add sse2, sse4, and avx2 {lum,chr}ConvertRange16 swscale/aarch64/range_convert: update neon range_convert functions to new API swscale/aarch64: add neon {lum,chr}ConvertRange16 libswscale/aarch64/range_convert_neon.S | 152 ++++++++++---- libswscale/aarch64/swscale.c | 41 +++- libswscale/hscale.c | 6 +- libswscale/loongarch/swscale_init_loongarch.c | 38 ++-- libswscale/riscv/swscale.c | 15 +- libswscale/swscale.c | 122 ++++++++++-- libswscale/swscale_internal.h | 11 +- libswscale/utils.c | 10 +- libswscale/x86/range_convert.asm | 161 ++++++++++----- libswscale/x86/swscale.c | 56 ++++-- tests/checkasm/sw_gbrp.c | 15 +- tests/checkasm/sw_range_convert.c | 186 +++++++++++++----- tests/checkasm/sw_scale.c | 11 +- .../fate/filter-alphaextract_alphamerge_rgb | 100 +++++----- tests/ref/fate/filter-pixdesc-gray10be | 2 +- tests/ref/fate/filter-pixdesc-gray10le | 2 +- tests/ref/fate/filter-pixdesc-gray12be | 2 +- tests/ref/fate/filter-pixdesc-gray12le | 2 +- tests/ref/fate/filter-pixdesc-gray14be | 2 +- tests/ref/fate/filter-pixdesc-gray14le | 2 +- tests/ref/fate/filter-pixdesc-gray16be | 2 +- tests/ref/fate/filter-pixdesc-gray16le | 2 +- tests/ref/fate/filter-pixdesc-gray9be | 2 +- tests/ref/fate/filter-pixdesc-gray9le | 2 +- tests/ref/fate/filter-pixdesc-ya16be | 2 +- tests/ref/fate/filter-pixdesc-ya16le | 2 +- tests/ref/fate/filter-pixdesc-yuvj411p | 2 +- tests/ref/fate/filter-pixdesc-yuvj420p | 2 +- tests/ref/fate/filter-pixdesc-yuvj422p | 2 +- tests/ref/fate/filter-pixdesc-yuvj440p | 2 +- tests/ref/fate/filter-pixdesc-yuvj444p | 2 +- tests/ref/fate/filter-pixfmts-copy | 34 ++-- tests/ref/fate/filter-pixfmts-crop | 34 ++-- tests/ref/fate/filter-pixfmts-field | 34 ++-- tests/ref/fate/filter-pixfmts-fieldorder | 30 +-- tests/ref/fate/filter-pixfmts-hflip | 34 ++-- tests/ref/fate/filter-pixfmts-il | 34 ++-- tests/ref/fate/filter-pixfmts-lut | 18 +- tests/ref/fate/filter-pixfmts-null | 34 ++-- tests/ref/fate/filter-pixfmts-pad | 22 +-- tests/ref/fate/filter-pixfmts-pullup | 10 +- tests/ref/fate/filter-pixfmts-rotate | 4 +- tests/ref/fate/filter-pixfmts-scale | 34 ++-- tests/ref/fate/filter-pixfmts-swapuv | 10 +- .../ref/fate/filter-pixfmts-tinterlace_cvlpf | 8 +- .../ref/fate/filter-pixfmts-tinterlace_merge | 8 +- tests/ref/fate/filter-pixfmts-tinterlace_pad | 8 +- tests/ref/fate/filter-pixfmts-tinterlace_vlpf | 8 +- tests/ref/fate/filter-pixfmts-transpose | 28 +-- tests/ref/fate/filter-pixfmts-vflip | 34 ++-- tests/ref/fate/fitsenc-gray | 2 +- tests/ref/fate/fitsenc-gray16be | 10 +- tests/ref/fate/gifenc-gray | 186 +++++++++--------- tests/ref/fate/idroq-video-encode | 2 +- tests/ref/fate/jpg-icc | 8 +- tests/ref/fate/sws-yuv-colorspace | 2 +- tests/ref/fate/sws-yuv-range | 2 +- tests/ref/fate/vvc-conformance-SCALING_A_1 | 128 ++++++------ tests/ref/lavf/gray16be.fits | 4 +- tests/ref/lavf/gray16be.pam | 4 +- tests/ref/lavf/gray16be.png | 6 +- tests/ref/lavf/jpg | 6 +- tests/ref/lavf/smjpeg | 6 +- tests/ref/pixfmt/yuvj420p | 2 +- tests/ref/pixfmt/yuvj422p | 2 +- tests/ref/pixfmt/yuvj440p | 2 +- tests/ref/pixfmt/yuvj444p | 2 +- tests/ref/seek/lavf-jpg | 8 +- tests/ref/seek/vsynth_lena-mjpeg | 40 ++-- tests/ref/seek/vsynth_lena-roqvideo | 2 +- tests/ref/vsynth/vsynth1-amv | 8 +- tests/ref/vsynth/vsynth1-mjpeg | 6 +- tests/ref/vsynth/vsynth1-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth1-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth1-mjpeg-huffman | 6 +- tests/ref/vsynth/vsynth1-mjpeg-trell | 8 +- tests/ref/vsynth/vsynth1-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth1-roqvideo | 8 +- tests/ref/vsynth/vsynth2-amv | 6 +- tests/ref/vsynth/vsynth2-mjpeg | 6 +- tests/ref/vsynth/vsynth2-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth2-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth2-mjpeg-huffman | 6 +- tests/ref/vsynth/vsynth2-mjpeg-trell | 8 +- tests/ref/vsynth/vsynth2-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth2-roqvideo | 8 +- tests/ref/vsynth/vsynth3-amv | 8 +- tests/ref/vsynth/vsynth3-mjpeg | 8 +- tests/ref/vsynth/vsynth3-mjpeg-422 | 8 +- tests/ref/vsynth/vsynth3-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth3-mjpeg-huffman | 8 +- tests/ref/vsynth/vsynth3-mjpeg-trell | 6 +- tests/ref/vsynth/vsynth3-mjpeg-trell-huffman | 6 +- tests/ref/vsynth/vsynth_lena-amv | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg | 8 +- tests/ref/vsynth/vsynth_lena-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg-huffman | 8 +- tests/ref/vsynth/vsynth_lena-mjpeg-trell | 8 +- .../vsynth/vsynth_lena-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth_lena-roqvideo | 8 +- 101 files changed, 1193 insertions(+), 833 deletions(-)