From patchwork Thu Jul 25 13:35:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 50728 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc0a:0:b0:482:c625:d099 with SMTP id h10csp565455vqv; Thu, 25 Jul 2024 07:01:57 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWLw7akNBwdyo80/zblHtT82CtCVPyFVYMJ8DpfN6fVwJV5UKR20+w4I9tjWdMOn/Fr5BCyPF5nS6WvYQl4PIU9VqbdBkRKfOwxjQ== X-Google-Smtp-Source: AGHT+IHosTNpKEgwOeh+7FsXaDSt4GvYG/eIFvrLb3fcUB9tav0FtodgkuseZeNWYnlNPtcHghdY X-Received: by 2002:a2e:8e6b:0:b0:2ee:8d04:7689 with SMTP id 38308e7fff4ca-2f03db7e170mr13169701fa.20.1721916116720; Thu, 25 Jul 2024 07:01:56 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f03d1a7594si4227261fa.651.2024.07.25.07.01.55; Thu, 25 Jul 2024 07:01:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=Ke2YIRzo; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AE33F68D6C8; Thu, 25 Jul 2024 16:52:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01olkn2012.outbound.protection.outlook.com [40.92.53.12]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6465C68D694 for ; Thu, 25 Jul 2024 16:52:01 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=OaBbwhUeJhd8JnaIVISCSTfUM8myp8KIxdJPB1zTbxGi/1GWQMLga/WvdepYTSgcKzlxXzVSHHCvvOjaMNYpk28DCZh0m++74urxYS+lnTpPrqO8BPACYzFA6qqWYVbQj+NvNE9uUYN5kS1Xo8fSs0uXnouHvELo1+zzZEru5IAQ0dk2flz2YOuVBen/qLZudktFIDy5J9KiktSERGUueBPlGRG7+Wtv0WBuo4RlZFtIUmVYDObl9G0CyqV5KdTeLJeOblZXv5gOlFzj9DJxWRgL7eQJ1dDbRjR0lMdmdN3PYYKtGq9tCU/akhVaYMImCbdMvvCYfGlsNxUSp7HtKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NNPIwCkdlf9BfNiXFYgTK0JHMAMINrHItnibn9wtErA=; b=ccwUfhxZj5igJ7e42kBoiD5Ivdg/F70X8nvphD4le5be06c6nc34QoUWgqO+Y+qpt4cAZ3mxZWJbiX8q5E3jSfHO1jbd1rtTguJEgVgaKbNIk56k06xtuME5lLb3yUJ0PvJM6Daho8YHHnDgePG8e+exIYf8ygrGs9qUPicOTI42pTD9osNPPsRorDjNQ+gNlt3EK8lIVAGad/jMFD18TbzWVYAA2bZYoldRMohBbwuzCbYfztTTbZ/ywngTH5NwEN0s9QyFkzugvsf+CS1tzOCW6MXstHbmlL//Lgr3gSSUq1JDMZ4Spt8p/lrhie7ber/ZJpIet1gPxB1ilqM+nw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NNPIwCkdlf9BfNiXFYgTK0JHMAMINrHItnibn9wtErA=; b=Ke2YIRzoqMNjdZYIoikBlGcUVcPkx8dsjVtVtDrpiaJVpO6ch2htPkeP6lhJqXRztUaDOQsGZGBU/eTi5EAPZiPgd4KSQfrNzCSwfoGUJa3trY79Ze5R+9wf6bDAcTZmaA1RaXz8vhOAghhDwJH89uAa7fDtCGidDqQTbL8+7Bwm+jPZZ8BGgbDfEWGFzonCrcXF1JGDHjONrV32V3R6tbqbxqwpjkEFHnaxEQZ4ZQioLbpUlaz4u7dqJJ1mIk6ZhjJ82tkZcqjtuxWO0g5EZFLyPcluRJneqDKpbC40Jf6gnHRXoFeDSCeNgTA/pGv0fR3/dSrJFVj6candXbncaw== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by TYZPR06MB5027.apcprd06.prod.outlook.com (2603:1096:400:1c9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.20; Thu, 25 Jul 2024 13:36:06 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%3]) with mapi id 15.20.7784.020; Thu, 25 Jul 2024 13:36:06 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Thu, 25 Jul 2024 21:35:44 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 X-TMN: [2RShEL/KPTZty83/vV8KIAhdjPNNmUz0] X-ClientProxiedBy: TYCP286CA0224.JPNP286.PROD.OUTLOOK.COM (2603:1096:400:3c5::8) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240725133546.19125-1-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|TYZPR06MB5027:EE_ X-MS-Office365-Filtering-Correlation-Id: 4c492936-eec2-4ca7-cdf6-08dcacaebc29 X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599006|19110799003|8060799006|461199028|440099028|3412199025|1710799026; X-Microsoft-Antispam-Message-Info: RM6i7uMkFS5f8GwPnQrfAtm0Vwmri/k3Jys7hzwMJgwlJrdsGubP72yuPI9ybIw3VPC6Sse/hYcHGW9mO7f6BCoG5b7x4ciLRIxCjyABJNN2NTKJPSYoHjPuNmHHkr/i8avXZ2nORAsep2xoaLKzjNNhdCVCkPcnnYlF0aa6hYqSWbkUpRReRGsEj6qng8L7B/C9Pu7oN7KpktOiT8LqsmjKwd/dTrRtcoRKTEDJcMKQvValnPwoltpDpoj2f2cqEuN5od1zFUI14yvJ2kFWJfRGr/lubuW0b+1pcZQf9y/uQGqPAfM5lCqKN4R+AJPcKgGNBioFAvRIgE//DkRF4lpupqxvMqh5g5e4/a5u7Of0RTeBhJj8o43CMdnSpy8yBhd3mS0UgTsyNgm5k84v+sL66CPLQ/hxiqQ1FkFjAbPcUUgVH1ib47JpAw+MorFh+ZTnsbrvIhYWnMUEsz2/fmNl6PzytGSghi4uXcWUTdANImRjy2OWZgojK//1SxW7iwSKtwTG5tHFIS7xSCGkEhEWNYwopIHCi1Nd5lOHsdDqQJE/JMmD+RVNTAdl/kbChbmbUtLEKg4+5Tf5gAamsALBuCeEBo8nVYt3Jn0B5jJcBE7R+023xdqJkhexmrynAkVQ9jVcRCDeoVZE6UYortdMXR30vJy3qUaG8Ja9ow032x6NFUG/YcdqKdJAIva2pUrBChE1pmnzb//DbxMOWw== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: otxb3YFjZ82/uIec0KGwIc0KJX1H0Lcp3CT7pIAxSxyKQijiJ8ZsrBIUb4TXpDUANxv1GioTFIPL/8FUsQmpNcjfChh4Vt86Kc2kidjQawur5f2ZZiWd1ZxuTCL3WUCbgFZCVQxByUafH5SNSYX+SwvaJrcjWpFdE4/xd3pco1xo5wjnEdC5GURfVQTtAFqLKa+Nd9EV96twhU7pPyN0RPQSuEvjZ6RUGD4Sr+0EANad2J0IMKF012FjhX+xBJa4h+H6nGMnxjmqp3NmFqNoNRMlpzs0v0yYCsk1y2Cb35RP9rgiQl4foJOqMMnCWtr8QCuYhBM4IGcQxYU9fMUJvDzr2i2rLPJdNvO9jtXpMKAIGWLx9bCa7qkNYqJqIn7D2hoSw/dJbvQpr9ePUkIY05/Tqu41VHbTi8u8MZXzcpr0lfv/PtIO8ivfwSga0Y2eRDHcc9rYpcJcZmE6CNGPx/+ANz8mNcYmFv7oywzBGWsP92lg1ZPMvTop3JBSIWlFExyQ8OJ61cuaZdLx6pctKmZJb8KLSDPblG2h9VRB+/jcJy1FH9iIAbNoUTJS331KLvW0BHVtNlLlRP3hiCxIar+5npWK3b3iSdlc8SuERmb2NCSe2mulZLSpm3dOXLsB9Ct1/ImH5m03Aey6U/xs66TNo6Pi1CvH4tInUZSSyVS2jbDtbXnD14SuOKcuWY5030EMa6QNsCT1nC/Qd/1+l4Dcwtc6DswjGvR6hZtre8UcWz2iYB5IKtgz26vlJsPdcUH18YMYD2sOCbM4Aw1M1KtCEcHlcqi28m7rhA5cckPWtsC45FVoSPHfeAIKAhf+ewiuchJfmc7OzvTLHSGgip6MSf8Aj46JoZybsGx3fSinpB23zZOv+HRvbRDqZ+PhrE00eG7V1I76wmd04rtwaOezWTV3JKaebxqmQ709829ntFZiT25nKC7ZWgUrnRIhpZbSac5DhuLaxgHG2o2enS7mSq9O+vYn9Od83oLtfhHvo0Xl32syvmZu+lVmxnC2mgbQXEJ3p7E27zEuD+cRrgiGPM/cuToUuL38l74duo0qro7+D9g1BWNrM12wklgD6zpOfy+qiDlipoKnaJt78JscYt7eAuN/LW/yuTbnty+OQBMWE6rueVZNtFEuWJWX1XiDXD9F7L9H350FqjIBa1bw6jLS0goDjNugG4qVvX/idKxmisKoStP0imxpbkT0tfu9zWCk8qXxJk01ccL8b2hn8bcudm3+Qsvcgy+lTxsEajtDcFoBcbOe6TSBIWUtaimXX9DhKD2pZi19pLBucA== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4c492936-eec2-4ca7-cdf6-08dcacaebc29 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Jul 2024 13:36:06.4719 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYZPR06MB5027 Subject: [FFmpeg-devel] [PATCH 1/3] avcodec/vvcdec: Use av_image_copy_plane for DMVR 10-bit integer pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6i8uBjqYAjdI It's no need to shift and interpolate for 10-bit integer pixels, av_image_copy_plane is enough --- libavcodec/vvc/inter_template.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index 89effffb8f..afcee2e360 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -21,6 +21,7 @@ */ #include "libavcodec/h26x/h2656_inter_template.c" +#include "libavutil/imgutils.h" #define TMP_STRIDE EDGE_EMU_BUFFER_STRIDE static void av_always_inline FUNC(put_scaled)(uint8_t *_dst, const ptrdiff_t _dst_stride, @@ -483,6 +484,7 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t static void FUNC(dmvr)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, const int height, const intptr_t mx, const intptr_t my, const int width) { +#if BIT_DEPTH != 10 const pixel *src = (const pixel *)_src; const ptrdiff_t src_stride = _src_stride / sizeof(pixel); #if BIT_DEPTH > 10 @@ -491,7 +493,7 @@ static void FUNC(dmvr)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_s #define DMVR_SHIFT(s) (((s) + offset4) >> shift4) #else #define DMVR_SHIFT(s) ((s) << (10 - BIT_DEPTH)) -#endif +#endif // BIT_DEPTH > 10 for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) @@ -500,6 +502,10 @@ static void FUNC(dmvr)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_s dst += MAX_PB_SIZE; } #undef DMVR_SHIFT +#else + av_image_copy_plane((uint8_t*)dst, sizeof(int16_t) * MAX_PB_SIZE, _src, _src_stride, + width * sizeof(pixel), height); +#endif // BIT_DEPTH != 10 } //8.5.3.2.2 Luma sample bilinear interpolation process From patchwork Thu Jul 25 13:35:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 50727 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc0a:0:b0:482:c625:d099 with SMTP id h10csp565422vqv; Thu, 25 Jul 2024 07:01:55 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUwMyZnQU7kul6MpPC3gCBdw3jt2ytl6B3wUFc/VE1XNF/5jL6CxaK/aHmHtoz4rFQnlhm7XxXj1GOBHa6HsZ1ffm790tmc5/DH4Q== X-Google-Smtp-Source: AGHT+IHZy4pqmztNnSH8gWM40P7olaZQ1ZjabojYKRAJxow5jlBjhPB25eS6YvSqzPHSmiIhox3u X-Received: by 2002:a05:6512:3d0b:b0:52c:f55d:44a3 with SMTP id 2adb3069b0e04-52fd602c405mr1919838e87.19.1721916113629; Thu, 25 Jul 2024 07:01:53 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52fd5bee795si480537e87.258.2024.07.25.07.01.52; Thu, 25 Jul 2024 07:01:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=VkR9ahr6; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 79B5268D69B; Thu, 25 Jul 2024 16:52:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01olkn2012.outbound.protection.outlook.com [40.92.53.12]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AC42568D63B for ; Thu, 25 Jul 2024 16:51:55 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=nGRbo/KdrbVEg7wbD8jiq3vTN4C0n8LLaizMMl044/iKGrmTQujG0iMDxlUgAoeoiAwiD9cNRRdREpFDgh00HRhonttowfaroHMDoLCir4XQcjChXF8bQcaiO2wppRyBlsVt5Loo28xCKXv21Zn1vxZOlm28ZLENaV5CcKGEMjr+Nl5Bn7o+Y4ZUUx9/NQ73hjJ3Xf0UI6RCf1z2BKHLoJ3yJnSeo6yJh/cBckaXkE+ds4G7CoPiPKqVEIKug6JGyYHJeekGghZ5/bpshsDgA49fdvcY9T0fkXCZkcVz6MH3MxFozTzzJH22cxIg5kSSdLmkR8YBt5OpgzBcivs+eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=H+4lbPgOwlm7oxy3WbXu36DsfbfTM/wZu4fWhIY9TIA=; b=DHTihmI8/lWP1VH2L/ZrILwUOGoEskdZXxrYaD3Ful6sU4+7KO1fYRq2jir0biWGydyPSlIG1hg5hWPTeMHoVkliQZpvSw3ORbZzLPzt7QRa48GQlQuvqO9VDIs6iowG/iZSZDqyvXQcoMv3voQjwRPf33FhLi6MYpFHukaHPu1EPLdZk7VHA8rIh8e5x7ogCzbzpnTONJt06BulfVV3iGfd2+HbeHibTwf+ylF0heeYHWkxyWn7eTPFVTSNfMbfDNixnFBIrndc7VcQFcLIW2Xve/VrBxplEpoO7sWrNoDib5+tEBm096lsJDcIRNsNF9TUtUS+W7zkXhw+8zJ7Dw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=H+4lbPgOwlm7oxy3WbXu36DsfbfTM/wZu4fWhIY9TIA=; b=VkR9ahr6GTQXNMM/QpdbxlxtYWT/P+Llkfglzafc1rUS8D8i/6TkLpKsHdAkoM97Mgpbp6yu3bjYzy/S5U6HJYRNH/dUrZ8Z2tsN+SdK3ByV0v9MgiSQYOvnrAZJabkwmJEm0S4gLGTRwOp3k40LhY6apMonCGHdvZUxkpdAPAkauajlkcg87MOm9RdKl2qP3hiz0SqwvkWNK53xFOSzvBECMp1XqdcrGASbi4Hq9SdfUdTfZv3/7nPVwSSLPQ1hWvjqTTOdGAAKx2PvjpjSUSbKNUIvwuvInwgqi9XX3ffKWn8jihAbsMRgxON4BVcvQXkUoLdatMFamB6M3Ujrog== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by TYZPR06MB5027.apcprd06.prod.outlook.com (2603:1096:400:1c9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.20; Thu, 25 Jul 2024 13:36:07 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%3]) with mapi id 15.20.7784.020; Thu, 25 Jul 2024 13:36:07 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Thu, 25 Jul 2024 21:35:45 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240725133546.19125-1-nuomi2021@gmail.com> References: <20240725133546.19125-1-nuomi2021@gmail.com> X-TMN: [cZf8uw+asMQhvfFoZJN+erFvDHcK454x] X-ClientProxiedBy: TYCP286CA0224.JPNP286.PROD.OUTLOOK.COM (2603:1096:400:3c5::8) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240725133546.19125-2-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|TYZPR06MB5027:EE_ X-MS-Office365-Filtering-Correlation-Id: 610cf185-91b2-4073-e835-08dcacaebcc3 X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599006|19110799003|8060799006|461199028|440099028|3412199025|1710799026; X-Microsoft-Antispam-Message-Info: Q8juV/NZwR8TVQw8rI6T3/62esM6GZm0RKmvMh8OIF9dbk2nRmwDs8kyzQ2ZJ5y9C4cWODyGQTYdZhw+w/Ap9UX8fl/xO5aJuEiTZHp0i87eUVKGkFsDsD/39GLr2Ld9kK/B+I05J8jbOfoBPR0gXuVU2enAOekbUZCMS5gOOtCP7AguOSOXcKu0ku0cukX9PUsysI+kWW/C57IL3PYE6SDxv/m2QpM8mZIqXSp74n+zaQ+jI/TIev9CA6dxDZBA/NgWfLqjiy5+c+eP2EyQ8qtvEuwAO9z4Vvd0Rm3CXQlf3Gf98fK5o1DowEVBuhuj+T2WheNfP0oAggQ8MNZj1PaWVsqKBRdKHrArwp7klGcZaMZZkMxASFR837oRaHwgurx+aLp/yAuK0DIJAQ87oy/aiO5iIGwWJwwRezNIKqUcEeIXpOkFDYyjrAwrq4K60wUvGDP5G/lx41l8hfngVvTt70mRb21eGp7tFw0/aYy7Hp/1WcGxuLMa9Kps6o2Uxb/7KZnKZYfbO1NZnJKMUh4t9l2/s284SWZSkwyVzPh6b7vu0wrAEOPnffLvfWBw93bjJFwFg2IbEi2N6k+bXx6DmAr/4aMSGMkw7tpIA9OrreBXA9PkP+ebQTJG63n2v+Oe91q1dvnBu1vIy3BJipGK8Cpz5Y+is1RgfPYA3AK0uJ1TWBftOwulSUfb8l0LTk8YSahnF+JsCZKp+Nkohg== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 6Yqlf2RJqv3lmaD7RkxSKjXeGIspgJZYxscljKaflurbCv17YvAhVEFcED8vmArTDdUjXA9mgRrTl8sp1flWF5th/tE99bNpoRBetzFZyvq1tVfY71A5kU+Ll6GgecKeMNHtGEFibWVCPrTxevgrWqSNoIe5wRV8OnI+d/DTmXlA7GkEEEKuogmtcOPZ522ozgnDMvrs1m9IbtcaboSMS4arVXEn7gKaEYrHtIjnwyVz9GPoLTfY+OjMLauTGLwJMPDADn+WRZhZaUS88+DoE4ll6z2BnMI/LtOHXRW644hWQk0r8vCTRNGtuZJbDZST80GArvleo2qBv8LhFwvupaalbaQFkgZpkFr8p0hnOigciS5Hp/my6wW2bCRszRGAF3SIgZNcPXveRHnyeSKtKn9+z2xXBqLkwdNt2sMGkdbodXW064/VktQ/GxSaAgUMBu45S4okPBExZnmmjEKG1eMDG1rKo6dD2+SqrtCNvsrUewcReNv+ZnU5B5JB+LHT6fHljX4IrHxG6rv6Mr3imSkwgN0oilW14nKcwjPASAUAsEmxj1tVUJ6FeIlX+R1qFObtcx2aKNOyWvQ12aFrMvsT15n6j/toWR08hMLgobk/AdipgPxqZtxhIj5zWcZjVeywrqoZCkNx8hlOBaWi4nMSim086SLKKpkmkEqubKm7GKarvM7KyOcErer5/GJjRBe61KCXYO+n5SrryMLej2bwwXGhfF3RDwMfRyJ+G9wH8pLAiRloupe5+CgqAvIlKTo2E7C//y7Fj0488Oaa1miZ9aJw/45EYjEzWIHxEblsOwzJkCFl1vxb/Qxx0NEHdLq/QZb4ex4ZEKXM2Cli87IIPGQSGEsnJkKjxfPZvxmHtpwZ8A2vLRZoboDjBKuBjZ9fde8N+trBlG+BPV+X/lZSXgqVBGNcyXFJTAq0/9o8IH+pehBNkmj9Eg55hAA1Q0+8Q40MP0UfdioEa+rBKvWhk0B97y9lBIeNkggk282+ZhBO5ew8MHvRwMJhVDv1j5P0oAXQmZECteyLoNSfbKEASrQKovmWeO1WuukMi52dqB5whee2ZCfpf5QRr+hrmBaa29ONLGdRNwUej9GxzQuOLq2GlbKoRc+MP6vk5SQtWLvHxvuYN82CykimvMGGakZKKmB1ebXmlBrvlYmpfS9ueIb7z463auFhx1Tq4UFIxLzhtKIh787Sbss5F3cI6NC+QwiCeR1pLnZIuQfImJNQujy3pw40pqi3JouxIOXQBKP8XkqeSlLp+Tr8ldABrTBuhdnLxKyyjNU/jIrLCQ== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 610cf185-91b2-4073-e835-08dcacaebcc3 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Jul 2024 13:36:07.6896 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYZPR06MB5027 Subject: [FFmpeg-devel] [PATCH 2/3] x86/vvcdec: add dmvr avx2 code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gvMlO8/xYFDw Decoder-Side Motion Vector Refinement is about 4~8% CPU usage for some clips here is the test result for one time clips | before| after | delta ------------------------------------------|-------|-------|------ RitualDance_1920x1080_60_10_420_37_RA.266 | 338.7 | 354.3 |4.61% NovosobornayaSquare_1920x1080.bin | 320.3 | 329.3 |2.81% Tango2_3840x2160_60_10_420_27_LD.266 | 83.3 | 83.7 |0.48% RitualDance_1920x1080_60_10_420_32_LD.266 | 320.7 | 327.3 |2.06% Chimera_8bit_1080P_1000_frames.vvc | 360.7 | 381.0 |5.63% BQTerrace_1920x1080_60_10_420_22_RA.vvc | 161.7 | 163.0 |0.80% --- libavcodec/x86/vvc/Makefile | 1 + libavcodec/x86/vvc/vvc_dmvr.asm | 373 +++++++++++++++++++++++++++++++ libavcodec/x86/vvc/vvcdsp_init.c | 25 +++ 3 files changed, 399 insertions(+) create mode 100644 libavcodec/x86/vvc/vvc_dmvr.asm diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index 832d802daf..04f16bc10c 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -4,6 +4,7 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ x86/h26x/h2656dsp.o X86ASM-OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvc_alf.o \ + x86/vvc/vvc_dmvr.o \ x86/vvc/vvc_mc.o \ x86/vvc/vvc_sad.o \ x86/h26x/h2656_inter.o diff --git a/libavcodec/x86/vvc/vvc_dmvr.asm b/libavcodec/x86/vvc/vvc_dmvr.asm new file mode 100644 index 0000000000..4c971f970b --- /dev/null +++ b/libavcodec/x86/vvc/vvc_dmvr.asm @@ -0,0 +1,373 @@ +; /* +; * Provide AVX2 luma dmvr functions for VVC decoding +; * Copyright (c) 2024 Nuo Mi +; * +; * This file is part of FFmpeg. +; * +; * FFmpeg is free software; you can redistribute it and/or +; * modify it under the terms of the GNU Lesser General Public +; * License as published by the Free Software Foundation; either +; * version 2.1 of the License, or (at your option) any later version. +; * +; * FFmpeg is distributed in the hope that it will be useful, +; * but WITHOUT ANY WARRANTY; without even the implied warranty of +; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +; * Lesser General Public License for more details. +; * +; * You should have received a copy of the GNU Lesser General Public +; * License along with FFmpeg; if not, write to the Free Software +; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +; */ +%include "libavutil/x86/x86util.asm" + +%define MAX_PB_SIZE 128 + +SECTION_RODATA 32 + +shift_12 times 2 dw 1 << (15 - (12 - 10)) +shift3_8 times 2 dw 1 << (15 - (8 - 6)) +shift3_10 times 2 dw 1 << (15 - (10 - 6)) +shift3_12 times 2 dw 1 << (15 - (12 - 6)) +pw_16 times 2 dw 16 + +%if ARCH_X86_64 + +%if HAVE_AVX2_EXTERNAL + +SECTION .text + +%define pstride (bd / 10 + 1) + +; LOAD(dst, src) +%macro LOAD_W16 2 +%if bd == 8 + pmovzxbw %1, %2 +%else + movu %1, %2 +%endif +%endmacro + +%macro SHIFT_W16 2 +%if bd == 8 + psllw %1, (10 - bd) +%elif bd == 10 + ; nothing +%else + pmulhrsw %1, %2 +%endif +%endmacro + +%macro SAVE_W16 2 + movu %1, %2 +%endmacro + +; NEXT_4_LINES(is_h) +%macro NEXT_4_LINES 1 + lea dstq, [dstq + dsq*4] + lea srcq, [srcq + ssq*4] +%if %1 + lea src1q, [srcq + pstride] +%endif +%endmacro + + +; DMVR_4xW16(dst, dst_stride, dst_stride3, src, src_stride, src_stride3) +%macro DMVR_4xW16 6 + LOAD_W16 m0, [%4] + LOAD_W16 m1, [%4 + %5] + LOAD_W16 m2, [%4 + 2 * %5] + LOAD_W16 m3, [%4 + %6] + + SHIFT_W16 m0, m4 + SHIFT_W16 m1, m4 + SHIFT_W16 m2, m4 + SHIFT_W16 m3, m4 + + SAVE_W16 [%1] , m0 + SAVE_W16 [%1 + %2] , m1 + SAVE_W16 [%1 + 2 * %2], m2 + SAVE_W16 [%1 + %3] , m3 +%endmacro + +; buf += -stride * h + off +; OFFSET_TO_W4(buf, stride, off) +%macro OFFSET_TO_W4 3 + mov id, hd + imul iq, %2 + sub %1, iq + lea %1, [%1 + %3] +%endmacro + +%macro OFFSET_TO_W4 0 + OFFSET_TO_W4 srcq, ssq, 16 * (bd / 10 + 1) + OFFSET_TO_W4 dstq, dsq, 16 * 2 +%endmacro + +; void ff_vvc_dmvr_%1_avx2(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, +; int height, intptr_t mx, intptr_t my, int width); +%macro DMVR_AVX2 1 +cglobal vvc_dmvr_%1, 4, 9, 5, dst, src, ss, h, ds, ds3, w, ss3, i +%define bd %1 + + LOAD_STRIDES + +%if %1 > 10 + vpbroadcastd m4, [shift_%1] +%endif + + mov wd, wm + mov id, hd +.w16: + sub id, 4 + jl .w16_end + DMVR_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q + NEXT_4_LINES 0 + jmp .w16 +.w16_end: + + sub wd, 16 + jl .w4_end + + OFFSET_TO_W4 +.w4: + sub hd, 4 + jl .w4_end + DMVR_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q + NEXT_4_LINES 0 + jmp .w4 +.w4_end: + + RET +%endmacro + +; LOAD_COEFFS(coeffs0, coeffs1, src) +%macro LOAD_COEFFS 3 + movd xm%2, %3 + vpbroadcastw m%2, xm%2 + vpbroadcastd m%1, [pw_16] + psubw m%1, m%2 +%endmacro + +; LOAD_SHIFT(shift, src) +%macro LOAD_SHIFT 2 + vpbroadcastd %1, [%2] +%if bd == 12 + psllw %1, 1 ; avoid signed mul for pmulhrsw +%endif +%endmacro + +; LOAD_STRIDES(shift, src) +%macro LOAD_STRIDES 0 + mov dsq, MAX_PB_SIZE * 2 + lea ss3q, [ssq*3] + lea ds3q, [dsq*3] +%endmacro + +; BILINEAR(dst/src0, src1, coeff0, coeff1, round, tmp) +%macro BILINEAR 6 + pmullw %1, %3 + pmullw %6, %2, %4 + paddw %1, %6 +%if bd == 12 + psrlw %1, 1 ; avoid signed mul for pmulhrsw +%endif + pmulhrsw %1, %5 +%endmacro + +; DMVR_H_1xW16(dst, src0, src1, offset, tmp) +%macro DMVR_H_1xW16 5 + LOAD_W16 %1, [%2 + %4] + LOAD_W16 %5, [%3 + %4] + BILINEAR %1, %5, m10, m11, m12, %5 +%endmacro + +; DMVR_H_4xW16(dst, dst_stride, dst_stride3, src, src_stride, src_stride3, src1) +%macro DMVR_H_4xW16 7 + DMVR_H_1xW16 m0, %4, %7, 0, m4 + DMVR_H_1xW16 m1, %4, %7, %5, m5 + DMVR_H_1xW16 m2, %4, %7, 2 * %5, m6 + DMVR_H_1xW16 m3, %4, %7, %6, m7 + + SAVE_W16 [%1] , m0 + SAVE_W16 [%1 + %2] , m1 + SAVE_W16 [%1 + 2 * %2], m2 + SAVE_W16 [%1 + %3] , m3 +%endmacro + +; void ff_vvc_dmvr_h_%1_avx2(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, +; int height, intptr_t mx, intptr_t my, int width); +%macro DMVR_H_AVX2 1 +cglobal vvc_dmvr_h_%1, 4, 10, 13, dst, src, ss, h, ds, ds3, w, ss3, src1, i +%define bd %1 + + LOAD_COEFFS 10, 11, dsm + LOAD_SHIFT m12, shift3_%1 + + LOAD_STRIDES + lea src1q, [srcq + pstride] + + mov wd, wm + mov id, hd +.w16: + sub id, 4 + jl .w16_end + DMVR_H_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q, src1q + NEXT_4_LINES 1 + jmp .w16 +.w16_end: + + sub wd, 16 + jl .w4_end + + OFFSET_TO_W4 + lea src1q, [srcq + pstride] +.w4: + sub hd, 4 + jl .w4_end + DMVR_H_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q, src1q + NEXT_4_LINES 1 + jmp .w4 +.w4_end: + + RET +%endmacro + +; DMVR_V_4xW16(dst, dst_stride, dst_stride3, src, src_stride, src_stride3) +%macro DMVR_V_4xW16 6 + LOAD_W16 m1, [%4 + %5] + LOAD_W16 m2, [%4 + 2 * %5] + LOAD_W16 m3, [%4 + %6] + LOAD_W16 m4, [%4 + 4 * %5] + + BILINEAR m0, m1, m8, m9, m10, m11 + BILINEAR m1, m2, m8, m9, m10, m12 + BILINEAR m2, m3, m8, m9, m10, m13 + BILINEAR m3, m4, m8, m9, m10, m14 + + SAVE_W16 [%1] , m0 + SAVE_W16 [%1 + %2] , m1 + SAVE_W16 [%1 + 2 * %2], m2 + SAVE_W16 [%1 + %3] , m3 + + ; why can't we use SWAP m0, m4 here? + movaps m0, m4 +%endmacro + +; void ff_vvc_dmvr_v_%1_avx2(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, +; int height, intptr_t mx, intptr_t my, int width); +%macro DMVR_V_AVX2 1 +cglobal vvc_dmvr_v_%1, 4, 9, 15, dst, src, ss, h, ds, ds3, w, ss3, i +%define bd %1 + + LOAD_COEFFS 8, 9, ds3m + LOAD_SHIFT m10, shift3_%1 + + LOAD_STRIDES + + mov wd, wm + mov id, hd + LOAD_W16 m0, [srcq] +.w16: + sub id, 4 + jl .w16_end + DMVR_V_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q + NEXT_4_LINES 0 + jmp .w16 +.w16_end: + + sub wd, 16 + jl .w4_end + + OFFSET_TO_W4 + LOAD_W16 m0, [srcq] +.w4: + sub hd, 4 + jl .w4_end + DMVR_V_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q + NEXT_4_LINES 0 + jmp .w4 +.w4_end: + + RET +%endmacro + +; DMVR_HV_4xW16(dst, dst_stride, dst_stride3, src, src_stride, src_stride3, src1) +%macro DMVR_HV_4xW16 7 + DMVR_H_1xW16 m1, %4, %7, %5, m6 + DMVR_H_1xW16 m2, %4, %7, 2 * %5, m7 + DMVR_H_1xW16 m3, %4, %7, %6, m8 + DMVR_H_1xW16 m4, %4, %7, 4 * %5, m9 + + BILINEAR m0, m1, m13, m14, m15, m6 + BILINEAR m1, m2, m13, m14, m15, m7 + BILINEAR m2, m3, m13, m14, m15, m8 + BILINEAR m3, m4, m13, m14, m15, m9 + + SAVE_W16 [%1] , m0 + SAVE_W16 [%1 + %2] , m1 + SAVE_W16 [%1 + 2 * %2], m2 + SAVE_W16 [%1 + %3] , m3 + + ; why can't we use SWAP m0, m4 here? + movaps m0, m4 +%endmacro + +; void ff_vvc_dmvr_hv_%1_avx2(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, +; int height, intptr_t mx, intptr_t my, int width); +%macro DMVR_HV_AVX2 1 +cglobal vvc_dmvr_hv_%1, 7, 10, 16, dst, src, ss, h, ds, ds3, w, ss3, src1, i +%define bd %1 + + LOAD_COEFFS 10, 11, dsm + LOAD_SHIFT m12, shift3_%1 + + LOAD_COEFFS 13, 14, ds3m + LOAD_SHIFT m15, shift3_10 + + LOAD_STRIDES + lea src1q, [srcq + pstride] + + mov id, hd + DMVR_H_1xW16 m0, srcq, src1q, 0, m5 +.w16: + sub id, 4 + jl .w16_end + DMVR_HV_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q, src1q + NEXT_4_LINES 1 + jmp .w16 +.w16_end: + + sub wd, 16 + jl .w4_end + + OFFSET_TO_W4 + lea src1q, [srcq + pstride] + + DMVR_H_1xW16 m0, srcq, src1q, 0, m5 +.w4: + sub hd, 4 + jl .w4_end + DMVR_HV_4xW16 dstq, dsq, ds3q, srcq, ssq, ss3q, src1q + NEXT_4_LINES 1 + jmp .w4 +.w4_end: + + RET +%endmacro + +%macro VVC_DMVR_AVX2 1 + DMVR_AVX2 %1 + DMVR_H_AVX2 %1 + DMVR_V_AVX2 %1 + DMVR_HV_AVX2 %1 +%endmacro + +INIT_YMM avx2 + +VVC_DMVR_AVX2 8 +VVC_DMVR_AVX2 10 +VVC_DMVR_AVX2 12 + +%endif ; HAVE_AVX2_EXTERNAL + +%endif ; ARCH_X86_64 diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index 4b4a2aa937..d5b4f4f8a5 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -87,6 +87,21 @@ AVG_PROTOTYPES( 8, avx2) AVG_PROTOTYPES(10, avx2) AVG_PROTOTYPES(12, avx2) + +#define DMVR_PROTOTYPES(bd, opt) \ +void ff_vvc_dmvr_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_h_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_v_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ +void ff_vvc_dmvr_hv_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ + int height, intptr_t mx, intptr_t my, int width); \ + +DMVR_PROTOTYPES( 8, avx2) +DMVR_PROTOTYPES(10, avx2) +DMVR_PROTOTYPES(12, avx2) + #define ALF_BPC_PROTOTYPES(bpc, opt) \ void BF(ff_vvc_alf_filter_luma, bpc, opt)(uint8_t *dst, ptrdiff_t dst_stride, \ const uint8_t *src, ptrdiff_t src_stride, ptrdiff_t width, ptrdiff_t height, \ @@ -306,6 +321,13 @@ ALF_FUNCS(16, 12, avx2) c->inter.w_avg = bf(ff_vvc_w_avg, bd, opt); \ } while (0) +#define DMVR_INIT(bd) do { \ + c->inter.dmvr[0][0] = ff_vvc_dmvr_##bd##_avx2; \ + c->inter.dmvr[0][1] = ff_vvc_dmvr_h_##bd##_avx2; \ + c->inter.dmvr[1][0] = ff_vvc_dmvr_v_##bd##_avx2; \ + c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_##bd##_avx2; \ +} while (0) + #define ALF_INIT(bd) do { \ c->alf.filter[LUMA] = ff_vvc_alf_filter_luma_##bd##_avx2; \ c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ @@ -330,6 +352,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); + DMVR_INIT(8); SAD_INIT(); } break; @@ -342,6 +365,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); + DMVR_INIT(10); SAD_INIT(); } break; @@ -354,6 +378,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); + DMVR_INIT(12); SAD_INIT(); } break; From patchwork Thu Jul 25 13:35:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 50729 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc0a:0:b0:482:c625:d099 with SMTP id h10csp578360vqv; Thu, 25 Jul 2024 07:16:54 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXJcszpd/528tsl0UzhBmTM2DDjqpbbiUYntnoLY4MUuORcGNpsoDIb+mth23fPLWksiIC57k5A5mSgiy99LCK+5IrHOwjSv9ZeLQ== X-Google-Smtp-Source: AGHT+IHm7dfV56q1I0HjmDvBKa0tUYxr5Wk71tAqlOnVcNt5V/yBjY8x3xgMYB2vKCtyWgTWB6qD X-Received: by 2002:a05:6402:354a:b0:5a7:464a:ac0 with SMTP id 4fb4d7f45d1cf-5ac2a8ace89mr2261251a12.11.1721917013811; Thu, 25 Jul 2024 07:16:53 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5ac650a438csi1052037a12.336.2024.07.25.07.16.53; Thu, 25 Jul 2024 07:16:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=CchPm8Jz; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4C95468BB61; Thu, 25 Jul 2024 16:51:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01olkn2039.outbound.protection.outlook.com [40.92.52.39]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2BC7368BB61 for ; Thu, 25 Jul 2024 16:51:17 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=r27nkie5KIkfNyy+diO0r7xxo4MzH9Q39TUaVJtZ6cZTL8rRjPbzoif421CTeIlWRlSbu6RdDnc5/sD2W+S8aBhrhBZKCNRpuvHjhmIezZWV7gzfZhsnTfyt9SUNDemUjFW2UQkUgeg4OTM5cldSBUUu7SVeGcrn4ZaTvGtrYlsvwS1fEIm6fWlOKtugt0EHGRoWveaLE9VxDD4m1bq9PpcOBdkvA9/C1ziqoWDlBTCB8u0M39BO3aSPVrSxpJE5MSqCcvBL2KGALORFhln5r1MoFsPZ2lG2ciLF2zZGCFUGx/Kzaz+wcxxad+P9Wyz/ZMBIYoGXIiARMhSjoPWhpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LxdKjGzwfnmCKl+pNnpH4cA/H5X32UOfLdM43EEnEr8=; b=n2epcpqZAWxdzONDj5k1twiVFkne/K4GBy5mnyFjoL5jYQ89DvYYm8vunyjWO0+U0dX+FP28Gs/FNEtskWFszuZWxj5LG+wtbbtHTIKvY0kdKjeswDI1OfeKWSUSSOJ+zoOFEbfXw3TBKh4HEqOjPhKRBzbSdIZVPY0kRGqh8O00j0WZHWYswg03TdkEZICQsxpd/JBlBRGzTFgdoNLTBaDuehtrk0rQf8cIVtFiox3EDV5ICFZHeJB/NnwYSIIysifY1m4oAH5hFuYcWrKJIdU2Z7wEJFkHJWvOGvT353c0uu1qs3wivt6FGMxmBDxdAC9AprCGblSah16RDdFeGg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LxdKjGzwfnmCKl+pNnpH4cA/H5X32UOfLdM43EEnEr8=; b=CchPm8JzpN7Rd8siyozP4bN6dnn9G3mnoi+9bp8X0/fulN3jtB9BWBEZ/66gGrtlwmZ9YmBJLtZZFMkGgCctdCmBcKbKyfPNSVPnUNxFRyg2k4nVpfqt5ohbCJ2nhDM6Jqmt53xeYW0EGSI9KiioVxlgGd8rYy7gNz7CId6e3P+9YDa5NE2RlUrUQsiZSE9zsruMiUqIR4g3NE8F78Ih9KiYLpeGDUYbfYZfb0KcSWwZ2jYNDMc7DC3jMhhstLZWnnoez7BmlaBlV0DyXL2p4bwAD8FQaAfuumN3yEFPKGWlLzSa8uXCxz8E8BNqYccfYEf979NVbYRpxyHdoUngqA== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by TYZPR06MB5027.apcprd06.prod.outlook.com (2603:1096:400:1c9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.20; Thu, 25 Jul 2024 13:36:08 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%3]) with mapi id 15.20.7784.020; Thu, 25 Jul 2024 13:36:08 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Thu, 25 Jul 2024 21:35:46 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240725133546.19125-1-nuomi2021@gmail.com> References: <20240725133546.19125-1-nuomi2021@gmail.com> X-TMN: [yRO5Jls20zJ0qTwl20wSnfhabMVpQYIn] X-ClientProxiedBy: TYCP286CA0224.JPNP286.PROD.OUTLOOK.COM (2603:1096:400:3c5::8) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240725133546.19125-3-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|TYZPR06MB5027:EE_ X-MS-Office365-Filtering-Correlation-Id: 0c1e2256-19b5-467e-bb4a-08dcacaebd75 X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599006|19110799003|8060799006|461199028|440099028|3412199025|1710799026; X-Microsoft-Antispam-Message-Info: fGwh3n5kYyftFZ+jYzsOBTpNRt3Lw/VzFm3Jl8jVKDpSCKNI/934ajg0APUgybb+alfSexc26jwpISPJ2J3n1/6+FRPQLzv4PhIvmjli4zlKh29I29cZwXyRgc1TjSgqYitFJAX988HZi5pgVRLhfkWrMy+C/pnquk702xFCgLwpCAsnG9y2FCNuF0MpoedZSdXV6pbGlDlVL+zaU4KO1d8KKU1laWZjVhc97uXoOr24fxSwJTAJC3d0Ghdi447HLy9cTfHf/JXZqAhgOuGVr/ie70RDs3xyz78SDfixFjHWig3ufHV1UvhOuXCpw0Sn7ytlG3dsv7DugBCZhwf4mr3zmmS8x5E9+8jSTltZpEMCidT7ula2BPqV5L3v6FtgbAZAhqrxE1tz2E2c70ZAtv+lIpAx7dc+4ZAMY+xjfT7tcsSdJ0YeWSSeg2y8hQru5AYUiPHna78HaTCg+CvZwQqpi4yAx2IrrERVxW1j6SCIbvNGZpXaKPA/n3SlqPVlLs/UcSdnNpHfu07gmh4QirvCch00QwaSouwuQ0WpgHtpXGQVUdAy/uy2xikoOVHf3Ol1vSMmS2xKl9sNYdSWvaYXjGKC174lrmmh95U/6CbNDJ1IJfzUL3E5/mS7lpvCVbQnOC3Cs940RThGLohuVWnkSycSn5S7gPm478mIRd/maUIOrkiP4oDGC9hElnUEABmwNrRoqdXnOSiIWS4ogg== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: RhVG9x9+TvTv57neqHeWTE2zvxbkV9FMW/RDUgUvJuXQyS3++7n39+6l+esVdE2FR86RHzDKEHuK1Gj9NUo4dAe6BJCPv14bropGX1Bl7pfx/CHS6VO1qEdbtosBgu5V/OiOX8NTbK4Iq2/2i9pkUqb+tKI3weoCi0G65Ka474GjQyV2+gNzo9eDI3m3MyiyJ8jPGGTuUjfwQxYQtv7/BmHf9E6WrWc3Qizus9uvbX/q4SdjYdb5pONI0iATwymHnnuVSA0iyLqP+FmzOJVX6UZqd30KR+YKJWA8jnr3bSmsLmKd/GoGQgV8T3/lza0qz/GOJ4j4ZyOX8dZU7isSp+6hYnoSb2womcT5OiUM6h4+M27eOYRDQMV5VoLzVmRL2PaoERWvZHGUnoQ0n3OLw8rpE38PppN9C5pvtsjc0pyOJBGEwQ8S/sni9KQHtCyx0K1/hQisWdGBGRi6zVyGjzfopwkTXqOZDfpLE9BmDXSs21Eql/VuRwn3h6B2dfMXDsGsnFUo/UwwZUmhUl2CtIDwqLSQvT3KtQHn1IZlhca5b5xNHKEoYUc5OpRq3NDN1TpvhD2bgV4B1xxcW1v6llUZ+Dbk22vv0E9GKWzzYT49dNqHILYdjLzUjK6VANZqLRAz2gl4rw2RwpEJvOtnwO3uhYusaQ9u0Ke++LEH8Jc8YeTH94IxnsMLfnaOtHkUrTysumaJ7apzqcQk+aULWGfK03A9zDm4c01jhxAgfF0Bsnj/rGXNdoyU1CfXAkXFocaNjq+2zbFmlADXwthaAGkfX3io0AlP4MBYF8lN44c2lkKYLsBnvkmhm/KcMhnA5uCtnOerf22mbzTLasn346z9s7b3Ceki7dnrUBCMq36gJXztMHHat3KnTvJNcyaMxzHycpLKArjcQ1FNGSvnsFbmDGTv89CvskERHrFb3nmkBUA8/PhbVva/bL5Yxz2tqtRRIMVtifeF/0mXANGFDVDfHUG9HUS2C3+UAl0L/MHQSuuV7Uy1qU8uOMP36p4wIGD1vYw5BvaB5EplWyql1HgdpvncvW4SspfRYzVu6eQsBpEFIvZAaT6qJsiHY80tll6pn4ZwK4IKFkFH+lDNg6rdsgNHx0kcAmTjIoG+aAXOgqWgO6E2wBhMavWqCFUUvw0AiWz995j35k7Zon5V3NS0s0PU2iUx3XaxOsK2MtMFKmrsgd5wY5QoKv1Ai+HiLubWbMm5CAR7klg7dgBslygyizSr3NHcGltQEw9G5RTbCGQWcMbFJKc9wSfMNz3jBkUMxRHiutbmU5gtGLh/4w== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0c1e2256-19b5-467e-bb4a-08dcacaebd75 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Jul 2024 13:36:08.4813 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYZPR06MB5027 Subject: [FFmpeg-devel] [PATCH 3/3] checkasm: add tests for vvc dmvr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ir//LbhPhau7 dmvr_8_12x20_c: 186.2 dmvr_8_12x20_avx2: 25.7 dmvr_8_20x12_c: 181.7 dmvr_8_20x12_avx2: 25.2 dmvr_8_20x20_c: 283.2 dmvr_8_20x20_avx2: 32.0 dmvr_10_12x20_c: 90.0 dmvr_10_12x20_avx2: 15.7 dmvr_10_20x12_c: 41.0 dmvr_10_20x12_avx2: 14.7 dmvr_10_20x20_c: 81.5 dmvr_10_20x20_avx2: 26.7 dmvr_12_12x20_c: 190.7 dmvr_12_12x20_avx2: 20.2 dmvr_12_20x12_c: 187.2 dmvr_12_20x12_avx2: 20.2 dmvr_12_20x20_c: 292.7 dmvr_12_20x20_avx2: 27.2 dmvr_h_8_12x20_c: 317.0 dmvr_h_8_12x20_avx2: 37.0 dmvr_h_8_20x12_c: 340.0 dmvr_h_8_20x12_avx2: 41.0 dmvr_h_8_20x20_c: 540.7 dmvr_h_8_20x20_avx2: 64.0 dmvr_h_10_12x20_c: 322.7 dmvr_h_10_12x20_avx2: 30.7 dmvr_h_10_20x12_c: 344.2 dmvr_h_10_20x12_avx2: 34.0 dmvr_h_10_20x20_c: 529.0 dmvr_h_10_20x20_avx2: 51.5 dmvr_h_12_12x20_c: 326.7 dmvr_h_12_12x20_avx2: 33.5 dmvr_h_12_20x12_c: 331.7 dmvr_h_12_20x12_avx2: 51.2 dmvr_h_12_20x20_c: 534.0 dmvr_h_12_20x20_avx2: 62.7 dmvr_hv_8_12x20_c: 650.0 dmvr_hv_8_12x20_avx2: 57.2 dmvr_hv_8_20x12_c: 676.2 dmvr_hv_8_20x12_avx2: 70.0 dmvr_hv_8_20x20_c: 1068.5 dmvr_hv_8_20x20_avx2: 103.2 dmvr_hv_10_12x20_c: 649.0 dmvr_hv_10_12x20_avx2: 48.2 dmvr_hv_10_20x12_c: 677.7 dmvr_hv_10_20x12_avx2: 59.7 dmvr_hv_10_20x20_c: 1093.5 dmvr_hv_10_20x20_avx2: 91.7 dmvr_hv_12_12x20_c: 660.0 dmvr_hv_12_12x20_avx2: 58.7 dmvr_hv_12_20x12_c: 682.7 dmvr_hv_12_20x12_avx2: 72.0 dmvr_hv_12_20x20_c: 1094.0 dmvr_hv_12_20x20_avx2: 113.2 dmvr_v_8_12x20_c: 325.7 dmvr_v_8_12x20_avx2: 31.2 dmvr_v_8_20x12_c: 326.2 dmvr_v_8_20x12_avx2: 38.5 dmvr_v_8_20x20_c: 538.5 dmvr_v_8_20x20_avx2: 54.2 dmvr_v_10_12x20_c: 318.5 dmvr_v_10_12x20_avx2: 23.7 dmvr_v_10_20x12_c: 330.7 dmvr_v_10_20x12_avx2: 40.5 dmvr_v_10_20x20_c: 567.5 dmvr_v_10_20x20_avx2: 48.0 dmvr_v_12_12x20_c: 335.2 dmvr_v_12_12x20_avx2: 30.0 dmvr_v_12_20x12_c: 330.2 dmvr_v_12_20x12_avx2: 39.5 dmvr_v_12_20x20_c: 535.2 dmvr_v_12_20x20_avx2: 60.0 --- tests/checkasm/vvc_mc.c | 59 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c index bc6b580f42..62fa6aa7d0 100644 --- a/tests/checkasm/vvc_mc.c +++ b/tests/checkasm/vvc_mc.c @@ -324,6 +324,64 @@ static void check_avg(void) report("avg"); } +#define SR_RANGE 2 +static void check_dmvr(void) +{ + LOCAL_ALIGNED_32(uint16_t, dst0, [DST_BUF_SIZE]); + LOCAL_ALIGNED_32(uint16_t, dst1, [DST_BUF_SIZE]); + LOCAL_ALIGNED_32(uint8_t, src0, [SRC_BUF_SIZE]); + LOCAL_ALIGNED_32(uint8_t, src1, [SRC_BUF_SIZE]); + const int dst_stride = MAX_PB_SIZE * sizeof(int16_t); + + VVCDSPContext c; + declare_func(void, int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, + intptr_t mx, intptr_t my, int width); + + for (int bit_depth = 8; bit_depth <= 12; bit_depth += 2) { + ff_vvc_dsp_init(&c, bit_depth); + randomize_pixels(src0, src1, SRC_BUF_SIZE); + for (int i = 0; i < 2; i++) { + for (int j = 0; j < 2; j++) { + for (int h = 8; h <= 16; h *= 2) { + for (int w = 8; w <= 16; w *= 2) { + const int pred_w = w + 2 * SR_RANGE; + const int pred_h = h + 2 * SR_RANGE; + const int mx = rnd() % VVC_INTER_LUMA_DMVR_FACTS; + const int my = rnd() % VVC_INTER_LUMA_DMVR_FACTS; + const char *type; + + if (w * h < 128) + continue; + + switch ((j << 1) | i) { + case 0: type = "dmvr"; break; // 0 0 + case 1: type = "dmvr_h"; break; // 0 1 + case 2: type = "dmvr_v"; break; // 1 0 + case 3: type = "dmvr_hv"; break; // 1 1 + } + + if (check_func(c.inter.dmvr[j][i], "%s_%d_%dx%d", type, bit_depth, pred_w, pred_h)) { + memset(dst0, 0, DST_BUF_SIZE); + memset(dst1, 0, DST_BUF_SIZE); + call_ref(dst0, src0 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w); + call_new(dst1, src1 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w); + for (int k = 0; k < pred_h; k++) { + if (memcmp(dst0 + k * dst_stride, dst1 + k * dst_stride, pred_w * sizeof(int16_t))) { + fail(); + break; + } + } + + bench_new(dst1, src1 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w); + } + } + } + } + } + } + report("dmvr"); +} + static void check_vvc_sad(void) { const int bit_depth = 10; @@ -363,6 +421,7 @@ static void check_vvc_sad(void) void checkasm_check_vvc_mc(void) { + check_dmvr(); check_vvc_sad(); check_put_vvc_luma(); check_put_vvc_luma_uni();