From patchwork Sun Aug 18 01:48:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51061 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b6ca:0:b0:48e:c0f8:d0de with SMTP id s10csp1228864vqj; Sat, 17 Aug 2024 18:48:45 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXfFmKDtLahxd7ZNUpuF+xC2U9qO7z63ImjQqdKGH6q6bw3RvPygvJK0P4GtISavowbAQuBHxI289+we7deGlJj@gmail.com X-Google-Smtp-Source: AGHT+IFAfTeqHlhsovV+VVIIPt/y4DTWiQqA0xls238Cu2f+/aEralAsv9td0nddd1+fNf/j6Ja3 X-Received: by 2002:a05:6402:430b:b0:5a1:cf25:75a6 with SMTP id 4fb4d7f45d1cf-5beca4d7c50mr2959286a12.1.1723945724715; Sat, 17 Aug 2024 18:48:44 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5bebcf5ccddsi3947120a12.118.2024.08.17.18.48.44; Sat, 17 Aug 2024 18:48:44 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b="Zjf98/Ju"; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B4B1368DD5D; Sun, 18 Aug 2024 04:48:40 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01olkn2023.outbound.protection.outlook.com [40.92.52.23]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0A4C468DD32 for ; Sun, 18 Aug 2024 04:48:32 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=JiGrDFc2AADgmozSd0MgQn2yqs03HnbP4aG40/+AzxvPlrA2QHNbb7vBzIuTzTYpeTmS03kFCJ4zqzhBzR0un/WcCsLtPvR+6pFwYUqozAwGtuSAiRNGYWgtxf4vc3sxlD9W9D3S+qmUDYLqmHvdL2I+s6BJWJdgsbRfITbkzInVyiO9ZAdymQ6++EJ62SJfmBKNXsUZ8cblnluDIP0GDo5TZTk1nlf5DltC18GwKH4/PLOlseeX4ZcUB22DB8SGUO4hAB7QC2vX+lWTOkmlc3IrqJIiUDDZ63j5p4ssjnDlahG9+CrlgJga1gDFyhKKnC6QqRSGBWuywmm0xwiChw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=s6BwlZOp63bf6o3hqFwgxSlZ/r5zML6jTzpk6Nbk9r8=; b=QSelthW4Als/gFJ6XSQc/MGwYYGJSihW2ksrTlCWyh7p1LmXGAblaTp+EvnuwJbxjfbAd47ydhYMNGqG5oVAeNu9MxkZocVVGoeYvjCDEOIjqUYeG9z2jLhXiLiFEg16oiRrljPyibqLyGzsR8XQvy4YuVN7URpiCl1f/ZpU8AKwx3fI4Zu9Po5g3zBSwvlRUsAP+ozDXEv8RRZffLhfE5jpm5VaukBd08MPZYD9eAfWTaGqa7QHxmLGjrbQGidniqTWrPdRLUM08VBsBiYI06jOPoE1PanP8QLXVm5q9qOzG+3ERMF7W6SdJqO9boqAiUZ4COQhS+gY2IbDpKprfg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=s6BwlZOp63bf6o3hqFwgxSlZ/r5zML6jTzpk6Nbk9r8=; b=Zjf98/JuWt0VxTHm++/H8HzNuF3RUM8xXrOapAx7wQ5VIQGlev6uhLVIHnoucLWffOXeyzGi3a+SC8EHKridSqe13YCmzAYxjggj5zP59Pq5Y+mMIp+OZRp3Fbhfau4dHtwj8x1YU6WN8zdH3XZFq67OrkQgHrDfH9gSdvUgnAAYBqnT7HnRFLQPd3SKHrAonOmUZQBBdM24FxdGG0rMkKqyQpTYfpcehZ7IOsMKJKd5AMpyKOJ8gSA2Xk6mwP5DT1XDgiMUCHfLaCJDZohU0zo8aWrlQbXHwJ/JKc+5SXwItbHvbnsB6ZgYJArqjPWm5IFndC8+6OGkCylvfdb5DQ== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by SEYPR06MB5208.apcprd06.prod.outlook.com (2603:1096:101:88::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.20; Sun, 18 Aug 2024 01:48:26 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Sun, 18 Aug 2024 01:48:26 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Aug 2024 09:48:04 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 X-TMN: [sChajViA8ghl60XZV7o4TBslihwbLMV2] X-ClientProxiedBy: TY2PR0101CA0034.apcprd01.prod.exchangelabs.com (2603:1096:404:8000::20) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240818014807.47423-1-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|SEYPR06MB5208:EE_ X-MS-Office365-Filtering-Correlation-Id: a221adaf-bb20-4f25-a6a4-08dcbf27d9ef X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599009|19110799003|15080799003|8060799006|461199028|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: y2BxjumdS3BFdUHvW6fM+FJa99pnPl9sko6A5acxLAL5aHQkIYezkKh7mLbRFxI9+LImna4pmg1pIrEbtRo+kTr/MdfsPEPqW29aSh26/+ASR3/tzwFdWI/KfSjacfMLvaPQXlaB86cn9H08p6MFcKJJuMfFWKsWe3Tc5vqte6SVUgMpOwBAIguhmIr+XBrpi/4mnz3CWF6Th4GLjP1hcFjh9ZJVjDS3fAS7NBYeNujeU10xjOjK3c7f1LOVY0gHoaC8EfjUjUzkOk8fTGAC6oP62iXdOBffkh3wFa6l87FKsMcN+dA5fK25epZ6OlKoz8Zolf6EkL5bUIIqUoIT8f9nP3NImfnioTFbbEz9mz2FCC5IP+L7AAsu02a/IJCrThOQ/v/83tGa40LOWjd1pBKIDnz7RMdrITXmaVn1cINaT0O6HrQOVEIRirgItc3g7TNhxXs4OQ4SHR5euOujpU/ToKQD6CLRzP9HGrhT0XekLwRkQREGoIRlt7jrIQSwkb6DwVAZ5OrQWGVEPuSVSBTANn9mjoJXoTWJa6SLp2ViJXnjThw1gWYB3ryBN48d8WWwkOUWgxgmGpCgM1GNKehSbmcYOEqLRsacfMeSTPR9KG0LOV4il6eFxgtIJ4mcoZg18cYZCyIYM8xZITmPhbXGt+A3vH4zxCaU6LUFM2oKoMh4T9lDjCkZX09IGOvaUnhbEYmSyZNbhMKytkWkwt2VKL8QEJMJPjuGTXMZPmw= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: EdIV7oE771Ulv8Bhy17LmytTs0fm8OmKTSbmRvqAhFod25MJSRnPXHqOIOUwY6RkoXyBfTHy1jek/IzDobCMxKWKQH4hlpp6bmYhRUHPFgwQdZHc21jRkv6xEaNQGpbneqGYxFH68eca5MTuSW//Y8OMR+Sbp+Ad4T5bTj1bo3Jcxt9zwQVVIHvyWUplZ47Cl3mlCQ3+eDjc8bw1gb4cf8oVADS6i3pgNri54Zno07/HJGfkgKuQPNapMCgjkVC3Kv30YjNqBCR3yIxYyZSZajnTqZdDGX+Sn3mV157mxwhyFXdrUrz9xY0+cWvvE/uMNsyQlXv0KZjKH5bywo8W7Ep8lxfNrVtIBiaZ6nDWnF1CbwTun7fSAfTLSbmFFtKfjcaCnR7nujdOol8q06CMdQ2dcH8wsTHl9RxVz34ZQdZVbcka5cftcbbGjT8uCoeiuNspPcO/bm4KDegvuqTi/rE34zrt1LRNVfgvOpLb4b/6hc/MT3/E/vIvImthMZAo6ouZ7M+105OEc+IAct8lkiNyzYiA60vLliYLDjlMZBBAvgPVKq1Gut+Undjtr1zg/7N9qNOFwUh03bj5RH7QvG3MzK8E+9TAWsVqRGRYemNbKPQ5+KkQGCBqI8UfWqzgbplousFD25oQRu6BYxjQZ64LRqG0Fwvsp/FivOtriwc2vKkI6N6CC4OZ4RRnXIBfyfj7ARRNAraSFIGiujLASSQMZBScxtz/E4EFYnG7OkEHgNHmDQMXybJBkO7iayC56T7V5tpuaVpils56cOhUJ9PUK4Gjpt5E7SgAJawEb7FErAxxj3zk/toQt0m/vIs5RXkgWJNykWR9TDr+Zg1ljML0/cvAEZueuUuZnnnygBgVHcpQTC4I16LMFEzV9urfpZ+6/LbrF5nO1qXEX9fJkBcLqKCKuMEMw9v9y4Eck8z4fk9Leva430J3dExszEY0Yg4PGY6FDH6P5b3EbMvYUidRVC/PIjMHf7CfUs7upSU7uz8Y4XXuBUUGqoj8kTzWKpg1zlpmnvu5Jzo1ERNDdhQmpBcDtz/6gOc+7NjlKquVxI8df10MgBxHFsQTwmEnWs9KFx81p2OR/7g31P+LlZMcdE2P03V3f5ukuhci9VrKWF4/qzpyfed9K4d/EQf2D5qopIl921DhxYJ0qrkEONrBP9YkzKaGcnRqRbuLVgzuEwhB6GpOa5XPXLXTuTWaPSI3XTmE5QpMTx8Tfs0fW65NlL88I1wuFyVJBIAwf62NPZSWfGlQjq6/Lq7gblHC6Yt7T4aowNNj991E+T78XQ== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: a221adaf-bb20-4f25-a6a4-08dcbf27d9ef X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2024 01:48:26.6319 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEYPR06MB5208 Subject: [FFmpeg-devel] [PATCH 1/4] avcodec/vvcdec: misc, rename BDOF_BLOCK_SIZE to BDOF_MIN_BLOCK_SIZE X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: flmBUW7e73YX --- libavcodec/vvc/dsp.c | 4 ++-- libavcodec/vvc/inter_template.c | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c index 648d54ebb2..7463d8c9de 100644 --- a/libavcodec/vvc/dsp.c +++ b/libavcodec/vvc/dsp.c @@ -80,8 +80,8 @@ typedef struct IntraEdgeParams { #define BDOF_BORDER_EXT 1 #define BDOF_PADDED_SIZE (16 + BDOF_BORDER_EXT * 2) -#define BDOF_BLOCK_SIZE 4 -#define BDOF_GRADIENT_SIZE (BDOF_BLOCK_SIZE + BDOF_BORDER_EXT * 2) +#define BDOF_MIN_BLOCK_SIZE 4 +#define BDOF_GRADIENT_SIZE (BDOF_MIN_BLOCK_SIZE + BDOF_BORDER_EXT * 2) #define BIT_DEPTH 8 #include "dsp_template.c" diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index afcee2e360..0f1712e337 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -433,8 +433,8 @@ static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, c const int16_t* gh[] = { gradient_h[0] + 1 + BDOF_PADDED_SIZE, gradient_h[1] + 1 + BDOF_PADDED_SIZE }; const int16_t* gv[] = { gradient_v[0] + 1 + BDOF_PADDED_SIZE, gradient_v[1] + 1 + BDOF_PADDED_SIZE }; - for (int y = 0; y < BDOF_BLOCK_SIZE; y++) { - for (int x = 0; x < BDOF_BLOCK_SIZE; x++) { + for (int y = 0; y < BDOF_MIN_BLOCK_SIZE; y++) { + for (int x = 0; x < BDOF_MIN_BLOCK_SIZE; x++) { const int idx = y * BDOF_PADDED_SIZE + x; const int bdof_offset = vx * (gh[0][idx] - gh[1][idx]) + vy * (gv[0][idx] - gv[1][idx]); dst[x] = av_clip_pixel((src0[x] + offset4 + src1[x] + bdof_offset) >> shift4); @@ -461,8 +461,8 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t _src1, MAX_PB_SIZE, block_w, block_h, 1); pad_int16(_src1, MAX_PB_SIZE, block_w, block_h); - for (int y = 0; y < block_h; y += BDOF_BLOCK_SIZE) { - for (int x = 0; x < block_w; x += BDOF_BLOCK_SIZE) { + for (int y = 0; y < block_h; y += BDOF_MIN_BLOCK_SIZE) { + for (int x = 0; x < block_w; x += BDOF_MIN_BLOCK_SIZE) { const int16_t* src0 = _src0 + y * MAX_PB_SIZE + x; const int16_t* src1 = _src1 + y * MAX_PB_SIZE + x; pixel *d = dst + x; @@ -472,7 +472,7 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t FUNC(derive_bdof_vx_vy)(src0, src1, gh, gv, BDOF_PADDED_SIZE, &vx, &vy); FUNC(apply_bdof_min_block)(d, dst_stride, src0, src1, gh, gv, vx, vy); } - dst += BDOF_BLOCK_SIZE * dst_stride; + dst += BDOF_MIN_BLOCK_SIZE * dst_stride; } } From patchwork Sun Aug 18 01:48:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51064 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b6ca:0:b0:48e:c0f8:d0de with SMTP id s10csp1230457vqj; Sat, 17 Aug 2024 18:56:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXEuZ74UoCE65VHOzhGbOyUM3WIoiY8ycd/7+T3i6+KsVr/vU5mYK6nnpfNQzFHeQIULXEzPMTU7A8yqdMRoyXN@gmail.com X-Google-Smtp-Source: AGHT+IGGVyVdvzju6/9ZKsto6fxyrgdxFu6WPeheD6TEgqR8okshlZhGUrex3gzVUC+NiUcSepnu X-Received: by 2002:a05:651c:19a9:b0:2ef:2b6e:f8c2 with SMTP id 38308e7fff4ca-2f3be5fba06mr26534271fa.6.1723946178706; Sat, 17 Aug 2024 18:56:18 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f3b771ae3dsi16496341fa.495.2024.08.17.18.56.18; Sat, 17 Aug 2024 18:56:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=RgeSabpd; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1A1F068DD75; Sun, 18 Aug 2024 04:48:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01olkn2023.outbound.protection.outlook.com [40.92.52.23]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A578468DD6F for ; Sun, 18 Aug 2024 04:48:38 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ZLOT/7Vuw9n90zbSDLd1d+xhYCOjqAFIl3Ot8+xaF7axLA0vjfKEzIivSeW4N6wOwmWoUatdIhyxjI/FYGHQ+G8/atzc2XvfHORdTdxu86S3VPtpJNvLkteMZVVvnhSelkiE9ErH7HXe2H8VKLupQf1xFM5NjFuucI6rBuo2sHbCHvk1014PVpzDaZgV5Ega1qBYXn/G+zPlEiSjWqtfX27tdewxTSLe+ERIjaKnf4F+Qc16dJ/VOXJ/XhgseEF4zSzLf4Jc4SXFTwEmMEum1yCGKmOR6+AMJbWQDp4JS0bn2y+NU8iUZZoyedTBPToZmio+7bLD7qpSaeqCELNHmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nN+A0pgptZvZqgIzuLUfynHmpdCaB+E0pA/4i7rIvHM=; b=AJ5AcUBqcMoPEDIV7iGCmU2H1zruz7xW8NZAdH6LFaQ6YMUoxj+RnC5e02w8OTph0g8AnouGcF7ecqUSOINhSAaaO6H1dsEeEmR4YSQ5iG7dwjWcL6i7RT1vGRiqrwWz5zPwI4Rq94zbHgVuUMkuxcKX3TX4yBkNGWqSAvCgkSUoM1ZuvKBSo0ZoC6Fg4EnupkJ8/jZPIvLQJFKvRkE/Q7rAqGsZau0eZei8MNxu6Fg+VQ2vHyPZcczqHpxIGO7eRkZ3ODxkwqATJz4tnW1pfdYWYo3xuFqNaEwv6cqlZlR00ezhh54Awzu+AiXABOzdwicQ2UxwtogiuKN1GK5UwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nN+A0pgptZvZqgIzuLUfynHmpdCaB+E0pA/4i7rIvHM=; b=RgeSabpdPw2Bb5ppQjrWeu8YPPG7MOMg04x1rOAoupmDgc3/l3EEEMZzPVw/kJqUf4LnCFIejphiqQc+0f9EJOyQ7mVpleLtNwAN81mgL3aXn4kmMdlN8v2WJqg+xkdCwjRUWfnHJ1wBBWwlSQhgze7i8FJ1IDehQ6TRffV7VLob2qdefFzw7jx8gNQTnbJW4IBO/W6+uCmCKElXY2VSRX/EECfs613ACMnyVP3o62+v5gFThnBM8oXl7iILXqVan0Z1IHi5JVwPy2EV6fX1TDAMSZ0s5N4H0QcoY6BxU+sutUHD5Zps/iaH/MSPPVkKHlbTEUGfaUaujoS5faUjZA== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by SEYPR06MB5208.apcprd06.prod.outlook.com (2603:1096:101:88::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.20; Sun, 18 Aug 2024 01:48:27 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Sun, 18 Aug 2024 01:48:27 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Aug 2024 09:48:05 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240818014807.47423-1-nuomi2021@gmail.com> References: <20240818014807.47423-1-nuomi2021@gmail.com> X-TMN: [KFjge3IcAuqvrN4yaQQx8z2OOuLUKpPj] X-ClientProxiedBy: TY2PR0101CA0034.apcprd01.prod.exchangelabs.com (2603:1096:404:8000::20) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240818014807.47423-2-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|SEYPR06MB5208:EE_ X-MS-Office365-Filtering-Correlation-Id: 3446db2b-c9c5-41bf-21a7-08dcbf27da9c X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599009|19110799003|15080799003|8060799006|461199028|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: sEWJsl7ah2Id1xvAEmt2h6kU3X9iewZWg97RlsJ0+SApQ+BptBiEtRZTfK8JQXoVjQwc/kY+dunTp4feemPiign8yjqM/IOUGuahiGEPNAD//uk11O3Xdpr1aw42oeb8kkgT8iQ/S1rkRAgQpJb3NbhQPfkFb5QmWZeU5qsG6lLiuvnkK0dr75J8L1NjDdW0uwNwqXQozrrr8wWGU4qyRf/Ghw5Ydr/tjrq0bcvNmgWSPXQ+z8g+K8ptK3pmhcOxltegM7efOUpCKUEmRPTWNiDjAy4wk/Of+6Ipt3brGCJS0zrPpZv9ngBdfTvvSH9QwEx2PXSM+d+w7lg4CP8mYiO5PUSAoBj79hLZqUAhAmVfS1Pk314e8Fpc0TydKMF0HW8C5+qEHGh8z6wb/WnPHYu6GGYg/JSmu6SUEkKUlraQpq8MjlGCTe2kDWXrjHro4hx8nHSX4+kfdT+hhjjEvC9AOtTsDCrvON1yy/+tQO1iHdMtHNZvzdR2e0GawJDFT2xHSMkLlQqMtbGNzjvpnnzgDITZHtjC/KocoGlL22Bf6v7/r96kBR9nCSvMScjhwMn21CcBVyQMtXAc/P95m2BC4XXwke2XfTvqletBYECAkm3sQW2rPSATU6C7ujeW6ZxksGLG87JU1PjCtu5+7Tv45sCiXjrWOxHutFyVBjwDmDg+FY/JNiicsjc68OX4abW40Q1uk5MZOn1Wil+wsuAVH1XBrxgv+tWZHrrMRuw= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Nd1JmJHzBSJ9GX0xFQF41O1GR3D69fHOIwqtfi3FnmkqmozQYq3xP96uTTWDzVJgm7Ua1PnZEfiYsbxMUaRoZjyxZ0oh8Ry6GDXiNJsv+d/8lF4gr0pmrGzPvnQ4ALpeenxgtiCunExqdn9XLeS4U+zcTkh2WtsegVTeADxh3f8+du1bjS25C7Fm3C95hj6u6h0rCiOXFl+AplRqzeuMXIV35BBTEY3qda3dCXlq9pxo4Y8Figi0ec6kPKq7TpbJeFzYCj/xng++kIpKr++G24fHOOnHEBG817ujoLkyrp0JFSRbefsEezR8SLu8++c2GJO70XqE+9TIJtWmfc1T5oMlZyUt+7Eix2zZkYtwEZNOyGKlP+aJvwJ6Mo/3AQdeofbLbiZ43TU3UadMkmOtfZrh9ndsSU4kgPbWY3i7HvSuQ5KpM9uB30jmtRFKgKaL3noDhc0AfNEa57hmYsUlBqo91HXk030h0QUD1CkpRQI5dxpaA1T4tVCunNdMD4jPgYCAPj4YKLrzHUqbD7Q2iOqhXQa2+PMOspD2Cz/w250DGu++/+V4AcMbwr3a0f67lZCpuCnOnolh2j3sSA/Dlc9zUmuwQd68Ks33OnogWgoqpkrckrL8LJH9jD8HGAW2ovjU/pDTDb0nJqv35kA21TBjQTiYT/dJtAfrRwkPmfnwaxais7khGhP8rGkiFDrmWBGyYcyG0lc6sVEg7MAgePQn6PWsopdllzrvu8Y+IYvMmFmQ39P/ZYMPACur0DwqZNeadNB0sJv2XNug7sKulLw0h9fwtzsgdte2BuPL/F3SnQ7UBTuCs+9vlopFoTMZXf2Noa3lKNNatVRGLElWJ1eLLb5x0aCBUH8otKpN8KeOlXpSb1esK0gxNufN0Y2YvMOe4a2DbwpDJmY4BLjhhqOIMRdvRx0KLcRFRH4CDF5ah4IVnM0Fov+DIi/PYh0a2v+3EJTJDwQ56BsQBlQIcKjoeHFcYVFObGTmD9q9omTk6CpbmINXXtvFmIirTHjzCO0VXy/zpSskuoRV9yq7mYR61Ar/3la/HPzzeALd7g6fRyNVZKaOs/++BNdD0xynVW7P5jj1bB+zBtIYZW3kYZyCYGzypbJ9FsC7t5kXWywXR5N5FMFYhd9zOFeixGbaSl4ZcCX+7GR0farThFYzQsSpY60mriO/YvMQD4ESdnJV7dRjqvNyus0+Q2VsCcIZSPe5WlxlLJNaVFeigBjsIwW3sFFjovOITMZDq3FMIFTTycrKkTkCWafCecB3imJXa7TjDJ6AtSYbCoDn491C7A== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3446db2b-c9c5-41bf-21a7-08dcbf27da9c X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2024 01:48:27.4619 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEYPR06MB5208 Subject: [FFmpeg-devel] [PATCH 2/4] avcodec/vvcdec: bdof, do not pad sources and gradients to simplify the code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: xkD7B5Bnemtr --- libavcodec/vvc/dsp.c | 25 +------------ libavcodec/vvc/dsp.h | 4 +- libavcodec/vvc/inter_template.c | 65 ++++++++++++++------------------- 3 files changed, 31 insertions(+), 63 deletions(-) diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c index 7463d8c9de..433353c32c 100644 --- a/libavcodec/vvc/dsp.c +++ b/libavcodec/vvc/dsp.c @@ -26,26 +26,6 @@ #define VVC_SIGN(v) (v < 0 ? -1 : !!v) -static void av_always_inline pad_int16(int16_t *_dst, const ptrdiff_t dst_stride, const int width, const int height) -{ - const int padded_width = width + 2; - int16_t *dst; - for (int y = 0; y < height; y++) { - dst = _dst + y * dst_stride; - for (int x = 0; x < width; x++) { - dst[-1] = dst[0]; - dst[width] = dst[width - 1]; - } - } - - _dst--; - //top - memcpy(_dst - dst_stride, _dst, padded_width * sizeof(int16_t)); - //bottom - _dst += dst_stride * height; - memcpy(_dst, _dst - dst_stride, padded_width * sizeof(int16_t)); -} - static int vvc_sad(const int16_t *src0, const int16_t *src1, int dx, int dy, const int block_w, const int block_h) { @@ -77,11 +57,10 @@ typedef struct IntraEdgeParams { #define PROF_BORDER_EXT 1 #define PROF_BLOCK_SIZE (AFFINE_MIN_BLOCK_SIZE + PROF_BORDER_EXT * 2) -#define BDOF_BORDER_EXT 1 -#define BDOF_PADDED_SIZE (16 + BDOF_BORDER_EXT * 2) +#define BDOF_BORDER_EXT 1 +#define BDOF_BLOCK_SIZE 16 #define BDOF_MIN_BLOCK_SIZE 4 -#define BDOF_GRADIENT_SIZE (BDOF_MIN_BLOCK_SIZE + BDOF_BORDER_EXT * 2) #define BIT_DEPTH 8 #include "dsp_template.c" diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 38ff492a23..635ebcafed 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -88,8 +88,6 @@ typedef struct VVCInterDSPContext { void (*bdof_fetch_samples)(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int x_frac, int y_frac, int width, int height); - void (*prof_grad_filter)(int16_t *gradient_h, int16_t *gradient_v, const ptrdiff_t gradient_stride, - const int16_t *src, const ptrdiff_t src_stride, int width, int height, const int pad); void (*apply_prof)(int16_t *dst, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y); void (*apply_prof_uni)(uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src, @@ -97,7 +95,7 @@ typedef struct VVCInterDSPContext { void (*apply_prof_uni_w)(uint8_t *dst, const ptrdiff_t dst_stride, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y, int denom, int wx, int ox); - void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h); + void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, int block_w, int block_h); int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index 0f1712e337..c073a73e76 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -292,13 +292,11 @@ static void FUNC(fetch_samples)(int16_t *_dst, const uint8_t *_src, const ptrdif FUNC(bdof_fetch_samples)(_dst, _src, _src_stride, x_frac, y_frac, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); } -static void FUNC(prof_grad_filter)(int16_t *_gradient_h, int16_t *_gradient_v, const ptrdiff_t gradient_stride, - const int16_t *_src, const ptrdiff_t src_stride, const int width, const int height, const int pad) +static void FUNC(prof_grad_filter)(int16_t *gradient_h, int16_t *gradient_v, const ptrdiff_t gradient_stride, + const int16_t *_src, const ptrdiff_t src_stride, const int width, const int height) { const int shift = 6; const int16_t *src = _src; - int16_t *gradient_h = _gradient_h + pad * (1 + gradient_stride); - int16_t *gradient_v = _gradient_v + pad * (1 + gradient_stride); for (int y = 0; y < height; y++) { const int16_t *p = src; @@ -311,10 +309,6 @@ static void FUNC(prof_grad_filter)(int16_t *_gradient_h, int16_t *_gradient_v, c gradient_v += gradient_stride; src += src_stride; } - if (pad) { - pad_int16(_gradient_h + 1 + gradient_stride, gradient_stride, width, height); - pad_int16(_gradient_v + 1 + gradient_stride, gradient_stride, width, height); - } } static void FUNC(apply_prof)(int16_t *dst, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y) @@ -323,7 +317,7 @@ static void FUNC(apply_prof)(int16_t *dst, const int16_t *src, const int16_t *di int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -352,7 +346,7 @@ static void FUNC(apply_prof_uni)(uint8_t *_dst, const ptrdiff_t _dst_stride, con int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -380,7 +374,7 @@ static void FUNC(apply_prof_uni_w)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -395,47 +389,47 @@ static void FUNC(apply_prof_uni_w)(uint8_t *_dst, const ptrdiff_t _dst_stride, } static void FUNC(derive_bdof_vx_vy)(const int16_t *_src0, const int16_t *_src1, - const int16_t **gradient_h, const int16_t **gradient_v, ptrdiff_t gradient_stride, + const int pad_left, const int pad_top, const int pad_right, const int pad_bottom, + const int16_t **gradient_h, const int16_t **gradient_v, int* vx, int* vy) { const int shift2 = 4; const int shift3 = 1; const int thres = 1 << 4; int sgx2 = 0, sgy2 = 0, sgxgy = 0, sgxdi = 0, sgydi = 0; - const int16_t *src0 = _src0 - 1 - MAX_PB_SIZE; - const int16_t *src1 = _src1 - 1 - MAX_PB_SIZE; - for (int y = 0; y < BDOF_GRADIENT_SIZE; y++) { - for (int x = 0; x < BDOF_GRADIENT_SIZE; x++) { - const int diff = (src0[x] >> shift2) - (src1[x] >> shift2); - const int idx = gradient_stride * y + x; + for (int y = -1; y < BDOF_MIN_BLOCK_SIZE + 1; y++) { + const int dy = y + (pad_top && y < 0) - (pad_bottom && y == BDOF_MIN_BLOCK_SIZE); // we pad for the first and last row + const int16_t *src0 = _src0 + dy * MAX_PB_SIZE; + const int16_t *src1 = _src1 + dy * MAX_PB_SIZE; + + for (int x = -1; x < BDOF_MIN_BLOCK_SIZE + 1; x++) { + const int dx = x + (pad_left && x < 0) - (pad_right && x == BDOF_MIN_BLOCK_SIZE); // we pad for the first and last col + const int diff = (src0[dx] >> shift2) - (src1[dx] >> shift2); + const int idx = BDOF_BLOCK_SIZE * dy + dx; const int temph = (gradient_h[0][idx] + gradient_h[1][idx]) >> shift3; const int tempv = (gradient_v[0][idx] + gradient_v[1][idx]) >> shift3; + sgx2 += FFABS(temph); sgy2 += FFABS(tempv); sgxgy += VVC_SIGN(tempv) * temph; sgxdi += -VVC_SIGN(temph) * diff; sgydi += -VVC_SIGN(tempv) * diff; } - src0 += MAX_PB_SIZE; - src1 += MAX_PB_SIZE; } *vx = sgx2 > 0 ? av_clip((sgxdi * (1 << 2)) >> av_log2(sgx2) , -thres + 1, thres - 1) : 0; *vy = sgy2 > 0 ? av_clip(((sgydi * (1 << 2)) - ((*vx * sgxgy) >> 1)) >> av_log2(sgy2), -thres + 1, thres - 1) : 0; } static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, - const int16_t **gradient_h, const int16_t **gradient_v, const int vx, const int vy) + const int16_t **gh, const int16_t **gv, const int vx, const int vy) { const int shift4 = 15 - BIT_DEPTH; const int offset4 = 1 << (shift4 - 1); - const int16_t* gh[] = { gradient_h[0] + 1 + BDOF_PADDED_SIZE, gradient_h[1] + 1 + BDOF_PADDED_SIZE }; - const int16_t* gv[] = { gradient_v[0] + 1 + BDOF_PADDED_SIZE, gradient_v[1] + 1 + BDOF_PADDED_SIZE }; - for (int y = 0; y < BDOF_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < BDOF_MIN_BLOCK_SIZE; x++) { - const int idx = y * BDOF_PADDED_SIZE + x; + const int idx = y * BDOF_BLOCK_SIZE + x; const int bdof_offset = vx * (gh[0][idx] - gh[1][idx]) + vy * (gv[0][idx] - gv[1][idx]); dst[x] = av_clip_pixel((src0[x] + offset4 + src1[x] + bdof_offset) >> shift4); } @@ -445,31 +439,29 @@ static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, c } } -static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t *_src0, int16_t *_src1, +static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, const int16_t *_src0, const int16_t *_src1, const int block_w, const int block_h) { - int16_t gradient_h[2][BDOF_PADDED_SIZE * BDOF_PADDED_SIZE]; - int16_t gradient_v[2][BDOF_PADDED_SIZE * BDOF_PADDED_SIZE]; + int16_t gradient_h[2][BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE]; + int16_t gradient_v[2][BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE]; int vx, vy; const ptrdiff_t dst_stride = _dst_stride / sizeof(pixel); pixel* dst = (pixel*)_dst; - FUNC(prof_grad_filter)(gradient_h[0], gradient_v[0], BDOF_PADDED_SIZE, - _src0, MAX_PB_SIZE, block_w, block_h, 1); - pad_int16(_src0, MAX_PB_SIZE, block_w, block_h); - FUNC(prof_grad_filter)(gradient_h[1], gradient_v[1], BDOF_PADDED_SIZE, - _src1, MAX_PB_SIZE, block_w, block_h, 1); - pad_int16(_src1, MAX_PB_SIZE, block_w, block_h); + FUNC(prof_grad_filter)(gradient_h[0], gradient_v[0], BDOF_BLOCK_SIZE, + _src0, MAX_PB_SIZE, block_w, block_h); + FUNC(prof_grad_filter)(gradient_h[1], gradient_v[1], BDOF_BLOCK_SIZE, + _src1, MAX_PB_SIZE, block_w, block_h); for (int y = 0; y < block_h; y += BDOF_MIN_BLOCK_SIZE) { for (int x = 0; x < block_w; x += BDOF_MIN_BLOCK_SIZE) { const int16_t* src0 = _src0 + y * MAX_PB_SIZE + x; const int16_t* src1 = _src1 + y * MAX_PB_SIZE + x; pixel *d = dst + x; - const int idx = BDOF_PADDED_SIZE * y + x; + const int idx = BDOF_BLOCK_SIZE * y + x; const int16_t* gh[] = { gradient_h[0] + idx, gradient_h[1] + idx }; const int16_t* gv[] = { gradient_v[0] + idx, gradient_v[1] + idx }; - FUNC(derive_bdof_vx_vy)(src0, src1, gh, gv, BDOF_PADDED_SIZE, &vx, &vy); + FUNC(derive_bdof_vx_vy)(src0, src1, !x, !y, x + BDOF_MIN_BLOCK_SIZE == block_w, y + BDOF_MIN_BLOCK_SIZE == block_h, gh, gv, &vx, &vy); FUNC(apply_bdof_min_block)(d, dst_stride, src0, src1, gh, gv, vx, vy); } dst += BDOF_MIN_BLOCK_SIZE * dst_stride; @@ -631,7 +623,6 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter) inter->apply_prof_uni = FUNC(apply_prof_uni); inter->apply_prof_uni_w = FUNC(apply_prof_uni_w); inter->apply_bdof = FUNC(apply_bdof); - inter->prof_grad_filter = FUNC(prof_grad_filter); inter->sad = vvc_sad; } From patchwork Sun Aug 18 01:48:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51062 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b6ca:0:b0:48e:c0f8:d0de with SMTP id s10csp1228950vqj; Sat, 17 Aug 2024 18:49:06 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUaIaQPFYf+1U/Y/zxZiBFrBPKCYG1RDH7F8I0/1Lnag90SmTKMzlBNrA3i2a/8BD/mlqr5ttEAuv/iVcORZPX0@gmail.com X-Google-Smtp-Source: AGHT+IGArpFHyGVy4S3v2WGTS0wYl15k3mnk3IVLimp2bL8Z39oJqtHaA9H2lQlyIv3qOyHYKym3 X-Received: by 2002:a2e:f19:0:b0:2f3:d032:44a9 with SMTP id 38308e7fff4ca-2f3d0324836mr5001671fa.0.1723945745704; Sat, 17 Aug 2024 18:49:05 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f3b7716d63si17303961fa.457.2024.08.17.18.49.05; Sat, 17 Aug 2024 18:49:05 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=KhtfOg+K; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 696CA68DD7D; Sun, 18 Aug 2024 04:48:46 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01olkn2023.outbound.protection.outlook.com [40.92.52.23]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 10C7168DD74 for ; Sun, 18 Aug 2024 04:48:44 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=j2tvptXthXwejpM0RTlzADpcgE/TYJ1cEZ99atsq2iUw7nWZUU5C6PI+4QO3e6mIxxdj1sYEPxLMiueMtikMg5U45VSB8mIMh5JB4TugIftmEIUV8x/mIG1DhOPeS6d3xxT5hZuutqCx40pR0HhU6Wx7xdfFZKeK/MPxOvfG0+HbvoFlIjALF2nW86j17lWh+74bfgbDKfAH5PcTEob1zsY9V0g9WmID6lj8QXYkHTISEP8ZNM+LXP3N9xyYczzF7duGLfBbf0PodubID2YnPllLCkwy4DVrDBKWO4ntxQhjjuiR2nXi3MLHeO4PfMeXf1IMrMlYMauHAhgWc2Ro7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=P0vpY+rmyPRqfeyBKxiKOFrl3UxwD1J3TmIfercHdro=; b=CwXOsA4T2Mb9P8q8/IQL2kFL/Bal6BJeUP1fOiSdtY2RHJArcZYaRdSZPExpGCFvDz1pYTr7JRd1fPccfj05Sc8jD3LS1HNYwNr4xT4uVCchBO2U3hFfJ0zVjdP5ehS8VaoomOtaMUYth8yQxxTAD+ZMaNIM3uek+pfA0sAd+Y41Om6q0svoR3hc++hjyC4Ct/hsRKtrQi5an4/2mLLBl23KxcKQy0UQnE0INLpC2PD30HWi+GTY7hcHC6YF9ioH1igkeDjVOkfSZXURms3gV9kBDgT+gQVIEwooRxau6gjnOjDXxmxq9qEZOItLcxNwC3AzVYgO1IYWBhujz5zuWg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=P0vpY+rmyPRqfeyBKxiKOFrl3UxwD1J3TmIfercHdro=; b=KhtfOg+KEsSZc9a4aX6rLz+ictpy4kDGx39A3fpywF60qt13ueUILbLNegRtBhniN6CvM1gFeE5EoxkMpFOmP2Nmq0622FzYS+ItlDlWrMSd3GIYZdpgbHY5UGAaV5TaNh50B/EG4beOHSsi6d6Opj3n0nJ9YwuC6F04pMq25Hq91Rp8UdoBKQ1GUKZiBRJ0NGwTcGyHsR7P1YnzWx48DQRqXKUmMsaHZbtVNh3QB3tLYhn1PzrMnOwh9Q7Y6+D7GrMgjXO+0QhS0EKT+HafovV9aEnCxnSAb9RTpsMzog4T3SL/RRBIpWIxN+M0xkIvGTXt1SqlIJPWpyjUhaGDqA== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by SEYPR06MB5208.apcprd06.prod.outlook.com (2603:1096:101:88::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.20; Sun, 18 Aug 2024 01:48:28 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Sun, 18 Aug 2024 01:48:28 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Aug 2024 09:48:06 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240818014807.47423-1-nuomi2021@gmail.com> References: <20240818014807.47423-1-nuomi2021@gmail.com> X-TMN: [hzgOfztUWuf4t+YvBEX8ATAVzrHy+Te9] X-ClientProxiedBy: TY2PR0101CA0034.apcprd01.prod.exchangelabs.com (2603:1096:404:8000::20) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240818014807.47423-3-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|SEYPR06MB5208:EE_ X-MS-Office365-Filtering-Correlation-Id: ff5361f5-fcb6-4564-acf0-08dcbf27db12 X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599009|19110799003|15080799003|8060799006|461199028|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: pGecB3od1bdnZQB12VBwNqJ3eb7PYCp7WnD8zfTo3TGAj5LeXYT+nmmNNQRSYq1t3oRGT1ruWVREoIKsENkIUy80Kl+koHvIFHEkw6QSOEdSrAkSYqE+SARkaoqLs8fo+x9460TZPHmT75WMr2R9VZ6ydXT2j7IXW5LBtT53JFvfNOtLxNuBA84DHZ6K9MNq4NKZKxVy0rO3Cnprv8nxkaylu1t3lqsbIkN1D2zg0sXOa9LUAf6SXlDDlPe5ycJQvf1vrt0mPHu5LwaZcUWHi3dLwDsDa2gru9ggBdALAezi3tZEGobPASl40GhJBpEdjPGlJl3qb3Vy9GvAG0lNeULro7kk4WF3Mbh9bdfWIUCzF1pqISyNKllg4w8B5ubh6dzGXG8gSIzcT/kXm7ai/WAQ4qsGGs/ypTq7R1MgSdpGS8HFbzFhUcE4cpBrSyP+mWob5734Rn606HL4OMVAqw5kcGyEE4JwGC6PSHi4vc/2r0A3K4WKtrMdewOKn9XanHyHL5nZG4pEG2noC3sLqgrT8Hcw2SmQLiK+ubggKiymRxglMp9blZVsJUdf1UJ2+fvop/XIf3etdmRn4RNHHaLcloALiuQJW7gBf3mmrLLaRnmIOrTJXj4X9OctumKCG/JRCt1R1/Fi/SMgk+bv318agImnh9sNusiqYcTW0sQeDc/uIm+/7MhNSqRuvFcgpJDknmx0Ap62LSaWu+5SsKJtiCrxZWZKqnL/ej/PsMg= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?EHemgWnf331ggwEMiHCpFe4EU6pW?= =?utf-8?q?ZFNPKQ6IyAG8WuHAS0klfRwpARG0Fb2m4Xix/Q6MmeWkxgvHC6T7NhnZo21Al/ZFV?= =?utf-8?q?UKZ+2IJ5DSZh3EdOvTLpHTySvsMbFyp4xiYvjkMYBuIQ1z4MhjZGUjzDSRbTnvDj5?= =?utf-8?q?jwFWMbsBWsOple+T4QUA/cHNv5NJGFOsE2RlvfcVxwCDRQ80u2lxdl4g8C+Hv5BZp?= =?utf-8?q?66k6yxqvb6/BxRIVknI3Rk7y/mtJT9i3W34ImyTNPkAvSebwdhIqY/tq58oPBMwTj?= =?utf-8?q?RYZrmkv6y4rJx+mSsMN6t3Ja1D9+NrQGGKXdmY8wcm+x0lEx984xwUQ4LJY9PAYEL?= =?utf-8?q?KK1sQMw/X3cCUbHbmBwL96yIZZDG5j5v5Kz/fBKVVtMmxtczoYI5WLciA9cu96F6A?= =?utf-8?q?DprL9uF+dT1+7Nm/j/Z8+vtIbnSDu5KE7VZcWIN/KQSaFMn5XAPxqI66+HvxPOAfJ?= =?utf-8?q?yLqVwqLDMQmzMUJIQ9OJm8PxRASPgllaeuKhUdFi0Ye9xqoSSNqovaR1W5ARg5Qg2?= =?utf-8?q?zCe1y1KmGWCTyGbFUulVaupuj4zBr5An/KREDiMOmkJPc82H8DQhp18+ZzyfHijPi?= =?utf-8?q?6fwtUKy8PeGcPphpWNN+9UBEYCa2jPe9Gn2+pnCAyvdBlQU6LTyVngbW3JhFF2b/R?= =?utf-8?q?tRX7h0j0BPK+PB5ti5To07SzUFZH1W3bWZL0k2AMkjJS4Oi8iXgYEMIvl9SdJAjFo?= =?utf-8?q?gey0Aly4cOvAOOBOSNDW2OyZeDO8HivXLha6RSn5u+piekE9xIcqDIedN8Xv+20f4?= =?utf-8?q?SSZCi+Yfi3CelmHMuR8VwcTjdHJJ4p+CyxZi/kieyuOuCWmUwtvBMY9uB8j2gxada?= =?utf-8?q?gehSIyYNx2SYREsIV0+KavBBErWFl6jXX/lYTevA/hav61AYSYhit2alJ34/UApqp?= =?utf-8?q?EsaWPfaNe4YaOeipYuhN+Ub3CsAx9fctdTdpn/GjjCgmw2L/Kfho3ezqENXNUGN5+?= =?utf-8?q?nzSrz+bxxoeyIy/QDfXrhe/goSGFqeCpTewtxr8NbM8FnbGcWhODvETmJ7hyL6A3S?= =?utf-8?q?e+BlcyZWoHZLEzQlTCAptNWFpzjwbR3gGE1AcP4hxbI6verzD90/h+/I5jOs4mKs3?= =?utf-8?q?MNaFzQLUPCX1uplsNWgZiik3hoprm+3Iq7/pchBC5qK013tne6JjiuYXR4PJCVF7Z?= =?utf-8?q?V5LwjxhjTMaRLMCAL09yVDzHCmTYe/TfsIozXG8TnOj8jrHxzc+kEbWAWHXM7utTw?= =?utf-8?q?fOO5VCPL3aTve/O/v2tAsai084GMB7R5nngiJ6A=3D=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ff5361f5-fcb6-4564-acf0-08dcbf27db12 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2024 01:48:28.1669 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEYPR06MB5208 Subject: [FFmpeg-devel] [PATCH 3/4] x86/vvcdec: inter, add optical flow avx2 code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pHZe8ZjyYQ+n BDoF used about 10%–25% of the CPU for some clips. Here are the FPS for one run; please ignore the negative values, as they may be due to round-to-round variation clips | before | after | delta --------------------------------------------|--------|-------|------ RitualDance_1920x1080_60_10_420_37_RA.266 | 310.0 | 363.0 | 14.60% NovosobornayaSquare_1920x1080.bin | 322.3 | 339.7 | 5.12% Tango2_3840x2160_60_10_420_27_LD.266 | 71.0 | 68.7 | -3.35% RitualDance_1920x1080_60_10_420_32_LD.266 | 250.0 | 245.3 | -1.92% Chimera_8bit_1080P_1000_frames.vvc | 359.3 | 422.7 | 15.00% BQTerrace_1920x1080_60_10_420_22_RA.vvc | 142.3 | 147.7 | 3.66% --- libavcodec/x86/vvc/Makefile | 1 + libavcodec/x86/vvc/vvc_of.asm | 380 +++++++++++++++++++++++++++++++ libavcodec/x86/vvc/vvcdsp_init.c | 21 ++ 3 files changed, 402 insertions(+) create mode 100644 libavcodec/x86/vvc/vvc_of.asm diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index 04f16bc10c..aa59aa59cf 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -6,5 +6,6 @@ OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ X86ASM-OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvc_alf.o \ x86/vvc/vvc_dmvr.o \ x86/vvc/vvc_mc.o \ + x86/vvc/vvc_of.o \ x86/vvc/vvc_sad.o \ x86/h26x/h2656_inter.o diff --git a/libavcodec/x86/vvc/vvc_of.asm b/libavcodec/x86/vvc/vvc_of.asm new file mode 100644 index 0000000000..b70073fa82 --- /dev/null +++ b/libavcodec/x86/vvc/vvc_of.asm @@ -0,0 +1,380 @@ +; /* +; * Provide AVX2 luma optical flow functions for VVC decoding +; * Copyright (c) 2024 Nuo Mi +; * +; * This file is part of FFmpeg. +; * +; * FFmpeg is free software; you can redistribute it and/or +; * modify it under the terms of the GNU Lesser General Public +; * License as published by the Free Software Foundation; either +; * version 2.1 of the License, or (at your option) any later version. +; * +; * FFmpeg is distributed in the hope that it will be useful, +; * but WITHOUT ANY WARRANTY; without even the implied warranty of +; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +; * Lesser General Public License for more details. +; * +; * You should have received a copy of the GNU Lesser General Public +; * License along with FFmpeg; if not, write to the Free Software +; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +; */ +%include "libavutil/x86/x86util.asm" + +%define MAX_PB_SIZE 128 +%define SRC_STRIDE (MAX_PB_SIZE * 2) +%define SRC_PS 2 ; source pixel size, sizeof(int16_t) +%define BDOF_STACK_SIZE 10 ; (4 + 1) * 2, 4 lines + the first line, *2 for h and v +%define bdof_stack_offset(line) ((line) * 2 % BDOF_STACK_SIZE * mmsize) +%define SHIFT 6 +%define SHIFT2 4 + +SECTION_RODATA 32 +pd_15 times 8 dd 15 +pd_m15 times 8 dd -15 +cextern pb_0 + +pb_shuffle_w8 times 2 db 0, 1, 0xff, 0xff, 8, 9, 0xff, 0xff, 6, 7, 0xff, 0xff, 14, 15, 0xff, 0xff +pb_shuffle_w16 times 2 db 0, 1, 0xff, 0xff, 6, 7, 0xff, 0xff, 8, 9, 0xff, 0xff, 14, 15, 0xff, 0xff +pd_perm_w16 dd 0, 2, 1, 4, 3, 6, 5, 7 +%if ARCH_X86_64 + +%if HAVE_AVX2_EXTERNAL + +SECTION .text + +INIT_YMM avx2 + +; dst = (src0 >> shift) - (src1 >> shift) +%macro DIFF 5 ; dst, src0, src1, shift, tmp + psraw %1, %2, %4 + psraw %5, %3, %4 + psubw %1, %5 +%endmacro + +%macro LOAD_GRAD_H 4 ; dst, src, off, tmp + movu %1, [%2 + %3 + 2 * SRC_PS] + movu %4, [%2 + %3] + + DIFF %1, %1, %4, SHIFT, %4 +%endmacro + +%macro SUM_GRAD 2 ;(dst/grad0, grad1) + paddw %1, %2 + psraw %1, 1 ; shift3 +%endmacro + +%macro APPLY_BDOF_MIN_BLOCK_LINE 5 ; dst, vx, vy, tmp, line_num +%define off bdof_stack_offset(%5) + pmullw %1, %2, [rsp + off + 0 * mmsize] ; vx * (gradient_h[0] - gradient_h[1]) + pmullw %4, %3, [rsp + off + 1 * mmsize] ; vy * (gradient_v[0] - gradient_v[1]) + paddw %1, [src0q + (%5 + 1) * SRC_STRIDE + SRC_PS] + paddw %4, [src1q + (%5 + 1) * SRC_STRIDE + SRC_PS] + paddsw %1, %4 ; src0[x] + src1[x] + bdof_offset + pmulhrsw %1, m11 + CLIPW %1, m9, m10 +%endmacro + +%macro SAVE_8BPC 2 ; dst, src + packuswb m%2, m%2 + vpermq m%2, m%2, q0020 + + cmp wd, 16 + je %%w16 + movq %1, xm%2 + jmp %%wend +%%w16: + movu %1, xm%2 +%%wend: +%endmacro + +%macro SAVE_16BPC 2 ; dst, src + cmp wd, 16 + je %%w16 + movu %1, xm%2 + jmp %%wend +%%w16: + movu %1, m%2 +%%wend: +%endmacro + +%macro SAVE 2 ; dst, src + cmp pixel_maxd, (1 << 8) - 1 + jne %%save_16bpc + SAVE_8BPC %1, %2 + jmp %%end +%%save_16bpc: + SAVE_16BPC %1, %2 +%%end: +%endmacro + +; [rsp + even * mmsize] are gradient_h[0] - gradient_h[1] +; [rsp + odd * mmsize] are gradient_v[0] - gradient_v[1] +%macro APPLY_BDOF_MIN_BLOCK 4 ; block_num, vx, vy, bd + vpbroadcastd m9, [pb_0] + + movd xm10, pixel_maxd + vpbroadcastw m10, xm10 + + lea tmp0d, [pixel_maxd + 1] + movd xm11, tmp0d + VPBROADCASTW m11, xm11 ;shift_4 for pmulhrsw + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 0 + SAVE [dstq + 0 * dsq], 6 + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 1 + SAVE [dstq + 1 * dsq], 6 + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 2 + SAVE [dstq + 2 * dsq], 6 + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 3 + SAVE [dstq + ds3q], 6 +%endmacro + +%macro SUM_MIN_BLOCK_W16 4 ; src/dst, shuffle, perm, tmp + pshufb %4, %1, %2 + vpermd %4, %3, %4 + paddw %1, %4 +%endmacro + +%macro SUM_MIN_BLOCK_W8 3 ; src/dst, shuffle, tmp + pshufb %3, %1, %2 + paddw %1, %3 +%endmacro + +%macro BDOF_PROF_GRAD 2 ; line_no, last_line +%assign i0 (%1 + 0) % 3 +%assign j0 (%1 + 1) % 3 +%assign k0 (%1 + 2) % 3 +%assign i1 3 + (%1 + 0) % 3 +%assign j1 3 + (%1 + 1) % 3 +%assign k1 3 + (%1 + 2) % 3 + +; we cached src0 in m0 to m2 +%define t0 m %+ i0 +%define c0 m %+ j0 +%define b0 m %+ k0 + +; we cached src1 in m3 to m5 +%define t1 m %+ i1 +%define c1 m %+ j1 +%define b1 m %+ k1 +%define ndiff t1 +%define off bdof_stack_offset(%1) + + movu b0, [src0q + (%1 + 2) * SRC_STRIDE + SRC_PS] + movu b1, [src1q + (%1 + 2) * SRC_STRIDE + SRC_PS] + + ; gradient_v[0], gradient_v[1] + DIFF m6, b0, t0, SHIFT, t0 + DIFF m7, b1, t1, SHIFT, t1 + + ; save gradient_v[0] - gradient_v[1] + psubw m10, m6, m7 + mova [rsp + off + mmsize], m10 + + ; gradient_h[0], gradient_h[1] + LOAD_GRAD_H m8, src0q, (%1 + 1) * SRC_STRIDE, t0 + LOAD_GRAD_H m9, src1q, (%1 + 1) * SRC_STRIDE, t1 + + ; save gradient_h[0] - gradient_h[1] + psubw m11, m8, m9 + mova [rsp + off], m11 + + SUM_GRAD m8, m9 ; temph + SUM_GRAD m6, m7 ; tempv + + DIFF ndiff, c1, c0, SHIFT2, t0 ; -diff + + psignw m7, ndiff, m8 ; sgxdi + psignw m9, ndiff, m6 ; sgydi + psignw m10, m8, m6 ; sgxgy + + pabsw m6, m6 ; sgy2 + pabsw m8, m8 ; sgx2 + + ; use t0, t1 as temporary buffers + cmp wd, 16 + + je %%w16 + mova t0, [pb_shuffle_w8] + SUM_MIN_BLOCK_W8 m6, t0, m11 + SUM_MIN_BLOCK_W8 m7, t0, m11 + SUM_MIN_BLOCK_W8 m8, t0, m11 + SUM_MIN_BLOCK_W8 m9, t0, m11 + SUM_MIN_BLOCK_W8 m10, t0, m11 + jmp %%wend + +%%w16: + mova t0, [pb_shuffle_w16] + mova t1, [pd_perm_w16] + SUM_MIN_BLOCK_W16 m6, t0, t1, m11 + SUM_MIN_BLOCK_W16 m7, t0, t1, m11 + SUM_MIN_BLOCK_W16 m8, t0, t1, m11 + SUM_MIN_BLOCK_W16 m9, t0, t1, m11 + SUM_MIN_BLOCK_W16 m10, t0, t1, m11 + +%%wend: + phaddw m8, m7 + phaddw m6, m9 + phaddw m8, m6 ; 2 sgx2, 2 sgxdi, sgy2, 2 sgydi, 2 sgx2, 2 sgxdi, 2 sgy2, 2 sgydi + +%if (%1) == 0 || (%2) + ; pad for top and bottom + paddw m8, m8 + paddw m10, m10 +%endif + + paddw m12, m8 + paddw m13, m10 +%endmacro + + +%macro LOG2 5; log_sum, src, cmp, shift, tmp + pcmpgtw %5, %2, %3 + pandd %5, %4 + paddw %1, %5 + + psrlw %2, %5 + psrlw %4, 1 + psrlw %3, %4 +%endmacro + +%macro LOG2 3; dst0/src0, dst1/src, offset + pextrd tmp0d, xm%1, %3 + pextrd tmp1d, xm%2, %3 + bsr tmp0d, tmp0d + bsr tmp1d, tmp1d + pinsrd xm%1, tmp0d, %3 + pinsrd xm%2, tmp1d, %3 +%endmacro + +%macro LOG2 2; dst/src, tmp + vperm2i128 m%2, m%1, m%1, 1 + LOG2 %1, %2, 0 + LOG2 %1, %2, 1 + LOG2 %1, %2, 2 + LOG2 %1, %2, 3 + vperm2i128 m%1, m%1, m%2, q0200 +%endmacro + +; %1: 2 sgx2, 2 sgxdi, sgy2, 2 sgydi, 2 sgx2, 2 sgxdi, 2 sgy2, 2 sgydi +; %2: 4 4sgxgy +%macro BDOF_VX_VY 2 ; + pmovsxwd m6, xm%1 + vextracti128 xm7, m%1, 1 + pmovsxwd m7, xm7 + punpcklqdq m8, m6, m7 ; 4 sgx2, 4 sgy2 + punpckhqdq m9, m6, m7 ; 4 sgxdi, 4 sgydi + mova m10, m8 + + LOG2 10, 11 + psignd m11, m9, m8 + pslld m11, 2 + vpsravd m11, m11, m10 + CLIPD m11, [pd_m15], [pd_15] ; 4 vx + + pxor m6, m6 + phaddw m%2, m6 + phaddw m%2, m6 + vpermq m%2, m%2, q0020 + pshufd m%2, m%2, q1120 + pmovsxwd m%2, xmm%2 ; 4 sgxgy + + pmulld m%2, m11 ; 4 vx * sgxgy + psrad m%2, 1 + + pslld m9, 2 + vextracti128 xmm9, m9, 1 ; 4 (sgydi << 2) + psubd m9, m%2 ; 4 ((sgydi << 2) - (vx * sgxgy >> 1)) + + vextracti128 xmm8, m8, 1 ; 4 sgy2 + psignd m9, m8 + + vextracti128 xmm10, m10, 1 ; 4 log2(sgy2) + vpsravd m9, m9, m10 + CLIPD m9, [pd_m15], [pd_15] ; 4 vy + + vpermq m%1, m11, q1100 + pshuflw m%1, m%1, q2200 + vpunpckldq m%1, m%1 ; 4 x 4vx + + vpermq m%2, m9, q1100 + pshuflw m%2, m%2, q2200 + vpunpckldq m%2, m%2 ; 4 x 4vy +%endmacro + + +%macro BDOF_MINI_BLOCKS 2 ; (block_num, last_block) + +%if (%1) == 0 + movu m0, [src0q + 0 * SRC_STRIDE + SRC_PS] + movu m1, [src0q + 1 * SRC_STRIDE + SRC_PS] + movu m3, [src1q + 0 * SRC_STRIDE + SRC_PS] + movu m4, [src1q + 1 * SRC_STRIDE + SRC_PS] + + pxor m12, m12 + pxor m13, m13 + + BDOF_PROF_GRAD 0, 0 +%endif + + mova m14, m12 + mova m15, m13 + + pxor m12, m12 + pxor m13, m13 + BDOF_PROF_GRAD %1 * 4 + 1, 0 + BDOF_PROF_GRAD %1 * 4 + 2, 0 + paddw m14, m12 + paddw m15, m13 + + pxor m12, m12 + pxor m13, m13 + BDOF_PROF_GRAD %1 * 4 + 3, %2 +%if (%2) == 0 + BDOF_PROF_GRAD %1 * 4 + 4, 0 +%endif + paddw m14, m12 + paddw m15, m13 + + BDOF_VX_VY 14, 15 + APPLY_BDOF_MIN_BLOCK %1, m14, m15, bd + lea dstq, [dstq + 4 * dsq] +%endmacro + +;void ff_vvc_apply_bdof_%1(uint8_t *dst, const ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, +; const int w, const int h, const int int pixel_max) +%macro BDOF_AVX2 0 +cglobal vvc_apply_bdof, 7, 10, 16, BDOF_STACK_SIZE*32, dst, ds, src0, src1, w, h, pixel_max, ds3, tmp0, tmp1 + + lea ds3q, [dsq * 3] + sub src0q, SRC_STRIDE + SRC_PS + sub src1q, SRC_STRIDE + SRC_PS + + BDOF_MINI_BLOCKS 0, 0 + + cmp hd, 16 + je .h16 + BDOF_MINI_BLOCKS 1, 1 + jmp .end + +.h16: + BDOF_MINI_BLOCKS 1, 0 + BDOF_MINI_BLOCKS 2, 0 + BDOF_MINI_BLOCKS 3, 1 + +.end: + RET +%endmacro + +%macro VVC_OF_AVX2 0 + BDOF_AVX2 +%endmacro + +VVC_OF_AVX2 + +%endif ; HAVE_AVX2_EXTERNAL + +%endif ; ARCH_X86_64 diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index d5b4f4f8a5..f3e2e3a27b 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -102,6 +102,20 @@ DMVR_PROTOTYPES( 8, avx2) DMVR_PROTOTYPES(10, avx2) DMVR_PROTOTYPES(12, avx2) +void ff_vvc_apply_bdof_avx2(uint8_t *dst, ptrdiff_t dst_stride, \ + const int16_t *src0, const int16_t *src1, int w, int h, int pixel_max); \ + +#define OF_PROTOTYPES(bd, opt) \ +static void ff_vvc_apply_bdof_##bd##_##opt(uint8_t *dst, ptrdiff_t dst_stride, \ + const int16_t *src0, const int16_t *src1, int w, int h) \ +{ \ + ff_vvc_apply_bdof##_##opt(dst, dst_stride, src0, src1, w, h, (1 << bd) - 1); \ +} \ + +OF_PROTOTYPES( 8, avx2) +OF_PROTOTYPES(10, avx2) +OF_PROTOTYPES(12, avx2) + #define ALF_BPC_PROTOTYPES(bpc, opt) \ void BF(ff_vvc_alf_filter_luma, bpc, opt)(uint8_t *dst, ptrdiff_t dst_stride, \ const uint8_t *src, ptrdiff_t src_stride, ptrdiff_t width, ptrdiff_t height, \ @@ -328,6 +342,10 @@ ALF_FUNCS(16, 12, avx2) c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_##bd##_avx2; \ } while (0) +#define OF_INIT(bd) do { \ + c->inter.apply_bdof = ff_vvc_apply_bdof_##bd##_avx2; \ +} while (0) + #define ALF_INIT(bd) do { \ c->alf.filter[LUMA] = ff_vvc_alf_filter_luma_##bd##_avx2; \ c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ @@ -352,6 +370,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); + OF_INIT(8); DMVR_INIT(8); SAD_INIT(); } @@ -365,6 +384,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); + OF_INIT(10); DMVR_INIT(10); SAD_INIT(); } @@ -378,6 +398,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); + OF_INIT(12); DMVR_INIT(12); SAD_INIT(); } From patchwork Sun Aug 18 01:48:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51063 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b6ca:0:b0:48e:c0f8:d0de with SMTP id s10csp1228980vqj; Sat, 17 Aug 2024 18:49:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV+4iX+2pLb/BLc0MnD+vAi6DknjK1RHmXwf55cSqKWbOTxfdDN5QknVkr3glEXDsm+e5KGtKBZ6NEK9T84KkNx@gmail.com X-Google-Smtp-Source: AGHT+IEtufK80a1rg9JR3V6oCjYqld3q7xdG11ydvUapUylc0CDLqNt/vUYUmsyiUrL7J3kVA6G8 X-Received: by 2002:a05:651c:1549:b0:2f1:5ae3:7141 with SMTP id 38308e7fff4ca-2f3be59466amr26584371fa.3.1723945754246; Sat, 17 Aug 2024 18:49:14 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f3b771b34asi17289701fa.532.2024.08.17.18.49.13; Sat, 17 Aug 2024 18:49:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=gUCSM+7h; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BFDBB68DD83; Sun, 18 Aug 2024 04:48:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01olkn2023.outbound.protection.outlook.com [40.92.52.23]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A2CAC68DD72 for ; Sun, 18 Aug 2024 04:48:45 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=J7+qwZXXxQ/H6QzKBXJisFshuy5VsmW+7xqHTQCJZ4yFNOZiPZ3qSKm/Y9IAu2m6/P8xYi6k26vbGMyJuPYvgfo4QQjFl5vjd/Pa6uOn6A3n0rJ+6RLHS+nJ8dDPOS0PdXAP3SMFLY/SVorbNm7UtNAxYR+lRFuyI5PMC0PruIyPOYg4icu38uV+YIslwNRFHOcCcReWH4c6t1o2/wIQKeWvIixjz2lipZUIYhBNqhPJpLaiXxB+3hqq2mgiE2Ro+Io1bwOsAIB4PQWtcDDgzmXMMgI+1HBKM4SLs80a9yTe1UwjlV3YfidSi+LEUpwXMY7s0l5ehSND0l03fglhQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=A7f2w7cHkgq8UJMIX1VNrRe1Wkhyg0bmnCCB5pDBazU=; b=GXGlpFZYrL6RPiHnY8c08guoNgEUswmWVE19OtfR6ECYHFf06SoBRq1PMIPquKNstbqJDLz6HjQ3a9GdII7XflI8gyrKtdBwnFUhD1mgSzP+Ja+v9Af9bT0UwSd6GcWP0qUnG8LMsLrFkwbZekoKu+nZe8QrXGoznqHlHnSctlJ8/XeVloWW8Mw7ScomCTS0//AeFpfylOrmNBdXay8Ioz5ZTeNVAWXrmdw8LBz6PYxHh9Lt+mzG4+VSq7ckZdHBcP30ivTAh5wGamEE1qu6jJgTMB5XTu5U5rJLD/rDeXOzGfvZJm44redqQMc6we9XnmcTqovStXUDOJvSxP015g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=A7f2w7cHkgq8UJMIX1VNrRe1Wkhyg0bmnCCB5pDBazU=; b=gUCSM+7hYv7PknfuoxfLOud2n/wtpEvGeTUbcuTIz8YqNgINCylug1Fvjh/Fj51DG+Cr7MPknHVqBfIhHwrChpkp8diz0nNT6SOtntUOyOXIiFdY4rcV9qR8xElDHxEek6Dda7aWu8GHU9CisMqAHVoXfGiRSMgFl9lTZO62A3R7fyBw1tttI6k9EgSHUNiDMpeOlcvfqJK6GVsvR0UDJFRprPcP1fJ4JzHPqUtXoKhFcKrzg18+2FVqPxTmxv4O8nTficfhWzsPCAwrXV7U6eBWWuSGl4psoHhLwXP9sdMKKUyf5ZUALIdGoXa2V06AllNZrh+/Glxq/mLulM/oWw== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by SEYPR06MB5208.apcprd06.prod.outlook.com (2603:1096:101:88::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.20; Sun, 18 Aug 2024 01:48:30 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Sun, 18 Aug 2024 01:48:30 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Aug 2024 09:48:07 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240818014807.47423-1-nuomi2021@gmail.com> References: <20240818014807.47423-1-nuomi2021@gmail.com> X-TMN: [y9+pyJGm9gJq9wJPutiwCVfeaOSdpbPt] X-ClientProxiedBy: TY2PR0101CA0034.apcprd01.prod.exchangelabs.com (2603:1096:404:8000::20) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240818014807.47423-4-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|SEYPR06MB5208:EE_ X-MS-Office365-Filtering-Correlation-Id: 87fd0a23-e9d8-451b-353f-08dcbf27db77 X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599009|19110799003|15080799003|8060799006|461199028|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: QQxBH34TETnkVA9OBY8vmqI8q4ycTSmT6UC4XgFc/QAotvHSO0CLTdyP4HPYqcx48McEb4Rox0YIqcDR3xXa+zRHRhq8VSP8GOEuxc27RwxUlW2Qb1us6ivwkI/PoMlg9FtSpZIRG9BWsMaqcEOgzQkquum4C+CqNGBWkwOvu425mWrcTSVAS5dHMYAGXEHxipz1Cq9hAHE0ycd7wiXgEoR4ACvhZ3KJXftgDuhkm3M/JTvv7MDnuVx8WsCMG+NzewHOA2E7yqkNoAk5mPGBjCUpVBBR/DQgluNP1iBuwY8mWc/y2rRTDafYPlbCvfRQ1TH2tn3FoxmuRx6IV9rPwa+Md1gnuZbIjTeKvgSorHD8dTSh7kI4sfmyHydJuvcn/nboPamar1CGnXMNgBo9dWa/XzhSMxClSZIA2xmDPrGI9w94Zwrs6o7D9lJVCO9YidNCRLvZVijsZ793UjJ7do9N3uM5nevkUqDKdFnEzCoT+JV2F36g9wJyfC8QL98Vy2gz13/J/WmaiWsdexL5f6DkiCUY3iroyZQW7H2T6vNZxlYB0Wskmu6f32RZvFnXgdqhQd0H48AdK776NL/W4+2QPcNbqqLlbulWSUO/XV8d/2n6JqeEQpwzOY36sGG6CLqXtptnLeT929Yh0qR/TfKr3AGMFTweyjT2XJT7d5APkGrY3iTajL2tHV4i2AjmBImHl9ozqk34P4JCIdByNFbSb4IRU8YiVrB48LDXt9Y= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: w6bEp8J2mqhtk9/uaKkE8qYKJFmQqEE8SpWoNfmSUKyjBisFZMH6RMbIZvKPust8oM61ywCnY+fsG065gpLyj1efrMqbzSHAWfSwB7uFhhmfXzytdIlbXdBvjRmxqjNVBCu9vmaNUpt41jN0H7tMLzh4IdsvctYiXXCHdf18zyLEfhaWRio0sfFIay6VJpYKGoe0TP0W1hl0FNNZ/Ge/JAMCXG5tSVWdUKkTZ/7qsOkUjKKRK3UampArqjpL+K2uuT/zpn9nB7PvrEQCxRdyQ0vqHFe3AwhfA+1RYsHQHYnIQ9sH3K+2jGcDoOJTJbdE8UwGeKido5tIsXmJcysb1097jdIfkBDOHuBZ06x+h7nAglP9MB9voXdgKDRpvON57uM4E4JJqwX4xyvESiQ7LLIc59hndU+Le5VI/SjUf76ozx5DKl6SjPE+40+8w/TKBdplLeOQ2S7BIw6vNyoubfnaGV8Vj+B/ryZ1+TuAqG7JxBPOeScCESBUP9cs/1E0jrd1z6Xp1w4g8KxPmoRLRJTz5BoonFYdkg+TuF0rg/m/dXyKKEeGZZTrB/6x7MMNLt3DFehtNHKS5WIXce4WAfln0U8TxSX828uCFbERmPwQAb7+qPymny23WPUXpXj3C4WhsTone+AYL6jspWSdAiWEOVyhFateUW/OkJq7YR7mSkmXMRWwaahFMEz4D8uu57age/nGs2xTiuYrLTSoTjZ0Sck/xfDp3ZOPyWhTp/V3Uyd0hvg9KHn5+8Lj4m0WxLLSGutCskmGwdrPNl39WMgT686R4cHVIotOlAl353A3r3O3EtZmbAcFQ0435gOMtha9LgObCg5K/Q2IlW3anQKvSF8223v7LeEtKFoTcyjtuZVXmKBsInfUfyd7cY8LFoIpehSIwJqR/L/AnIUF+k6CCYitlTRlPCPh7AbEYnJvXKDqTh2+MZDJn4X1xpWsV6fgp6tAYxSNkrwKUljg4LQc/IGVA6cgYkdtoYEfjMShKs3V6k9loTA3B/qVJ4IPrz2GAwv6OFQ6tqucvwXrj4KOndax/O6dJJn6g+7Yh7vbbpnR4UcapZe1KR6uh8UmSY/abOOTm6b0iy4i5ASnRXxl895HZCvRnSV/ZB6DtUc6d5HufuOqQVfO3oOBtNQOxhMDdCfLghGJG49RHI53MzHLqUtPp83nfC6V25+qFBMIhoxTxW4WrNu49kNTFDnNfjRmUC46mK6Snwzwj2T2xqiVv3QSlm+sHvMmfhj1xX1HngSxORGSpW/MOMw+PIQzXeGw4skaclIBIRnCoWh4Bg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 87fd0a23-e9d8-451b-353f-08dcbf27db77 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2024 01:48:28.8336 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEYPR06MB5208 Subject: [FFmpeg-devel] [PATCH 4/4] checkasm: add vvc_bdof test X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 01N2IQIntwBt apply_bdof_8_8x16_c: 5718.7 apply_bdof_8_8x16_avx2: 1029.9 apply_bdof_8_16x8_c: 5669.4 apply_bdof_8_16x8_avx2: 592.2 apply_bdof_8_16x16_c: 11313.4 apply_bdof_8_16x16_avx2: 1211.9 apply_bdof_10_8x16_c: 6295.7 apply_bdof_10_8x16_avx2: 1019.9 apply_bdof_10_16x8_c: 5548.2 apply_bdof_10_16x8_avx2: 580.9 apply_bdof_10_16x16_c: 11199.2 apply_bdof_10_16x16_avx2: 1154.2 apply_bdof_12_8x16_c: 5594.2 apply_bdof_12_8x16_avx2: 1018.2 apply_bdof_12_16x8_c: 5548.4 apply_bdof_12_16x8_avx2: 582.9 apply_bdof_12_16x16_c: 11016.7 apply_bdof_12_16x16_avx2: 1158.2 --- tests/checkasm/vvc_mc.c | 50 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c index 62fa6aa7d0..754cf19065 100644 --- a/tests/checkasm/vvc_mc.c +++ b/tests/checkasm/vvc_mc.c @@ -64,6 +64,14 @@ static const int sizes[] = { 2, 4, 8, 16, 32, 64, 128 }; randomize_buffers(buf0, buf1, size, mask); \ } while (0) +#define randomize_prof_src(buf0, buf1, size) \ + do { \ + const int shift = 14 - bit_depth; \ + const int mask16 = 0x3fff >> shift << shift; \ + uint32_t mask = (mask16 << 16) | mask16; \ + randomize_buffers(buf0, buf1, size, mask); \ + } while (0) + static void check_put_vvc_luma(void) { LOCAL_ALIGNED_32(int16_t, dst0, [DST_BUF_SIZE / 2]); @@ -382,6 +390,47 @@ static void check_dmvr(void) report("dmvr"); } +#define BDOF_BLOCK_SIZE 16 +#define BDOF_SRC_SIZE (MAX_PB_SIZE* (BDOF_BLOCK_SIZE + 2)) +#define BDOF_SRC_OFFSET (MAX_PB_SIZE + 1) +#define BDOF_DST_SIZE (BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE * 2) +static void check_bdof(void) +{ + LOCAL_ALIGNED_32(uint8_t, dst0, [BDOF_DST_SIZE]); + LOCAL_ALIGNED_32(uint8_t, dst1, [BDOF_DST_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src00, [BDOF_SRC_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src01, [BDOF_SRC_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src10, [BDOF_SRC_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src11, [BDOF_SRC_SIZE]); + + VVCDSPContext c; + declare_func(void, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, int block_w, int block_h); + + for (int bit_depth = 8; bit_depth <= 12; bit_depth += 2) { + const int dst_stride = BDOF_BLOCK_SIZE * SIZEOF_PIXEL; + + ff_vvc_dsp_init(&c, bit_depth); + randomize_prof_src(src00, src10, BDOF_SRC_SIZE); + randomize_prof_src(src01, src11, BDOF_SRC_SIZE); + for (int h = 8; h <= 16; h *= 2) { + for (int w = 8; w <= 16; w *= 2) { + if (w * h < 128) + continue; + if (check_func(c.inter.apply_bdof, "apply_bdof_%d_%dx%d", bit_depth, w, h)) { + memset(dst0, 0, BDOF_DST_SIZE); + memset(dst1, 0, BDOF_DST_SIZE); + call_ref(dst0, dst_stride, src00 + BDOF_SRC_OFFSET, src01 + BDOF_SRC_OFFSET, w, h); + call_new(dst1, dst_stride, src10 + BDOF_SRC_OFFSET, src11 + BDOF_SRC_OFFSET, w, h); + if (memcmp(dst0, dst1, BDOF_DST_SIZE)) + fail(); + bench_new(dst0, dst_stride, src00 + BDOF_SRC_OFFSET, src01 + BDOF_SRC_OFFSET, w, h); + } + } + } + } + report("apply_bdof"); +} + static void check_vvc_sad(void) { const int bit_depth = 10; @@ -422,6 +471,7 @@ static void check_vvc_sad(void) void checkasm_check_vvc_mc(void) { check_dmvr(); + check_bdof(); check_vvc_sad(); check_put_vvc_luma(); check_put_vvc_luma_uni();