From patchwork Tue Aug 20 13:22:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51092 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:40de:b0:48e:c0f8:d0de with SMTP id lb30csp320312vqb; Tue, 20 Aug 2024 06:41:20 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUP/uwj6x25v+3XZ/WYteo2aYwroZ4DfPxRssJQ4OFn1olnx9hdT9ps7YSSkvz53ViBoY/jYRxGurWvZwT9fFs6@gmail.com X-Google-Smtp-Source: AGHT+IESjg6KHxTB2SpT4/C4V0lROLYai4gq/H81Pycq+boaYeCbkc9QKayUTygQMwTKJ/8RkzMI X-Received: by 2002:a05:6512:ad2:b0:52c:de29:9ff with SMTP id 2adb3069b0e04-5331c692a14mr10331017e87.2.1724161279737; Tue, 20 Aug 2024 06:41:19 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a838394b34csi688064566b.564.2024.08.20.06.41.19; Tue, 20 Aug 2024 06:41:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=KaO6Lij9; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8F70168DD34; Tue, 20 Aug 2024 16:23:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01olkn2095.outbound.protection.outlook.com [40.92.53.95]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3737268DC2C for ; Tue, 20 Aug 2024 16:23:00 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=t1sDDlkv9wQ24VmT3ruxUR3LLxFDUs9Q0JQTrWWOBCORWR3v55EQHjmpCa0CIp4yFjHtE1k9kUI3LtkysAtpx77ZyPaXEW1j0/DJITH7HP7qJBNR27BeBhjwNNLMEFP6GxT4UjYucWTbLY2dQveedU1Vk/4BApsQRVhXHelCHknELtfa0mtLWcmSegaC9b6DOeR2Xv9gfVWxjpwDHzUsadDiTelcBAug8Zn94CF+VVZmi6WoAEZuT3Ttu8ycGnuDw+boky1GC6thF3xGyxrXiXptxmMkowHd1TyWaHBGaNguSCLQYb+7413Xjn9OGCwIDNbS8nD1gx9oZlzTR6Tp8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=s6BwlZOp63bf6o3hqFwgxSlZ/r5zML6jTzpk6Nbk9r8=; b=LSjP1LCTYzQFCV+UufDvZF1kjyEmEVwZYZlPSSjAWkEG8U6fnsJ/sObiD5rK9HA4AAM7ktcxKmue7XdunMj+d+u9cWDI7TpAbWFz/DIp8B6lT85tMyX7d8we+4ToC5YirTDYyC8KEtR79tc7Lw9ytQlsBEgMDUgKSQZ46Z9iIXK/mGZfcLoS5vTbvckw8DOsW5RpnaEyAP/SXl6l9qyHMReW41/AyEHDIHq6Rqb+9p3WJCTBzAWwf3w9zZ2zrFGFN3Ob7v9BigfH7IPaxZzJjHFepQ38Yolffi1ZafZTviLHAo24NZx+NBX1LBXSS28rNMrXPU5jz+x9UGTBrbk4GQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=s6BwlZOp63bf6o3hqFwgxSlZ/r5zML6jTzpk6Nbk9r8=; b=KaO6Lij93SwfcEsd+De0gL2lfT/mB6e3V3p8rI0hoCYm+aGwpKyMz2hoX3IhuakmCd+aqQMiEb0hCN7PScM5esd9lsoP3sENFTatk039Mu3DdYVmQW2jWfVUgyL0UX4vtpePDeFNloTwRKcIl2I8vEFghU2lC9xvnPKuv6YvAcUn6rD0PGzmMQomg08ptZId33rvVlXYzcNRapsD8yAR8ZEmdabZ8qdHI8v491Uozfua8WBuG7dHh7J6+PHIjAARUW9IL7l5ZIwQM3upE0EOoQKgbjVcxwrdZ5H5HVlklejJ3F+YqyyIerEjXx8jsHYCOVV2yNUiuySz9ZY1a/yZZg== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by KL1PR06MB6623.apcprd06.prod.outlook.com (2603:1096:820:f9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.21; Tue, 20 Aug 2024 13:22:48 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Tue, 20 Aug 2024 13:22:48 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Aug 2024 21:22:33 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820132236.286553-1-nuomi2021@gmail.com> References: <20240820132236.286553-1-nuomi2021@gmail.com> X-TMN: [S4Z8Pxzs9BCOzfRCI0ueYKlURNh80Jm4] X-ClientProxiedBy: TYCP301CA0028.JPNP301.PROD.OUTLOOK.COM (2603:1096:400:381::11) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240820132236.286553-2-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|KL1PR06MB6623:EE_ X-MS-Office365-Filtering-Correlation-Id: 76a2427b-6e94-471a-75e7-08dcc11b2f12 X-Microsoft-Antispam: BCL:0; ARA:14566002|19110799003|461199028|15080799003|5072599009|8060799006|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: JQ0Y765NZ2DmsXb++pnaXo8WGsQT8PSQ2ukt2Jeb88IrJRxRgeWc9YeW/2WHDJRfRjqkrJlbsp3QdlPWIxHtxCo/tsKiq4zaRckk6n1aLltPHyDUpA1vfEP5qdYps0jLWReiM4+EEgG4+Wl02DJvmfQbUx19ZgUzHo//sYE1PVLrkbiOY2lBcVWDRmj/RyqjtR8BB4n1J+LbecWmixqUVwqDzzPZvRFS/Es4wbTiyDMj9XRIig/TwJWp44BdjW+UIL03TFDkvgQbgnwR6NXk8oE07phY+B7QtKEBUttYWSSahtfL0akl0T7jtV3nGPsFsN/gIu7ocvCzTh5sHyVEk7h5PMEg6wGgWVot7BVnnLxhceerkOQQK2lFmLtBBfb4zN+6wqm2i/MGzFVSFHqOiznMHJ8HPC+2Fj/MAaSDtzUdRIwI+8m35rZq5KXbMqqXImxPnoLXkcEaoTzrtdWMhmgD8aZnsTFgRW+8hK5NYlXI9EUaERJEcGOG/UxrBmxUFEAZjz67BSqtlXjphcRuMqsTNGzSL6l90Zmfhoq/gfsZnq+5XVz6u9LsWfYPAeCXU4bEs6DBMKKoiFtybtvyeU8kn7vssEA1Km2Z0geTvDi8DvPVzr3pL6Hl5gvCDV5Ky2LV+pDyhxVBuyU/zQweqJuGOf7kDRXqzD/fPFVMXr5jGYAxqRimULJmntLR4IQykF+aT6QZUmEauXMKnKzOnu6mLqH1cyD8bGcnPkCu1ZA= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: IAfFwr7n+mOuAv8jU53b0m7UrqUQGbpCi8DTc74fD2E0saZ4FtoDVbopnIAI+sGYtQD9ze2IZwI6pMvMHbOUIA4MOju1IozRHFlHjEZrryQZui6B6UJr1OFKGL8XtXF9BedZkDEjwrRDK49nzwHjJzScL3+0aOtmtSHB/kho6IFJoePys49JsQ8AUIPFI2JBKvjKjtq7C7yOAJXPTdA2rWrOQQ16VRXPEtl7C9WapyoMuHoTNlttUpecYnLP6HkjUJFXrUhnV5owbRb7Q7rTsuTG7r3RPWqcrIWt+XCqdcuOlMgEyffk6l3c/Oir1kpT9mpU762Zg6OUJdUldp13r6mPxEadp8S97UoVwPKHRiZ29fQM6/LhfMN/Yvif+9obiTCZ5pkzA2qJUDeqRCy3OsclkcnZc4+Tgbpmt7AT3sOLGzMlaEDid4GQReJa9wQ/Ia72fFer4ukup4HblYfPOTE/QTS7f8G7mkLe4+6MUy4CSd7x6wZYkpoO8pOM7lxDs7wa9hOiPWcwdamJ8VYn6OA5ts51UdTQL3153LXRs9RlpLO4RVZCB/Mq+bqwX3tFgtDor+mcuDHrCofPSTWrqdy81xvLEQjAdKs6CAg79QGSnwkJeOkPu+TwCchyq1kpgodsGq4lYQnTgadBb9GaqdGB0lfFT59BI74MezmVPbhZrGJS+OLTrRuODcCvPqJfp15YV2JlaZeDgK9dnYGvf0mCPdw9EWXRKS0KJSBBBg4LzH2ajGay4lMjBMr3Po0yqMlIforOk9Ji4JtiEVQr07EvUWumokiGYxfVlUpi6aU2PlO+CbHghf2XC0ypEsMFIJOYJhNQkToPPYWwPwTNZKzfoTyL5uEDKE3HkchWKrFsqafBYVL2IpAPS+3lWftfd4ben4mBayEJkwKoICaU8dMc2Xmklm7BUmMVp0bodFJil5PoafR4G3Gni5pSGKBLD2V3jvalwRb/hopVEDnOSzYv9zXlvmbjdHMCD666wFP9oPEGNMWouIBOuXZpZm6KHj1mrHBgkg0oPVKeWHay/4GyLPdzxwTcmPTBD5nxpyCFZSAPV1J0CCyHD4nD1XydlmJJOht4Ksuecw4R1DNHaI+QpjfDcnNC0Wuf1DH+5KF0n1u4sD23kl1dqNi3Y5sJQDq8pk0lU51Xm6ZSp6zWaE7Uk0+fEi9dFOpCKx4xyV+iDSjqF9cH9jowfhuK35V2TkvuTIR14b4H6Ckg3OnhdCHxdav/IzyxPmXpkWPBip0ZhKvmWOkTGH6qLayvF0Gdz/FmrQY7nEoUscw2Ri9kGg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 76a2427b-6e94-471a-75e7-08dcc11b2f12 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Aug 2024 13:22:48.0316 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: KL1PR06MB6623 Subject: [FFmpeg-devel] [PATCH v2 1/4] avcodec/vvcdec: misc, rename BDOF_BLOCK_SIZE to BDOF_MIN_BLOCK_SIZE X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 4e9Lm/gUNAph --- libavcodec/vvc/dsp.c | 4 ++-- libavcodec/vvc/inter_template.c | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c index 648d54ebb2..7463d8c9de 100644 --- a/libavcodec/vvc/dsp.c +++ b/libavcodec/vvc/dsp.c @@ -80,8 +80,8 @@ typedef struct IntraEdgeParams { #define BDOF_BORDER_EXT 1 #define BDOF_PADDED_SIZE (16 + BDOF_BORDER_EXT * 2) -#define BDOF_BLOCK_SIZE 4 -#define BDOF_GRADIENT_SIZE (BDOF_BLOCK_SIZE + BDOF_BORDER_EXT * 2) +#define BDOF_MIN_BLOCK_SIZE 4 +#define BDOF_GRADIENT_SIZE (BDOF_MIN_BLOCK_SIZE + BDOF_BORDER_EXT * 2) #define BIT_DEPTH 8 #include "dsp_template.c" diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index afcee2e360..0f1712e337 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -433,8 +433,8 @@ static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, c const int16_t* gh[] = { gradient_h[0] + 1 + BDOF_PADDED_SIZE, gradient_h[1] + 1 + BDOF_PADDED_SIZE }; const int16_t* gv[] = { gradient_v[0] + 1 + BDOF_PADDED_SIZE, gradient_v[1] + 1 + BDOF_PADDED_SIZE }; - for (int y = 0; y < BDOF_BLOCK_SIZE; y++) { - for (int x = 0; x < BDOF_BLOCK_SIZE; x++) { + for (int y = 0; y < BDOF_MIN_BLOCK_SIZE; y++) { + for (int x = 0; x < BDOF_MIN_BLOCK_SIZE; x++) { const int idx = y * BDOF_PADDED_SIZE + x; const int bdof_offset = vx * (gh[0][idx] - gh[1][idx]) + vy * (gv[0][idx] - gv[1][idx]); dst[x] = av_clip_pixel((src0[x] + offset4 + src1[x] + bdof_offset) >> shift4); @@ -461,8 +461,8 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t _src1, MAX_PB_SIZE, block_w, block_h, 1); pad_int16(_src1, MAX_PB_SIZE, block_w, block_h); - for (int y = 0; y < block_h; y += BDOF_BLOCK_SIZE) { - for (int x = 0; x < block_w; x += BDOF_BLOCK_SIZE) { + for (int y = 0; y < block_h; y += BDOF_MIN_BLOCK_SIZE) { + for (int x = 0; x < block_w; x += BDOF_MIN_BLOCK_SIZE) { const int16_t* src0 = _src0 + y * MAX_PB_SIZE + x; const int16_t* src1 = _src1 + y * MAX_PB_SIZE + x; pixel *d = dst + x; @@ -472,7 +472,7 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t FUNC(derive_bdof_vx_vy)(src0, src1, gh, gv, BDOF_PADDED_SIZE, &vx, &vy); FUNC(apply_bdof_min_block)(d, dst_stride, src0, src1, gh, gv, vx, vy); } - dst += BDOF_BLOCK_SIZE * dst_stride; + dst += BDOF_MIN_BLOCK_SIZE * dst_stride; } } From patchwork Tue Aug 20 13:22:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51093 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:40de:b0:48e:c0f8:d0de with SMTP id lb30csp363822vqb; Tue, 20 Aug 2024 07:41:18 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV/FtsOnD4qy+wRTw73h8LHUXOueQL9//3cA6S4iageUf1QY1CMKfPRNOhLaDwfawCJq/nKTXzcLgGeW3wUERdk@gmail.com X-Google-Smtp-Source: AGHT+IFb13SDnX7jvseNP+I/GFdzGxKFHiL226ENstCQwK0iZTARK6mW2UerA771+swBJuH8CRVa X-Received: by 2002:a17:907:f1d9:b0:a7d:a080:bb3 with SMTP id a640c23a62f3a-a83929544c6mr996082666b.33.1724164878267; Tue, 20 Aug 2024 07:41:18 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a838394a88csi711181866b.572.2024.08.20.07.41.17; Tue, 20 Aug 2024 07:41:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=YAKiLoNQ; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EABBF68DD95; Tue, 20 Aug 2024 16:23:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01olkn2095.outbound.protection.outlook.com [40.92.53.95]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C8F0B68DC60 for ; Tue, 20 Aug 2024 16:23:05 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MFB55FaMyMikQTdTuS1DJcn33R5zTDQ8lwKuMWhTxbm2IK7P7MZ7hrPzvRJek1ViB4gLHYAMk5T0jlvMwSyI+IhEcGu2PGPF3dlfoIRAq3PSq6wkxBZ+l+75j3DhfPNuMPoNhZPX4tHiof4xqM66nlPilVhiM8alllB4YK9/7GKuBqHU8yE+GG+91FWcMjmfj53fwnK4JSMCkstkXDglmTlpQd4p90KuRLJiuuE8OGMjQwezFHvBL4KOE7F1oapdcSLw0Nb7QRBy9eCvNsf7ZkG//cn+7PjUxEmAHyHnE0f7zaE++957D9pf6vRax8IDs5j2pLQZ29akw/3MRdgEDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nN+A0pgptZvZqgIzuLUfynHmpdCaB+E0pA/4i7rIvHM=; b=gMBTqigAPFyvFc6X9nTkl1FYvz3hDVcebBYKPHqfkyEuQilEinvpD9hVE25HJxwQRqTBggGVbyCq78WPm/GMx0islzo5AQoCBLyA9eCOD6/LbMPDzA/wPHbIf4vW/3rLxRIiY37lSoATXafNWBcmFOV6Jmb3l4MTgRaOMV0fRZUdxQ4ceutirAEYXne4FqbdKgnNms3xH9IKhosLSIbJrFj+QCo0xQq+VN3qBbl4rxTPJH6LCqTvYRxGb6wyLwfURO9KUPIrN4ajHfePDKmElIgTuaPpMiymh6RzI6myOXArSMBnFeOst4fpJxePcOQpEHNe67HQddHpmZFtfZFkdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nN+A0pgptZvZqgIzuLUfynHmpdCaB+E0pA/4i7rIvHM=; b=YAKiLoNQBhU0WwuEZHaU6YZCJh7XY2CzTrV39+wT2BYguTViKpVhTrOF+N6DhmeEKemW/fIcSHI1JVQe3BHFacW2bL8ZVAgXGnkiwjmXp48+hpiEwHDP7l8+iHueH8kNfvOc0GHLZp1RzYQz9ZNrpLewndN2e6YNds6YXFq3x+JMuAvmBMIP7npZ3u/8g5JDFeNR+uwdpfat8wdzlw/5sMx2aVSU9MpSA6yTPAlt3xV1W4Qf7mabXKdQIkD9AZZ1OYSdtFA2LvEbuLvkswNpQ1+niB6EqpFvnycq5vyRjCXvUfggXICZ2gMhhQto5uFZpHQz7Xpfz1ocTHsxJnGQtA== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by KL1PR06MB6623.apcprd06.prod.outlook.com (2603:1096:820:f9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.21; Tue, 20 Aug 2024 13:22:51 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Tue, 20 Aug 2024 13:22:51 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Aug 2024 21:22:34 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820132236.286553-1-nuomi2021@gmail.com> References: <20240820132236.286553-1-nuomi2021@gmail.com> X-TMN: [ZAeygx7RTt3KRCl1XdCVdI/fzuIpD6Pw] X-ClientProxiedBy: TYCP301CA0028.JPNP301.PROD.OUTLOOK.COM (2603:1096:400:381::11) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240820132236.286553-3-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|KL1PR06MB6623:EE_ X-MS-Office365-Filtering-Correlation-Id: fd82f763-5558-4234-5507-08dcc11b2f95 X-Microsoft-Antispam: BCL:0; ARA:14566002|19110799003|461199028|15080799003|5072599009|8060799006|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: 6c2wz1aRUmVLG/49TGfSir1MbGIc5G96luaKY7kNyyCUSTX6OZ8AN1G1BQKjWb8Wc1cxLLuAidSL0wViVUdySMg0w1iGqLY5xzBlgLI3mV0nIE/aCrrPUIv1M+Ezx18ADgKLWyHrIzpP5MSLD54ajq2WdEGK7ldzqvLTolez/b+y+4ViWLdsMozSMwJWwtNRBzAoQU019fMkfzLZ/j3/+gKwojzch5PskuHWfrCOPsqsCtJtKQL/8/KHx9EcLnUcGNrZOSEccZ0nFzkMGu+Q8Td2njZ/EvA1J1IavfyY8DJNyrYj6z9J9Jf90+Xq1EBIM1hDE+3k9xoENAtClLVqzFJddALOx8QRqgZKDu5FDUjTmYMvgQWD1jhz1CPX0JGWRt2P5URxKloNcG9kBICWZ/JkbMyWANDuY87Lv/+fW1ZUBA1fZ8/vyj18TqVP3Q8dhD9H1wiQVrna4zmZQ3biufb8amqjiA4l+xWeVz8PlOZVQ0JJ7t3NIiOoYFFvfTpSBpsTIWEzK8BFRaQPla9B8c1rVEgHaNul8CocwyxEEHqAM562W637+P5YT7pFGsDkaPjPfV/fb3DLuSPBYcP1N+88b8bJfNs66M9PaK9iQT51NWjQRtnocF89AOmdZ1gqGnXqXcAdZDQdJuH+IzUCE/C7jyyjI/a8SGHF4cxCTWZQ+iRPfSAsGt0WBMpPF2NF8r3jbLrpi5IphdEoZs6pFxlRhZ2jikkpqKWpF4O/o4Q= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Adru3kOHf/RRgyOJC3BZqPCDHlGNZfSErUtTYxFjlGClr1pS7Uk030ZofYY5iqtDGztSFXdhYxGmDZz/+fdOvbHmLnFwG2Cs2YWHs6PKGowlUYUCUBUdGFxl+SAS8VQrFkhyPND19vqRWJIt5SUc9UyUO1brhdbP3/cz/RmVWcRoxtTTIXLz7espvj4leux6VQyBInOx9VOGda+xz46W8qQkpKDzZIMx8KGbsW3DRZe8QiPzU87kNtp2EysqB/Eh4HhXMqXN+UBUoEpYzxUABbU2c3Vtf/+K7aDJ3NV3duXzu2yxB7+W+S7q/gvpZ7P66yCy/WMQ5xvmFKax36F/+ZirYTbYHBYNi99/wuk4wDCO3F1xlExlvpxOdijrnPugRajt65mKR1b5axwWraqvCvj5P/Gk8zX+ATGKBn2HraTv/Vsx4Fld6xWq64S40q+xYCMpL59wUZUc5bFk7qIE96BInx7IKMgOW2zywPBqSTMrwnEJJ6vVV2k8WecmEtsAq3TWjabUW8xRgWY8hCHoVCxIxKOjCZ43U6CfrWa5WkWJfkhT3/wOsfwRCochffpu8WUpiIhTsJJkgxEkCzHLam8CKQ2D1/j6+51JJuwdTrZNP09WnUV2j7Y0Ph9WCViw4JD2W+0+UQkyX8RvPl6Oj0J91JrmVEBPUPF3O//vmndZc9C3SevdW6InOljvSg/dKas+cJmzTwtnSKwc9NqZqDpDZ5zws+7V3JDxZP2IgwjTWVzL8BmHt2SuGrU2k4iRatFAvOpMBOf3IP8U86QV5iPFeC5bvHpKjvyah8xiU5KHrJKKVV7Fwp/fnL0bqj0/3K/Lo5BkXognIF0OCFDDNTy0KzUUhjW7D5Oi1QeSpnQCxju3da0jPdWIfd2wq0CjPGNPcyI9SHNxL73qQJNhHGRdb+9fs2HAdA1djQn+dtoMA/3ZHwMgF5dF4xwtCjAco2GEGvl4LPWu3QGPBoh+eGEZV8KJonEsdwmbN3p/Teo7dDJpMODy3PdI9mbOskzMMvmLZ3a5O8amxxbp0CPW70S1jxa8MCH2RvgHn/YFI6sue1ur+wpdGIvxP6Xad2TjiGrT+6/QERDGEtS0zZElGwWqs4zSeHlHyVAbQOZZg95dphpeVgoX2ZXmzpCH0Zw0RuDShmzo6m3PEt9pN9nHEJ9D2lV6dxCI3/VeNGxn193E0zXTHhSw3kg3UpPQbL/A6RfxNj5Zzq0WNfglY9bzjDdcc1dLVVQhVmNmHla3Te3xBW6ZC3ph9sW/IIPnWQlwVdfIfQK5vG0qPxa2oH4PWg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: fd82f763-5558-4234-5507-08dcc11b2f95 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Aug 2024 13:22:48.8547 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: KL1PR06MB6623 Subject: [FFmpeg-devel] [PATCH v2 2/4] avcodec/vvcdec: bdof, do not pad sources and gradients to simplify the code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: b/qAxHHRUG2D --- libavcodec/vvc/dsp.c | 25 +------------ libavcodec/vvc/dsp.h | 4 +- libavcodec/vvc/inter_template.c | 65 ++++++++++++++------------------- 3 files changed, 31 insertions(+), 63 deletions(-) diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c index 7463d8c9de..433353c32c 100644 --- a/libavcodec/vvc/dsp.c +++ b/libavcodec/vvc/dsp.c @@ -26,26 +26,6 @@ #define VVC_SIGN(v) (v < 0 ? -1 : !!v) -static void av_always_inline pad_int16(int16_t *_dst, const ptrdiff_t dst_stride, const int width, const int height) -{ - const int padded_width = width + 2; - int16_t *dst; - for (int y = 0; y < height; y++) { - dst = _dst + y * dst_stride; - for (int x = 0; x < width; x++) { - dst[-1] = dst[0]; - dst[width] = dst[width - 1]; - } - } - - _dst--; - //top - memcpy(_dst - dst_stride, _dst, padded_width * sizeof(int16_t)); - //bottom - _dst += dst_stride * height; - memcpy(_dst, _dst - dst_stride, padded_width * sizeof(int16_t)); -} - static int vvc_sad(const int16_t *src0, const int16_t *src1, int dx, int dy, const int block_w, const int block_h) { @@ -77,11 +57,10 @@ typedef struct IntraEdgeParams { #define PROF_BORDER_EXT 1 #define PROF_BLOCK_SIZE (AFFINE_MIN_BLOCK_SIZE + PROF_BORDER_EXT * 2) -#define BDOF_BORDER_EXT 1 -#define BDOF_PADDED_SIZE (16 + BDOF_BORDER_EXT * 2) +#define BDOF_BORDER_EXT 1 +#define BDOF_BLOCK_SIZE 16 #define BDOF_MIN_BLOCK_SIZE 4 -#define BDOF_GRADIENT_SIZE (BDOF_MIN_BLOCK_SIZE + BDOF_BORDER_EXT * 2) #define BIT_DEPTH 8 #include "dsp_template.c" diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 38ff492a23..635ebcafed 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -88,8 +88,6 @@ typedef struct VVCInterDSPContext { void (*bdof_fetch_samples)(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int x_frac, int y_frac, int width, int height); - void (*prof_grad_filter)(int16_t *gradient_h, int16_t *gradient_v, const ptrdiff_t gradient_stride, - const int16_t *src, const ptrdiff_t src_stride, int width, int height, const int pad); void (*apply_prof)(int16_t *dst, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y); void (*apply_prof_uni)(uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src, @@ -97,7 +95,7 @@ typedef struct VVCInterDSPContext { void (*apply_prof_uni_w)(uint8_t *dst, const ptrdiff_t dst_stride, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y, int denom, int wx, int ox); - void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h); + void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, int block_w, int block_h); int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index 0f1712e337..c073a73e76 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -292,13 +292,11 @@ static void FUNC(fetch_samples)(int16_t *_dst, const uint8_t *_src, const ptrdif FUNC(bdof_fetch_samples)(_dst, _src, _src_stride, x_frac, y_frac, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); } -static void FUNC(prof_grad_filter)(int16_t *_gradient_h, int16_t *_gradient_v, const ptrdiff_t gradient_stride, - const int16_t *_src, const ptrdiff_t src_stride, const int width, const int height, const int pad) +static void FUNC(prof_grad_filter)(int16_t *gradient_h, int16_t *gradient_v, const ptrdiff_t gradient_stride, + const int16_t *_src, const ptrdiff_t src_stride, const int width, const int height) { const int shift = 6; const int16_t *src = _src; - int16_t *gradient_h = _gradient_h + pad * (1 + gradient_stride); - int16_t *gradient_v = _gradient_v + pad * (1 + gradient_stride); for (int y = 0; y < height; y++) { const int16_t *p = src; @@ -311,10 +309,6 @@ static void FUNC(prof_grad_filter)(int16_t *_gradient_h, int16_t *_gradient_v, c gradient_v += gradient_stride; src += src_stride; } - if (pad) { - pad_int16(_gradient_h + 1 + gradient_stride, gradient_stride, width, height); - pad_int16(_gradient_v + 1 + gradient_stride, gradient_stride, width, height); - } } static void FUNC(apply_prof)(int16_t *dst, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y) @@ -323,7 +317,7 @@ static void FUNC(apply_prof)(int16_t *dst, const int16_t *src, const int16_t *di int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -352,7 +346,7 @@ static void FUNC(apply_prof_uni)(uint8_t *_dst, const ptrdiff_t _dst_stride, con int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -380,7 +374,7 @@ static void FUNC(apply_prof_uni_w)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -395,47 +389,47 @@ static void FUNC(apply_prof_uni_w)(uint8_t *_dst, const ptrdiff_t _dst_stride, } static void FUNC(derive_bdof_vx_vy)(const int16_t *_src0, const int16_t *_src1, - const int16_t **gradient_h, const int16_t **gradient_v, ptrdiff_t gradient_stride, + const int pad_left, const int pad_top, const int pad_right, const int pad_bottom, + const int16_t **gradient_h, const int16_t **gradient_v, int* vx, int* vy) { const int shift2 = 4; const int shift3 = 1; const int thres = 1 << 4; int sgx2 = 0, sgy2 = 0, sgxgy = 0, sgxdi = 0, sgydi = 0; - const int16_t *src0 = _src0 - 1 - MAX_PB_SIZE; - const int16_t *src1 = _src1 - 1 - MAX_PB_SIZE; - for (int y = 0; y < BDOF_GRADIENT_SIZE; y++) { - for (int x = 0; x < BDOF_GRADIENT_SIZE; x++) { - const int diff = (src0[x] >> shift2) - (src1[x] >> shift2); - const int idx = gradient_stride * y + x; + for (int y = -1; y < BDOF_MIN_BLOCK_SIZE + 1; y++) { + const int dy = y + (pad_top && y < 0) - (pad_bottom && y == BDOF_MIN_BLOCK_SIZE); // we pad for the first and last row + const int16_t *src0 = _src0 + dy * MAX_PB_SIZE; + const int16_t *src1 = _src1 + dy * MAX_PB_SIZE; + + for (int x = -1; x < BDOF_MIN_BLOCK_SIZE + 1; x++) { + const int dx = x + (pad_left && x < 0) - (pad_right && x == BDOF_MIN_BLOCK_SIZE); // we pad for the first and last col + const int diff = (src0[dx] >> shift2) - (src1[dx] >> shift2); + const int idx = BDOF_BLOCK_SIZE * dy + dx; const int temph = (gradient_h[0][idx] + gradient_h[1][idx]) >> shift3; const int tempv = (gradient_v[0][idx] + gradient_v[1][idx]) >> shift3; + sgx2 += FFABS(temph); sgy2 += FFABS(tempv); sgxgy += VVC_SIGN(tempv) * temph; sgxdi += -VVC_SIGN(temph) * diff; sgydi += -VVC_SIGN(tempv) * diff; } - src0 += MAX_PB_SIZE; - src1 += MAX_PB_SIZE; } *vx = sgx2 > 0 ? av_clip((sgxdi * (1 << 2)) >> av_log2(sgx2) , -thres + 1, thres - 1) : 0; *vy = sgy2 > 0 ? av_clip(((sgydi * (1 << 2)) - ((*vx * sgxgy) >> 1)) >> av_log2(sgy2), -thres + 1, thres - 1) : 0; } static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, - const int16_t **gradient_h, const int16_t **gradient_v, const int vx, const int vy) + const int16_t **gh, const int16_t **gv, const int vx, const int vy) { const int shift4 = 15 - BIT_DEPTH; const int offset4 = 1 << (shift4 - 1); - const int16_t* gh[] = { gradient_h[0] + 1 + BDOF_PADDED_SIZE, gradient_h[1] + 1 + BDOF_PADDED_SIZE }; - const int16_t* gv[] = { gradient_v[0] + 1 + BDOF_PADDED_SIZE, gradient_v[1] + 1 + BDOF_PADDED_SIZE }; - for (int y = 0; y < BDOF_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < BDOF_MIN_BLOCK_SIZE; x++) { - const int idx = y * BDOF_PADDED_SIZE + x; + const int idx = y * BDOF_BLOCK_SIZE + x; const int bdof_offset = vx * (gh[0][idx] - gh[1][idx]) + vy * (gv[0][idx] - gv[1][idx]); dst[x] = av_clip_pixel((src0[x] + offset4 + src1[x] + bdof_offset) >> shift4); } @@ -445,31 +439,29 @@ static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, c } } -static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t *_src0, int16_t *_src1, +static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, const int16_t *_src0, const int16_t *_src1, const int block_w, const int block_h) { - int16_t gradient_h[2][BDOF_PADDED_SIZE * BDOF_PADDED_SIZE]; - int16_t gradient_v[2][BDOF_PADDED_SIZE * BDOF_PADDED_SIZE]; + int16_t gradient_h[2][BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE]; + int16_t gradient_v[2][BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE]; int vx, vy; const ptrdiff_t dst_stride = _dst_stride / sizeof(pixel); pixel* dst = (pixel*)_dst; - FUNC(prof_grad_filter)(gradient_h[0], gradient_v[0], BDOF_PADDED_SIZE, - _src0, MAX_PB_SIZE, block_w, block_h, 1); - pad_int16(_src0, MAX_PB_SIZE, block_w, block_h); - FUNC(prof_grad_filter)(gradient_h[1], gradient_v[1], BDOF_PADDED_SIZE, - _src1, MAX_PB_SIZE, block_w, block_h, 1); - pad_int16(_src1, MAX_PB_SIZE, block_w, block_h); + FUNC(prof_grad_filter)(gradient_h[0], gradient_v[0], BDOF_BLOCK_SIZE, + _src0, MAX_PB_SIZE, block_w, block_h); + FUNC(prof_grad_filter)(gradient_h[1], gradient_v[1], BDOF_BLOCK_SIZE, + _src1, MAX_PB_SIZE, block_w, block_h); for (int y = 0; y < block_h; y += BDOF_MIN_BLOCK_SIZE) { for (int x = 0; x < block_w; x += BDOF_MIN_BLOCK_SIZE) { const int16_t* src0 = _src0 + y * MAX_PB_SIZE + x; const int16_t* src1 = _src1 + y * MAX_PB_SIZE + x; pixel *d = dst + x; - const int idx = BDOF_PADDED_SIZE * y + x; + const int idx = BDOF_BLOCK_SIZE * y + x; const int16_t* gh[] = { gradient_h[0] + idx, gradient_h[1] + idx }; const int16_t* gv[] = { gradient_v[0] + idx, gradient_v[1] + idx }; - FUNC(derive_bdof_vx_vy)(src0, src1, gh, gv, BDOF_PADDED_SIZE, &vx, &vy); + FUNC(derive_bdof_vx_vy)(src0, src1, !x, !y, x + BDOF_MIN_BLOCK_SIZE == block_w, y + BDOF_MIN_BLOCK_SIZE == block_h, gh, gv, &vx, &vy); FUNC(apply_bdof_min_block)(d, dst_stride, src0, src1, gh, gv, vx, vy); } dst += BDOF_MIN_BLOCK_SIZE * dst_stride; @@ -631,7 +623,6 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter) inter->apply_prof_uni = FUNC(apply_prof_uni); inter->apply_prof_uni_w = FUNC(apply_prof_uni_w); inter->apply_bdof = FUNC(apply_bdof); - inter->prof_grad_filter = FUNC(prof_grad_filter); inter->sad = vvc_sad; } From patchwork Tue Aug 20 13:22:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51091 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:40de:b0:48e:c0f8:d0de with SMTP id lb30csp320301vqb; Tue, 20 Aug 2024 06:41:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV/ZFpofJLfgTkWwSPAqU1JndKlGupg5SdHj3fFalApTXUCu5UJ/HXp3e35HSd3wrMPN7dYjClyu1Loo8SF00lS@gmail.com X-Google-Smtp-Source: AGHT+IFDbunGy1uekwH1CDENm5TPr4cPRsSJZEruWCFnlebZ8ki2oQd0UxqBfWnTGEuIGqxfGvKP X-Received: by 2002:a17:907:7247:b0:a7a:a2e3:3739 with SMTP id a640c23a62f3a-a8643f7dcb1mr313017966b.20.1724161279134; Tue, 20 Aug 2024 06:41:19 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5bebbdef2f3si7309409a12.191.2024.08.20.06.41.18; Tue, 20 Aug 2024 06:41:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=IwBksMSP; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6BC2B68DDB3; Tue, 20 Aug 2024 16:23:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01olkn2095.outbound.protection.outlook.com [40.92.53.95]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4603568DC2C for ; Tue, 20 Aug 2024 16:23:07 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=wzFpxMiJecu4UTzPxL9xSkibBLN69XuQws3U2kByawmP8UOdtakfLPoeccUhfWyd4mR92tEZU4ypGD2fBgz3dR1iKTLFnLrqrLfpq914Uk/OzztxkjdrJW86lxCRHlqnN0r2zRqidqOqou3ZsVC6K/IvsznJccnDSDqMfhUneeiUDQZg+S6CH67ZQfsY57HFqINROgFielGIrObTd0JMHhm4Q7s2bH8dCuod1lP1ZcZW3mWv9qzNZrME6LTFyEejeWOxBlp2wdHvKQ0thpYi8iAJrEF4rzaa7Cy0TkEjToeHwT6TZtP56GB9uMJ4KPdzHn14ftz7eywvVsL41O7hJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=V7DVvCp1NwVa0rotE2AuT+FkJDo7DOIubMeELKEqGPg=; b=Y9UE5h8L0fXc/Ll2necIHGxzL+AAQRYIYirzp4jdJ4mKp3h/fbBeJj6EgIM6/eJyYdEYWo5hJ2QGRrcke4pHL0WkUEkr7ezEqh+Ch+WjluM2hSVqBWNM3oTqXT3U51KoTjxWJL52vE5/gekYbkXT2wGYm+HmxrXQdFa1VH/YPJJ0+yYKzOAPHGg6egEQxVzKNUj3Z8qTqWGyxU5lTceTmO151ggl/69W/nSc7eWEGk6XoQ/4c8YtInH/GCiNab6g04KJ/eKT9qF+qUc0s6rc98SY78aNVKc0glHpKLqB5IiDO26135y9FOkReDNmwo+BhpRFBL+BOcIIJeQVhsbD8w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=V7DVvCp1NwVa0rotE2AuT+FkJDo7DOIubMeELKEqGPg=; b=IwBksMSPGqw/7eVLcSkTQ0zobhEkP3PBY0W9UFBwJbXzncF7t9fBQerWiltgL5ObDA16mWVoqmdccA9Aq7WR7osxxT4F6xrp/5kbL+bDkdvWx9bDcRLA+yKIVyiXCQH765yeXufxUyo6ngk69QRZiChWDBIELO1gRmDLjbEVJ2ez4w3V5o0GdYfdMF/c2OaNYyn+FdLrqxt8xPu+cS/YhYxE329ma7WNWe9PiSwgTBIyb2qQ20rEF4PQ8q/FnLe00sR0oxIQZqt9xwz/1rCzd3yaTzHszmId8lJ6/GJ3HrsvLEq5uwCM+upKi6/dj43oEZ60GJhsWX/OMGskqA9Mrw== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by KL1PR06MB6623.apcprd06.prod.outlook.com (2603:1096:820:f9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.21; Tue, 20 Aug 2024 13:22:52 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Tue, 20 Aug 2024 13:22:52 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Aug 2024 21:22:35 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820132236.286553-1-nuomi2021@gmail.com> References: <20240820132236.286553-1-nuomi2021@gmail.com> X-TMN: [5YjId2dmKR3ixiL29YhewVBZW+PskPfw] X-ClientProxiedBy: TYCP301CA0028.JPNP301.PROD.OUTLOOK.COM (2603:1096:400:381::11) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240820132236.286553-4-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|KL1PR06MB6623:EE_ X-MS-Office365-Filtering-Correlation-Id: 09a4a865-b732-4003-7e18-08dcc11b3013 X-Microsoft-Antispam: BCL:0; ARA:14566002|19110799003|461199028|15080799003|5072599009|8060799006|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: 7jxZiytr0zhBLJdVSVktFtnf06p5R9JcA6yuP1NVt68j878H01s+IOvFztYYyntMAFGspMndft+zoe7tiUeuoMvl1lL5P7K7GgQ0iy3/t2CTVq651sTiZkG78NgUgTl43HStbp4GihNwgI6i2fBm/835ktH86iI+sOnS2n0zXVfvrWy1cjCSPovhUSpo7wpyHY3yeZcBS6v/gH6R81Okwa6bQ0YE2piUXKi/ARWG6pzboBn6bUMIo/v12qeqTt03HOMQTuOr1LkMubPVrjZ5OYwD7FvTxdoWWTU6lZTMUvPpzSCKLJEHqIRi6ucwL1l/fEAlk2jaHMCamREbKgT0b4XfL3YsmZQlBon0wim0z7JVizBHpByBYB6m9KcynkwUlRZdkuBCJVsnOhDT3FDmn5e8o3YwvIlf1cPsarFfCFilTDVXR/kag29iTA3drcGOYmRWs/OGJYLAX/aTjJZQjcg/95YYd8SKTlChlsVJ87EXnMJpH5ww0xy2mhLi+EhBsuvGmOfLkb7RMa1AjzxOKykEJpK2R+pmMDWNDnABJr0oi4KefC18y8A+aE6mRNA2+yuVy7odH81cLNNF7Ceubaws1MCSddeFdEtgOmPLEP65KlxHCG1NTr3c6JUYFPgswoc98mFZzIQJ4afZ+cQKDPXmJeILp5O/kbvtHObUzRCGIwCWIoCgauHynyAYnIEs7QaBLmcCbF51/W2I/2h6cjQu4FCsse2ZsZM8xgQnxhQ= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?+QEI1AOEs832Bl0uhRQ2bXr8rmUs?= =?utf-8?q?fpK70Csd+1AnsXo9iKTNCluedm8E3M/QbWUm+Bau1bCeOupj86R5LnoGCNv1yGZFx?= =?utf-8?q?nDOoKqfj6I4jL+tZ0aXlMlSERHeEqLhYZqMuaXR8YJYctzBBySXlw1ccpE/grPAWY?= =?utf-8?q?IP+AM88AWKcE9tQVWzN4a5MCq5iH1Q7sT6AlsuDhWM/FVU7IPDJAibBQbq3PAsxyu?= =?utf-8?q?eyyhSr42gK21FrHRtEhh3SSUO0T1H5bMMdUrWld87PZhSNVddybunW1/I5Bo7LX97?= =?utf-8?q?YS2+29SPaTPIyduwvC5ixLx314YZ4Hotl9GprB2goTpjjAEfPnlJwsTK0Cr6B3ufY?= =?utf-8?q?UjzYDaIzXJBkQSM4O22+huQBgQf+KInhSm10Wzx0ANdHesI6D6dHL/kiMLg0jrnXn?= =?utf-8?q?8lgp/ofoE3IX8EqeX3YYtD1WFuMqDRL2uJKsGr5qSauKQimRDMFOUiZXAyx5+IZwo?= =?utf-8?q?oPel9mPSIG3/PXb3NP7NMjtzODM4hzQFWZGlyCohfvYb3vWc/mFjieDEEEHeHcsLt?= =?utf-8?q?EMcUVAHTzF7Ws3LYFs8n+wXzNiFdXIHcJow9EINkBwXpPCk2oex9QwW+oM4vLpSmz?= =?utf-8?q?K28gRfWJk02hPu463RFzqh9/3I9sah/Kgw8+RvStSEwPgwjZcu69R7LLc7G0wL3Yx?= =?utf-8?q?oDbf/rn7o1ILGQ9UBMhihiT0j57NS/B3w3B18FZMrOjLqiHqlCDtu3yR2HKgibYLp?= =?utf-8?q?J75Hb/2u/ZvYV85VTVs5x0Nf+dSmfbMkL8HteJCRtSTeEtbN+guGKjNE2MuTUAH40?= =?utf-8?q?xemrYKVqwBru1WUzZWdxtj7XTXstUZ1Adq7gQgox5Bqaod2lvyVbmAyBh8xqGC8Fs?= =?utf-8?q?n+lMOrMixedbFGuSMjvTKkVZ/im9aBXUBz4Y5V5j/Dj87ZRPp+mhZ4nyuq5LIg9DT?= =?utf-8?q?2ysBc4y40Ai7jQGH8srZ7A+mXFfgAnhRR2SyzvnGQDYK+zTFaFU8XRNz/ipfqsVzJ?= =?utf-8?q?fkoUFsib2t52mwiANkwbyEUhXULINKK8b1liIu4GwPkGdgyUbV6PY2L4UYsA8PhT9?= =?utf-8?q?hG027rJi1+uaPWcdivIaA00aP18KEuwib7QwQakm2LMA1H5di7D67dblMRVvPrXHZ?= =?utf-8?q?OYgzI86GpB0dG082HlB4+5C9Aw/ScYeNAUP+FGQK3svyUtnwNd1QOs8OL4h/SNCsH?= =?utf-8?q?zOmVXwsPAkphj0yDRnTKh+nxZlP7czkKfEmjXNzrQI9Irfumn5tdwAFZ+zGHeufO8?= =?utf-8?q?hRZtRoO7lML45PGZ87EgPGSzEXb0o3QT4SgT4XA=3D=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 09a4a865-b732-4003-7e18-08dcc11b3013 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Aug 2024 13:22:49.6733 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: KL1PR06MB6623 Subject: [FFmpeg-devel] [PATCH v2 3/4] x86/vvcdec: inter, add optical flow avx2 code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Kxz/NtbhnptB BDoF used about 10%–25% of the CPU for some clips. Here are the FPS for one run; please ignore the negative values, as they may be due to round-to-round variation clips | before | after | delta --------------------------------------------|--------|-------|------ RitualDance_1920x1080_60_10_420_37_RA.266 | 310.0 | 363.0 | 14.60% NovosobornayaSquare_1920x1080.bin | 322.3 | 339.7 | 5.12% Tango2_3840x2160_60_10_420_27_LD.266 | 71.0 | 68.7 | -3.35% RitualDance_1920x1080_60_10_420_32_LD.266 | 250.0 | 245.3 | -1.92% Chimera_8bit_1080P_1000_frames.vvc | 359.3 | 422.7 | 15.00% BQTerrace_1920x1080_60_10_420_22_RA.vvc | 142.3 | 147.7 | 3.66% --- libavcodec/x86/vvc/Makefile | 1 + libavcodec/x86/vvc/vvc_of.asm | 385 +++++++++++++++++++++++++++++++ libavcodec/x86/vvc/vvcdsp_init.c | 21 ++ 3 files changed, 407 insertions(+) create mode 100644 libavcodec/x86/vvc/vvc_of.asm diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index 04f16bc10c..aa59aa59cf 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -6,5 +6,6 @@ OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ X86ASM-OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvc_alf.o \ x86/vvc/vvc_dmvr.o \ x86/vvc/vvc_mc.o \ + x86/vvc/vvc_of.o \ x86/vvc/vvc_sad.o \ x86/h26x/h2656_inter.o diff --git a/libavcodec/x86/vvc/vvc_of.asm b/libavcodec/x86/vvc/vvc_of.asm new file mode 100644 index 0000000000..5893bfb23a --- /dev/null +++ b/libavcodec/x86/vvc/vvc_of.asm @@ -0,0 +1,385 @@ +; /* +; * Provide AVX2 luma optical flow functions for VVC decoding +; * Copyright (c) 2024 Nuo Mi +; * +; * This file is part of FFmpeg. +; * +; * FFmpeg is free software; you can redistribute it and/or +; * modify it under the terms of the GNU Lesser General Public +; * License as published by the Free Software Foundation; either +; * version 2.1 of the License, or (at your option) any later version. +; * +; * FFmpeg is distributed in the hope that it will be useful, +; * but WITHOUT ANY WARRANTY; without even the implied warranty of +; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +; * Lesser General Public License for more details. +; * +; * You should have received a copy of the GNU Lesser General Public +; * License along with FFmpeg; if not, write to the Free Software +; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +; */ +%include "libavutil/x86/x86util.asm" + +%define MAX_PB_SIZE 128 +%define SRC_STRIDE (MAX_PB_SIZE * 2) +%define SRC_PS 2 ; source pixel size, sizeof(int16_t) +%define BDOF_STACK_SIZE 10 ; (4 + 1) * 2, 4 lines + the first line, *2 for h and v +%define bdof_stack_offset(line) ((line) * 2 % BDOF_STACK_SIZE * mmsize) +%define SHIFT 6 +%define SHIFT2 4 + +SECTION_RODATA 32 +pd_15 times 8 dd 15 +pd_m15 times 8 dd -15 + +pb_shuffle_w8 times 2 db 0, 1, 0xff, 0xff, 8, 9, 0xff, 0xff, 6, 7, 0xff, 0xff, 14, 15, 0xff, 0xff +pb_shuffle_w16 times 2 db 0, 1, 0xff, 0xff, 6, 7, 0xff, 0xff, 8, 9, 0xff, 0xff, 14, 15, 0xff, 0xff +pd_perm_w16 dd 0, 2, 1, 4, 3, 6, 5, 7 +%if ARCH_X86_64 + +%if HAVE_AVX2_EXTERNAL + +SECTION .text + +INIT_YMM avx2 + +; dst = (src0 >> shift) - (src1 >> shift) +%macro DIFF 5 ; dst, src0, src1, shift, tmp + psraw %1, %2, %4 + psraw %5, %3, %4 + psubw %1, %5 +%endmacro + +%macro LOAD_GRAD_H 4 ; dst, src, off, tmp + movu %1, [%2 + %3 + 2 * SRC_PS] + movu %4, [%2 + %3] + + DIFF %1, %1, %4, SHIFT, %4 +%endmacro + +%macro SUM_GRAD 2 ;(dst/grad0, grad1) + paddw %1, %2 + psraw %1, 1 ; shift3 +%endmacro + +%macro APPLY_BDOF_MIN_BLOCK_LINE 5 ; dst, vx, vy, tmp, line_num +%define off bdof_stack_offset(%5) + pmullw %1, %2, [rsp + off + 0 * mmsize] ; vx * (gradient_h[0] - gradient_h[1]) + pmullw %4, %3, [rsp + off + 1 * mmsize] ; vy * (gradient_v[0] - gradient_v[1]) + paddw %1, [src0q + (%5 + 1) * SRC_STRIDE + SRC_PS] + paddw %4, [src1q + (%5 + 1) * SRC_STRIDE + SRC_PS] + paddsw %1, %4 ; src0[x] + src1[x] + bdof_offset + pmulhrsw %1, m11 + CLIPW %1, m9, m10 +%endmacro + +%macro SAVE_8BPC 2 ; dst, src + packuswb m%2, m%2 + vpermq m%2, m%2, q0020 + + cmp wd, 16 + je %%w16 + movq %1, xm%2 + jmp %%wend +%%w16: + movu %1, xm%2 +%%wend: +%endmacro + +%macro SAVE_16BPC 2 ; dst, src + cmp wd, 16 + je %%w16 + movu %1, xm%2 + jmp %%wend +%%w16: + movu %1, m%2 +%%wend: +%endmacro + +%macro SAVE 2 ; dst, src + cmp pixel_maxd, (1 << 8) - 1 + jne %%save_16bpc + SAVE_8BPC %1, %2 + jmp %%end +%%save_16bpc: + SAVE_16BPC %1, %2 +%%end: +%endmacro + +; [rsp + even * mmsize] are gradient_h[0] - gradient_h[1] +; [rsp + odd * mmsize] are gradient_v[0] - gradient_v[1] +%macro APPLY_BDOF_MIN_BLOCK 4 ; block_num, vx, vy, bd + pxor m9, m9 + + movd xm10, pixel_maxd + vpbroadcastw m10, xm10 + + lea tmp0d, [pixel_maxd + 1] + movd xm11, tmp0d + VPBROADCASTW m11, xm11 ;shift_4 for pmulhrsw + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 0 + SAVE [dstq + 0 * dsq], 6 + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 1 + SAVE [dstq + 1 * dsq], 6 + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 2 + SAVE [dstq + 2 * dsq], 6 + + APPLY_BDOF_MIN_BLOCK_LINE m6, %2, %3, m7, (%1) * 4 + 3 + SAVE [dstq + ds3q], 6 +%endmacro + +%macro SUM_MIN_BLOCK_W16 4 ; src/dst, shuffle, perm, tmp + pshufb %4, %1, %2 + vpermd %4, %3, %4 + paddw %1, %4 +%endmacro + +%macro SUM_MIN_BLOCK_W8 3 ; src/dst, shuffle, tmp + pshufb %3, %1, %2 + paddw %1, %3 +%endmacro + +%macro BDOF_PROF_GRAD 2 ; line_no, last_line +%assign i0 (%1 + 0) % 3 +%assign j0 (%1 + 1) % 3 +%assign k0 (%1 + 2) % 3 +%assign i1 3 + (%1 + 0) % 3 +%assign j1 3 + (%1 + 1) % 3 +%assign k1 3 + (%1 + 2) % 3 + +; we cached src0 in m0 to m2 +%define t0 m %+ i0 +%define c0 m %+ j0 +%define b0 m %+ k0 + +; we cached src1 in m3 to m5 +%define t1 m %+ i1 +%define c1 m %+ j1 +%define b1 m %+ k1 +%define ndiff t1 +%define off bdof_stack_offset(%1) + + movu b0, [src0q + (%1 + 2) * SRC_STRIDE + SRC_PS] + movu b1, [src1q + (%1 + 2) * SRC_STRIDE + SRC_PS] + + ; gradient_v[0], gradient_v[1] + DIFF m6, b0, t0, SHIFT, t0 + DIFF m7, b1, t1, SHIFT, t1 + + ; save gradient_v[0] - gradient_v[1] + psubw m10, m6, m7 + mova [rsp + off + mmsize], m10 + + ; gradient_h[0], gradient_h[1] + LOAD_GRAD_H m8, src0q, (%1 + 1) * SRC_STRIDE, t0 + LOAD_GRAD_H m9, src1q, (%1 + 1) * SRC_STRIDE, t1 + + ; save gradient_h[0] - gradient_h[1] + psubw m11, m8, m9 + mova [rsp + off], m11 + + SUM_GRAD m8, m9 ; temph + SUM_GRAD m6, m7 ; tempv + + DIFF ndiff, c1, c0, SHIFT2, t0 ; -diff + + psignw m7, ndiff, m8 ; sgxdi + psignw m9, ndiff, m6 ; sgydi + psignw m10, m8, m6 ; sgxgy + + pabsw m6, m6 ; sgy2 + pabsw m8, m8 ; sgx2 + + ; use t0, t1 as temporary buffers + cmp wd, 16 + + je %%w16 + mova t0, [pb_shuffle_w8] + SUM_MIN_BLOCK_W8 m6, t0, m11 + SUM_MIN_BLOCK_W8 m7, t0, m11 + SUM_MIN_BLOCK_W8 m8, t0, m11 + SUM_MIN_BLOCK_W8 m9, t0, m11 + SUM_MIN_BLOCK_W8 m10, t0, m11 + jmp %%wend + +%%w16: + mova t0, [pb_shuffle_w16] + mova t1, [pd_perm_w16] + SUM_MIN_BLOCK_W16 m6, t0, t1, m11 + SUM_MIN_BLOCK_W16 m7, t0, t1, m11 + SUM_MIN_BLOCK_W16 m8, t0, t1, m11 + SUM_MIN_BLOCK_W16 m9, t0, t1, m11 + SUM_MIN_BLOCK_W16 m10, t0, t1, m11 + +%%wend: + vpblendd m11, m8, m7, 10101010b + vpblendd m7, m8, m7, 01010101b + pshufd m7, m7, q2301 + paddw m8, m7, m11 ;4 x (2sgx2, 2sgxdi) + + vpblendd m11, m6, m9, 10101010b + vpblendd m9, m6, m9, 01010101b + pshufd m9, m9, q2301 + paddw m6, m9, m11 ;4 x (2sgy2, 2sgydi) + + vpblendw m11, m8, m6, 10101010b + vpblendw m6, m8, m6, 01010101b + pshuflw m6, m6, q2301 + pshufhw m6, m6, q2301 + paddw m8, m6, m11 ; 4 x (4sgx2, 4sgy2, 4sgxdi, 4sgydi) + +%if (%1) == 0 || (%2) + ; pad for top and bottom + paddw m8, m8 + paddw m10, m10 +%endif + + paddw m12, m8 + paddw m13, m10 +%endmacro + + +%macro LOG2 5 ; log_sum, src, cmp, shift, tmp + pcmpgtw %5, %2, %3 + pandd %5, %4 + paddw %1, %5 + + psrlw %2, %5 + psrlw %4, 1 + psrlw %3, %4 +%endmacro + +%macro LOG2 2 ; dst/src, offset + pextrw tmp0d, xm%1, %2 + bsr tmp0d, tmp0d + pinsrw xm%1, tmp0d, %2 +%endmacro + +%macro LOG2 1 ; dst/src + LOG2 %1, 0 + LOG2 %1, 1 + LOG2 %1, 2 + LOG2 %1, 3 + LOG2 %1, 4 + LOG2 %1, 5 + LOG2 %1, 6 + LOG2 %1, 7 +%endmacro + +; %1: 4 (sgx2, sgy2, sgxdi, gydi) +; %2: 4 (4sgxgy) +%macro BDOF_VX_VY 2 ; + pshufd m6, m%1, q0032 + punpckldq m%1, m6 + vextracti128 xm7, m%1, 1 + + punpcklqdq m8, m%1, m7 ; 4 (sgx2, sgy2) + punpckhqdq m9, m%1, m7 ; 4 (sgxdi, sgydi) + mova m10, m8 + LOG2 10 ; 4 (log2(sgx2), log2(sgy2)) + + ; Promote to dword since vpsrlvw is AVX-512 only + pmovsxwd m8, xm8 + pmovsxwd m9, xm9 + pmovsxwd m10, xm10 + + pslld m9, 2 ; 4 (log2(sgx2) << 2, log2(sgy2) << 2) + + psignd m11, m9, m8 + vpsravd m11, m11, m10 + CLIPD m11, [pd_m15], [pd_15] ; 4 (vx, junk) + + pshuflw m%1, m11, q0000 + pshufhw m%1, m%1, q0000 ; 4 (2junk, 2vx) + + psllq m6, m%2, 32 + paddw m%2, m6 + + pmaddwd m%2, m%1 ; 4 (junk, vx * sgxgy) + psrad m%2, 1 + psubd m9, m%2 ; 4 (junk, (sgydi << 2) - (vx * sgxgy >> 1)) + + psignd m9, m8 + vpsravd m%2, m9, m10 + CLIPD m%2, [pd_m15], [pd_15] ; 4 (junk, vy) + + pshuflw m%2, m%2, q2222 + pshufhw m%2, m%2, q2222 ; 4 (4vy) +%endmacro + + +%macro BDOF_MINI_BLOCKS 2 ; (block_num, last_block) + +%if (%1) == 0 + movu m0, [src0q + 0 * SRC_STRIDE + SRC_PS] + movu m1, [src0q + 1 * SRC_STRIDE + SRC_PS] + movu m3, [src1q + 0 * SRC_STRIDE + SRC_PS] + movu m4, [src1q + 1 * SRC_STRIDE + SRC_PS] + + pxor m12, m12 + pxor m13, m13 + + BDOF_PROF_GRAD 0, 0 +%endif + + mova m14, m12 + mova m15, m13 + + pxor m12, m12 + pxor m13, m13 + BDOF_PROF_GRAD %1 * 4 + 1, 0 + BDOF_PROF_GRAD %1 * 4 + 2, 0 + paddw m14, m12 + paddw m15, m13 + + pxor m12, m12 + pxor m13, m13 + BDOF_PROF_GRAD %1 * 4 + 3, %2 +%if (%2) == 0 + BDOF_PROF_GRAD %1 * 4 + 4, 0 +%endif + paddw m14, m12 + paddw m15, m13 + + BDOF_VX_VY 14, 15 + APPLY_BDOF_MIN_BLOCK %1, m14, m15, bd + lea dstq, [dstq + 4 * dsq] +%endmacro + +;void ff_vvc_apply_bdof_%1(uint8_t *dst, const ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, +; const int w, const int h, const int int pixel_max) +%macro BDOF_AVX2 0 +cglobal vvc_apply_bdof, 7, 10, 16, BDOF_STACK_SIZE*32, dst, ds, src0, src1, w, h, pixel_max, ds3, tmp0, tmp1 + + lea ds3q, [dsq * 3] + sub src0q, SRC_STRIDE + SRC_PS + sub src1q, SRC_STRIDE + SRC_PS + + BDOF_MINI_BLOCKS 0, 0 + + cmp hd, 16 + je .h16 + BDOF_MINI_BLOCKS 1, 1 + jmp .end + +.h16: + BDOF_MINI_BLOCKS 1, 0 + BDOF_MINI_BLOCKS 2, 0 + BDOF_MINI_BLOCKS 3, 1 + +.end: + RET +%endmacro + +%macro VVC_OF_AVX2 0 + BDOF_AVX2 +%endmacro + +VVC_OF_AVX2 + +%endif ; HAVE_AVX2_EXTERNAL + +%endif ; ARCH_X86_64 diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index d5b4f4f8a5..f3e2e3a27b 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -102,6 +102,20 @@ DMVR_PROTOTYPES( 8, avx2) DMVR_PROTOTYPES(10, avx2) DMVR_PROTOTYPES(12, avx2) +void ff_vvc_apply_bdof_avx2(uint8_t *dst, ptrdiff_t dst_stride, \ + const int16_t *src0, const int16_t *src1, int w, int h, int pixel_max); \ + +#define OF_PROTOTYPES(bd, opt) \ +static void ff_vvc_apply_bdof_##bd##_##opt(uint8_t *dst, ptrdiff_t dst_stride, \ + const int16_t *src0, const int16_t *src1, int w, int h) \ +{ \ + ff_vvc_apply_bdof##_##opt(dst, dst_stride, src0, src1, w, h, (1 << bd) - 1); \ +} \ + +OF_PROTOTYPES( 8, avx2) +OF_PROTOTYPES(10, avx2) +OF_PROTOTYPES(12, avx2) + #define ALF_BPC_PROTOTYPES(bpc, opt) \ void BF(ff_vvc_alf_filter_luma, bpc, opt)(uint8_t *dst, ptrdiff_t dst_stride, \ const uint8_t *src, ptrdiff_t src_stride, ptrdiff_t width, ptrdiff_t height, \ @@ -328,6 +342,10 @@ ALF_FUNCS(16, 12, avx2) c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_##bd##_avx2; \ } while (0) +#define OF_INIT(bd) do { \ + c->inter.apply_bdof = ff_vvc_apply_bdof_##bd##_avx2; \ +} while (0) + #define ALF_INIT(bd) do { \ c->alf.filter[LUMA] = ff_vvc_alf_filter_luma_##bd##_avx2; \ c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ @@ -352,6 +370,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); + OF_INIT(8); DMVR_INIT(8); SAD_INIT(); } @@ -365,6 +384,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); + OF_INIT(10); DMVR_INIT(10); SAD_INIT(); } @@ -378,6 +398,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); + OF_INIT(12); DMVR_INIT(12); SAD_INIT(); } From patchwork Tue Aug 20 13:22:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51090 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:40de:b0:48e:c0f8:d0de with SMTP id lb30csp308373vqb; Tue, 20 Aug 2024 06:23:46 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUz9MSiHaNx5D/aCoDY423g2efBMlppTPDH3i3ZUh5y239Vk3cOYyfRqRYsmb2ACmgy2iMgxOr2rxeyW10NWbts@gmail.com X-Google-Smtp-Source: AGHT+IHGAQNBCI6gLQoJy1lBk7SlAIfU3PjYfJByN+kkzMK7ASDT+U66987dlYFlF/j15SYWku5D X-Received: by 2002:a2e:a60b:0:b0:2ef:1c0f:d490 with SMTP id 38308e7fff4ca-2f3be5e4c3amr79669061fa.39.1724160225810; Tue, 20 Aug 2024 06:23:45 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f3f1809f55si2571021fa.594.2024.08.20.06.23.45; Tue, 20 Aug 2024 06:23:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=eGOcicJx; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CDBCB68DDBD; Tue, 20 Aug 2024 16:23:11 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01olkn2095.outbound.protection.outlook.com [40.92.53.95]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A4E6D68DD74 for ; Tue, 20 Aug 2024 16:23:07 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=lLTfuJfLnN9D05DMvln+44Mbnw+YIotPM8AS61EPoTJZ/GbMGTFiLQXePX7ubHZ4A1c2wbaK4ogo6C7opO2RPL7fBj92Us3JTKSGHRmmtAQiJyH3EwYaB5KS1AY5Jo7HQg6jJB2e+ju6mIp5m0usvicvKjn646j9xwCuDzP9cUH849bVN9MQKM4AXTLJsdXWGzCBoLITMQdiMnsjrTR4Aii8b0rEWOSDO72CdrZYBO4LkgSmH0su3sBiA7FrVuLMFFRI8hRJ8SJmXCt16t2qrNQziiisthkMmwU1z8GfnWgY9G3Lo/47g9F4b+LRl187ljGznTdscX6JHXlrKqEdww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=D7nf9/n4C8gqaJyEzvDxVmOWwyiyDCxUqs8jECiw92w=; b=M0nItSoQVyQS7CoOt9yi3lPv3OoqudouqDczFhanjOFS7FoTf85xm8RrpTe8apr/BpI7D6B33npWN1ViDBCATch5dtJuA6GgKfGQLP6MNZvwamnDY3hJQ2BV5G0XiwHIy4+GUWj62fyi5CYI52+j+u9Rijy3vZz5bOD1epqSlP991A455k3jtg52QYZm2NTMbnN6yQPskuJQZMPtSLV2eF0/ovhMVV0mgHvW1wIOMj8+tlYEye3SjgULwWslLEpIRu8DMWpwG4cv28wgUkVzfiUpqtfIJMtOce4XpyBUOl0aARQhvDxVdmm+BXIFKuuOHA/o+yDNSBxoFeUjITGfYg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=D7nf9/n4C8gqaJyEzvDxVmOWwyiyDCxUqs8jECiw92w=; b=eGOcicJxenBlO5RcXOBcwC6a/g4ve41yrD8/1bEXErz8Etgns5Nhck+2VgqtS1Wuv9CISg4mWPMqL45CQShtO9IWXlESFMe0TORlQlL92XPlFL9g++pDz36lZFcdvluqqa/zAz1D2AAhSdXTOFxZi4ZCTrZfXMcM/6FXtWc2JNvXNGz7i7YNr/4JgSiSRw/RrDqB+VpXXdLF/g1m2JkEKtm9cpfCekWg+WJI8QY0sD3s1OUQ5cCGr3WGwL28ee04Egfp808hiNZw/tdbGyX/oKDl42Kjjxw3N7V75V9sbrNtrTNxmCy9IKAmF3u0fwrO3W05K8MvvotECJjRbj00wA== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by KL1PR06MB6623.apcprd06.prod.outlook.com (2603:1096:820:f9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.21; Tue, 20 Aug 2024 13:22:55 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Tue, 20 Aug 2024 13:22:54 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Tue, 20 Aug 2024 21:22:36 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240820132236.286553-1-nuomi2021@gmail.com> References: <20240820132236.286553-1-nuomi2021@gmail.com> X-TMN: [Leyu01zQynijjRwmF2ccveEycDtvDAIt] X-ClientProxiedBy: TYCP301CA0028.JPNP301.PROD.OUTLOOK.COM (2603:1096:400:381::11) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240820132236.286553-5-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|KL1PR06MB6623:EE_ X-MS-Office365-Filtering-Correlation-Id: 4bad019e-4b9c-42b2-d3f8-08dcc11b308f X-Microsoft-Antispam: BCL:0; ARA:14566002|19110799003|461199028|15080799003|5072599009|8060799006|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: lgrpIoJCu0bB3pB65lP0WbDEcY+fNSFBt+jGmG9I80SiEIGOsMwWFsejrkHNuJWJfIRr5vNxYwiJHuqjjGFWqrgP0tRZtjLi/divQe9k8ESJJTU73w8zLSvcg357bzaSh6SUQHrmEN0tbndKS6GU3+zqvVkhhmkW9PIItMG0j9h7ZOa1q8t4iR6Itp+P4BGlkHtH0ki20ri6Wmv0PJ/0XH4STecwxBfW8Ic6NhBYPwGowpD19IGPxJNEQ0BjrLN54EHefuGIqvtnsI6xH6vXdD0XXGlWJpuMolkmJapHwp/Stze8ZpR3UiDGIiBshcoRVSwH6W8GJ/V5rQuFMs8WViA6Uaynz+kWcPHnSEtMAxmg4OQnos6iPRCyTweRDionAW+ylk75XGFpfIZTZxPbCCBolDjVRsOSxIHxgCjrAPhp56TJbVPDRhzOj4W3RDl036Y2H9JYAyH4zD24UBijgyVsTl9QHSOUnVdNC4krKmPbmtcraDywewcZyJigReGveSKgT2RRtIFr3ZBYgUshMy/QELQ2EGKlNIWlr9GPF/kYhZJV/olEe4sU9h7TsgI/E1ATdO7c1Ep8AlGGEt/MWqGdOFCDYo6BzZXy0y1TuXfL4MKcfaSA3UPrzBWq1Lua3clsnQZUPdiJtjvQr4pLuLlUrhFdtFMHRJ5tMBt9/bknJkgJho7oax6FneEormYiF29Tc3XgXRyhOi0urM7b9xxnN1S+K3x9sGqRCebQAZk= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: NQF47tjehWnyEeXovepobarILq4YdjRMPoGviE4qCgtze3eXuosbbkajffBDbr67D/yvMXrtOpJBDczYqEP97OaWb1sewWJ7FDMc/V0NNobdqUvZeBjezcGkCJz9ousJntfAXpbC7JldvVWLha5q8O2gjK9etAXeM66wPUOShvRhly2kBvSFBVBX9zHfpmzGOl/8bjbd60hJaNrZetp8FBA9+GRQcER30ifrNrMJqr0iDtMs4l+CvFKNlPysjElK/v1vlIlYXiPJyhdmwNTVQrJiudpO4t3S9RNRqHcsAY+9UvnDJWBkmYPD04Il+KmXjgBVgnIDTssGBbBxBX12GzXrVO8h7ht6jve4J5dCQFjNz37wmThWQPTRiGGj8yoy6mrMoGYPlyPZoNXWccaxt72FhKSw/AH/PZxut6d6584tpoeLkz3fn7f9Za7fiUiMLDArxrSZ97GiIghZEWPL7D7B4rRv4JLH7u62wFufaTF1cxpN2mo3Yqd/I+uva9FG1DEB7uZK4KmQQSz2LRxfXpaIGWbJDKr8El2kiVuT29y67MLNJ+j8ythbi4xUe7noqfr9VuoTb/CDinPaFYAWijwgb1LAmiCWToDypchAaBs2XC4YNM3IWKj/vhh0OyAWBZUgZ3c1T6Pd32pROYKl/IceiBD1g5oM/ol+cFJ9W+w5UlHRMTuTSGLQiPrjKQu2LmIVua5T8cVtlVX4ygYbD+uqY9OgasA8HRViyAv3gm3LxFLO96x9qibB4PEYsFmYTfJxTiiFwZQwIX8wBnk91sGFNQvSBx6VfnIYyh47/nnasj5S3sL/+WPzXqfcgPhW8jtHEEmHXcpbBYG2AHxb1SIBaemHwB7kvq/39NO161Fjf9Us/YTI2quoHSHujMrMAd6xZrux+/qJI9tDOtoVarVS6QLZZAogS3gWz0pHwQM/UYW/gKftu2DL1LmsaoKTrNIJvF9PUiZHrEOkluavEhO2VaBQFw7OfKEESjHo4boKdbZyZqKwvd1lfsF8FatmcbeR/vLEiDtOTar0Ii6ifyBfwrem7r7NNrM7gRHDokJ5isP1NyOSjTnzCmpowtgxJQ1yUgy+Ptf2zaUSNQfzm9ea3O/JonSq42rlSA6AXILRL81YyQ+3ei11ryIJpdFGnipnEUhQSx72gn2vzg95Xcuniuc7hZ/rZgksKPnirbA2AaEPBeqQ0Oz5Wp8LNR3cxp9f54mz4RsgUUqIxWs2QxxKXDWmVfJV4X/QZIUzRScECBSp0+UTbbSskM1yAASEpKuM4TpbXAKsMqpyFUoAAA== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4bad019e-4b9c-42b2-d3f8-08dcc11b308f X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Aug 2024 13:22:50.5296 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: KL1PR06MB6623 Subject: [FFmpeg-devel] [PATCH v2 4/4] checkasm: add vvc_bdof test X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: LKNczv3n06lv apply_bdof_8_8x16_c: 5776.5 apply_bdof_8_8x16_avx2: 396.2 apply_bdof_8_16x8_c: 5722.0 apply_bdof_8_16x8_avx2: 216.0 apply_bdof_8_16x16_c: 11213.2 apply_bdof_8_16x16_avx2: 434.5 apply_bdof_10_8x16_c: 5657.7 apply_bdof_10_8x16_avx2: 1096.0 apply_bdof_10_16x8_c: 5531.7 apply_bdof_10_16x8_avx2: 212.5 apply_bdof_10_16x16_c: 11043.7 apply_bdof_10_16x16_avx2: 1252.7 apply_bdof_12_8x16_c: 5680.0 apply_bdof_12_8x16_avx2: 1096.5 apply_bdof_12_16x8_c: 5646.2 apply_bdof_12_16x8_avx2: 624.5 apply_bdof_12_16x16_c: 11076.0 apply_bdof_12_16x16_avx2: 1241.5 --- tests/checkasm/vvc_mc.c | 50 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c index 62fa6aa7d0..754cf19065 100644 --- a/tests/checkasm/vvc_mc.c +++ b/tests/checkasm/vvc_mc.c @@ -64,6 +64,14 @@ static const int sizes[] = { 2, 4, 8, 16, 32, 64, 128 }; randomize_buffers(buf0, buf1, size, mask); \ } while (0) +#define randomize_prof_src(buf0, buf1, size) \ + do { \ + const int shift = 14 - bit_depth; \ + const int mask16 = 0x3fff >> shift << shift; \ + uint32_t mask = (mask16 << 16) | mask16; \ + randomize_buffers(buf0, buf1, size, mask); \ + } while (0) + static void check_put_vvc_luma(void) { LOCAL_ALIGNED_32(int16_t, dst0, [DST_BUF_SIZE / 2]); @@ -382,6 +390,47 @@ static void check_dmvr(void) report("dmvr"); } +#define BDOF_BLOCK_SIZE 16 +#define BDOF_SRC_SIZE (MAX_PB_SIZE* (BDOF_BLOCK_SIZE + 2)) +#define BDOF_SRC_OFFSET (MAX_PB_SIZE + 1) +#define BDOF_DST_SIZE (BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE * 2) +static void check_bdof(void) +{ + LOCAL_ALIGNED_32(uint8_t, dst0, [BDOF_DST_SIZE]); + LOCAL_ALIGNED_32(uint8_t, dst1, [BDOF_DST_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src00, [BDOF_SRC_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src01, [BDOF_SRC_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src10, [BDOF_SRC_SIZE]); + LOCAL_ALIGNED_32(uint16_t, src11, [BDOF_SRC_SIZE]); + + VVCDSPContext c; + declare_func(void, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, int block_w, int block_h); + + for (int bit_depth = 8; bit_depth <= 12; bit_depth += 2) { + const int dst_stride = BDOF_BLOCK_SIZE * SIZEOF_PIXEL; + + ff_vvc_dsp_init(&c, bit_depth); + randomize_prof_src(src00, src10, BDOF_SRC_SIZE); + randomize_prof_src(src01, src11, BDOF_SRC_SIZE); + for (int h = 8; h <= 16; h *= 2) { + for (int w = 8; w <= 16; w *= 2) { + if (w * h < 128) + continue; + if (check_func(c.inter.apply_bdof, "apply_bdof_%d_%dx%d", bit_depth, w, h)) { + memset(dst0, 0, BDOF_DST_SIZE); + memset(dst1, 0, BDOF_DST_SIZE); + call_ref(dst0, dst_stride, src00 + BDOF_SRC_OFFSET, src01 + BDOF_SRC_OFFSET, w, h); + call_new(dst1, dst_stride, src10 + BDOF_SRC_OFFSET, src11 + BDOF_SRC_OFFSET, w, h); + if (memcmp(dst0, dst1, BDOF_DST_SIZE)) + fail(); + bench_new(dst0, dst_stride, src00 + BDOF_SRC_OFFSET, src01 + BDOF_SRC_OFFSET, w, h); + } + } + } + } + report("apply_bdof"); +} + static void check_vvc_sad(void) { const int bit_depth = 10; @@ -422,6 +471,7 @@ static void check_vvc_sad(void) void checkasm_check_vvc_mc(void) { check_dmvr(); + check_bdof(); check_vvc_sad(); check_put_vvc_luma(); check_put_vvc_luma_uni();