From patchwork Sun Aug 18 01:48:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 51064 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:b6ca:0:b0:48e:c0f8:d0de with SMTP id s10csp1230457vqj; Sat, 17 Aug 2024 18:56:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXEuZ74UoCE65VHOzhGbOyUM3WIoiY8ycd/7+T3i6+KsVr/vU5mYK6nnpfNQzFHeQIULXEzPMTU7A8yqdMRoyXN@gmail.com X-Google-Smtp-Source: AGHT+IGGVyVdvzju6/9ZKsto6fxyrgdxFu6WPeheD6TEgqR8okshlZhGUrex3gzVUC+NiUcSepnu X-Received: by 2002:a05:651c:19a9:b0:2ef:2b6e:f8c2 with SMTP id 38308e7fff4ca-2f3be5fba06mr26534271fa.6.1723946178706; Sat, 17 Aug 2024 18:56:18 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f3b771ae3dsi16496341fa.495.2024.08.17.18.56.18; Sat, 17 Aug 2024 18:56:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=RgeSabpd; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1A1F068DD75; Sun, 18 Aug 2024 04:48:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01olkn2023.outbound.protection.outlook.com [40.92.52.23]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A578468DD6F for ; Sun, 18 Aug 2024 04:48:38 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ZLOT/7Vuw9n90zbSDLd1d+xhYCOjqAFIl3Ot8+xaF7axLA0vjfKEzIivSeW4N6wOwmWoUatdIhyxjI/FYGHQ+G8/atzc2XvfHORdTdxu86S3VPtpJNvLkteMZVVvnhSelkiE9ErH7HXe2H8VKLupQf1xFM5NjFuucI6rBuo2sHbCHvk1014PVpzDaZgV5Ega1qBYXn/G+zPlEiSjWqtfX27tdewxTSLe+ERIjaKnf4F+Qc16dJ/VOXJ/XhgseEF4zSzLf4Jc4SXFTwEmMEum1yCGKmOR6+AMJbWQDp4JS0bn2y+NU8iUZZoyedTBPToZmio+7bLD7qpSaeqCELNHmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nN+A0pgptZvZqgIzuLUfynHmpdCaB+E0pA/4i7rIvHM=; b=AJ5AcUBqcMoPEDIV7iGCmU2H1zruz7xW8NZAdH6LFaQ6YMUoxj+RnC5e02w8OTph0g8AnouGcF7ecqUSOINhSAaaO6H1dsEeEmR4YSQ5iG7dwjWcL6i7RT1vGRiqrwWz5zPwI4Rq94zbHgVuUMkuxcKX3TX4yBkNGWqSAvCgkSUoM1ZuvKBSo0ZoC6Fg4EnupkJ8/jZPIvLQJFKvRkE/Q7rAqGsZau0eZei8MNxu6Fg+VQ2vHyPZcczqHpxIGO7eRkZ3ODxkwqATJz4tnW1pfdYWYo3xuFqNaEwv6cqlZlR00ezhh54Awzu+AiXABOzdwicQ2UxwtogiuKN1GK5UwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nN+A0pgptZvZqgIzuLUfynHmpdCaB+E0pA/4i7rIvHM=; b=RgeSabpdPw2Bb5ppQjrWeu8YPPG7MOMg04x1rOAoupmDgc3/l3EEEMZzPVw/kJqUf4LnCFIejphiqQc+0f9EJOyQ7mVpleLtNwAN81mgL3aXn4kmMdlN8v2WJqg+xkdCwjRUWfnHJ1wBBWwlSQhgze7i8FJ1IDehQ6TRffV7VLob2qdefFzw7jx8gNQTnbJW4IBO/W6+uCmCKElXY2VSRX/EECfs613ACMnyVP3o62+v5gFThnBM8oXl7iILXqVan0Z1IHi5JVwPy2EV6fX1TDAMSZ0s5N4H0QcoY6BxU+sutUHD5Zps/iaH/MSPPVkKHlbTEUGfaUaujoS5faUjZA== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by SEYPR06MB5208.apcprd06.prod.outlook.com (2603:1096:101:88::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.20; Sun, 18 Aug 2024 01:48:27 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%5]) with mapi id 15.20.7875.019; Sun, 18 Aug 2024 01:48:27 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sun, 18 Aug 2024 09:48:05 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240818014807.47423-1-nuomi2021@gmail.com> References: <20240818014807.47423-1-nuomi2021@gmail.com> X-TMN: [KFjge3IcAuqvrN4yaQQx8z2OOuLUKpPj] X-ClientProxiedBy: TY2PR0101CA0034.apcprd01.prod.exchangelabs.com (2603:1096:404:8000::20) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240818014807.47423-2-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|SEYPR06MB5208:EE_ X-MS-Office365-Filtering-Correlation-Id: 3446db2b-c9c5-41bf-21a7-08dcbf27da9c X-Microsoft-Antispam: BCL:0; ARA:14566002|5072599009|19110799003|15080799003|8060799006|461199028|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: sEWJsl7ah2Id1xvAEmt2h6kU3X9iewZWg97RlsJ0+SApQ+BptBiEtRZTfK8JQXoVjQwc/kY+dunTp4feemPiign8yjqM/IOUGuahiGEPNAD//uk11O3Xdpr1aw42oeb8kkgT8iQ/S1rkRAgQpJb3NbhQPfkFb5QmWZeU5qsG6lLiuvnkK0dr75J8L1NjDdW0uwNwqXQozrrr8wWGU4qyRf/Ghw5Ydr/tjrq0bcvNmgWSPXQ+z8g+K8ptK3pmhcOxltegM7efOUpCKUEmRPTWNiDjAy4wk/Of+6Ipt3brGCJS0zrPpZv9ngBdfTvvSH9QwEx2PXSM+d+w7lg4CP8mYiO5PUSAoBj79hLZqUAhAmVfS1Pk314e8Fpc0TydKMF0HW8C5+qEHGh8z6wb/WnPHYu6GGYg/JSmu6SUEkKUlraQpq8MjlGCTe2kDWXrjHro4hx8nHSX4+kfdT+hhjjEvC9AOtTsDCrvON1yy/+tQO1iHdMtHNZvzdR2e0GawJDFT2xHSMkLlQqMtbGNzjvpnnzgDITZHtjC/KocoGlL22Bf6v7/r96kBR9nCSvMScjhwMn21CcBVyQMtXAc/P95m2BC4XXwke2XfTvqletBYECAkm3sQW2rPSATU6C7ujeW6ZxksGLG87JU1PjCtu5+7Tv45sCiXjrWOxHutFyVBjwDmDg+FY/JNiicsjc68OX4abW40Q1uk5MZOn1Wil+wsuAVH1XBrxgv+tWZHrrMRuw= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Nd1JmJHzBSJ9GX0xFQF41O1GR3D69fHOIwqtfi3FnmkqmozQYq3xP96uTTWDzVJgm7Ua1PnZEfiYsbxMUaRoZjyxZ0oh8Ry6GDXiNJsv+d/8lF4gr0pmrGzPvnQ4ALpeenxgtiCunExqdn9XLeS4U+zcTkh2WtsegVTeADxh3f8+du1bjS25C7Fm3C95hj6u6h0rCiOXFl+AplRqzeuMXIV35BBTEY3qda3dCXlq9pxo4Y8Figi0ec6kPKq7TpbJeFzYCj/xng++kIpKr++G24fHOOnHEBG817ujoLkyrp0JFSRbefsEezR8SLu8++c2GJO70XqE+9TIJtWmfc1T5oMlZyUt+7Eix2zZkYtwEZNOyGKlP+aJvwJ6Mo/3AQdeofbLbiZ43TU3UadMkmOtfZrh9ndsSU4kgPbWY3i7HvSuQ5KpM9uB30jmtRFKgKaL3noDhc0AfNEa57hmYsUlBqo91HXk030h0QUD1CkpRQI5dxpaA1T4tVCunNdMD4jPgYCAPj4YKLrzHUqbD7Q2iOqhXQa2+PMOspD2Cz/w250DGu++/+V4AcMbwr3a0f67lZCpuCnOnolh2j3sSA/Dlc9zUmuwQd68Ks33OnogWgoqpkrckrL8LJH9jD8HGAW2ovjU/pDTDb0nJqv35kA21TBjQTiYT/dJtAfrRwkPmfnwaxais7khGhP8rGkiFDrmWBGyYcyG0lc6sVEg7MAgePQn6PWsopdllzrvu8Y+IYvMmFmQ39P/ZYMPACur0DwqZNeadNB0sJv2XNug7sKulLw0h9fwtzsgdte2BuPL/F3SnQ7UBTuCs+9vlopFoTMZXf2Noa3lKNNatVRGLElWJ1eLLb5x0aCBUH8otKpN8KeOlXpSb1esK0gxNufN0Y2YvMOe4a2DbwpDJmY4BLjhhqOIMRdvRx0KLcRFRH4CDF5ah4IVnM0Fov+DIi/PYh0a2v+3EJTJDwQ56BsQBlQIcKjoeHFcYVFObGTmD9q9omTk6CpbmINXXtvFmIirTHjzCO0VXy/zpSskuoRV9yq7mYR61Ar/3la/HPzzeALd7g6fRyNVZKaOs/++BNdD0xynVW7P5jj1bB+zBtIYZW3kYZyCYGzypbJ9FsC7t5kXWywXR5N5FMFYhd9zOFeixGbaSl4ZcCX+7GR0farThFYzQsSpY60mriO/YvMQD4ESdnJV7dRjqvNyus0+Q2VsCcIZSPe5WlxlLJNaVFeigBjsIwW3sFFjovOITMZDq3FMIFTTycrKkTkCWafCecB3imJXa7TjDJ6AtSYbCoDn491C7A== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3446db2b-c9c5-41bf-21a7-08dcbf27da9c X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2024 01:48:27.4619 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEYPR06MB5208 Subject: [FFmpeg-devel] [PATCH 2/4] avcodec/vvcdec: bdof, do not pad sources and gradients to simplify the code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: xkD7B5Bnemtr --- libavcodec/vvc/dsp.c | 25 +------------ libavcodec/vvc/dsp.h | 4 +- libavcodec/vvc/inter_template.c | 65 ++++++++++++++------------------- 3 files changed, 31 insertions(+), 63 deletions(-) diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c index 7463d8c9de..433353c32c 100644 --- a/libavcodec/vvc/dsp.c +++ b/libavcodec/vvc/dsp.c @@ -26,26 +26,6 @@ #define VVC_SIGN(v) (v < 0 ? -1 : !!v) -static void av_always_inline pad_int16(int16_t *_dst, const ptrdiff_t dst_stride, const int width, const int height) -{ - const int padded_width = width + 2; - int16_t *dst; - for (int y = 0; y < height; y++) { - dst = _dst + y * dst_stride; - for (int x = 0; x < width; x++) { - dst[-1] = dst[0]; - dst[width] = dst[width - 1]; - } - } - - _dst--; - //top - memcpy(_dst - dst_stride, _dst, padded_width * sizeof(int16_t)); - //bottom - _dst += dst_stride * height; - memcpy(_dst, _dst - dst_stride, padded_width * sizeof(int16_t)); -} - static int vvc_sad(const int16_t *src0, const int16_t *src1, int dx, int dy, const int block_w, const int block_h) { @@ -77,11 +57,10 @@ typedef struct IntraEdgeParams { #define PROF_BORDER_EXT 1 #define PROF_BLOCK_SIZE (AFFINE_MIN_BLOCK_SIZE + PROF_BORDER_EXT * 2) -#define BDOF_BORDER_EXT 1 -#define BDOF_PADDED_SIZE (16 + BDOF_BORDER_EXT * 2) +#define BDOF_BORDER_EXT 1 +#define BDOF_BLOCK_SIZE 16 #define BDOF_MIN_BLOCK_SIZE 4 -#define BDOF_GRADIENT_SIZE (BDOF_MIN_BLOCK_SIZE + BDOF_BORDER_EXT * 2) #define BIT_DEPTH 8 #include "dsp_template.c" diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 38ff492a23..635ebcafed 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -88,8 +88,6 @@ typedef struct VVCInterDSPContext { void (*bdof_fetch_samples)(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int x_frac, int y_frac, int width, int height); - void (*prof_grad_filter)(int16_t *gradient_h, int16_t *gradient_v, const ptrdiff_t gradient_stride, - const int16_t *src, const ptrdiff_t src_stride, int width, int height, const int pad); void (*apply_prof)(int16_t *dst, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y); void (*apply_prof_uni)(uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src, @@ -97,7 +95,7 @@ typedef struct VVCInterDSPContext { void (*apply_prof_uni_w)(uint8_t *dst, const ptrdiff_t dst_stride, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y, int denom, int wx, int ox); - void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h); + void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, int block_w, int block_h); int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index 0f1712e337..c073a73e76 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -292,13 +292,11 @@ static void FUNC(fetch_samples)(int16_t *_dst, const uint8_t *_src, const ptrdif FUNC(bdof_fetch_samples)(_dst, _src, _src_stride, x_frac, y_frac, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); } -static void FUNC(prof_grad_filter)(int16_t *_gradient_h, int16_t *_gradient_v, const ptrdiff_t gradient_stride, - const int16_t *_src, const ptrdiff_t src_stride, const int width, const int height, const int pad) +static void FUNC(prof_grad_filter)(int16_t *gradient_h, int16_t *gradient_v, const ptrdiff_t gradient_stride, + const int16_t *_src, const ptrdiff_t src_stride, const int width, const int height) { const int shift = 6; const int16_t *src = _src; - int16_t *gradient_h = _gradient_h + pad * (1 + gradient_stride); - int16_t *gradient_v = _gradient_v + pad * (1 + gradient_stride); for (int y = 0; y < height; y++) { const int16_t *p = src; @@ -311,10 +309,6 @@ static void FUNC(prof_grad_filter)(int16_t *_gradient_h, int16_t *_gradient_v, c gradient_v += gradient_stride; src += src_stride; } - if (pad) { - pad_int16(_gradient_h + 1 + gradient_stride, gradient_stride, width, height); - pad_int16(_gradient_v + 1 + gradient_stride, gradient_stride, width, height); - } } static void FUNC(apply_prof)(int16_t *dst, const int16_t *src, const int16_t *diff_mv_x, const int16_t *diff_mv_y) @@ -323,7 +317,7 @@ static void FUNC(apply_prof)(int16_t *dst, const int16_t *src, const int16_t *di int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -352,7 +346,7 @@ static void FUNC(apply_prof_uni)(uint8_t *_dst, const ptrdiff_t _dst_stride, con int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -380,7 +374,7 @@ static void FUNC(apply_prof_uni_w)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t gradient_h[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; int16_t gradient_v[AFFINE_MIN_BLOCK_SIZE * AFFINE_MIN_BLOCK_SIZE]; - FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE, 0); + FUNC(prof_grad_filter)(gradient_h, gradient_v, AFFINE_MIN_BLOCK_SIZE, src, MAX_PB_SIZE, AFFINE_MIN_BLOCK_SIZE, AFFINE_MIN_BLOCK_SIZE); for (int y = 0; y < AFFINE_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < AFFINE_MIN_BLOCK_SIZE; x++) { @@ -395,47 +389,47 @@ static void FUNC(apply_prof_uni_w)(uint8_t *_dst, const ptrdiff_t _dst_stride, } static void FUNC(derive_bdof_vx_vy)(const int16_t *_src0, const int16_t *_src1, - const int16_t **gradient_h, const int16_t **gradient_v, ptrdiff_t gradient_stride, + const int pad_left, const int pad_top, const int pad_right, const int pad_bottom, + const int16_t **gradient_h, const int16_t **gradient_v, int* vx, int* vy) { const int shift2 = 4; const int shift3 = 1; const int thres = 1 << 4; int sgx2 = 0, sgy2 = 0, sgxgy = 0, sgxdi = 0, sgydi = 0; - const int16_t *src0 = _src0 - 1 - MAX_PB_SIZE; - const int16_t *src1 = _src1 - 1 - MAX_PB_SIZE; - for (int y = 0; y < BDOF_GRADIENT_SIZE; y++) { - for (int x = 0; x < BDOF_GRADIENT_SIZE; x++) { - const int diff = (src0[x] >> shift2) - (src1[x] >> shift2); - const int idx = gradient_stride * y + x; + for (int y = -1; y < BDOF_MIN_BLOCK_SIZE + 1; y++) { + const int dy = y + (pad_top && y < 0) - (pad_bottom && y == BDOF_MIN_BLOCK_SIZE); // we pad for the first and last row + const int16_t *src0 = _src0 + dy * MAX_PB_SIZE; + const int16_t *src1 = _src1 + dy * MAX_PB_SIZE; + + for (int x = -1; x < BDOF_MIN_BLOCK_SIZE + 1; x++) { + const int dx = x + (pad_left && x < 0) - (pad_right && x == BDOF_MIN_BLOCK_SIZE); // we pad for the first and last col + const int diff = (src0[dx] >> shift2) - (src1[dx] >> shift2); + const int idx = BDOF_BLOCK_SIZE * dy + dx; const int temph = (gradient_h[0][idx] + gradient_h[1][idx]) >> shift3; const int tempv = (gradient_v[0][idx] + gradient_v[1][idx]) >> shift3; + sgx2 += FFABS(temph); sgy2 += FFABS(tempv); sgxgy += VVC_SIGN(tempv) * temph; sgxdi += -VVC_SIGN(temph) * diff; sgydi += -VVC_SIGN(tempv) * diff; } - src0 += MAX_PB_SIZE; - src1 += MAX_PB_SIZE; } *vx = sgx2 > 0 ? av_clip((sgxdi * (1 << 2)) >> av_log2(sgx2) , -thres + 1, thres - 1) : 0; *vy = sgy2 > 0 ? av_clip(((sgydi * (1 << 2)) - ((*vx * sgxgy) >> 1)) >> av_log2(sgy2), -thres + 1, thres - 1) : 0; } static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, const int16_t *src0, const int16_t *src1, - const int16_t **gradient_h, const int16_t **gradient_v, const int vx, const int vy) + const int16_t **gh, const int16_t **gv, const int vx, const int vy) { const int shift4 = 15 - BIT_DEPTH; const int offset4 = 1 << (shift4 - 1); - const int16_t* gh[] = { gradient_h[0] + 1 + BDOF_PADDED_SIZE, gradient_h[1] + 1 + BDOF_PADDED_SIZE }; - const int16_t* gv[] = { gradient_v[0] + 1 + BDOF_PADDED_SIZE, gradient_v[1] + 1 + BDOF_PADDED_SIZE }; - for (int y = 0; y < BDOF_MIN_BLOCK_SIZE; y++) { for (int x = 0; x < BDOF_MIN_BLOCK_SIZE; x++) { - const int idx = y * BDOF_PADDED_SIZE + x; + const int idx = y * BDOF_BLOCK_SIZE + x; const int bdof_offset = vx * (gh[0][idx] - gh[1][idx]) + vy * (gv[0][idx] - gv[1][idx]); dst[x] = av_clip_pixel((src0[x] + offset4 + src1[x] + bdof_offset) >> shift4); } @@ -445,31 +439,29 @@ static void FUNC(apply_bdof_min_block)(pixel* dst, const ptrdiff_t dst_stride, c } } -static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, int16_t *_src0, int16_t *_src1, +static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, const int16_t *_src0, const int16_t *_src1, const int block_w, const int block_h) { - int16_t gradient_h[2][BDOF_PADDED_SIZE * BDOF_PADDED_SIZE]; - int16_t gradient_v[2][BDOF_PADDED_SIZE * BDOF_PADDED_SIZE]; + int16_t gradient_h[2][BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE]; + int16_t gradient_v[2][BDOF_BLOCK_SIZE * BDOF_BLOCK_SIZE]; int vx, vy; const ptrdiff_t dst_stride = _dst_stride / sizeof(pixel); pixel* dst = (pixel*)_dst; - FUNC(prof_grad_filter)(gradient_h[0], gradient_v[0], BDOF_PADDED_SIZE, - _src0, MAX_PB_SIZE, block_w, block_h, 1); - pad_int16(_src0, MAX_PB_SIZE, block_w, block_h); - FUNC(prof_grad_filter)(gradient_h[1], gradient_v[1], BDOF_PADDED_SIZE, - _src1, MAX_PB_SIZE, block_w, block_h, 1); - pad_int16(_src1, MAX_PB_SIZE, block_w, block_h); + FUNC(prof_grad_filter)(gradient_h[0], gradient_v[0], BDOF_BLOCK_SIZE, + _src0, MAX_PB_SIZE, block_w, block_h); + FUNC(prof_grad_filter)(gradient_h[1], gradient_v[1], BDOF_BLOCK_SIZE, + _src1, MAX_PB_SIZE, block_w, block_h); for (int y = 0; y < block_h; y += BDOF_MIN_BLOCK_SIZE) { for (int x = 0; x < block_w; x += BDOF_MIN_BLOCK_SIZE) { const int16_t* src0 = _src0 + y * MAX_PB_SIZE + x; const int16_t* src1 = _src1 + y * MAX_PB_SIZE + x; pixel *d = dst + x; - const int idx = BDOF_PADDED_SIZE * y + x; + const int idx = BDOF_BLOCK_SIZE * y + x; const int16_t* gh[] = { gradient_h[0] + idx, gradient_h[1] + idx }; const int16_t* gv[] = { gradient_v[0] + idx, gradient_v[1] + idx }; - FUNC(derive_bdof_vx_vy)(src0, src1, gh, gv, BDOF_PADDED_SIZE, &vx, &vy); + FUNC(derive_bdof_vx_vy)(src0, src1, !x, !y, x + BDOF_MIN_BLOCK_SIZE == block_w, y + BDOF_MIN_BLOCK_SIZE == block_h, gh, gv, &vx, &vy); FUNC(apply_bdof_min_block)(d, dst_stride, src0, src1, gh, gv, vx, vy); } dst += BDOF_MIN_BLOCK_SIZE * dst_stride; @@ -631,7 +623,6 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter) inter->apply_prof_uni = FUNC(apply_prof_uni); inter->apply_prof_uni_w = FUNC(apply_prof_uni_w); inter->apply_bdof = FUNC(apply_bdof); - inter->prof_grad_filter = FUNC(prof_grad_filter); inter->sad = vvc_sad; }