From patchwork Sat Jun 22 04:21:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nuo Mi X-Patchwork-Id: 50053 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:ae71:0:b0:482:c625:d099 with SMTP id w17csp907178vqz; Fri, 21 Jun 2024 21:27:47 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVtyND4LLUmQO/UhH2Zu3bt2vRuvF4cL+u9T1C9MZRZF+5YFuOjMOR/3HT79k4/oHKUOl3ZJ5gVNQ//0ojRJGo71mt3iuRdYYoTaQ== X-Google-Smtp-Source: AGHT+IF8g70q8YIlfjaMpOXfxD4GKzzmera4QatpyEgTY4LoI0SIOhSIMHru12a7Ge6/yIA0w6rh X-Received: by 2002:a17:906:3c0f:b0:a6f:b411:d5d8 with SMTP id a640c23a62f3a-a6fe76b5da5mr22551866b.1.1719030466898; Fri, 21 Jun 2024 21:27:46 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a6fcf57977esi148899566b.851.2024.06.21.21.27.46; Fri, 21 Jun 2024 21:27:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=QvaPkt0Z; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 76BA868D6C5; Sat, 22 Jun 2024 07:21:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from APC01-TYZ-obe.outbound.protection.outlook.com (mail-tyzapc01olkn2095.outbound.protection.outlook.com [40.92.107.95]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C709A68D6F2 for ; Sat, 22 Jun 2024 07:21:43 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CSw/snQj2psR5vp8MmniX+j3Dc4EkEvPUGDix30cG+4IeGr5dPzyQmeYDo7gtbiAJ+YyrA1ZDdDX2cNOYp8GWDE4nW0YAWb225tFjarJQ7tjqf2fJqP2bRUneodqjYvoCsERccJWoBxgcddV4sX6Bg72rybLnXqLgx+aZTbDOaydoF+h62DqRuIIv9mqSBfo2dKzBLetaIkvDRmWKr9hqQw9Zn1vJjh3Axq52Z9W8iQaL9CUuvJePaPzUSicRbA8F2p/K0n1ed0xzzSfeePkiMVuHoZ9uSs5RR8MNKN1Dn0I7F1INXVI0oh5z7G7lKrBgilKgXaiJthjSdH2jlyHBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6sci6EGbUUqXn6iHPC1UoJGtawEbUFF/bfVJc23zw90=; b=SbxQ3apiUct70POLRj8Bk7xgFM3LwsdJ9ynmoi1TTZOjmq2NRX8xvJe1E9GeMX5uDbF8QqFML4f+qwfUfAjijOicAw6lpRuwbPVM3BWjsB4YwMjiw1Cgd3i9uRXAZ+2hl9SiegKtIei3wpHt5mEIVyH+pLcbtqgxvZLsg/lGn0zsMLrqueT3aqcGmJwlnysC1cr8kNSJo4lCtyOrjko6UOqT3BFnuFBixrHgrZ1xvdPVDn1ltv6Nl+0LIS82At1mQ8+mVWTGyL4uG4Pb6OOuWloHYER4fgh1yJQzStgU+/+hM6laMkpHPNzua+sMZm8KljEWBn+vOIfABuNO5llC6g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6sci6EGbUUqXn6iHPC1UoJGtawEbUFF/bfVJc23zw90=; b=QvaPkt0ZEAjG2rbGzEmTaE+XuRoNaIRmwkW7CkqpoNu52mxf8FymSKKhvUR53nDaLqths6ZmmKAroz80p87voIiecihOSOpMCrkGIQHNaOGi63i6hTEqY4yPDxZqivx2XQKaKhrhZowaMbcOgfFwW+W/tT59//Ilfr8JlEC3CDj0swqQxcOwtfPmuBswAYx84rsSmIaN+j4HHJri6TKQIhwfa7mj1tt2naiDN2xA/RxSYbwf4svvUV+6ZtB1Y/icDBkd0bMQtyhGEcjbXOVpH3k3U7NBjXTtIm0kXXFvPxdbhlZR2SJkiQF7C/Ol2Ct0CuZHCrmb6zRAtohMUIb4+g== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by TY0PR06MB5494.apcprd06.prod.outlook.com (2603:1096:400:265::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7698.25; Sat, 22 Jun 2024 04:21:29 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%3]) with mapi id 15.20.7698.017; Sat, 22 Jun 2024 04:21:29 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sat, 22 Jun 2024 12:21:13 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240622042114.92873-1-nuomi2021@gmail.com> References: <20240622042114.92873-1-nuomi2021@gmail.com> X-TMN: [IQrtk+6OhOy0VcwgaHgOku6g1OK8tENH] X-ClientProxiedBy: SG2P153CA0008.APCP153.PROD.OUTLOOK.COM (2603:1096::18) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240622042114.92873-3-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|TY0PR06MB5494:EE_ X-MS-Office365-Filtering-Correlation-Id: f0d959d3-cde4-4756-ddd0-08dc9272c9aa X-Microsoft-Antispam: BCL:0; ARA:14566002|461199025|3412199022|440099025|1710799023; X-Microsoft-Antispam-Message-Info: eOjX/EUFk4/gxHoSYXFzZSNb6aXYnZEybbFd60XZGaZxsggvh7wbhltsABzm4mjTWmItp7//aPsvHikeitfaKS+xbxz7kPMwN1FvfJP+4yW57HNxhz+EAbdN69BOaaZWMxomrwtdRQVMzudolhq4Jp6II96IIQHVtR6Q24YgosqeynLeD/wUu5WMtq8UP1x2yOIxiYfLrERiCWpRVhWB0vVUYMWEgz/9woS5rVLmzaiewmBD0WoBHS/Ued4mnrQlcmlAnmHiXJB1P0CVhuWp4wa97NLdZfPOxlRqswtI/jA0ZA64qDSCWyOwQX1EfAQAf2yfwxZxmE07DMaprY3PK2H5zxbtouZ8I8xGMnvt2oyrKFVWi7qIA1LhK1uJtnnJJpuQkbmv3trfH/Z0KCNkm77qHZwDWluZIl4O8lnYqDdKBGKADDbyriboMuM2WFAR7yyPQsTrb1fdM6WXZoabL+pwyjlLNljwLhSODY3Ip7BakOmCTsBaLZG3itl686v4KxVZzla85h0ccY9lvi++R4Yb9bbqiN70aI+/F6PsEO3NmGALhP5p8TllhN6AeGFjVwD7M9SpgET75wEYL2k7iuXHkVfcK3c8GiokfmKplXa8/8mjNtGEKsmyqTLS8vW1 X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 8ZayQWCb++ZVLwicPG037lBun0p5KZWJ7JEPm1VxzsAjPcfgYJhkAdx7rPecSWGtODulJdLoV/K+6Z9rA0njDl2JuM4zoGQV3/YW/LFym9v7IIYsLGi9qaRoWoYQFG1uBI50HBop+YsxRm4tBUL13medLGkqKaoqYUD+1+4sxtC27qSd9iYV9semYZlqEHBFgxjw9UedDQCnr0Qak69a7Q6wcBFVbNVkTrB8J6Cyip8ylBRQ5g73jTfykpBelecedQrA+aemvoMhwQX33nWcQqyBsVh9qbJn0HMSp7aJQfWY2Dp4yziMp+PmuJy3A1DZATkKsSd7guEOlP21ATzqwr7yxsbUhdKyfTj6D1NdXq/haEabQ67y5JVOKJP11LvfNSmvW9jmgA3c/jmTct9v0kFePwyk0So6u0T0PD1ne/YgX6Zfmd2A5to9EGZN9dbC8fTvFCznlTjndRA8sxad539UVwxeK7/YhWOMgePmUQmMLbpwwilBXbH9AwmaJW76wdr8xMyQMNvN17sl601m6AeCkQ2ooUeU/u6AB/npPptdrlFc1bHMoVqwhiCNi47o1VapxhVCSmAGORNDzkZzptF1QHGSOGDxIAwgcToZf1QUDb5SYgqAJAB13YX+4O86+tFWn/BghAToPJxO/fgj5R1fkNy40xI8Oz28ODOSikyQYUp88LbfjwNPEpEQ6f6W190aW9+dCV+JeWfbzqFXmVatlmclHFVfz8splcsq24Te7fuPccXJ2SnysPfOswoBy2d+o1KYXC9kToUmBMqFXKMa6CI0JzObVYkLis8RsabrYZsyvIU3kvOGbGfbTbYSJHdleeNa8AImKQSPgfIPxhsopieEbUkWaNJa0E/zeZdy0Z6qOSfEDnaMj4v9JIz/UG1HY+40w3mJrenVybfZEQkQ0F3Vnfrh62iegyakceZ2ZpeKOH0j6C4leZ7PyvvxN0h5sN8NSKWLcG2RSTlQg34eDY2Aiv63++cwg3jakn2dzBNFg0ShTSpQjTLZabtNVfCKBxHaPXU0K3CSPs81ZdUuEM/mf1fUAwut1OOjlVlU+6WZdzhtog8LbJpOX51EbCC8ayCLy2MSMF8Ldwb6ZyqJJqb/GcdX1m4LLgVdbT99OWbUAlUcNL8Xu6bdgPYhbyIFBaY8+6hwBKQiEA8aD9KaEJmiHBeFOiXZMdFysijy4nsqMf5U4eiHUxjyCN1gDwskCR88ybdLAV6SGfskksCEJROny3RXZ7Kfxj8hcjTBJWPn5NEDE2TOY35SBDECxNMTSlZMNnxvHmokWf9Owg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: f0d959d3-cde4-4756-ddd0-08dc9272c9aa X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jun 2024 04:21:28.8588 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY0PR06MB5494 Subject: [FFmpeg-devel] [PATCH 3/4] x86/vvc_alf: avoid overwriting for non-16 aligned widths X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: benjamin.bross@hhi.fraunhofer.de, Nuo Mi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: npVYZ2si0IZb Previously, the code allowed overwriting on 16-aligned blocks, which was suitable when there were no picture's virtual boundaries because both CTU sizes and strides were 16-aligned. However, with picture's virtual boundaries, each CTU is divided into four ALF blocks, leading to potential issues with overwriting later CTUs. In cases involving picture virtual boundaries, each ALF block is 8-pixel aligned. For luma, we consistently ensure an 8-aligned width. For chroma in 4:2:0 format, we need to account for a 4-aligned width. --- libavcodec/x86/vvc/vvc_alf.asm | 85 ++++++++++++++++++++++++++++++---- 1 file changed, 75 insertions(+), 10 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm index b35dd9b0e9..f69a69f05f 100644 --- a/libavcodec/x86/vvc/vvc_alf.asm +++ b/libavcodec/x86/vvc/vvc_alf.asm @@ -324,18 +324,69 @@ SECTION .text %endif %endmacro -; STORE_PIXELS(dst, src) -%macro STORE_PIXELS 2 +; STORE_PIXELS_W16(dst, src) +%macro STORE_PIXELS_W16 2 %if ps == 2 - movu %1, m%2 + movu [%1], m%2 %else + movu [%1], xm%2 + %endif +%endmacro + +%macro STORE_PIXELS_W8 2 + %if ps == 2 + movu [%1], xm%2 + %else + movq [%1], xm%2 + %endif +%endmacro + +; STORE_PIXELS_W4(dst, src, offset) +%macro STORE_PIXELS_W4 3 + %if ps == 2 + movq [%1 + %3 * ps], xm%2 + %else + movd [%1 + %3], xm%2 + %endif +%endmacro + +%macro STORE_PIXELS_W8LE 3 + cmp %3, 8 + jl .w4 + STORE_PIXELS_W8 %1, %2 + cmp %3, 12 + %if ps == 2 + vpermq m%2, m%2, q0302 + %else + vpermq m%2, m%2, q0101 + %endif + jl .end + STORE_PIXELS_W4 %1, %2, 8 + jmp .end +.w4: + STORE_PIXELS_W4 %1, %2, 0 +.end: +%endmacro + +; STORE_PIXELS(dst, src, width) +%macro STORE_PIXELS 3 + %if ps == 1 packuswb m%2, m%2 vpermq m%2, m%2, 0x8 - movu %1, xm%2 + %endif + + %ifidn %3, 16 + STORE_PIXELS_W16 %1, %2 + %else + %if LUMA + STORE_PIXELS_W8 %1, %2 + %else + STORE_PIXELS_W8LE %1, %2, %3 + %endif %endif %endmacro -%macro FILTER_16x4 0 +%macro FILTER_16x4 1 %if LUMA push clipq push strideq @@ -362,7 +413,7 @@ SECTION .text ; clip to pixel CLIPW m0, m14, m15 - STORE_PIXELS [dstq], 0 + STORE_PIXELS dstq, 0, %1 lea srcq, [srcq + src_strideq] lea dstq, [dstq + dst_strideq] @@ -399,7 +450,7 @@ SECTION .text ; const uint8_t *src, ptrdiff_t src_stride, const ptrdiff_t width, cosnt ptr_diff_t height, ; const int16_t *filter, const int16_t *clip, ptrdiff_t stride, ptrdiff_t vb_pos, ptrdiff_t pixel_max); ; ****************************** -cglobal vvc_alf_filter_%2_%1bpc, 11, 15, 16, 0-0x28, dst, dst_stride, src, src_stride, width, height, filter, clip, stride, vb_pos, pixel_max, \ +cglobal vvc_alf_filter_%2_%1bpc, 11, 15, 16, 0-0x30, dst, dst_stride, src, src_stride, width, height, filter, clip, stride, vb_pos, pixel_max, \ offset, x, s5, s6 %define ps (%1 / 8) ; pixel size movd xm15, pixel_maxd @@ -409,18 +460,32 @@ cglobal vvc_alf_filter_%2_%1bpc, 11, 15, 16, 0-0x28, dst, dst_stride, src, src_s .loop: push srcq push dstq + push widthq xor xq, xq .loop_w: + cmp widthq, 16 + jl .loop_w_end + LOAD_PARAMS - FILTER_16x4 + FILTER_16x4 16 add srcq, 16 * ps add dstq, 16 * ps add xq, 16 - cmp xq, widthq - jl .loop_w + sub widthq, 16 + jmp .loop_w + +.loop_w_end: + cmp widthq, 0 + je .w_end + + LOAD_PARAMS + FILTER_16x4 widthq + +.w_end: + pop widthq pop dstq pop srcq lea srcq, [srcq + 4 * src_strideq]