From patchwork Tue Oct 11 22:20:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 38701 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4a86:b0:9d:28a3:170e with SMTP id fn6csp1085872pzb; Tue, 11 Oct 2022 15:20:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM663ivYlu1KHiBGc+Q6eBCztVuHheHXVTXmC08bQPEWmGLXWalCOWu35OLB1Hs6j4wzBnwI X-Received: by 2002:a17:907:7b93:b0:770:1d4f:4de9 with SMTP id ne19-20020a1709077b9300b007701d4f4de9mr20267672ejc.201.1665526835466; Tue, 11 Oct 2022 15:20:35 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z13-20020a056402274d00b0044ef4f59a03si17100919edd.566.2022.10.11.15.20.34; Tue, 11 Oct 2022 15:20:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=W4x5WNdx; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 31D1A68BD27; Wed, 12 Oct 2022 01:20:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02olkn2012.outbound.protection.outlook.com [40.92.50.12]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 56E4A68BBFD for ; Wed, 12 Oct 2022 01:20:24 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Od/vgxwtJWl/sNFSianpNVpEv7nj98Umlrsiujf9N/n87CGR8x4eik0LG90fF3fpDQVFMEsjDo+JghtEy468cLaLWOIeNwqAwj4TR61gvMB0WhLcGtuMl8CHAzFUyyq9HL+uU9IG9MoKjVyvlGaFNCt3XtuxMyq7WN/5MaMulnJ37IcsW17zpCmSUpCXWP46JKKA7o1z6z3kU+G7HRSvTFCcap8rNQAxLKqystQK+uePudnTF0eluL8UJU59u/NVRqQ+fRPsJ6SR8gs1UEHHkQ2dIQKjrYJG7nC7Z6VDoqP36VCLiEGk7OzHMhMO3qZh7MMHLDh5/EmdOrjDcxpJ+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=huyXMdibcIMDm4hYCQ+0KffkBBZQ9aBgqElSZkr9H1U=; b=ngSUNUoW/l4iGcScR9DBTMUQAfu7mFnYfl1XsYmcfZget7Toe20PCzbCTk76Jrlogw6HBiyVuoZBS979qh13usQqP3LqXrwnbo+zQFxK8pRbhWCTmaNZ5IwyvBoSFpDcQklM76dQslyaZ7mv4t1NyHhL/YpyZVBUx8hJ4BWJ8vgsARwZ5K+WeWRJqyD/K/04bzglLhEelui5Mfw0DnCJmGuLE9nygnpMT8ARzfv1Lkd/EOZbgYP6Lw1vaVzA/MG5XSRmRtDVdLivDcZ6dOv+L6LuKe5t4dHU6gX0GH2Uli+v9p/z3T2qzIYQkvQNOAiqyn/FtAPwDVLMQ8iYgpP5mg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=huyXMdibcIMDm4hYCQ+0KffkBBZQ9aBgqElSZkr9H1U=; b=W4x5WNdxuJpvdZsubFBdTWCXDCBhudGAeldTAixaYEqBJY8BzIQSILWkKbvZdAV3BdgSYedVN4LrCUAnwMy5oAanp5Y4dIYjpg9fp/8B870ruCW2Yoh6fixb5KegAhms9kBsAYaGFPNjVlf/yUjgX9WAVFktrtmyuFlOad5imWvyzd5nH2YjLsl6StcBE5WAarHUBRGhJiVdooZC3A+lhZIP2ystGcAVGfps6QCcqUtEZzvvFeGMCHGRMtot01CRVZ1yIWULwX3e/R1Ccd+Tbv4mIci5iHRqFzhqTrkPej+lpE9uAZxl3m+TM2R37I8Xixl+l3GX1yEk8y/BqqGoNw== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by PR3P250MB0305.EURP250.PROD.OUTLOOK.COM (2603:10a6:102:17c::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.15; Tue, 11 Oct 2022 22:20:22 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::f9d7:680f:70c4:44fe]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::f9d7:680f:70c4:44fe%7]) with mapi id 15.20.5709.015; Tue, 11 Oct 2022 22:20:22 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Wed, 12 Oct 2022 00:20:23 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 X-TMN: [uEvjBj7N6Fh/0NkTDQXFlCo9jYOhIl98GI29AMcJO2g=] X-ClientProxiedBy: FR3P281CA0155.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:a2::16) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <20221011222023.1316680-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|PR3P250MB0305:EE_ X-MS-Office365-Filtering-Correlation-Id: 2e08b5b6-16a6-45ee-7a60-08daabd6c974 X-MS-Exchange-SLBlob-MailProps: a+H6FLLcF3pnAHoQw1t90rG7suY0MUZ/KwJ0NuOPzuM6p10YQMnMBxmsIp2PMQdSQkf/e70gNTejpTuv6HsuDsPWzFa7bpJJ2U2MeiHJESqA32JepOz/VeIyxkzc+4m9aEMRLlN0pP5zYYuvad5xTaz8euzB5wG6PVJJXpb7VHt1YxlX7OrXEduX/YR+lJ0gsx7yMHrn0vxu+IIygXFhdMWYmWH1vQRjYh/NttU2fmEDOGs0kx4JOYXr3xvqQcGpDSiJOJ5wc9G0064Z7pEzKetV1xmlCjdI7ZDrjroKAlkTx6y+KMJqiWHHPoPEQ0CIJulq9ZAO0Q6SE7ZvvKOF6CnMfednHmjtgR0GQK2arwwH5/+oWxhWK5Pe0gfsGBLZ/ZAdxvNm1S3UPqrvzxaywEh6m6Z+YIq6qW11oPhJ0UzjqfeVYj0HMyxiROPQRzk+sARIcHpZQOr9rGLtKspzCtZC5f+rsZhudk3PLmAY/+g4nSfRk6picZ82QLNjGfzEcCR+uUOAtCN+ICGPShDO0Uy1iErTWJIorCLVrTNVL2IfxIh1M588GIA7Ia5oKIQlRoWChZqGfcJKk+wk3jjmY8z41OALc1ahFtk02D5rcEI8UnGHnuUgOa/k7D0sYFPvUUOytU7/WzluLE3fJS+XnslZXbiMBDb36tONJ2fhrkOxvMCqEYSczY/kTOZJHMQCnhtRIGTLv2LkWMNU20/ReFAUhGJ+pOCvbjhNK+xHyE2sAB4jRDkZ7bgGRmPcxlatH0YFNF4qGsHGnhUhkabQoXKZzxtdY6npEXfoRS/yN1ZNISOc3we8Adwtw+en3037 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: RA+EoZWSDfd+t+MGpK0zyXdomHGrK6SqLx3YTShwaPz2eqFvFp3EOeQ57j4MJkru7cIv8b61AmgqVLFg8Mmp604cZOBEPADM2l7zbIkyaVXFCPpSxDRoPM1CIBUN/4HLKBp7LpZ3HJpw1R5P6k/J3oWQCzIqykhq8zecDhWBxQalb2qKS2Yixuz/RFxOOl4XHLL/F9s5dCXxpKrxlDeL7FfaqBQiu9vnSFtlfGP6UrCvBk7Yz9lzpRnoEGrS4VzUjPn03I2GgbyN4wDuRFG04PCbNa5bSCsjhVBzd+h78nVCiId7y9x/jh1v0ke17+n1GYqnIzlskoSZMNBNxKzQHJKVcbDKHCptTvt6uOmNDc/K0t6kLf12/kpd1j/kUfWljAQRk9f9BkaE92koejTeSTfgPPen19G1RQNbHbBqJIpneuB8GS7Z8x+Ys7xLvK//+jGNEiUB2JWFAbKjYmF52f5LjyCjbwpUeouVoOO+6E0UC6s3T3HVHzr/EM4Iq37TiIg5Lwp6+yIlowo9kvzrcxkFKLlGGsv5uu1D6Py5G9bIyWq5ynFEu6jES3ZMOTXaYbzREYB1IN8jAPgb2T2rtB4SxUfrYAG/+72eiVZN37U1Ik0BZOp7/zMj3YUaNMAoJVb1++w3OQl+tQFbjbs1tF2GSs5KZpEcIsh8mPUZDKEBxSskg/Wi4uxvt3yPN7c8 X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: O6NHs1i8JB6oUcbn2b0kiLqZUAG9ExZc8tYCpKcC2i5ecEylo1zc3TG4ShnWcLracOy775luKYcS+Koj2aDZ/kTnnmOLBgooCEe+7bDNNZFUBi/vc/z9H/kkYOfccRgWTiK/ix+LkBQWDc93nwkTK+OosOk61jR39LnSh73jZMkfWegJB87izNBX3LedrRo3jh/VBT0efjFcAbKEZf/JxoC+qYeuUH6OmgFoyRK6oTmsppkD8Qf4TR1MQzSoixTvJ9XwJc/yV1qSX2f/VwuVGm9lY41C95EuTw/L2r0BkDMinpojxQlL7dEsT61TS61+Dy4kv+wku/+/pfoHje7KCvLaD2QA5jpnSIbSrV+28QX/flgXPsH6rv+lYIQTLllPEOTrAbLqrnHA4t7HrEctZvpUJ63KuClcGsc4ENKRBjqMaM46nXcAbAFCJu7ZJCiZuahm1xDsFimRoZqawarpiNkh88rmFTs4nowAQrSNyzvj+y6jfW05K0Vb2HfX0FU2pBPEpbRmqd5163wV+5U/Sh7TZZ5/j8ULhvvn4wWFaMnEORVRryqlKl4B76dsToZxlTXC/Tmy+DcXXFC7Bi7gqLar6A+Iot5L02Ve6QpU6x1RT0TplV2AIGm6cabP6mBci27L4Uo1NO4+Ccm51NIn27BCMajgpDO/otSMsQH2i7ukqq26jIHV60uWnHV4PNeFwPRkKljasN8cDUKkD+8kmjZEmkIXXCX99NBPqbPG6d+TuP6qKEgIT9wv9jZ0DSXb6MF5SpH+vCeOis6hOAB9tWR4uNcvQ6q/5GqSDSDLktF6Ex2Tg1DbjnQ8kBVvUuSCr5s6AmcypGYk1YKtZ0uqOZY+axOpcTn0+0OvryfGu+U40eMdq7o7ZIP+Fd2AZcMV3FN/M01NSutXwhgdnToHrirXkPEzR1ppSxWqVobrMg+li0/bpKrEeQyPKdY2ZBJzm0Km3L6mhjAKIv+h8D7xXpmggGlNZlnDd0a388mh86vyXrGQcnU0j6gt6f09Y+XcMO9wqTyC3JOCkEH4UialQn5Zbjl6CpcLtGy7gwd8Xy1THAHDfk8mTEKBEijY4DKWhmpGpMRRMKc+x1b7KyV0VKufR0DvnG+JB8cVRYU4lrF9/fmFK+OGVLttdHEd/8xdrVW15BxeVg13JAUMDHIu7mvWuqWTZ7tT9HYhIjVQTw5bexPqUl7REgBOjA9MjK7r/P9OgJmTBlfWEB+oUpvR92wQd+grB+ae2SC2Lany3NZmi1dOWKSRYy1RwRmE1Z9P46Zyo12bl49hyZNlH765sLeXMMglg2unTg0SfHKWewjQvgtMHdQjzi4JKRLHfNoS X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2e08b5b6-16a6-45ee-7a60-08daabd6c974 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2022 22:20:22.0206 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3P250MB0305 Subject: [FFmpeg-devel] [PATCH v2] avcodec/startcode: Avoid unaligned accesses X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: mKb3/1b2euq2 Up until now, ff_startcode_find_candidate_c() simply casts an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time in case HAVE_FAST_UNALIGNED is true. Yet this ignores the alignment requirement of these types as well as effective type rules of the C standard. This commit therefore replaces these direct accesses with AV_RN64/32; this also improves readability. UBSan reported these unaligned accesses which happened in 233 FATE-tests involving H.264 and VC-1 (this has also been reported in tickets #8138 and #8485); these tests are fixed by this commit. The output of GCC with -O3 is unchanged for aarch64, alpha, arm, loongarch, ppc and x64. There was only a slight difference for mips. Signed-off-by: Andreas Rheinhardt --- This is v2 of https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200122145210.6898-1-andreas.rheinhardt@gmail.com/ Here is the mips code before this change: startcode_old_O3.o: file format elf64-tradlittlemips Disassembly of section .text: 0000000000000000 : 0: 18a00029 blez a1,a8 4: 3c08ff7f lui a4,0xff7f 8: 3c07ff01 lui a3,0xff01 c: 65087f7f daddiu a4,a4,32639 10: 64e70101 daddiu a3,a3,257 14: 00084438 dsll a4,a4,0x10 18: 00073c38 dsll a3,a3,0x10 1c: 65087f7f daddiu a4,a4,32639 20: 64e70101 daddiu a3,a3,257 24: 00084478 dsll a4,a4,0x11 28: 00073df8 dsll a3,a3,0x17 2c: 00803025 move a2,a0 30: 00001025 move v0,zero 34: 3508feff ori a4,a4,0xfeff 38: 10000005 b 50 3c: 34e78080 ori a3,a3,0x8080 40: 24420008 addiu v0,v0,8 44: 0045182a slt v1,v0,a1 48: 10600015 beqz v1,a0 4c: 00000000 nop 50: dcc30000 ld v1,0(a2) 54: 0068482d daddu a5,v1,a4 58: 44a30000 dmtc1 v1,$f0 5c: 44a90800 dmtc1 a5,$f1 60: 4be10002 pandn $f0,$f0,$f1 64: 44230000 dmfc1 v1,$f0 68: 00671824 and v1,v1,a3 6c: 1060fff4 beqz v1,40 70: 64c60008 daddiu a2,a2,8 74: 0045182a slt v1,v0,a1 78: 10600009 beqz v1,a0 7c: 0082182d daddu v1,a0,v0 80: 10000005 b 98 84: 90640000 lbu a0,0(v1) 88: 24420001 addiu v0,v0,1 8c: 10a20008 beq a1,v0,b0 90: 00000000 nop 94: 90640000 lbu a0,0(v1) 98: 1480fffb bnez a0,88 9c: 64630001 daddiu v1,v1,1 a0: 03e00008 jr ra a4: 00000000 nop a8: 03e00008 jr ra ac: 00001025 move v0,zero b0: 03e00008 jr ra b4: 00a01025 move v0,a1 ... And here after this change: startcode_new_O3.o: file format elf64-tradlittlemips Disassembly of section .text: 0000000000000000 : 0: 18a0002b blez a1,b0 4: 3c08ff7f lui a4,0xff7f 8: 3c07ff01 lui a3,0xff01 c: 65087f7f daddiu a4,a4,32639 10: 64e70101 daddiu a3,a3,257 14: 00084438 dsll a4,a4,0x10 18: 00073c38 dsll a3,a3,0x10 1c: 65087f7f daddiu a4,a4,32639 20: 64e70101 daddiu a3,a3,257 24: 00084478 dsll a4,a4,0x11 28: 00073df8 dsll a3,a3,0x17 2c: 00803025 move a2,a0 30: 00001025 move v0,zero 34: 3508feff ori a4,a4,0xfeff 38: 10000005 b 50 3c: 34e78080 ori a3,a3,0x8080 40: 24420008 addiu v0,v0,8 44: 0045182a slt v1,v0,a1 48: 10600017 beqz v1,a8 4c: 00000000 nop 50: 68c30007 ldl v1,7(a2) 54: 6cc30000 ldr v1,0(a2) 58: 0068482d daddu a5,v1,a4 5c: 44a30000 dmtc1 v1,$f0 60: 44a90800 dmtc1 a5,$f1 64: 4be10002 pandn $f0,$f0,$f1 68: 44230000 dmfc1 v1,$f0 6c: 00671824 and v1,v1,a3 70: 1060fff3 beqz v1,40 74: 64c60008 daddiu a2,a2,8 78: 0045182a slt v1,v0,a1 7c: 1060000a beqz v1,a8 80: 0082182d daddu v1,a0,v0 84: 10000006 b a0 88: 90640000 lbu a0,0(v1) 8c: 00000000 nop 90: 24420001 addiu v0,v0,1 94: 10a20008 beq a1,v0,b8 98: 00000000 nop 9c: 90640000 lbu a0,0(v1) a0: 1480fffb bnez a0,90 a4: 64630001 daddiu v1,v1,1 a8: 03e00008 jr ra ac: 00000000 nop b0: 03e00008 jr ra b4: 00001025 move v0,zero b8: 03e00008 jr ra bc: 00a01025 move v0,a1 As one can see, the difference is that an ld has been replaced by a pair of ldl and ldr. I don't know the performance implications of this. libavcodec/startcode.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c index 9efdffe8c6..d84f326521 100644 --- a/libavcodec/startcode.c +++ b/libavcodec/startcode.c @@ -25,6 +25,7 @@ * @author Michael Niedermayer */ +#include "libavutil/intreadwrite.h" #include "startcode.h" #include "config.h" @@ -38,14 +39,14 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) */ #if HAVE_FAST_64BIT while (i < size && - !((~*(const uint64_t *)(buf + i) & - (*(const uint64_t *)(buf + i) - 0x0101010101010101ULL)) & + !((~AV_RN64(buf + i) & + (AV_RN64(buf + i) - 0x0101010101010101ULL)) & 0x8080808080808080ULL)) i += 8; #else while (i < size && - !((~*(const uint32_t *)(buf + i) & - (*(const uint32_t *)(buf + i) - 0x01010101U)) & + !((~AV_RN32(buf + i) & + (AV_RN32(buf + i) - 0x01010101U)) & 0x80808080U)) i += 4; #endif