From patchwork Tue Aug 2 00:23:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 37081 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2828048pzb; Mon, 1 Aug 2022 17:24:11 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tydbWzoDFiwiopv1JRq/Z26Uzvs/aD5vYUpd/iOSYiIYh/hBiKoLX9XD5lfxo2axhvgrSa X-Received: by 2002:a05:6402:ce:b0:43c:874f:e08f with SMTP id i14-20020a05640200ce00b0043c874fe08fmr18522117edu.225.1659399851586; Mon, 01 Aug 2022 17:24:11 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i19-20020a05640242d300b0043d67a7dd61si6235581edc.179.2022.08.01.17.24.11; Mon, 01 Aug 2022 17:24:11 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=cVZ4fYrh; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 637D568B989; Tue, 2 Aug 2022 03:24:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05olkn2087.outbound.protection.outlook.com [40.92.90.87]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9842568B4AF for ; Tue, 2 Aug 2022 03:24:00 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ClCT5vWEzCg05+s3hHP3neFNOvU3xfhuVR3n3I9NOOizaI65d5WZzv4rnUmdO1O+sl2BjDfShMyK/EOfwGUkyyDSAvKTiMSNkD8hcJX8lsdtWsjM0CHfXz2hrsZI+oJKlJIOSc3SQP1bYqLFrHR7pd2iW1iJ2gvks+mSuOAgesGfg0XqRKBqXH9B3zbuDhm82Y1UzzLvQW0CSBUarq/7+PoBHuHNI6Xbjgok/BgtyPD1Xsfz+JTGaYslpCNX58tUoO4hI2Pw15zLRDfhP1YBfuIHWdqFGg/ue18n0GyFgDc23nGZoYPWYWYVXdpvH/8dehGRSqdG+r42Gm6zhOuAwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=d+UZYg9lVqBdCB1oxzefYbRLJstUAPXh5bWmFosL14w=; b=EHI3CI9qEDKxxrHM8donPewNwlYc7DosvrAPTDhHOEzh96IdlhHW/BcfPEuwnJ9wsWuKEBuwZL8BzW1CfnnJn43ovgkzLDpkYEnZ/k9cfINIGouoxEU0ATPuFjC8OQoHNmfYkNmmd+dPcGewNj/g1nhDqfnRtoV+EYacoLDvntxo4fMtGF7LvFBYADeuQii7ROXwlad3+YyjEwXUyqjo8xPW1B4Ab7glKZsIN9lm8PuunS5OCh/VetBHJKnBX2TlcZgLn3iYq5eDPZrWgsMWqvSdX8jZLP7yQXhnzYHbGkdWSsv9i4Y3Hsk+kPrEyAF8yl0M3jXkSGH55ktE18pRkQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=d+UZYg9lVqBdCB1oxzefYbRLJstUAPXh5bWmFosL14w=; b=cVZ4fYrhRWKd/pjagDz/7LSW6zya+TPg1yVmhxRrgpXgPITiOlGNFRkkR9TQSH0qgyjFZ8sQHc5tgcqr0XHajwCdjHbAkkhlg7YBKGT9JfS/WTDzvkBBBROX0eRA+wVJ6T2kY8Y4cVlu5VdLoQzOyziHI5SR4auwhw4r5wDYQNfaKA5MnXT6Z9sYXd9hAADzaq1GO+H6IsoC4Ic6B0wXt4ThGlGa93svcewouHtMhyps4wT6rp0eDTTNqLCSAPXLdU3aNohX0euJSBAM4LrVGSMhmsyLtqFNYbPSxjuU7B56mFpT38lt3yxVAS/UdnVrJd0xkdqCDb6d8V50St7jnQ== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by VE1PR01MB5502.eurprd01.prod.exchangelabs.com (2603:10a6:803:119::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5482.11; Tue, 2 Aug 2022 00:23:54 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::210e:b627:bcc9:8c46%11]) with mapi id 15.20.5482.016; Tue, 2 Aug 2022 00:23:54 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Tue, 2 Aug 2022 02:23:11 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [oyMvX3D4GPxXMdORbMLV99iaLjjRi1OV] X-ClientProxiedBy: ZR0P278CA0067.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:21::18) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <20220802002312.262441-2-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: dfdd84b6-0f2f-4737-23bf-08da741d4817 X-MS-Exchange-SLBlob-MailProps: VCKv1Cle4b3INLEjV/qCxZehfoDeVjdXNlOqSVCBbaRZJB5Ki9SNIEMeH8oy2DpzrVurWA9HKz11RghIRZhJeBdVRydlXMvhYeI1S19ZgkxAjW2CeNU4q3JeXBYKqB7xtbxG6UE9uGc4UjSrCO0LKSUDgAmuMgyX4mKxgtQdQVkMOe8FyOmeK/NhE6Ur7iM0h9ZpOMs0BLCzFEDLTtj/ShX030sdIq7DAjq+wvdpEWJjA3bRIwwom1WfDjGNdq/wsR5VApAcgPFtzsh2qvH4/lqzzAu2dyL82c/QPIGNAvW8ksD6ZYPIZEIPeXQZdR0mOTvA9HZN9RDluKmBdHlN10zfk1SeZT/L/npaeQDe8Qnp75JrcHKow36XrOY3ohzYWla5ElDL3dacN9VDlOy7s1zhK5zHMgkQnFT2D6IT02HP3uXDpe2DaEeR2wjpECusviD0nrR+c59bOqT+IlDknR6iV2wDExjEp0rmh7vKrlg+fZqjTvPJXFcCUJ2UCFh3ieGOs3POoyQw1pegWwSLrtow0qyik6MNpB8dLcbi5zZgUwBDpUK9VH2OuiUR5RDwrqu7OodUkY7qsSzpcNFtbu4rbLqmwDHpk9wqIW9IM0lglb3X99wSvEVdWyBqPu9jHPr/yDyWHBEBZFhLTWT3btU0u3Wjl86hkJn7fz3kyTfWN4N77jVow+Eou8sPuWAJuP7txSJ/4InYjiMqBVq8LPmhxzQfjTMJn9lK0mx2GVxuL67Fz6ar1KcFBZWnXQCcx6iOCkW5SlQLrQXDzw6DHTZYNrScQzF1rFmdveKxbd9i5kqq9DSJ71gk7PqWMqit7RJpx2bRJt/Nu4IYS5sXNeWhR1U8Std+7GooMyX74slaWccTPqWr6XqYIN/2pozgH9jqE+In6XcNpX3VqF0rDM2MqBndeHwlblWFXqOFbYfHJK37bxageyrepLlST5pCYjPO7RMd9qmeLpAVG97vV2xs4EVl83/tSLwwuPFmgru5HnpbPilraRxFXR5diL/84W72DPGe4PCiknp2Gw8lorHcGsdTBAnzOSMKg205uKI= X-MS-TrafficTypeDiagnostic: VE1PR01MB5502:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: hJTi0FLvPeQ9wYFfBLr02RWe2Q6mnfGpKzONOb5ig4m2EbfX+E3j35wBYZZ6JIYlnnYpiQDBEftN4OUSYfIVbfSadKQRyMVnOzFmAPMQbvjpiA5uqc72J3nl7VMuQM0yqiMgHHR7Lpe7cTkTpPy1aZNF1kMwVtKRcbC+yJGMgD7gW4a5suMqCLwEKx7m6/JvZ4jPwwK7u9VrZh46ZwjhJOmCRdQETX9yHj0MlNYiGPxqqCDf0PDD7eZS6iF+7KeQ1vFme9ZBXh6KsSj1MdO2km48Y1dVdYU0XELmweKt+Ju49NntK8l6MkxwEkku7Su8RSe8721Bnw6niKar6bnDgyDVH2tDY4y9ULRa0EmscVNAzH+mQKpcpunfyWkbWaE4gGgvJYzsfqh427bApw3OPLlhuTjwxBeJZLlbgJXk0xYiHGQF5886NJfHW8i0+bZnnBWG6k6QzxTtBJQ9VP+jo2TY0l6PB1+sac1O0EOjctws6NyA8U7ZbFuUxt4gtCDTgGoKjP/sBs19FLW2pgPH3cUbF5NzvejiGE98QVR1Xv6jjfhWiqwM3gd3eyjfBVrmUukqWefUMuZONkd1mXdMTdPZMVPQnrE1/q01y3M3xVnR8r7e23rL48BFXOHfIPXH4EqRwEPSxGn4EiWUdEXwkw== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: qoznnxKAt/oH9Bgo0zSkOzVeru6VTB+Uhz2VV27Tj2JNY9hoZ7R3Z2XBha3thcvDyw/weTo7AUoXNk8f/d0YH4uebFqIY6ofB9P8N5nTAIYskH2bs+Umh94hplj2CqY9PQJ5+OiD3rbHE2RZjd/h7nIzGxgwMmApYBcvinQgkpAsdikSs/JV1S1eLfPb/VOACA7Vhh6rs9ClCjInY8oq1DP+EKDbL9eG6PztHpGzzYeGkYFjI22UtvmD32fcg9NJBh0QdK1aCVu5hsLjyU5R8HwNg3l1yJp7JYj7qzXA1Ixw3mxlW4NN1BGcCMHbjgO5W0lrepQivCslS6+HLg4+n29Wo/S7d9qA8mwHQCx+J0c8vLm5qPIYxdwihm7yS34OXCDPL4QEHFt1eB0kuvy1BQawIHsJx4LRusUawXeiKeCf06rbZMMr3uDb7Ktxm0pvNlokFefkeMnlrTq9tg56MfWMKADuEZTeyFQq7tzu46uaynBGviXd4fTTBbenKAy5L1agLy4XHaREnn9FlKgvJi76a6sBND0lld+MJiuvG2J4PpPi0TRT5EGXALiPzE8GsjkpnOwrJIidy8fjx/Q4y+IgHk7S+FD+9VuLzwV2rsF/IReb5KwVo7A2y5uMp/MSbsqhuz9/MNGgonqFaghneAK1gK8ChAgwseB/OKEN7tcP8cD0HOmvDRWKx+aqt3eTIseIhhMf8rYmMCtn/NvLk1af3hOyZaquAgc2aG2wSoIQzrPEJm8UP+djeewHkdk57iQCOc9LwQf+PIAqnPlDrGSRiwS2ZIxc76HvCARSayCl0kSck38LJBNuu25BmVaFx3BST8K02B/KzMGRzN/lamPRubMagvialuFH9JQ1yjwGaBnaIYhI+aZeeKqWgOBBCLRr4HlczZWpcVKkIicX3cvQOUxUvTa08RkXVcRbjo7DVVrM/vPTEJu87kFuoGDMuXFEYf+9UdYqBAhYkv8mQKXgxt9vk2HIfgoCo+zXpKYC9X/eKWK5FAYNqg6iQPaVyPaY8DH+N37l+k9BjJHgxSi6K5YuJGnH6wJnL9S/a2KoJHJPGPFMQL1Rk1BCkJ0msZeyowZ4RprlmC7B45fdiTPnjP/cZaBpAtJXoo0Ss4hcObKN9akxpxEV5wjk4CkdHIWNSgLxoadZ3lwCvZ+gB2UWmQu8O1Q+gegxvAX/nsqRyTPhipui7isY+57sGxpK1UzKtSxB46NcE2cEh4pSUn4It6abKBL5VMLvQBKs0FxJfi2qja6vP6iIgpqSHLL+9EpZ7Emrx55eShcfGy9yKzzfHVRKhRc8S0Q6Vd8lLk4= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: dfdd84b6-0f2f-4737-23bf-08da741d4817 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Aug 2022 00:23:54.1820 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR01MB5502 Subject: [FFmpeg-devel] [PATCH v5 3/4] avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Shiyou Yin , Lu Wang , Hao Chen , Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6gvp41zCTsrv __lasx_xvldx does not accept a pointer to const (in fact, no function in lasxintrin.h does so), although it is not allowed to modify the pointed-to buffer. Therefore this commit adds a wrapper for it in order to constify the H264Chroma API in a later commit. Signed-off-by: Andreas Rheinhardt --- libavcodec/loongarch/h264chroma_lasx.c | 94 ++++++++++++++------------ libavcodec/loongarch/vc1dsp_lasx.c | 20 +++--- 2 files changed, 61 insertions(+), 53 deletions(-) diff --git a/libavcodec/loongarch/h264chroma_lasx.c b/libavcodec/loongarch/h264chroma_lasx.c index 824a78dfc8..bada8bb5ed 100644 --- a/libavcodec/loongarch/h264chroma_lasx.c +++ b/libavcodec/loongarch/h264chroma_lasx.c @@ -26,6 +26,10 @@ #include "libavutil/avassert.h" #include "libavutil/loongarch/loongson_intrinsics.h" +/* __lasx_xvldx() in lasxintrin.h does not accept a const void*; + * remove the following once it does. */ +#define LASX_XVLDX(ptr, stride) __lasx_xvldx((void*)ptr, stride) + static const uint8_t chroma_mask_arr[64] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, @@ -51,7 +55,7 @@ static av_always_inline void avc_chroma_hv_8x4_lasx(uint8_t *src, uint8_t *dst, __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); DUP2_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src1, src3); src0 = __lasx_xvshuf_b(src0, src0, mask); @@ -91,10 +95,10 @@ static av_always_inline void avc_chroma_hv_8x8_lasx(uint8_t *src, uint8_t *dst, __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src6, src5, 0x20, src8, src7, 0x20, src1, src3, src5, src7); @@ -141,8 +145,8 @@ static av_always_inline void avc_chroma_hz_8x4_lasx(uint8_t *src, uint8_t *dst, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src1, src2); - src3 = __lasx_xvldx(src, stride_3x); + DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src1, src2); + src3 = LASX_XVLDX(src, stride_3x); DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src2); DUP2_ARG3(__lasx_xvshuf_b, src0, src0, mask, src2, src2, mask, src0, src2); DUP2_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, res0, res1); @@ -170,11 +174,11 @@ static av_always_inline void avc_chroma_hz_8x8_lasx(uint8_t *src, uint8_t *dst, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); src += stride_4x; - DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src5, src6); - src7 = __lasx_xvldx(src, stride_3x); + DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src5, src6); + src7 = LASX_XVLDX(src, stride_3x); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4, 0x20, src7, src6, 0x20, src0, src2, src4, src6); DUP4_ARG3(__lasx_xvshuf_b, src0, src0, mask, src2, src2, mask, src4, src4, mask, @@ -212,7 +216,7 @@ static av_always_inline void avc_chroma_hz_nonmult_lasx(uint8_t *src, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); for (row = height >> 2; row--;) { - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src0, src1, src2, src3); src += stride_4x; DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src2); @@ -228,7 +232,7 @@ static av_always_inline void avc_chroma_hz_nonmult_lasx(uint8_t *src, if ((height & 3)) { src0 = __lasx_xvld(src, 0); - src1 = __lasx_xvldx(src, stride); + src1 = LASX_XVLDX(src, stride); src1 = __lasx_xvpermi_q(src1, src0, 0x20); src0 = __lasx_xvshuf_b(src1, src1, mask); res0 = __lasx_xvdp2_h_bu(src0, coeff_vec); @@ -253,7 +257,7 @@ static av_always_inline void avc_chroma_vt_8x4_lasx(uint8_t *src, uint8_t *dst, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); src += stride; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); @@ -282,10 +286,10 @@ static av_always_inline void avc_chroma_vt_8x8_lasx(uint8_t *src, uint8_t *dst, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); src += stride; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); @@ -402,7 +406,7 @@ static void avc_chroma_hv_4x2_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, __m256i coeff_vt_vec = __lasx_xvpermi_q(coeff_vt_vec1, coeff_vt_vec0, 0x02); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2, src1, src2); + DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2, src1, src2); DUP2_ARG3(__lasx_xvshuf_b, src1, src0, mask, src2, src1, mask, src0, src1); src0 = __lasx_xvpermi_q(src0, src1, 0x02); res_hz = __lasx_xvdp2_h_bu(src0, coeff_hz_vec); @@ -431,7 +435,7 @@ static void avc_chroma_hv_4x4_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); DUP4_ARG3(__lasx_xvshuf_b, src1, src0, mask, src2, src1, mask, src3, src2, mask, src4, src3, mask, src0, src1, src2, src3); @@ -464,10 +468,10 @@ static void avc_chroma_hv_4x8_lasx(uint8_t *src, uint8_t * dst, ptrdiff_t stride __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); src += stride_4; - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvshuf_b, src1, src0, mask, src2, src1, mask, src3, src2, mask, src4, src3, mask, src0, src1, src2, src3); @@ -519,7 +523,7 @@ static void avc_chroma_hz_4x2_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - src1 = __lasx_xvldx(src, stride); + src1 = LASX_XVLDX(src, stride); src0 = __lasx_xvshuf_b(src1, src0, mask); res = __lasx_xvdp2_h_bu(src0, coeff_vec); res = __lasx_xvslli_h(res, 3); @@ -540,8 +544,8 @@ static void avc_chroma_hz_4x4_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2, src1, src2); - src3 = __lasx_xvldx(src, stride_3); + DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2, src1, src2); + src3 = LASX_XVLDX(src, stride_3); DUP2_ARG3(__lasx_xvshuf_b, src1, src0, mask, src3, src2, mask, src0, src2); src0 = __lasx_xvpermi_q(src0, src2, 0x02); res = __lasx_xvdp2_h_bu(src0, coeff_vec); @@ -567,11 +571,11 @@ static void avc_chroma_hz_4x8_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); src += stride_4; - DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2, src5, src6); - src7 = __lasx_xvldx(src, stride_3); + DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2, src5, src6); + src7 = LASX_XVLDX(src, stride_3); DUP4_ARG3(__lasx_xvshuf_b, src1, src0, mask, src3, src2, mask, src5, src4, mask, src7, src6, mask, src0, src2, src4, src6); DUP2_ARG3(__lasx_xvpermi_q, src0, src2, 0x02, src4, src6, 0x02, src0, src4); @@ -625,7 +629,7 @@ static void avc_chroma_vt_4x2_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); src0 = __lasx_xvld(src, 0); - DUP2_ARG2(__lasx_xvldx, src, stride, src, stride << 1, src1, src2); + DUP2_ARG2(LASX_XVLDX, src, stride, src, stride << 1, src1, src2); DUP2_ARG2(__lasx_xvilvl_b, src1, src0, src2, src1, tmp0, tmp1); tmp0 = __lasx_xvilvl_d(tmp1, tmp0); res = __lasx_xvdp2_h_bu(tmp0, coeff_vec); @@ -649,7 +653,7 @@ static void avc_chroma_vt_4x4_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); src0 = __lasx_xvld(src, 0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); DUP4_ARG2(__lasx_xvilvl_b, src1, src0, src2, src1, src3, src2, src4, src3, tmp0, tmp1, tmp2, tmp3); @@ -679,10 +683,10 @@ static void avc_chroma_vt_4x8_lasx(uint8_t *src, uint8_t *dst, ptrdiff_t stride, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); src += stride_4; - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, src, stride_4, src5, src6, src7, src8); DUP4_ARG2(__lasx_xvilvl_b, src1, src0, src2, src1, src3, src2, src4, src3, tmp0, tmp1, tmp2, tmp3); @@ -860,7 +864,7 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x4_lasx(uint8_t *src, __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); DUP2_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src1, src3); src0 = __lasx_xvshuf_b(src0, src0, mask); @@ -874,7 +878,7 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x4_lasx(uint8_t *src, res_vt0 = __lasx_xvmadd_h(res_vt0, res_hz0, coeff_vt_vec1); res_vt1 = __lasx_xvmadd_h(res_vt1, res_hz1, coeff_vt_vec1); out = __lasx_xvssrarni_bu_h(res_vt1, res_vt0, 6); - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); tp0 = __lasx_xvpermi_q(tp2, tp0, 0x20); @@ -907,10 +911,10 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x8_lasx(uint8_t *src, DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); src += stride; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src6, src5, 0x20, src8, src7, 0x20, src1, src3, src5, src7); @@ -934,12 +938,12 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x8_lasx(uint8_t *src, res_vt3 = __lasx_xvmadd_h(res_vt3, res_hz3, coeff_vt_vec1); DUP2_ARG3(__lasx_xvssrarni_bu_h, res_vt1, res_vt0, 6, res_vt3, res_vt2, 6, out0, out1); - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); dst0 = __lasx_xvpermi_q(tp2, tp0, 0x20); dst += stride_4x; - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); dst -= stride_4x; DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); @@ -973,13 +977,13 @@ static av_always_inline void avc_chroma_hz_and_aver_dst_8x4_lasx(uint8_t *src, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); mask = __lasx_xvld(chroma_mask_arr, 0); - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src0, src1, src2, src3); DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src2); DUP2_ARG3(__lasx_xvshuf_b, src0, src0, mask, src2, src2, mask, src0, src2); DUP2_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, res0, res1); out = __lasx_xvssrarni_bu_h(res1, res0, 6); - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); tp0 = __lasx_xvpermi_q(tp2, tp0, 0x20); @@ -1008,10 +1012,10 @@ static av_always_inline void avc_chroma_hz_and_aver_dst_8x8_lasx(uint8_t *src, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); mask = __lasx_xvld(chroma_mask_arr, 0); - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src0, src1, src2, src3); src += stride_4x; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src4, src5, src6, src7); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4, 0x20, src7, src6, 0x20, src0, src2, src4, src6); @@ -1020,12 +1024,12 @@ static av_always_inline void avc_chroma_hz_and_aver_dst_8x8_lasx(uint8_t *src, DUP4_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, src4, coeff_vec, src6, coeff_vec, res0, res1, res2, res3); DUP2_ARG3(__lasx_xvssrarni_bu_h, res1, res0, 6, res3, res2, 6, out0, out1); - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); dst0 = __lasx_xvpermi_q(tp2, tp0, 0x20); dst += stride_4x; - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); dst -= stride_4x; DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); @@ -1059,14 +1063,14 @@ static av_always_inline void avc_chroma_vt_and_aver_dst_8x4_lasx(uint8_t *src, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); - DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); DUP2_ARG2(__lasx_xvilvl_b, src1, src0, src3, src2, src0, src2); DUP2_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, res0, res1); out = __lasx_xvssrarni_bu_h(res1, res0, 6); - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); tp0 = __lasx_xvpermi_q(tp2, tp0, 0x20); @@ -1095,10 +1099,10 @@ static av_always_inline void avc_chroma_vt_and_aver_dst_8x8_lasx(uint8_t *src, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); src += stride; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); @@ -1109,12 +1113,12 @@ static av_always_inline void avc_chroma_vt_and_aver_dst_8x8_lasx(uint8_t *src, DUP4_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, src4, coeff_vec, src6, coeff_vec, res0, res1, res2, res3); DUP2_ARG3(__lasx_xvssrarni_bu_h, res1, res0, 6, res3, res2, 6, out0, out1); - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); dst0 = __lasx_xvpermi_q(tp2, tp0, 0x20); dst += stride_4x; - DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); dst -= stride_4x; DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); diff --git a/libavcodec/loongarch/vc1dsp_lasx.c b/libavcodec/loongarch/vc1dsp_lasx.c index 40b8668f2b..63950bc076 100644 --- a/libavcodec/loongarch/vc1dsp_lasx.c +++ b/libavcodec/loongarch/vc1dsp_lasx.c @@ -22,6 +22,10 @@ #include "vc1dsp_loongarch.h" #include "libavutil/loongarch/loongson_intrinsics.h" +/* __lasx_xvldx() in lasxintrin.h does not accept a const void*; + * remove the following once it does. */ +#define LASX_XVLDX(ptr, stride) __lasx_xvldx((void*)ptr, stride) + void ff_vc1_inv_trans_8x8_lasx(int16_t block[64]) { int32_t con_4 = 4; @@ -831,20 +835,20 @@ static void put_vc1_mspel_mc_h_lasx(uint8_t *dst, const uint8_t *src, const_para1_2 = __lasx_xvreplgr2vr_h(*(para_v + 1)); in0 = __lasx_xvld(_src, 0); - DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in1, in2); - in3 = __lasx_xvldx(_src, stride3); + DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in1, in2); + in3 = LASX_XVLDX(_src, stride3); _src += stride4; in4 = __lasx_xvld(_src, 0); - DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in5, in6); - in7 = __lasx_xvldx(_src, stride3); + DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in5, in6); + in7 = LASX_XVLDX(_src, stride3); _src += stride4; in8 = __lasx_xvld(_src, 0); - DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in9, in10); - in11 = __lasx_xvldx(_src, stride3); + DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in9, in10); + in11 = LASX_XVLDX(_src, stride3); _src += stride4; in12 = __lasx_xvld(_src, 0); - DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in13, in14); - in15 = __lasx_xvldx(_src, stride3); + DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in13, in14); + in15 = LASX_XVLDX(_src, stride3); DUP4_ARG2(__lasx_xvilvl_b, in2, in0, in3, in1, in6, in4, in7, in5, tmp0_m, tmp1_m, tmp2_m, tmp3_m); DUP4_ARG2(__lasx_xvilvl_b, in10, in8, in11, in9, in14, in12, in15, in13,