From patchwork Fri Sep 9 12:42:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 37787 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:139a:b0:8f:1db5:eae2 with SMTP id w26csp887684pzh; Fri, 9 Sep 2022 05:42:43 -0700 (PDT) X-Google-Smtp-Source: AA6agR4SSHG4cDTL41p/EYHUPjdG0ZvOZrhLhS7o3cuQ/pDUSrf4ncEFS8uwzkRG0qAWk+sLDjwJ X-Received: by 2002:a17:907:2672:b0:734:a952:439a with SMTP id ci18-20020a170907267200b00734a952439amr9092696ejc.539.1662727362983; Fri, 09 Sep 2022 05:42:42 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s20-20020aa7d794000000b0044de73821efsi313570edq.588.2022.09.09.05.42.42; Fri, 09 Sep 2022 05:42:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=VhamdxfY; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 615D868BB05; Fri, 9 Sep 2022 15:42:38 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05olkn2077.outbound.protection.outlook.com [40.92.91.77]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EB58D68B9B2 for ; Fri, 9 Sep 2022 15:42:31 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Qae1e0xrvaNdl4M9S2VKDJvlAByZTk3HJEa24Jy/mSC9j24PkjAaRcVdddkT2omMvrvHl2Tzi7VOh5kNZmF0+ldhCh5IdDKkruCKjfA8T17W/cOmdWyy6iWBvl9P+N5SywE4Y5gLjKGjWS80JI2zS8gWmZa+YdGyf0XRPV4Taw3IRE/i9vL9z2FyYrKvprWQRifgkwNJ01dodCV9dHzs+xdYcZt/ZhbncMbZsBXg9q4aLETUgf68/3+CpEbbFlyD0L5Hh/j84QcvQiEEtMOt4WOvwRVZT4DTbyNZQi0y1143CUQaKWd6zwrsuQyPguSdGGe2X4Tu9mhV/Ywi5+H+Sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SqsTTtEoDxhz5GIEwYhWrVbEmWs4C3BUWyfU4h5CnKc=; b=NgV0arJIxihdmPSYF6S2f7RrB+35M7l4XPcepNG8VjDh9OjzXuqGooyPm1fOYLIicGUt6U9lIWnle7RpzFMGzWdf4sheNyW+cGz8KDdLv27vcqwojqP1/VsTY/WFUqS5wgrE1u0sCjwL+8abJ4VMxwZU3F8+hY8NPYtJOoDKe6X7umYYUZoibRjvYYJ4pvhvQ2KUvlGfbMGDs0ZdxYvydwdOrQJuwdB1J4hcbw/6GTZtFPlF3h7bFNRoSf5MlZS882pSWKuP6T8W8415unrpzxmfQ6e1vLRx/ovYPYG3hjAFCfE8WX/KnyRtQpFdzMTQPXCXOZyKgsfL2tY/xEe6PA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=SqsTTtEoDxhz5GIEwYhWrVbEmWs4C3BUWyfU4h5CnKc=; b=VhamdxfYbu5zWfvKjnuV9+AWnAC1b+tApoQP133FrilKMcAYwXIKVQJ+SQvU3G8qhFlUtAQrTCy+Yi5WscNy9Vpll75XfkixbXSns9diMFqEY7jlJCX5rM2L/LdFJTspWt3t3zyNvncW/mHyk32wXgGGGL4YZ57hAJEKy8O+uRyzTYBIFggkC35FjSoekW8MrX54DlN5NsaIjw0BOVpReV0e2pyEv7wI5ByVJg9tN7hduNdYb61rTdZr9AJRVZ8cndjDy152DsOki2T+iHp6xzt//LWdkQVKGiNfopZhHOKQUOg4qYVSRHbjOvToKMBvNRqFsBf9FIlM9rBoAD0+xw== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by DU2P250MB0286.EURP250.PROD.OUTLOOK.COM (2603:10a6:10:27b::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5566.21; Fri, 9 Sep 2022 12:42:27 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::611e:1608:45cb:b58a]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::611e:1608:45cb:b58a%4]) with mapi id 15.20.5612.019; Fri, 9 Sep 2022 12:42:22 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 9 Sep 2022 14:42:17 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 X-TMN: [RMuNuvtyA/rnhpQ+MheBMBNJDntcnlNM24ON9lwhiGg=] X-ClientProxiedBy: FR0P281CA0096.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:a9::9) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <20220909124218.2779876-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|DU2P250MB0286:EE_ X-MS-Office365-Filtering-Correlation-Id: ef66a13a-79f5-4e49-4178-08da9260bd4a X-MS-Exchange-SLBlob-MailProps: YYQ+YNRYenMXI0KIiv6e2bsjEYGHGZwanr43lPX7HMRyXXBUIgMiI9ckWBbvdjoAsWpnuxuolVwp4zKH/kbrrQsPdHC1ykuevcdNoxvUkjVvS50n4PJ8H7pyGo9VEmTRGJ0ApA4vc0yM2gge1oBP/h90SYs42LsftBNlsYmgu35vvJXEWpnlHA1FYgYPgsJJQaBjbT/VkUwW3iKxrgk1CXHdJxD01J70hTwGTQaPdOdZjLs590mfH4f5iNiLKU8Pq9WQXl1qOf5MohVv4SLADbCsOYDOnaU+ncwFM8C0OZagrohfI2iHwilKKCZkKQvA1yfrbE0+LAEYRGOrpr2iNKX4x+FUNFB8oOzGp+HIzPcCi/tGgmllWgDgH2j1mWK9Og1rJncjBYH0ansqBwtrcHwlbBueScYbyTW2qq3mcO6qu4CB5QFWdD7Ao1SlftSuF04nsP7cH6cgzsmy0/vkKqJ8aOMsiXAZ4V70/GWgHaOKh8C77Uu2UijlOqiS19penXhNsgMaxmvxPIUVVhTghg9D+Oh0Hi8v5VIOEDpuyek3sY5E2y9NikadNDtZFQ07d4GDhIoMVC9B5+GETOEslCLETi7qEyBhGdAvqQyXHu/pU0rpRoMg0cI9Or2J5i0p3PNQU47wts2Jskp36oLbb2ZrV3M4whk7keLA3EZ/bRZ64Aa3NBT3UqLtDH0exuM9At5pbRafJ6tTYEFccF4cs3uEQw7tkSfSJSO1JchQ/a10kq7m3AaNPq9GG0eU/ieoKcj/e776pDKBSZ3bV5x+p5/+sTJgJaghiPZRSdw2HglaUB61vgPiQ9uAq6bxziufhz9x3K7xgUVnglba2IdnfdnBXIUcHcLjkmNO78JdFr81ZM+QHR8Qa4hMgrryEk2w4HVHkA/BJCBPsryhjmyrwbB+1CDoXJZNEFEdIrX7iVvdAUrr7RImv+ILkUzv7ASFFOVVpuzuuNd/W/LEPSeMwPQ6OyxUev62pOdp6pOVDRKnbIbrjN+wyg== X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QVhhcdiUaksQEafbsqmsXAHW+/XdqdkxHIIbCZQfHmR0xaTIaHa3voXbf+zbpMabdUKfhvFgnzlL6dmDLpLvuoeu7OYPRku2TEIZgV0OVxRIQA3ZyJpNK3/0CriYAlWjwoV/roYjL6R5qlu9dfKBqQiwceSUzpuxfGvbzGVWNpJuaFzuEV55Hzjxf+nz2nwAlStwnm9ucrDfDX/aCXAH1wwYJMrCNT610ziOzom7UC7OTmDfapr6bMl5elLdmcSCkH3ES1M6t29a/JNdcC2Fe/Qo1ogi5SCuVLpQI4zJJldKljOX/JmSbtc5Lc/kSJLt6GemcBvP94HTCOlMAu+00/uGbfoYpN/ALxAAIyGwUMcJVGTIbBmk0TqjKuP9rg7PbWLdQQUzyK7WpzRxfzuyuaS7+IwggKV2KwvaMtobaIp5QRln5J2Nv6AskBM/lX18XORj7aZADywokIfJRjKsxvTtjjUbPnWqLdCGbL7kEAhde1S30ouoocJlsgDm8QG8QU0sLflK1G8WWEIoCFIQQwqIHuCp211uGCwBV0VWWm8nPlRPsqOqktHnaHbWAka0uBGoAwuzgQVzBiufqrL1h7v9x+CCdPcISreXk6HlrXpijScWpyggQJC9zNoY6EVrI0kUTYXwLrUc/Vn/TlP+ZA== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: mo43lTCgXI43WPx8b7Y5WXVjOx8TCSkND87eDyzl8XiNHYQssrGs0XVKTRMr1x+qHZKkxaMrS/Iv41VYfdZGu56hX0ZolrkOffKR5iDwdQisi/JZ4oWlfteI6eybun5DW8yz5vAR3l9DD0zZ8Qr+fJf6PkYCrOESXUwWOajppQYFtYMqpae5vzz+xnXp1ET7qYe5Oixq3VS2gQKzoXnl9y4ynE2Y3PGkg94Avo7x9Xchhogw3ODiL8+p2vHAA++FPKZWpVcKtxz4GLYvSYfAnYmcMfoVQIDqtDoesqgZm7+GS0yBACknPqKgo+UZudbl6258A9GeTt445OYhR24fQG2fA3+kul/68zP3l2W8kSXTz4ZrYJTHAdSGg44C/pgrvVkHpH3tTtSLglQ0mEWUIWwuLQy4exulRtyBPFlwjd0uVWX8QtoevIuedYiWu0RWh6pGUr+7roFGZAmn9odOv1yl91z3U7cbBPcnTBigMLjzRYYxHNWUpkErP17NB3tLeI1mdwWMb2U/qpsPARMElxhDP5uQyJrmUY9V2yXJDODcZF9KW9YBc8Gat6ABamRTFx0hT0Tg9CVGR3mZ246hwdA/UigZn/Vzup0IxtM/U/rSxg1mNAoKr0jE0sgWb52imzLY1k4jBShNOuc0QpOxxhdYdfeuKG+oZeIVv9R2JPEGsCHWfTJ6XTaLBMmIH9NH15/scKpXFRs0mPfD2B//E76TDHVaONScTq6j3HaDR6UCdZUa34oBOaotZ/49AhDGL6oM6XgXke+LNManJVxTDRk0/tp/xaQ9iY3B7qLXrIjkjzn/neTqzSBN9IR+suRfx4NWBbfSsVeX97TDEuWzxFMoiV26wacGv3ZIuxxC7j8CdFVgjGDYT1FWxkXew98ops4W6PQ8uxFiXENWXeiIhRhtorLyv4RHuDAU3YHDNaN3dppkj4q267jT71g6h9NkaEUJLlw4DvRa5MKgRpn5TzNesobUNyttP/V2XiwfjdTQzTsyz2WUuJCLolUNB+Zuf7W7/g/gSu5lJY5j8vxkCuR59Ve2mWI9IYm0xiWBLquvy0gwQu85OidJqe0v14AEz5pERn2MbQZfkUpmEYGLduWpgkSiog4696FO+P8qb6qFoLB4HxFWskCfU/Filp61uUj6GBHdKbfygHQXxyWJvRDHlnFuJEMVY/bZC4DluR7jdhT9spq5IhKzWdltQCXEFtOSYtib45rjZ/gULoKQ2b5TFqUeu2M9MlCu9bNLZX8IldW4vzdBI6qHLbKsFeAs5rIg7zeCwQHSsm4e2pYenCBQ6UPQC6MCF1IhrkOUinKQyW3XOKx0MC4rx4x8Fvls X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ef66a13a-79f5-4e49-4178-08da9260bd4a X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Sep 2022 12:42:22.0695 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2P250MB0286 Subject: [FFmpeg-devel] [PATCH 1/2] Revert "avcodec/loongarch: Add wrapper for __lsx_vldx" X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 7JNmxAZCbGQ6 This reverts commit 6c9a60ada4256cf5c388d8dc48860e24c15396c0. The loongarch headers have been fixed, so that this workaround is no longer necessary. Signed-off-by: Andreas Rheinhardt --- libavcodec/loongarch/hevc_lpf_sao_lsx.c | 52 +++--- libavcodec/loongarch/hevc_mc_bi_lsx.c | 140 +++++++------- libavcodec/loongarch/hevc_mc_uni_lsx.c | 76 ++++---- libavcodec/loongarch/hevc_mc_uniw_lsx.c | 8 +- libavcodec/loongarch/hevcdsp_lsx.c | 218 +++++++++++----------- libavutil/loongarch/loongson_intrinsics.h | 5 - 6 files changed, 247 insertions(+), 252 deletions(-) diff --git a/libavcodec/loongarch/hevc_lpf_sao_lsx.c b/libavcodec/loongarch/hevc_lpf_sao_lsx.c index 1944336876..b5822afd94 100644 --- a/libavcodec/loongarch/hevc_lpf_sao_lsx.c +++ b/libavcodec/loongarch/hevc_lpf_sao_lsx.c @@ -1202,17 +1202,17 @@ static void hevc_sao_edge_filter_0degree_16multiple_lsx(uint8_t *dst, for (; height; height -= 4) { src_minus1 = src - 1; src_minus10 = __lsx_vld(src_minus1, 0); - DUP2_ARG2(LSX_VLDX, src_minus1, src_stride, src_minus1, + DUP2_ARG2(__lsx_vldx, src_minus1, src_stride, src_minus1, src_stride_2x, src_minus11, src_minus12); - src_minus13 = LSX_VLDX(src_minus1, src_stride_3x); + src_minus13 = __lsx_vldx(src_minus1, src_stride_3x); for (v_cnt = 0; v_cnt < width; v_cnt += 16) { src_minus1 += 16; dst_ptr = dst + v_cnt; src10 = __lsx_vld(src_minus1, 0); - DUP2_ARG2(LSX_VLDX, src_minus1, src_stride, src_minus1, + DUP2_ARG2(__lsx_vldx, src_minus1, src_stride, src_minus1, src_stride_2x, src11, src12); - src13 = LSX_VLDX(src_minus1, src_stride_3x); + src13 = __lsx_vldx(src_minus1, src_stride_3x); DUP4_ARG3(__lsx_vshuf_b, src10, src_minus10, shuf1, src11, src_minus11, shuf1, src12, src_minus12, shuf1, src13, src_minus13, shuf1, src_zero0, src_zero1, @@ -1359,7 +1359,7 @@ static void hevc_sao_edge_filter_90degree_4width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src10, src11); __lsx_vstelm_w(dst0, dst, 0, 0); @@ -1418,7 +1418,7 @@ static void hevc_sao_edge_filter_90degree_8width_lsx(uint8_t *dst, /* load in advance */ DUP2_ARG2(__lsx_vld, src - src_stride, 0, src, 0, src_minus10, src_minus11); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src10, src11); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src10, src11); for (height -= 2; height; height -= 2) { src += src_stride_2x; @@ -1452,7 +1452,7 @@ static void hevc_sao_edge_filter_90degree_8width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src10, src11); __lsx_vstelm_d(dst0, dst, 0, 0); @@ -1529,7 +1529,7 @@ static void hevc_sao_edge_filter_90degree_16multiple_lsx(uint8_t *dst, src_minus10, src_minus11); for (h_cnt = (height >> 2); h_cnt--;) { - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src, src_stride_3x, src, src_stride_4x, src10, src11, src12, src13); DUP4_ARG2(__lsx_vseq_b, src_minus11, src_minus10, src_minus11, @@ -1636,7 +1636,7 @@ static void hevc_sao_edge_filter_45degree_4width_lsx(uint8_t *dst, /* load in advance */ DUP2_ARG2(__lsx_vld, src_orig - src_stride, 0, src_orig, 0, src_minus10, src_minus11); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); for (height -= 2; height; height -= 2) { @@ -1678,7 +1678,7 @@ static void hevc_sao_edge_filter_45degree_4width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); __lsx_vstelm_w(dst0, dst, 0, 0); @@ -1749,7 +1749,7 @@ static void hevc_sao_edge_filter_45degree_8width_lsx(uint8_t *dst, /* load in advance */ DUP2_ARG2(__lsx_vld, src_orig - src_stride, 0, src_orig, 0, src_minus10, src_minus11); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); for (height -= 2; height; height -= 2) { @@ -1791,7 +1791,7 @@ static void hevc_sao_edge_filter_45degree_8width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11) __lsx_vstelm_d(dst0, dst, 0, 0); __lsx_vstelm_d(dst0, dst + dst_stride, 0, 1); @@ -1834,7 +1834,7 @@ static void hevc_sao_edge_filter_45degree_8width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); __lsx_vstelm_d(dst0, dst, 0, 0); @@ -1881,17 +1881,17 @@ static void hevc_sao_edge_filter_45degree_16multiple_lsx(uint8_t *dst, src_orig = src - 1; dst_orig = dst; src_minus11 = __lsx_vld(src_orig, 0); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src_minus12, src_minus13); - src_minus14 = LSX_VLDX(src_orig, src_stride_3x); + src_minus14 = __lsx_vldx(src_orig, src_stride_3x); for (v_cnt = 0; v_cnt < width; v_cnt += 16) { src_minus10 = __lsx_vld(src_orig - src_stride, 0); src_orig += 16; src10 = __lsx_vld(src_orig, 0); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src11, src12); - src13 = LSX_VLDX(src_orig, src_stride_3x); + src13 = __lsx_vldx(src_orig, src_stride_3x); src_plus13 = __lsx_vld(src + v_cnt + src_stride_4x, 1); DUP4_ARG3(__lsx_vshuf_b, src10, src_minus11, shuf1, src11, @@ -2017,7 +2017,7 @@ static void hevc_sao_edge_filter_135degree_4width_lsx(uint8_t *dst, /* load in advance */ DUP2_ARG2(__lsx_vld, src_orig - src_stride, 0, src_orig, 0, src_minus10, src_minus11); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); for (height -= 2; height; height -= 2) { @@ -2059,7 +2059,7 @@ static void hevc_sao_edge_filter_135degree_4width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); __lsx_vstelm_w(dst0, dst, 0, 0); @@ -2132,7 +2132,7 @@ static void hevc_sao_edge_filter_135degree_8width_lsx(uint8_t *dst, /* load in advance */ DUP2_ARG2(__lsx_vld, src_orig - src_stride, 0, src_orig, 0, src_minus10, src_minus11); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); for (height -= 2; height; height -= 2) { @@ -2174,7 +2174,7 @@ static void hevc_sao_edge_filter_135degree_8width_lsx(uint8_t *dst, src_minus11 = src11; /* load in advance */ - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src10, src11); __lsx_vstelm_d(dst0, dst, 0, 0); @@ -2257,18 +2257,18 @@ static void hevc_sao_edge_filter_135degree_16multiple_lsx(uint8_t *dst, dst_orig = dst; src_minus11 = __lsx_vld(src_orig, 0); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src_plus10, src_plus11); - src_plus12 = LSX_VLDX(src_orig, src_stride_3x); + src_plus12 = __lsx_vldx(src_orig, src_stride_3x); for (v_cnt = 0; v_cnt < width; v_cnt += 16) { src_minus10 = __lsx_vld(src_orig - src_stride, 2); - src_plus13 = LSX_VLDX(src_orig, src_stride_4x); + src_plus13 = __lsx_vldx(src_orig, src_stride_4x); src_orig += 16; src10 = __lsx_vld(src_orig, 0); - DUP2_ARG2(LSX_VLDX, src_orig, src_stride, src_orig, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_orig, src_stride, src_orig, src_stride_2x, src11, src12); - src13 =LSX_VLDX(src_orig, src_stride_3x); + src13 =__lsx_vldx(src_orig, src_stride_3x); DUP4_ARG3(__lsx_vshuf_b, src10, src_minus11, shuf1, src11, src_plus10, shuf1, src12, src_plus11, shuf1, src13, diff --git a/libavcodec/loongarch/hevc_mc_bi_lsx.c b/libavcodec/loongarch/hevc_mc_bi_lsx.c index 4e10a8a440..48441c107b 100644 --- a/libavcodec/loongarch/hevc_mc_bi_lsx.c +++ b/libavcodec/loongarch/hevc_mc_bi_lsx.c @@ -163,14 +163,14 @@ void hevc_bi_copy_6w_lsx(const uint8_t *src0_ptr, int32_t src_stride, DUP2_ARG2(__lsx_vilvl_d, reg1, reg0, reg3, reg2, src2, src3); src0_ptr += src_stride_4x; in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; in4 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in5, in6); - in7 = LSX_VLDX(src1_ptr, src2_stride_3x); + in7 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; DUP4_ARG2(__lsx_vsllwil_hu_bu, src0, 6, src1, 6, src2, 6, src3, 6, dst0, dst2, dst4, dst6); @@ -207,7 +207,7 @@ void hevc_bi_copy_6w_lsx(const uint8_t *src0_ptr, int32_t src_stride, src0 = __lsx_vilvl_d(reg1, reg0); src0_ptr += src_stride_2x; in0 = __lsx_vld(src1_ptr, 0); - in1 = LSX_VLDX(src1_ptr, src2_stride_x); + in1 = __lsx_vldx(src1_ptr, src2_stride_x); src1_ptr += src2_stride_x; dst0 = __lsx_vsllwil_hu_bu(src0, 6); dst1 = __lsx_vilvh_b(zero, src0); @@ -265,14 +265,14 @@ void hevc_bi_copy_8w_lsx(const uint8_t *src0_ptr, int32_t src_stride, DUP4_ARG2(__lsx_vslli_h, dst1, 6, dst3, 6, dst5, 6, dst7, 6, dst1, dst3, dst5, dst7); in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; in4 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in5, in6); - in7 = LSX_VLDX(src1_ptr, src2_stride_3x); + in7 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; out0 = hevc_bi_rnd_clip(in0, dst0, in1, dst1); out1 = hevc_bi_rnd_clip(in2, dst2, in3, dst3); @@ -294,7 +294,7 @@ void hevc_bi_copy_8w_lsx(const uint8_t *src0_ptr, int32_t src_stride, reg1 = __lsx_vldrepl_d(src0_ptr + src_stride, 0); src0 = __lsx_vilvl_d(reg1, reg0); in0 = __lsx_vld(src1_ptr, 0); - in1 = LSX_VLDX(src1_ptr, src2_stride_x); + in1 = __lsx_vldx(src1_ptr, src2_stride_x); dst0 = __lsx_vsllwil_hu_bu(src0, 6); dst1 = __lsx_vilvh_b(zero, src0); dst1 = __lsx_vslli_h(dst1, 6); @@ -330,19 +330,19 @@ void hevc_bi_copy_12w_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (loop_cnt = 4; loop_cnt--;) { src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr, src_stride_3x); + src3 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; in4 = __lsx_vld(_src1, 0); - DUP2_ARG2(LSX_VLDX, _src1, src2_stride_x, _src1, src2_stride_2x, + DUP2_ARG2(__lsx_vldx, _src1, src2_stride_x, _src1, src2_stride_2x, in5, in6); - in7 = LSX_VLDX(_src1, src2_stride_3x); + in7 = __lsx_vldx(_src1, src2_stride_3x); _src1 += src2_stride_2x; DUP2_ARG2(__lsx_vilvl_d, in5, in4, in7, in6, in4, in5); @@ -389,19 +389,19 @@ void hevc_bi_copy_16w_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr, src_stride_3x); + src3 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; in4 = __lsx_vld(_src1, 0); - DUP2_ARG2(LSX_VLDX, _src1, src2_stride_x, _src1, src2_stride_2x, + DUP2_ARG2(__lsx_vldx, _src1, src2_stride_x, _src1, src2_stride_2x, in5, in6); - in7 = LSX_VLDX(_src1, src2_stride_3x); + in7 = __lsx_vldx(_src1, src2_stride_3x); _src1 += src2_stride_2x; DUP4_ARG2(__lsx_vsllwil_hu_bu, src0, 6, src1, 6, src2, 6, src3, 6, dst0_r, dst1_r, dst2_r, dst3_r) @@ -647,12 +647,12 @@ void hevc_vt_8t_8w_lsx(const uint8_t *src0_ptr, int32_t src_stride, const int16_ filt0, filt1, filt2, filt3); src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr, src_stride_3x); + src3 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; src4 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src5, src6); src0_ptr += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, @@ -661,14 +661,14 @@ void hevc_vt_8t_8w_lsx(const uint8_t *src0_ptr, int32_t src_stride, const int16_ for (loop_cnt = (height >> 2); loop_cnt--;) { src7 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src0_ptr, src_stride_3x); + src10 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, src9, src76_r, src87_r, src98_r, src109_r); @@ -741,12 +741,12 @@ void hevc_vt_8t_16multx2mult_lsx(const uint8_t *src0_ptr, int32_t src_stride, dst_tmp = dst; src0 = __lsx_vld(src0_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr_tmp, src_stride, src0_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src0_ptr_tmp, src_stride, src0_ptr_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr_tmp, src_stride_3x); + src3 = __lsx_vldx(src0_ptr_tmp, src_stride_3x); src0_ptr_tmp += src_stride_4x; src4 = __lsx_vld(src0_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr_tmp, src_stride, src0_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src0_ptr_tmp, src_stride, src0_ptr_tmp, src_stride_2x, src5, src6); src0_ptr_tmp += src_stride_3x; @@ -759,7 +759,7 @@ void hevc_vt_8t_16multx2mult_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (loop_cnt = (height >> 1); loop_cnt--;) { src7 = __lsx_vld(src0_ptr_tmp, 0); - src8 = LSX_VLDX(src0_ptr_tmp, src_stride); + src8 = __lsx_vldx(src0_ptr_tmp, src_stride); src0_ptr_tmp += src_stride_2x; DUP2_ARG2(__lsx_vld, src1_ptr_tmp, 0, src1_ptr_tmp, 16, in0, in2); src1_ptr_tmp += src2_stride; @@ -903,12 +903,12 @@ void hevc_hv_8t_8multx1mult_lsx(const uint8_t *src0_ptr, int32_t src_stride, src1_ptr_tmp = src1_ptr; src0 = __lsx_vld(src0_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr_tmp, src_stride, src0_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src0_ptr_tmp, src_stride, src0_ptr_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr_tmp, src_stride_3x); + src3 = __lsx_vldx(src0_ptr_tmp, src_stride_3x); src0_ptr_tmp += src_stride_4x; src4 = __lsx_vld(src0_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr_tmp, src_stride, src0_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src0_ptr_tmp, src_stride, src0_ptr_tmp, src_stride_2x, src5, src6); src0_ptr_tmp += src_stride_3x; @@ -1134,9 +1134,9 @@ static void hevc_hz_4t_24w_lsx(const uint8_t *src0_ptr, int32_t src_stride, dst += dst_stride_4x; in0 = __lsx_vld(src1_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr_tmp, src2_stride_x, src1_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src1_ptr_tmp, src2_stride_x, src1_ptr_tmp, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr_tmp, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr_tmp, src2_stride_3x); src1_ptr_tmp += src2_stride_2x; DUP4_ARG3(__lsx_vshuf_b, src1, src1, mask0, src3, src3, mask0, src5, @@ -1229,7 +1229,7 @@ static void hevc_vt_4t_12w_lsx(const uint8_t *src0_ptr, int32_t src_stride, DUP2_ARG2(__lsx_vldrepl_h, filter, 0, filter, 2, filt0, filt1); src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); src0_ptr += src_stride_3x; DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); @@ -1238,19 +1238,19 @@ static void hevc_vt_4t_12w_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src3 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src0_ptr, src_stride_3x); + src6 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; in4 = __lsx_vld(_src1, 0); - DUP2_ARG2(LSX_VLDX, _src1, src2_stride_x, _src1, src2_stride_2x, + DUP2_ARG2(__lsx_vldx, _src1, src2_stride_x, _src1, src2_stride_2x, in5, in6); - in7 = LSX_VLDX(_src1, src2_stride_3x); + in7 = __lsx_vldx(_src1, src2_stride_3x); _src1 += src2_stride_2x; DUP2_ARG2(__lsx_vilvl_d, in5, in4, in7, in6, in4, in5); @@ -1310,7 +1310,7 @@ static void hevc_vt_4t_16w_lsx(const uint8_t *src0_ptr, int32_t src_stride, DUP2_ARG2(__lsx_vldrepl_h, filter, 0, filter, 2, filt0, filt1); src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); src0_ptr += src_stride_3x; DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); @@ -1318,7 +1318,7 @@ static void hevc_vt_4t_16w_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src3 = __lsx_vld(src0_ptr, 0); - src4 = LSX_VLDX(src0_ptr, src_stride); + src4 = __lsx_vldx(src0_ptr, src_stride); src0_ptr += src_stride_2x; DUP2_ARG2(__lsx_vld, src1_ptr, 0, src1_ptr, 16, in0, in2); src1_ptr += src2_stride; @@ -1340,7 +1340,7 @@ static void hevc_vt_4t_16w_lsx(const uint8_t *src0_ptr, int32_t src_stride, dst += dst_stride_2x; src5 = __lsx_vld(src0_ptr, 0); - src2 = LSX_VLDX(src0_ptr, src_stride); + src2 = __lsx_vldx(src0_ptr, src_stride); src0_ptr += src_stride_2x; DUP2_ARG2(__lsx_vld, src1_ptr, 0, src1_ptr, 16, in0, in2); src1_ptr += src2_stride; @@ -1517,7 +1517,7 @@ static void hevc_hv_4t_6w_lsx(const uint8_t *src0_ptr, int32_t src_stride, mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); src0_ptr += src_stride_3x; @@ -1535,9 +1535,9 @@ static void hevc_hv_4t_6w_lsx(const uint8_t *src0_ptr, int32_t src_stride, DUP2_ARG2(__lsx_vilvh_h, dsth1, dsth0, dsth2, dsth1, tmp1, tmp3); src3 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src0_ptr, src_stride_3x); + src6 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; DUP2_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, vec0, vec1); DUP2_ARG3(__lsx_vshuf_b, src4, src4, mask0, src4, src4, mask1, vec2, vec3); @@ -1550,9 +1550,9 @@ static void hevc_hv_4t_6w_lsx(const uint8_t *src0_ptr, int32_t src_stride, vec5, filt1, dsth6, vec7, filt1, dsth3, dsth4, dsth5, dsth6); src3 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src0_ptr, src_stride_3x); + src6 = __lsx_vldx(src0_ptr, src_stride_3x); DUP2_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, vec0, vec1); DUP2_ARG3(__lsx_vshuf_b, src4, src4, mask0, src4, src4, mask1, vec2, vec3); @@ -1700,7 +1700,7 @@ void hevc_hv_4t_8x2_lsx(const uint8_t *src0_ptr, int32_t src_stride, const int16 mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src0_ptr, 0); - DUP4_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP4_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src0_ptr, src_stride_3x, src0_ptr, src_stride_4x, src1, src2, src3, src4); @@ -1777,19 +1777,19 @@ void hevc_hv_4t_8multx4_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (cnt = width8mult; cnt--;) { src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr, src_stride_3x); + src3 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; src4 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src5, src6); src0_ptr += (8 - src_stride_4x); in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += 8; DUP2_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, @@ -1900,22 +1900,22 @@ void hevc_hv_4t_8x6_lsx(const uint8_t *src0_ptr, int32_t src_stride, const int16 mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src0_ptr, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src0_ptr, src_stride_3x); + src3 = __lsx_vldx(src0_ptr, src_stride_3x); src0_ptr += src_stride_4x; src4 = __lsx_vld(src0_ptr, 0); - DUP4_ARG2(LSX_VLDX, src0_ptr, src_stride, src0_ptr, src_stride_2x, + DUP4_ARG2(__lsx_vldx, src0_ptr, src_stride, src0_ptr, src_stride_2x, src0_ptr, src_stride_3x, src0_ptr, src_stride_4x, src5, src6, src7, src8); in0 = __lsx_vld(src1_ptr, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, + DUP2_ARG2(__lsx_vldx, src1_ptr, src2_stride_x, src1_ptr, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr, src2_stride_3x); src1_ptr += src2_stride_2x; in4 = __lsx_vld(src1_ptr, 0); - in5 = LSX_VLDX(src1_ptr, src2_stride_x); + in5 = __lsx_vldx(src1_ptr, src2_stride_x); DUP2_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, vec0, vec1); DUP2_ARG3(__lsx_vshuf_b, src1, src1, mask0, src1, src1, mask1, vec2, vec3); @@ -2041,7 +2041,7 @@ void hevc_hv_4t_8multx4mult_lsx(const uint8_t *src0_ptr, int32_t src_stride, src1_ptr_tmp = src1_ptr; src0 = __lsx_vld(src0_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr_tmp, src_stride, src0_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src0_ptr_tmp, src_stride, src0_ptr_tmp, src_stride_2x, src1, src2); src0_ptr_tmp += src_stride_3x; @@ -2063,14 +2063,14 @@ void hevc_hv_4t_8multx4mult_lsx(const uint8_t *src0_ptr, int32_t src_stride, for (loop_cnt = height >> 2; loop_cnt--;) { src3 = __lsx_vld(src0_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src0_ptr_tmp, src_stride, src0_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src0_ptr_tmp, src_stride, src0_ptr_tmp, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src0_ptr_tmp, src_stride_3x); + src6 = __lsx_vldx(src0_ptr_tmp, src_stride_3x); src0_ptr_tmp += src_stride_4x; in0 = __lsx_vld(src1_ptr_tmp, 0); - DUP2_ARG2(LSX_VLDX, src1_ptr_tmp, src2_stride_x, src1_ptr_tmp, + DUP2_ARG2(__lsx_vldx, src1_ptr_tmp, src2_stride_x, src1_ptr_tmp, src2_stride_2x, in1, in2); - in3 = LSX_VLDX(src1_ptr_tmp, src2_stride_3x); + in3 = __lsx_vldx(src1_ptr_tmp, src2_stride_3x); src1_ptr_tmp += src2_stride_2x; DUP4_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, src4, diff --git a/libavcodec/loongarch/hevc_mc_uni_lsx.c b/libavcodec/loongarch/hevc_mc_uni_lsx.c index de8e79f502..5437dce0e0 100644 --- a/libavcodec/loongarch/hevc_mc_uni_lsx.c +++ b/libavcodec/loongarch/hevc_mc_uni_lsx.c @@ -148,11 +148,11 @@ void common_vt_8t_8w_lsx(const uint8_t *src, int32_t src_stride, filt0, filt1, filt2, filt3); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, src10_r, src32_r, src54_r, src21_r); @@ -160,8 +160,8 @@ void common_vt_8t_8w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, @@ -228,12 +228,12 @@ void common_vt_8t_16w_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src_tmp, src_stride_3x); + src3 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; src4 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src5, src6); src_tmp += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, @@ -245,9 +245,9 @@ void common_vt_8t_16w_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, for (loop_cnt = (height >> 2); loop_cnt--;) { src7 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src_tmp, src_stride_3x); + src10 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, src9, src76_r, src87_r, src98_r, src109_r); @@ -380,12 +380,12 @@ void hevc_hv_8t_8x2_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src_tmp, src_stride_3x); + src3 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; src4 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src5, src6); src_tmp += src_stride_3x; @@ -429,7 +429,7 @@ void hevc_hv_8t_8x2_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, for (loop_cnt = height >> 1; loop_cnt--;) { src7 = __lsx_vld(src_tmp, 0); - src8 = LSX_VLDX(src_tmp, src_stride); + src8 = __lsx_vldx(src_tmp, src_stride); src_tmp += src_stride_2x; DUP4_ARG3(__lsx_vshuf_b, src7, src7, mask0, src7, src7, mask1, src7, @@ -567,13 +567,13 @@ void common_vt_4t_24w_lsx(const uint8_t *src, int32_t src_stride, /* 16 width */ src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src1, src0, src2, src1, src10_l, src21_l); /* 8 width */ src6 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, src7, src8); + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src7, src8); src += src_stride_3x; _src += src_stride_3x; DUP2_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src76_r, src87_r); @@ -581,7 +581,7 @@ void common_vt_4t_24w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = 8; loop_cnt--;) { /* 16 width */ DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src3, src9); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src4, src10); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src4, src10); DUP2_ARG2(__lsx_vilvl_b, src3, src2, src4, src3, src32_r, src43_r); DUP2_ARG2(__lsx_vilvh_b, src3, src2, src4, src3, src32_l, src43_l); @@ -615,7 +615,7 @@ void common_vt_4t_24w_lsx(const uint8_t *src, int32_t src_stride, /* 16 width */ DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src5, src11); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src2, src8); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src2, src8); DUP2_ARG2(__lsx_vilvl_b, src5, src4, src2, src5, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src5, src4, src2, src5, src10_l, src21_l); @@ -676,14 +676,14 @@ void common_vt_4t_32w_lsx(const uint8_t *src, int32_t src_stride, /* 16 width */ src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src1, src0, src2, src1, src10_l, src21_l); /* next 16 width */ src6 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, src7, src8); + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src7, src8); src += src_stride_3x; _src += src_stride_3x; @@ -693,7 +693,7 @@ void common_vt_4t_32w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 1); loop_cnt--;) { /* 16 width */ DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src3, src9); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src4, src10); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src4, src10); DUP2_ARG2(__lsx_vilvl_b, src3, src2, src4, src3, src32_r, src43_r); DUP2_ARG2(__lsx_vilvh_b, src3, src2, src4, src3, src32_l, src43_l); @@ -774,7 +774,7 @@ void hevc_hv_4t_8x2_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src, 0); - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src, src_stride_3x, src, src_stride_4x, src1, src2, src3, src4); DUP4_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, src1, src1, @@ -838,11 +838,11 @@ void hevc_hv_4t_8multx4_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst for (cnt = width8mult; cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += (8 - src_stride_4x); DUP2_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, vec0, vec1); @@ -939,10 +939,10 @@ void hevc_hv_4t_8x6_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src, 0); - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x,src, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x,src, src_stride_3x, src, src_stride_4x, src1, src2, src3, src4); src += src_stride_4x; - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x,src, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x,src, src_stride_3x, src, src_stride_4x, src5, src6, src7, src8); DUP4_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, src1, src1, @@ -1051,7 +1051,7 @@ void hevc_hv_4t_8multx4mult_lsx(const uint8_t *src, int32_t src_stride, uint8_t dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); src_tmp += src_stride_3x; @@ -1073,9 +1073,9 @@ void hevc_hv_4t_8multx4mult_lsx(const uint8_t *src, int32_t src_stride, uint8_t for (loop_cnt = (height >> 2); loop_cnt--;) { src3 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src_tmp, src_stride_3x); + src6 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, src4, @@ -1186,7 +1186,7 @@ void hevc_hv_4t_12w_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); src_tmp += src_stride_3x; @@ -1205,9 +1205,9 @@ void hevc_hv_4t_12w_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, for (loop_cnt = 4; loop_cnt--;) { src3 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src_tmp, src_stride_3x); + src6 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, src4, @@ -1261,7 +1261,7 @@ void hevc_hv_4t_12w_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, mask3 = __lsx_vaddi_bu(mask2, 2); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); src += src_stride_3x; DUP2_ARG3(__lsx_vshuf_b, src1, src0, mask2, src1, src0, mask3, vec0, vec1); DUP2_ARG3(__lsx_vshuf_b, src2, src1, mask2, src2, src1, mask3, vec2, vec3); @@ -1276,12 +1276,12 @@ void hevc_hv_4t_12w_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, for (loop_cnt = 2; loop_cnt--;) { src3 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src4, src5); + src6 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src7, src3, mask2, src7, src3, mask3, src8, src4, mask2, src8, src4, mask3, vec0, vec1, vec2, vec3); diff --git a/libavcodec/loongarch/hevc_mc_uniw_lsx.c b/libavcodec/loongarch/hevc_mc_uniw_lsx.c index 502bf24e71..c4e79225d3 100644 --- a/libavcodec/loongarch/hevc_mc_uniw_lsx.c +++ b/libavcodec/loongarch/hevc_mc_uniw_lsx.c @@ -79,12 +79,12 @@ void hevc_hv_8t_8x2_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src_tmp, src_stride_3x); + src3 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; src4 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src5, src6); src_tmp += src_stride_3x; @@ -127,7 +127,7 @@ void hevc_hv_8t_8x2_lsx(const uint8_t *src, int32_t src_stride, uint8_t *dst, for (loop_cnt = height >> 1; loop_cnt--;) { src7 = __lsx_vld(src_tmp, 0); - src8 = LSX_VLDX(src_tmp, src_stride); + src8 = __lsx_vldx(src_tmp, src_stride); src_tmp += src_stride_2x; DUP4_ARG3(__lsx_vshuf_b, src7, src7, mask0, src7, src7, mask1, src7, src7, mask2, src7, src7, mask3, vec0, vec1, vec2, vec3); diff --git a/libavcodec/loongarch/hevcdsp_lsx.c b/libavcodec/loongarch/hevcdsp_lsx.c index 86fc5f06c0..85843dd111 100644 --- a/libavcodec/loongarch/hevcdsp_lsx.c +++ b/libavcodec/loongarch/hevcdsp_lsx.c @@ -48,14 +48,14 @@ static void hevc_copy_4w_lsx(const uint8_t *src, int32_t src_stride, __m128i in0, in1, in2, in3; for (; loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(src, src_stride_3x); + src7 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vilvl_w, src1, src0, src3, src2, src5, src4, src7, src6, @@ -98,12 +98,12 @@ static void hevc_copy_6w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 3); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); + src7 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vsllwil_hu_bu, src0, 6, src1, 6, src2, 6, src3, 6, @@ -163,14 +163,14 @@ static void hevc_copy_8w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 3); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(src, src_stride_3x); + src7 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vsllwil_hu_bu, src0, 6, src1, 6, src2, 6, src3, 6, @@ -215,12 +215,12 @@ static void hevc_copy_12w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 3); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); + src7 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vsllwil_hu_bu, src0, 6, src1, 6, src2, 6, src3, 6, @@ -288,14 +288,14 @@ static void hevc_copy_16w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 3); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(src, src_stride_3x); + src7 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vilvh_b, zero, src0, zero, src1, zero, src2, zero, src3, in0_l, in1_l, in2_l, in3_l); @@ -333,8 +333,8 @@ static void hevc_copy_16w_lsx(const uint8_t *src, int32_t src_stride, } if (res) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); DUP4_ARG2(__lsx_vilvh_b, zero, src0, zero, src1, zero, src2, zero, src3, in0_l, in1_l, in2_l, in3_l); @@ -373,13 +373,13 @@ static void hevc_copy_24w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(_src, src_stride_3x); + src7 = __lsx_vldx(_src, src_stride_3x); _src += src_stride_4x; DUP4_ARG2(__lsx_vilvh_b, zero, src0, zero, src1, zero, src2, zero, @@ -423,13 +423,13 @@ static void hevc_copy_32w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src2, src4); - src6 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src2, src4); + src6 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src1 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src3, src5); - src7 = LSX_VLDX(_src, src_stride_3x); + src7 = __lsx_vldx(_src, src_stride_3x); _src += src_stride_4x; DUP4_ARG2(__lsx_vilvh_b, zero, src0, zero, src1, zero, src2, zero, @@ -623,12 +623,12 @@ static void hevc_hz_8t_4w_lsx(const uint8_t *src, int32_t src_stride, for (;loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); + src7 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src1, src0, mask0, src1, src0, mask1, src1, src0, mask2, src1, src0, mask3, vec0, vec1, vec2, vec3); @@ -668,7 +668,7 @@ static void hevc_hz_8t_4w_lsx(const uint8_t *src, int32_t src_stride, } for (;res--;) { src0 = __lsx_vld(src, 0); - src1 = LSX_VLDX(src, src_stride); + src1 = __lsx_vldx(src, src_stride); DUP4_ARG3(__lsx_vshuf_b, src1, src0, mask0, src1, src0, mask1, src1, src0, mask2, src1, src0, mask3, vec0, vec1, vec2, vec3); dst0 = __lsx_vdp2_h_bu_b(vec0, filt0); @@ -709,8 +709,8 @@ static void hevc_hz_8t_8w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, src0, @@ -774,12 +774,12 @@ static void hevc_hz_8t_12w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = 4; loop_cnt--;) { src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src4 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src5, src6); - src7 = LSX_VLDX(_src, src_stride_3x); + src7 = __lsx_vldx(_src, src_stride_3x); src += src_stride_4x; _src += src_stride_4x; @@ -1216,11 +1216,11 @@ static void hevc_vt_8t_4w_lsx(const uint8_t *src, int32_t src_stride, filt0, filt1, filt2, filt3); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, src10_r, src32_r, src54_r, src21_r); @@ -1231,13 +1231,13 @@ static void hevc_vt_8t_4w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 3); loop_cnt--;) { src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src11 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src12, src13); - src14 = LSX_VLDX(src, src_stride_3x); + src14 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, src9, @@ -1289,7 +1289,7 @@ static void hevc_vt_8t_4w_lsx(const uint8_t *src, int32_t src_stride, } for (;res--;) { src7 = __lsx_vld(src, 0); - src8 = LSX_VLDX(src, src_stride); + src8 = __lsx_vldx(src, src_stride); DUP2_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src76_r, src87_r); src += src_stride_2x; src8776 = __lsx_vilvl_d(src87_r, src76_r); @@ -1334,11 +1334,11 @@ static void hevc_vt_8t_8w_lsx(const uint8_t *src, int32_t src_stride, filt0, filt1, filt2, filt3); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, src10_r, src32_r, src54_r, src21_r); @@ -1346,8 +1346,8 @@ static void hevc_vt_8t_8w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, src9, src76_r, src87_r, src98_r, src109_r); @@ -1408,11 +1408,11 @@ static void hevc_vt_8t_12w_lsx(const uint8_t *src, int32_t src_stride, DUP4_ARG2(__lsx_vldrepl_h, filter, 0, filter, 2, filter, 4, filter, 6, filt0, filt1, filt2, filt3); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, src10_r, src32_r, src54_r, src21_r); @@ -1426,8 +1426,8 @@ static void hevc_vt_8t_12w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = (height >> 2); loop_cnt--;) { src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, src9, src76_r, src87_r, src98_r, src109_r); @@ -1520,12 +1520,12 @@ static void hevc_vt_8t_16multx4mult_lsx(const uint8_t *src, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src_tmp, src_stride_3x); + src3 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; src4 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src5, src6); src_tmp += src_stride_3x; DUP4_ARG2(__lsx_vilvl_b, src1, src0, src3, src2, src5, src4, src2, src1, @@ -1537,9 +1537,9 @@ static void hevc_vt_8t_16multx4mult_lsx(const uint8_t *src, for (loop_cnt = (height >> 2); loop_cnt--;) { src7 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src_tmp, src_stride_3x); + src10 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; DUP4_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src9, src8, src10, src9, src76_r, src87_r, src98_r, src109_r); @@ -1689,11 +1689,11 @@ static void hevc_hv_8t_4w_lsx(const uint8_t *src, int32_t src_stride, mask3 = __lsx_vaddi_bu(mask0, 6); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += src_stride_3x; DUP4_ARG3(__lsx_vshuf_b, src3, src0, mask0, src3, src0, mask1, src3, src0, @@ -1729,8 +1729,8 @@ static void hevc_hv_8t_4w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = height >> 2; loop_cnt--;) { src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src9, src7, mask0, src9, src7, mask1, src9, src7, @@ -1830,12 +1830,12 @@ static void hevc_hv_8t_8multx1mult_lsx(const uint8_t *src, src_tmp = src; dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src_tmp, src_stride_3x); + src3 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; src4 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src5, src6); src_tmp += src_stride_3x; @@ -1978,12 +1978,12 @@ static void hevc_hv_8t_12w_lsx(const uint8_t *src, int32_t src_stride, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src_tmp, src_stride_3x); + src3 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; src4 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src5, src6); src_tmp += src_stride_3x; @@ -2077,11 +2077,11 @@ static void hevc_hv_8t_12w_lsx(const uint8_t *src, int32_t src_stride, mask7 = __lsx_vaddi_bu(mask4, 6); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src4 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += src_stride_3x; DUP4_ARG3(__lsx_vshuf_b, src3, src0, mask4, src3, src0, mask5, src3, src0, @@ -2118,8 +2118,8 @@ static void hevc_hv_8t_12w_lsx(const uint8_t *src, int32_t src_stride, for (loop_cnt = height >> 2; loop_cnt--;) { src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP4_ARG3(__lsx_vshuf_b, src9, src7, mask4, src9, src7, mask5, src9, @@ -2285,14 +2285,14 @@ static void hevc_vt_4t_16w_lsx(const uint8_t *src, DUP2_ARG2(__lsx_vldrepl_h, filter, 0, filter, 2, filt0, filt1); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); src += src_stride_3x; DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src1, src0, src2, src1, src10_l, src21_l); for (loop_cnt = (height >> 2); loop_cnt--;) { src3 = __lsx_vld(src, 0); - src4 = LSX_VLDX(src, src_stride); + src4 = __lsx_vldx(src, src_stride); src += src_stride_2x; DUP2_ARG2(__lsx_vilvl_b, src3, src2, src4, src3, src32_r, src43_r); DUP2_ARG2(__lsx_vilvh_b, src3, src2, src4, src3, src32_l, src43_l); @@ -2309,7 +2309,7 @@ static void hevc_vt_4t_16w_lsx(const uint8_t *src, dst += dst_stride; src5 = __lsx_vld(src, 0); - src2 = LSX_VLDX(src, src_stride); + src2 = __lsx_vldx(src, src_stride); src += src_stride_2x; DUP2_ARG2(__lsx_vilvl_b, src5, src4, src2, src5, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src5, src4, src2, src5, src10_l, src21_l); @@ -2353,19 +2353,19 @@ static void hevc_vt_4t_24w_lsx(const uint8_t *src, DUP2_ARG2(__lsx_vldrepl_h, filter, 0, filter, 2, filt0, filt1); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src1, src0, src2, src1, src10_l, src21_l); src6 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, src7, src8); + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src7, src8); src += src_stride_3x; _src += src_stride_3x; DUP2_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src76_r, src87_r); for (loop_cnt = (height >> 2); loop_cnt--;) { DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src3, src9); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src4, src10); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src4, src10); src += src_stride_2x; _src += src_stride_2x; DUP2_ARG2(__lsx_vilvl_b, src3, src2, src4, src3, src32_r, src43_r); @@ -2392,7 +2392,7 @@ static void hevc_vt_4t_24w_lsx(const uint8_t *src, dst += dst_stride; DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src5, src11); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src2, src8); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src2, src8); src += src_stride_2x; _src += src_stride_2x; DUP2_ARG2(__lsx_vilvl_b, src5, src4, src2, src5, src10_r, src21_r); @@ -2448,12 +2448,12 @@ static void hevc_vt_4t_32w_lsx(const uint8_t *src, DUP2_ARG2(__lsx_vldrepl_h, filter, 0, filter, 2, filt0, filt1); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); DUP2_ARG2(__lsx_vilvl_b, src1, src0, src2, src1, src10_r, src21_r); DUP2_ARG2(__lsx_vilvh_b, src1, src0, src2, src1, src10_l, src21_l); src6 = __lsx_vld(_src, 0); - DUP2_ARG2(LSX_VLDX, _src, src_stride, _src, src_stride_2x, src7, src8); + DUP2_ARG2(__lsx_vldx, _src, src_stride, _src, src_stride_2x, src7, src8); src += src_stride_3x; _src += src_stride_3x; DUP2_ARG2(__lsx_vilvl_b, src7, src6, src8, src7, src76_r, src87_r); @@ -2461,7 +2461,7 @@ static void hevc_vt_4t_32w_lsx(const uint8_t *src, for (loop_cnt = (height >> 2); loop_cnt--;) { DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src3, src9); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src4, src10); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src4, src10); src += src_stride_2x; _src += src_stride_2x; DUP2_ARG2(__lsx_vilvl_b, src3, src2, src4, src3, src32_r, src43_r); @@ -2493,7 +2493,7 @@ static void hevc_vt_4t_32w_lsx(const uint8_t *src, dst += dst_stride; DUP2_ARG2(__lsx_vld, src, 0, _src, 0, src5, src11); - DUP2_ARG2(LSX_VLDX, src, src_stride, _src, src_stride, src2, src8); + DUP2_ARG2(__lsx_vldx, src, src_stride, _src, src_stride, src2, src8); src += src_stride_2x; _src += src_stride_2x; DUP2_ARG2(__lsx_vilvl_b, src5, src4, src2, src5, src10_r, src21_r); @@ -2560,9 +2560,9 @@ static void hevc_hv_4t_8x2_lsx(const uint8_t *src, mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); - src3 = LSX_VLDX(src, src_stride_3x); - src4 = LSX_VLDX(src, src_stride_4x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); + src3 = __lsx_vldx(src, src_stride_3x); + src4 = __lsx_vldx(src, src_stride_4x); DUP2_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, vec0, vec1); DUP2_ARG3(__lsx_vshuf_b, src1, src1, mask0, src1, src1, mask1, vec2, vec3); @@ -2627,10 +2627,10 @@ static void hevc_hv_4t_8multx4_lsx(const uint8_t *src, int32_t src_stride, for (cnt = width8mult; cnt--;) { src0 = __lsx_vld(src, 0); - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src, src_stride_3x, src, src_stride_4x, src1, src2, src3, src4); src += src_stride_4x; - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src5, src6); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src5, src6); src += (8 - src_stride_4x); DUP2_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, @@ -2730,10 +2730,10 @@ static void hevc_hv_4t_8x6_lsx(const uint8_t *src, mask1 = __lsx_vaddi_bu(mask0, 2); src0 = __lsx_vld(src, 0); - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src, src_stride_3x, src, src_stride_4x, src1, src2, src3, src4); src += src_stride_4x; - DUP4_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src, + DUP4_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src, src_stride_3x, src, src_stride_4x, src5, src6, src7, src8); DUP4_ARG3(__lsx_vshuf_b, src0, src0, mask0, src0, src0, mask1, src1, src1, @@ -2847,7 +2847,7 @@ static void hevc_hv_4t_8multx4mult_lsx(const uint8_t *src, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); src_tmp += src_stride_3x; @@ -2869,9 +2869,9 @@ static void hevc_hv_4t_8multx4mult_lsx(const uint8_t *src, for (loop_cnt = height >> 2; loop_cnt--;) { src3 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src_tmp, src_stride_3x); + src6 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; DUP2_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, @@ -2997,7 +2997,7 @@ static void hevc_hv_4t_12w_lsx(const uint8_t *src, dst_tmp = dst; src0 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src1, src2); src_tmp += src_stride_3x; @@ -3016,9 +3016,9 @@ static void hevc_hv_4t_12w_lsx(const uint8_t *src, for (loop_cnt = 4; loop_cnt--;) { src3 = __lsx_vld(src_tmp, 0); - DUP2_ARG2(LSX_VLDX, src_tmp, src_stride, src_tmp, src_stride_2x, + DUP2_ARG2(__lsx_vldx, src_tmp, src_stride, src_tmp, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src_tmp, src_stride_3x); + src6 = __lsx_vldx(src_tmp, src_stride_3x); src_tmp += src_stride_4x; DUP2_ARG3(__lsx_vshuf_b, src3, src3, mask0, src3, src3, mask1, @@ -3077,7 +3077,7 @@ static void hevc_hv_4t_12w_lsx(const uint8_t *src, mask3 = __lsx_vaddi_bu(mask2, 2); src0 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src1, src2); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src1, src2); src += src_stride_3x; DUP2_ARG3(__lsx_vshuf_b, src1, src0, mask2, src1, src0, mask3, vec0, vec1); DUP2_ARG3(__lsx_vshuf_b, src2, src1, mask2, src2, src1, mask3, vec2, vec3); @@ -3090,12 +3090,12 @@ static void hevc_hv_4t_12w_lsx(const uint8_t *src, for (loop_cnt = 2; loop_cnt--;) { src3 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src4, src5); - src6 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src4, src5); + src6 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; src7 = __lsx_vld(src, 0); - DUP2_ARG2(LSX_VLDX, src, src_stride, src, src_stride_2x, src8, src9); - src10 = LSX_VLDX(src, src_stride_3x); + DUP2_ARG2(__lsx_vldx, src, src_stride, src, src_stride_2x, src8, src9); + src10 = __lsx_vldx(src, src_stride_3x); src += src_stride_4x; DUP2_ARG3(__lsx_vshuf_b, src7, src3, mask2, src7, src3, mask3, vec0, vec1); diff --git a/libavutil/loongarch/loongson_intrinsics.h b/libavutil/loongarch/loongson_intrinsics.h index 6425551255..090adab266 100644 --- a/libavutil/loongarch/loongson_intrinsics.h +++ b/libavutil/loongarch/loongson_intrinsics.h @@ -89,11 +89,6 @@ #ifdef __loongarch_sx #include - -/* __lsx_vldx() from lsxintrin.h does not accept a const void*; - * remove the following once it does. */ -#define LSX_VLDX(cptr, stride) __lsx_vldx((void*)(cptr), (stride)) - /* * ============================================================================= * Description : Dot product & addition of byte vector elements From patchwork Fri Sep 9 12:43:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 37788 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:139a:b0:8f:1db5:eae2 with SMTP id w26csp888019pzh; Fri, 9 Sep 2022 05:43:29 -0700 (PDT) X-Google-Smtp-Source: AA6agR5ALAnKGC93N7w44x1qCCSzgb61SvAxkRK2vUKSoMZ4EV9liYQBf+tNHOfhGZ+E8n+Y26hZ X-Received: by 2002:a05:6402:e96:b0:443:a086:e3e8 with SMTP id h22-20020a0564020e9600b00443a086e3e8mr11802687eda.330.1662727408968; Fri, 09 Sep 2022 05:43:28 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z5-20020a05640235c500b00450f5384090si405231edc.537.2022.09.09.05.43.28; Fri, 09 Sep 2022 05:43:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=H+Zk2oaP; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 94AE768BB01; Fri, 9 Sep 2022 15:43:26 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05olkn2010.outbound.protection.outlook.com [40.92.91.10]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8A7D068BB01 for ; Fri, 9 Sep 2022 15:43:20 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dK4sOJ/sh0EbAoBxIo891xVD+A9oyR5NJb7ld+f0p+h23X08Rj06Mwu7yH3tEoe73gvQx8jaoPe4se4I//1H2fJSRMmlzpBJoMDmtgJkXtaQZr0T70NrQVDx61hc8Z5NOl53I9K5FbfhM/BhyKQcr0S/pEfXVwSdy3SYIl/0/oeoTw7rxEAJVnM/tbiPF33TkvFFmnCjKfLkqRatX50oakGRnHLW6rN+Iz/UXHKfYyuKRjzMjf5RumQJp1vLwW07mIPuVlsAJo89WKOO9woRktrsKtTZb0KQrz/1DMwc7+hOtXb93rGzfvC76fA5Ge6skthFEuvh6fpMb+69teQamA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r3L8WbQ4Q/dMo1EL5KLleeHvYlRzC4kKlEiuiDC2Vh0=; b=H23heVx+E6EStsIA3mqfpqLxu8SZ5SivTI1ke+ky4SFmjTSf7oIxIGrGm5MzGKmbHldULxGcH1kaLPsQtZXAvgCF7oDfYFwdkvRJN0DmLwDz0W/Kcf5I4FIfjUU9r1Hf+VdlzF5Qe7O+xzmP5P4ZJLo9XAfpMnRfBC/mxnMgnJaKkLjLh7nF/Ey7J67wyGdAWn+iE134A7ygjdWAn2T4ChuoT/5h3W4qYppVBJSyzXaDBDLOY/MHlu89Drt2/UekvNlpVcVnjH01C/2MoJojrmRWZCZ7naYNmvd2NdL4oV1YthsiDu2SGgUL61qegCMyPeBqrfdfYJGAlENw1DKIZA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=r3L8WbQ4Q/dMo1EL5KLleeHvYlRzC4kKlEiuiDC2Vh0=; b=H+Zk2oaPcb/OCChU0BqmlkvnK4jHiCTKdDDtr5Nq5vVkRvpyAaNbLu3KVyV82q0HinEWF5PDlPOpPY4GlfG3jAu3q+0SPmj6EAQWiPdLfsp1zucUNxZFvKGr4xS2pVcxgBeV1iHdZSx91aX0BolgOcbyIrkb79NE9IY5nwf2EWjADwHvgQu+MW798FmBO71qK+CijkafVE5wEsLKkFQmX9brxZMAbmMDxcHFSRruw1TEFpTawKHp8eVj47RkePMa7YV4rlX9MSGhqbO6xpUbwYSgpqTh1E0DsL1gl7YU0YDzIVDwRLuEB7oULeKuEGqLM/bNa3e2hnbp7iWPnf11Zw== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by DU2P250MB0286.EURP250.PROD.OUTLOOK.COM (2603:10a6:10:27b::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5566.21; Fri, 9 Sep 2022 12:43:18 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::611e:1608:45cb:b58a]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::611e:1608:45cb:b58a%4]) with mapi id 15.20.5612.019; Fri, 9 Sep 2022 12:43:18 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 9 Sep 2022 14:43:15 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [J8xrTkDZO1WSre3Q7WwF7YJjtsF0znlPNXH/qj4cbbU=] X-ClientProxiedBy: FR0P281CA0073.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1e::13) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <20220909124315.2780058-1-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|DU2P250MB0286:EE_ X-MS-Office365-Filtering-Correlation-Id: 5ddb34c1-3f8e-4b51-7bf8-08da9260df25 X-MS-Exchange-SLBlob-MailProps: vGeHJwoxm7Z7WxZx8SmGU1IDXbyti4V6M2qfV71ZjA0opq2cUm0GNCVkvJvGXdJ8eXx8vbP2RmrH9Q8v6slgOSGCp0j5ompWN9s9Cvbz+KUjXCmZWH0aTUtNS70pkWXcJk8mPddm6J9EuB2unwsRh64Ak5fenM+fXtwvdrUSbk00H/EoL5HnT4igKpXIAzaIOdkURhSdGa1GwRZ7GcyXuC+fhYDQegzCofH4eYtxS7xfjySkGmkPq7uj3fu+X5RKV4eh0v1uY3lklkNQu6NI5H6IdPjeF+RmvBQwU9zJ6uC+7aGMisse8Mlh3AEa0XBKpD6erVZUF0cRZTQb2xFxH71O3724TDQ5v0K5cRF80qcEm7zqRXXwINW29sXRioGtzW39P5jFXbGG9rv2JGRcxz2lFD5UY6n4/dD6g3Y0vX/YyKKfCbiL6JJq3RdxKnXrPa5ASHKArBqSML07kcwnmY/iOJ94P11umDxd07qzwY1X7lcksqy7ywPasM3+Ud2eHTXM7pu+NOXl+gnpp3nXDqm/Z2ymsnLY2OMpnaHnOlbvMJaK4yAz6f6vFWHI13QXErCpX7QMvcrlTmTFxi+LKyjnkZU238CtKXkK2uGpimBFJ1DzqyXvIAiActaIzKu32nwdMrI5kzJPZpln6VvuLnmQy51ul3AReCpoa2+sE8onIozH96dVoAt0vp76d1WculzZ5Eu7+/Qut7NXZKF6JDGG3FQ/Zgm4aqTFyfLlbBfWf0j+cvzpFxh5z9qsvBhWqmMAAOTletZROQcYMQcLW2vAoctolMIErZarhcvVi+TTgieGHX6bEWEsmNQMgkJmGURHzF6iCT95ybPyM17c4+SkgagOPqnqyRSi4+zV9LtrVG2rNZ4gUrEku0X1Sjua50sZD5/8TknxfrsN3Ob+EnZi2KQ1SOSQZcmOp1G7Evlw5WogA7IeoVTKn29jsrGmppYfI+mj56VNB+ucu3YDv6CV5eLl8wh363pJOoIT/OzaI/ZKsXUZMIVG4UPfQtufXAwctOCqRK5Z5N+3/HbdyiDo4m6+NS9XW8bN5qsCR0k= X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: K6i+eEZ5qWaqgiLx0JTdGHnsc2Hm9D3zhTUqDc6eI35HlJQ9lLvUXdUs3oaIIqTonRYul1vhJskvH1WWIx5NEdcPoujjjIT9/sTYAUO9zSW5sp9qwfVsd/spCwryFGw5m+0529T1lqRQioQ+jUjo0GW7umpu3w7F6rIGC6YM+7gS0jovHZ/RiFTV1leKfSPQO5SaBJMzkOUOn+5RpO42v5kSMiAP8TPXlOzTMRhU0ojrTUmAdnF2c8RwCfMdPc9yfbYkNxOf2uO0fiht3XOkDqaeoC0HQzL8+7Rzp3QzLAMUrILg3oJXRmXmR+5yhahDI4TYTiii2WT6D9pU+FYaVBzJHKQlsf8qDpbVdI60ItAADYUe81XKoQ+SByvPWqRf7ndh6VRX5cNhLztuQFp85ly+kDv0Xj0hbKHLbwDP2sVkDFoGfoFS9GhlGY1V/4dz9/zRrCziSWBRpDzPBITYqBWv+OHSZFcuoJYJn3/6kxnk/VQHH5bcOzoj685GBBjJgbqnmknlYmkYrrl468ek/Zexjmh9SRZIaYJ8zKlYp1ESIusRNAxUQRocuxLcBgmDCtuAngQ8UhQUl8e+HpVHHGHAz03RoVkB4gZEsnNfso0KlbfkGR0Qwiib4AUjvTqNjWIgvIMVML4rrn253UNJKw== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: YZN1uouIYMjHq//dI+kzwPcZr9/IBRDlYsk1mm1PAtnSiJcmGNgxfRbSJhxPQibsAm+XVb/2nKc8UpWZg6Fr7E3tqHQECAkVaEkXrzsmF/QoGQK3w7Z/jEGoBrjQ7tIF2wf7k6J2GyBGToReOkMhYutIznvuQFg7Yc/IdtvJECV3WEVHMOy1EnpjXwmkmZ8HN/IczHVsyCUR1+kZzCdaAO6EIutDtrGp5rlrFpxU/+rNnoBETsteCJoVC2ZuNnRVOvnEHS2As+Ht4XIgOsE7h1QhZUP99xaOJ64R4nuvkx0l1V1ffwAeXDgJE7866S54fR5IKPFoO2Br5YQqmFdKR1UMI2ZmEifq3EP0O5Un62aFMOxTQQruD/BzVgTWoEh71u6JMHognrs7fS6/UYmbzgSoo5bUN8toLY3nqSeaLOuRqFfcYFbnLIKHrWqZJCAoesBy+eMIxmz9TZ6kZlLSdSVS7tLcHJxxioorRnrSqqeIGOUXWaJ8XhuIL81EekODl3e/pGmN/AwD8IqfSXyS7evitjiMClW0roOec4wemeu1hKnrxmDhnBwym/WThVXlAxttHijWFMpDHPd+fxSYZhjYkLTgSlub61SWRGq1EOWIekO3dpF8nawTBhfVaoRMJEMocuzFq91AQRrspzgVq5rVOcYQoKkG+HFFTsSIubplGMR59606SBxi+v5fSNrCfpDLkBD+IbaYyivrpzpa/drHjnHFHEuSYa3k3hEYSp0CvMC1MCPegpcrdTsuvzuuKAqJxCz5dk8ueYQ3PlG4L7pZRvcfEqr6T5MEUvhOxXfU7qA2FyXeO+kaN5KAoAt5fVKvy+nnrH5ytShpUYANL3xoiBLwf6xiIF1754x3PDpIYJl0Xf6ZRzSgy78l5uo6hCBYI9q283s2iwXXe8kaBri0vZvGg6/a1AkeASsbi+XYJaxbDES1HPhMze4y8fNe62R3PtJEcSfEyjFtpPQAMM03Zj40jwNzeN+nsGLFfBTMd8jqydu7Ht7QnIFqmyM0nOj1gThhhzv5jnFNmn/JMcyGJbjEoRb6bs1VMO1JZRDHq3uMP4+u2F749OmRcc/7X4brZqpSA0/EdUTtiRVU8u+R3Ra80gsxHK+BHuOQpONU+xRgvOePjnHT1qnVo37ptNiWCnNgqkXy8jFHuBkYsMUYywwHP5v6EKOCb9XfzbV/Xwj+q8Q8DarZ8WgfNqwMqk57ydqOzUeOmKI6c9Zx/9j90LYz09teg9usmQqzgTuV61t7l3DKWpPaSyhDXIhZmrj3Zm/lzZbwwQ6IKRaUA9x0+1rKXT0HhZe+GpHV9nLxdp0T+6igygL4eAFkUS8h X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5ddb34c1-3f8e-4b51-7bf8-08da9260df25 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Sep 2022 12:43:18.6980 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2P250MB0286 Subject: [FFmpeg-devel] [PATCH 2/2] Revert "avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx" X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5fBsfT58QTTD This reverts commit 2c8dc7e953e532752500e8145aa1ceee908bda2f. The loongarch headers have been fixed, so that this wrapper is no longer necessary. Signed-off-by: Andreas Rheinhardt --- libavcodec/loongarch/h264chroma_lasx.c | 90 +++++++++++------------ libavcodec/loongarch/vc1dsp_lasx.c | 16 ++-- libavutil/loongarch/loongson_intrinsics.h | 5 -- 3 files changed, 53 insertions(+), 58 deletions(-) diff --git a/libavcodec/loongarch/h264chroma_lasx.c b/libavcodec/loongarch/h264chroma_lasx.c index 5e611997f4..1c0e002bdf 100644 --- a/libavcodec/loongarch/h264chroma_lasx.c +++ b/libavcodec/loongarch/h264chroma_lasx.c @@ -51,7 +51,7 @@ static av_always_inline void avc_chroma_hv_8x4_lasx(const uint8_t *src, uint8_t __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); DUP2_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src1, src3); src0 = __lasx_xvshuf_b(src0, src0, mask); @@ -91,10 +91,10 @@ static av_always_inline void avc_chroma_hv_8x8_lasx(const uint8_t *src, uint8_t __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src6, src5, 0x20, src8, src7, 0x20, src1, src3, src5, src7); @@ -141,8 +141,8 @@ static av_always_inline void avc_chroma_hz_8x4_lasx(const uint8_t *src, uint8_t coeff_vec = __lasx_xvslli_b(coeff_vec, 3); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src1, src2); - src3 = LASX_XVLDX(src, stride_3x); + DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src1, src2); + src3 = __lasx_xvldx(src, stride_3x); DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src2); DUP2_ARG3(__lasx_xvshuf_b, src0, src0, mask, src2, src2, mask, src0, src2); DUP2_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, res0, res1); @@ -170,11 +170,11 @@ static av_always_inline void avc_chroma_hz_8x8_lasx(const uint8_t *src, uint8_t coeff_vec = __lasx_xvslli_b(coeff_vec, 3); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); src += stride_4x; - DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src5, src6); - src7 = LASX_XVLDX(src, stride_3x); + DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src5, src6); + src7 = __lasx_xvldx(src, stride_3x); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4, 0x20, src7, src6, 0x20, src0, src2, src4, src6); DUP4_ARG3(__lasx_xvshuf_b, src0, src0, mask, src2, src2, mask, src4, src4, mask, @@ -212,7 +212,7 @@ static av_always_inline void avc_chroma_hz_nonmult_lasx(const uint8_t *src, coeff_vec = __lasx_xvslli_b(coeff_vec, 3); for (row = height >> 2; row--;) { - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src0, src1, src2, src3); src += stride_4x; DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src2); @@ -228,7 +228,7 @@ static av_always_inline void avc_chroma_hz_nonmult_lasx(const uint8_t *src, if ((height & 3)) { src0 = __lasx_xvld(src, 0); - src1 = LASX_XVLDX(src, stride); + src1 = __lasx_xvldx(src, stride); src1 = __lasx_xvpermi_q(src1, src0, 0x20); src0 = __lasx_xvshuf_b(src1, src1, mask); res0 = __lasx_xvdp2_h_bu(src0, coeff_vec); @@ -253,7 +253,7 @@ static av_always_inline void avc_chroma_vt_8x4_lasx(const uint8_t *src, uint8_t coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); src += stride; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); @@ -282,10 +282,10 @@ static av_always_inline void avc_chroma_vt_8x8_lasx(const uint8_t *src, uint8_t coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); src += stride; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); @@ -402,7 +402,7 @@ static void avc_chroma_hv_4x2_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s __m256i coeff_vt_vec = __lasx_xvpermi_q(coeff_vt_vec1, coeff_vt_vec0, 0x02); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2, src1, src2); + DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2, src1, src2); DUP2_ARG3(__lasx_xvshuf_b, src1, src0, mask, src2, src1, mask, src0, src1); src0 = __lasx_xvpermi_q(src0, src1, 0x02); res_hz = __lasx_xvdp2_h_bu(src0, coeff_hz_vec); @@ -431,7 +431,7 @@ static void avc_chroma_hv_4x4_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); DUP4_ARG3(__lasx_xvshuf_b, src1, src0, mask, src2, src1, mask, src3, src2, mask, src4, src3, mask, src0, src1, src2, src3); @@ -464,10 +464,10 @@ static void avc_chroma_hv_4x8_lasx(const uint8_t *src, uint8_t * dst, ptrdiff_t __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); src += stride_4; - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvshuf_b, src1, src0, mask, src2, src1, mask, src3, src2, mask, src4, src3, mask, src0, src1, src2, src3); @@ -519,7 +519,7 @@ static void avc_chroma_hz_4x2_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - src1 = LASX_XVLDX(src, stride); + src1 = __lasx_xvldx(src, stride); src0 = __lasx_xvshuf_b(src1, src0, mask); res = __lasx_xvdp2_h_bu(src0, coeff_vec); res = __lasx_xvslli_h(res, 3); @@ -540,8 +540,8 @@ static void avc_chroma_hz_4x4_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2, src1, src2); - src3 = LASX_XVLDX(src, stride_3); + DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2, src1, src2); + src3 = __lasx_xvldx(src, stride_3); DUP2_ARG3(__lasx_xvshuf_b, src1, src0, mask, src3, src2, mask, src0, src2); src0 = __lasx_xvpermi_q(src0, src2, 0x02); res = __lasx_xvdp2_h_bu(src0, coeff_vec); @@ -567,11 +567,11 @@ static void avc_chroma_hz_4x8_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s coeff_vec = __lasx_xvslli_b(coeff_vec, 3); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 32, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); src += stride_4; - DUP2_ARG2(LASX_XVLDX, src, stride, src, stride_2, src5, src6); - src7 = LASX_XVLDX(src, stride_3); + DUP2_ARG2(__lasx_xvldx, src, stride, src, stride_2, src5, src6); + src7 = __lasx_xvldx(src, stride_3); DUP4_ARG3(__lasx_xvshuf_b, src1, src0, mask, src3, src2, mask, src5, src4, mask, src7, src6, mask, src0, src2, src4, src6); DUP2_ARG3(__lasx_xvpermi_q, src0, src2, 0x02, src4, src6, 0x02, src0, src4); @@ -625,7 +625,7 @@ static void avc_chroma_vt_4x2_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); src0 = __lasx_xvld(src, 0); - DUP2_ARG2(LASX_XVLDX, src, stride, src, stride << 1, src1, src2); + DUP2_ARG2(__lasx_xvldx, src, stride, src, stride << 1, src1, src2); DUP2_ARG2(__lasx_xvilvl_b, src1, src0, src2, src1, tmp0, tmp1); tmp0 = __lasx_xvilvl_d(tmp1, tmp0); res = __lasx_xvdp2_h_bu(tmp0, coeff_vec); @@ -649,7 +649,7 @@ static void avc_chroma_vt_4x4_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s __m256i coeff_vec = __lasx_xvilvl_b(coeff_vec0, coeff_vec1); src0 = __lasx_xvld(src, 0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); DUP4_ARG2(__lasx_xvilvl_b, src1, src0, src2, src1, src3, src2, src4, src3, tmp0, tmp1, tmp2, tmp3); @@ -679,10 +679,10 @@ static void avc_chroma_vt_4x8_lasx(const uint8_t *src, uint8_t *dst, ptrdiff_t s coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src1, src2, src3, src4); src += stride_4; - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2, src, stride_3, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2, src, stride_3, src, stride_4, src5, src6, src7, src8); DUP4_ARG2(__lasx_xvilvl_b, src1, src0, src2, src1, src3, src2, src4, src3, tmp0, tmp1, tmp2, tmp3); @@ -860,7 +860,7 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x4_lasx(const uint8_t * __m256i coeff_vt_vec1 = __lasx_xvreplgr2vr_h(coef_ver1); DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); DUP2_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src1, src3); src0 = __lasx_xvshuf_b(src0, src0, mask); @@ -874,7 +874,7 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x4_lasx(const uint8_t * res_vt0 = __lasx_xvmadd_h(res_vt0, res_hz0, coeff_vt_vec1); res_vt1 = __lasx_xvmadd_h(res_vt1, res_hz1, coeff_vt_vec1); out = __lasx_xvssrarni_bu_h(res_vt1, res_vt0, 6); - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); tp0 = __lasx_xvpermi_q(tp2, tp0, 0x20); @@ -907,10 +907,10 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x8_lasx(const uint8_t * DUP2_ARG2(__lasx_xvld, chroma_mask_arr, 0, src, 0, mask, src0); src += stride; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src2, src1, 0x20, src4, src3, 0x20, src6, src5, 0x20, src8, src7, 0x20, src1, src3, src5, src7); @@ -934,12 +934,12 @@ static av_always_inline void avc_chroma_hv_and_aver_dst_8x8_lasx(const uint8_t * res_vt3 = __lasx_xvmadd_h(res_vt3, res_hz3, coeff_vt_vec1); DUP2_ARG3(__lasx_xvssrarni_bu_h, res_vt1, res_vt0, 6, res_vt3, res_vt2, 6, out0, out1); - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); dst0 = __lasx_xvpermi_q(tp2, tp0, 0x20); dst += stride_4x; - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); dst -= stride_4x; DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); @@ -973,13 +973,13 @@ static av_always_inline void avc_chroma_hz_and_aver_dst_8x4_lasx(const uint8_t * coeff_vec = __lasx_xvslli_b(coeff_vec, 3); mask = __lasx_xvld(chroma_mask_arr, 0); - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src0, src1, src2, src3); DUP2_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src0, src2); DUP2_ARG3(__lasx_xvshuf_b, src0, src0, mask, src2, src2, mask, src0, src2); DUP2_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, res0, res1); out = __lasx_xvssrarni_bu_h(res1, res0, 6); - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); tp0 = __lasx_xvpermi_q(tp2, tp0, 0x20); @@ -1008,10 +1008,10 @@ static av_always_inline void avc_chroma_hz_and_aver_dst_8x8_lasx(const uint8_t * coeff_vec = __lasx_xvslli_b(coeff_vec, 3); mask = __lasx_xvld(chroma_mask_arr, 0); - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src0, src1, src2, src3); src += stride_4x; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src4, src5, src6, src7); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src3, src2, 0x20, src5, src4, 0x20, src7, src6, 0x20, src0, src2, src4, src6); @@ -1020,12 +1020,12 @@ static av_always_inline void avc_chroma_hz_and_aver_dst_8x8_lasx(const uint8_t * DUP4_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, src4, coeff_vec, src6, coeff_vec, res0, res1, res2, res3); DUP2_ARG3(__lasx_xvssrarni_bu_h, res1, res0, 6, res3, res2, 6, out0, out1); - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); dst0 = __lasx_xvpermi_q(tp2, tp0, 0x20); dst += stride_4x; - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); dst -= stride_4x; DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); @@ -1059,14 +1059,14 @@ static av_always_inline void avc_chroma_vt_and_aver_dst_8x4_lasx(const uint8_t * coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); - DUP4_ARG2(LASX_XVLDX, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, + DUP4_ARG2(__lasx_xvldx, src, stride, src, stride_2x, src, stride_3x, src, stride_4x, src1, src2, src3, src4); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); DUP2_ARG2(__lasx_xvilvl_b, src1, src0, src3, src2, src0, src2); DUP2_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, res0, res1); out = __lasx_xvssrarni_bu_h(res1, res0, 6); - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); tp0 = __lasx_xvpermi_q(tp2, tp0, 0x20); @@ -1095,10 +1095,10 @@ static av_always_inline void avc_chroma_vt_and_aver_dst_8x8_lasx(const uint8_t * coeff_vec = __lasx_xvslli_b(coeff_vec, 3); src0 = __lasx_xvld(src, 0); src += stride; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src1, src2, src3, src4); src += stride_4x; - DUP4_ARG2(LASX_XVLDX, src, 0, src, stride, src, stride_2x, src, stride_3x, + DUP4_ARG2(__lasx_xvldx, src, 0, src, stride, src, stride_2x, src, stride_3x, src5, src6, src7, src8); DUP4_ARG3(__lasx_xvpermi_q, src1, src0, 0x20, src2, src1, 0x20, src3, src2, 0x20, src4, src3, 0x20, src0, src1, src2, src3); @@ -1109,12 +1109,12 @@ static av_always_inline void avc_chroma_vt_and_aver_dst_8x8_lasx(const uint8_t * DUP4_ARG2(__lasx_xvdp2_h_bu, src0, coeff_vec, src2, coeff_vec, src4, coeff_vec, src6, coeff_vec, res0, res1, res2, res3); DUP2_ARG3(__lasx_xvssrarni_bu_h, res1, res0, 6, res3, res2, 6, out0, out1); - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); dst0 = __lasx_xvpermi_q(tp2, tp0, 0x20); dst += stride_4x; - DUP4_ARG2(LASX_XVLDX, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, + DUP4_ARG2(__lasx_xvldx, dst, 0, dst, stride, dst, stride_2x, dst, stride_3x, tp0, tp1, tp2, tp3); dst -= stride_4x; DUP2_ARG2(__lasx_xvilvl_d, tp2, tp0, tp3, tp1, tp0, tp2); diff --git a/libavcodec/loongarch/vc1dsp_lasx.c b/libavcodec/loongarch/vc1dsp_lasx.c index 12f68ee028..848fe4afb3 100644 --- a/libavcodec/loongarch/vc1dsp_lasx.c +++ b/libavcodec/loongarch/vc1dsp_lasx.c @@ -831,20 +831,20 @@ static void put_vc1_mspel_mc_h_lasx(uint8_t *dst, const uint8_t *src, const_para1_2 = __lasx_xvreplgr2vr_h(*(para_v + 1)); in0 = __lasx_xvld(_src, 0); - DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in1, in2); - in3 = LASX_XVLDX(_src, stride3); + DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in1, in2); + in3 = __lasx_xvldx(_src, stride3); _src += stride4; in4 = __lasx_xvld(_src, 0); - DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in5, in6); - in7 = LASX_XVLDX(_src, stride3); + DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in5, in6); + in7 = __lasx_xvldx(_src, stride3); _src += stride4; in8 = __lasx_xvld(_src, 0); - DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in9, in10); - in11 = LASX_XVLDX(_src, stride3); + DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in9, in10); + in11 = __lasx_xvldx(_src, stride3); _src += stride4; in12 = __lasx_xvld(_src, 0); - DUP2_ARG2(LASX_XVLDX, _src, stride, _src, stride2, in13, in14); - in15 = LASX_XVLDX(_src, stride3); + DUP2_ARG2(__lasx_xvldx, _src, stride, _src, stride2, in13, in14); + in15 = __lasx_xvldx(_src, stride3); DUP4_ARG2(__lasx_xvilvl_b, in2, in0, in3, in1, in6, in4, in7, in5, tmp0_m, tmp1_m, tmp2_m, tmp3_m); DUP4_ARG2(__lasx_xvilvl_b, in10, in8, in11, in9, in14, in12, in15, in13, diff --git a/libavutil/loongarch/loongson_intrinsics.h b/libavutil/loongarch/loongson_intrinsics.h index 090adab266..eb256863c8 100644 --- a/libavutil/loongarch/loongson_intrinsics.h +++ b/libavutil/loongarch/loongson_intrinsics.h @@ -716,11 +716,6 @@ static inline __m128i __lsx_vclip255_w(__m128i _in) { #ifdef __loongarch_asx #include - -/* __lasx_xvldx() in lasxintrin.h does not accept a const void*; - * remove the following once it does. */ -#define LASX_XVLDX(ptr, stride) __lasx_xvldx((void*)ptr, stride) - /* * ============================================================================= * Description : Dot product of byte vector elements