From patchwork Fri Feb 19 05:28:31 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jiaxun Yang <jiaxun.yang@flygoat.com>
X-Patchwork-Id: 25777
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 0E08944A97A
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 19 Feb 2021 07:29:02 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E69E368A0FC;
	Fri, 19 Feb 2021 07:29:01 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com
 [66.111.4.26])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D6A75689FD9
 for <ffmpeg-devel@ffmpeg.org>; Fri, 19 Feb 2021 07:28:54 +0200 (EET)
Received: from compute6.internal (compute6.nyi.internal [10.202.2.46])
 by mailout.nyi.internal (Postfix) with ESMTP id F03E45C0117;
 Fri, 19 Feb 2021 00:28:53 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163])
 by compute6.internal (MEProxy); Fri, 19 Feb 2021 00:28:53 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding; s=fm1; bh=QyuvEWTr+DoxJ
 rbFVoHRpgPJ6fmIWsQnLNwLE2V4p4I=; b=SqB5FaOPA+5xPYEYFV2jdCSwG1g6g
 j6h6N7lVvx0khEbFg5fK425GmUsPhrLyJFCQ4mxzEkXwkUWeGNWq6saAeZ/v7Zyz
 Mc+Iate99GPZVOPsQAzTm6sWtc0IqMGEySTGvW8OTYyyFpzsD0qxhJmbbqTqqRWd
 o1gJlT91X4UcFKUIq5hrC1o1y72CKul8IafjJaJqhIC18gEvLYgzrZyCQz3F1onJ
 67RQqRAQ9/66qsFaucfayMoH1L0BQIq/G/ja5Vkb99FkPzEDULCR48zTxJbxWmgc
 XIGuQMYxyGduZolBA135DR2lqH816SbG0tI3tYtPKwIhnqiyfBs213NbA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:date:from
 :in-reply-to:message-id:mime-version:references:subject:to
 :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=
 fm2; bh=QyuvEWTr+DoxJrbFVoHRpgPJ6fmIWsQnLNwLE2V4p4I=; b=suAq4RoH
 q60+HuIntKrlvxgnyrADtoJg01twfWlI922zMFnalRr4Dn6LSe0C1H1SFYvwJ3D3
 jITA+iqy6XhpUgBWhTXxDEB0WKUH3U61V+LRsIIQATMporStiPrygMHcvhw3wX+6
 CFOjX0sjL3qlSituuT4Iw42wwijLRm/9YZPR8rCsIksMt4a1LVg0Tt65vVFo7U5I
 L6xRgmIGmXqUvIEzhX9hTopiifsQRtA0x0JouGQnOXFcFn2IxvnFhTrIYdeSEFYa
 REibZhPSmi6fXA6heMREzvmrhJwi+CreeiseOn+82Q0mFt8bmCobvgF9r8EcRthy
 ebGJj2f5FCODnQ==
X-ME-Sender: <xms:lUwvYFq-pEJsnxN-7hYQMk_HVFiPHLavu-_7fyU9C2ypAtnQDym_5g>
 <xme:lUwvYHoEGDJXtdPqdt9RlFdZEYkvdwLqCB9N4HHFzVZMxLtDvk9uhLgKQKXd0pnxs
 VvaC67OLMqnAOsQttg>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd
 ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes
 fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke
 dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef
 ledrudeiudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh
 hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm
X-ME-Proxy: <xmx:lUwvYCPcF8zAYrkIetejdM2NdNoToQCiaod_242uM3CoYuVQcMxe5g>
 <xmx:lUwvYA6ZF-VgIRBLVW-p28jCFGXSGMR0fBV_LAe7NR4OBarPUx2SJw>
 <xmx:lUwvYE6Dujfl1JSmCqhZencZyzUwc1kof5v84q-krscF3t7leJqL5g>
 <xmx:lUwvYCR4dxwQJncoOXCK-6Nt-JSlqQQk_Fa5MDNA7iDT7H3saUJx3w>
Received: from strike.202.net.flygoat.com (unknown [183.157.39.161])
 by mail.messagingengine.com (Postfix) with ESMTPA id 66CCC108005B;
 Fri, 19 Feb 2021 00:28:51 -0500 (EST)
From: Jiaxun Yang <jiaxun.yang@flygoat.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 19 Feb 2021 13:28:31 +0800
Message-Id: <20210219052834.533558-2-jiaxun.yang@flygoat.com>
X-Mailer: git-send-email 2.30.1
In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
References: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 1/4] avutil/mips: Use MMI_{L,
	S}QC1 macro in {SAVE, RECOVER}_REG
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn,
 Jiaxun Yang <jiaxun.yang@flygoat.com>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

{SAVE,RECOVER}_REG will be avilable for Loongson2 again,
also comment about the magic.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
 libavutil/mips/mmiutils.h | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h
index 8f692e86c5..fb85a4dd1b 100644
--- a/libavutil/mips/mmiutils.h
+++ b/libavutil/mips/mmiutils.h
@@ -202,25 +202,27 @@
 #endif /* HAVE_LOONGSON2 */
 
 /**
- * backup register
+ * Backup saved registers
+ * We're not using compiler's clobber list as it's not smart enough
+ * to take advantage of quad word load/store.
  */
 #define BACKUP_REG \
   LOCAL_ALIGNED_16(double, temp_backup_reg, [8]);               \
   if (_MIPS_SIM == _ABI64)                                      \
     __asm__ volatile (                                          \
-      "gssqc1       $f25,      $f24,       0x00(%[temp])  \n\t" \
-      "gssqc1       $f27,      $f26,       0x10(%[temp])  \n\t" \
-      "gssqc1       $f29,      $f28,       0x20(%[temp])  \n\t" \
-      "gssqc1       $f31,      $f30,       0x30(%[temp])  \n\t" \
+      MMI_SQC1($f25, $f24, %[temp], 0x00)                       \
+      MMI_SQC1($f27, $f26, %[temp], 0x10)                       \
+      MMI_SQC1($f29, $f28, %[temp], 0x20)                       \
+      MMI_SQC1($f31, $f30, %[temp], 0x30)                       \
       :                                                         \
       : [temp]"r"(temp_backup_reg)                              \
       : "memory"                                                \
     );                                                          \
   else                                                          \
     __asm__ volatile (                                          \
-      "gssqc1       $f22,      $f20,       0x00(%[temp])  \n\t" \
-      "gssqc1       $f26,      $f24,       0x10(%[temp])  \n\t" \
-      "gssqc1       $f30,      $f28,       0x20(%[temp])  \n\t" \
+      MMI_SQC1($f22, $f20, %[temp], 0x10)                       \
+      MMI_SQC1($f26, $f24, %[temp], 0x10)                       \
+      MMI_SQC1($f30, $f28, %[temp], 0x20)                       \
       :                                                         \
       : [temp]"r"(temp_backup_reg)                              \
       : "memory"                                                \
@@ -232,19 +234,19 @@
 #define RECOVER_REG \
   if (_MIPS_SIM == _ABI64)                                      \
     __asm__ volatile (                                          \
-      "gslqc1       $f25,      $f24,       0x00(%[temp])  \n\t" \
-      "gslqc1       $f27,      $f26,       0x10(%[temp])  \n\t" \
-      "gslqc1       $f29,      $f28,       0x20(%[temp])  \n\t" \
-      "gslqc1       $f31,      $f30,       0x30(%[temp])  \n\t" \
+      MMI_LQC1($f25, $f24, %[temp], 0x00)                       \
+      MMI_LQC1($f27, $f26, %[temp], 0x10)                       \
+      MMI_LQC1($f29, $f28, %[temp], 0x20)                       \
+      MMI_LQC1($f31, $f30, %[temp], 0x30)                       \
       :                                                         \
       : [temp]"r"(temp_backup_reg)                              \
       : "memory"                                                \
     );                                                          \
   else                                                          \
     __asm__ volatile (                                          \
-      "gslqc1       $f22,      $f20,       0x00(%[temp])  \n\t" \
-      "gslqc1       $f26,      $f24,       0x10(%[temp])  \n\t" \
-      "gslqc1       $f30,      $f28,       0x20(%[temp])  \n\t" \
+      MMI_LQC1($f22, $f20, %[temp], 0x10)                       \
+      MMI_LQC1($f26, $f24, %[temp], 0x10)                       \
+      MMI_LQC1($f30, $f28, %[temp], 0x20)                       \
       :                                                         \
       : [temp]"r"(temp_backup_reg)                              \
       : "memory"                                                \

From patchwork Fri Feb 19 05:28:32 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jiaxun Yang <jiaxun.yang@flygoat.com>
X-Patchwork-Id: 25778
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 0D83A44ADFD
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 19 Feb 2021 07:29:05 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E65E868A17B;
	Fri, 19 Feb 2021 07:29:04 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com
 [66.111.4.26])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E6B65680534
 for <ffmpeg-devel@ffmpeg.org>; Fri, 19 Feb 2021 07:28:58 +0200 (EET)
Received: from compute6.internal (compute6.nyi.internal [10.202.2.46])
 by mailout.nyi.internal (Postfix) with ESMTP id 00E285C0129;
 Fri, 19 Feb 2021 00:28:58 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163])
 by compute6.internal (MEProxy); Fri, 19 Feb 2021 00:28:58 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding; s=fm1; bh=lyLBFKJSsiUDV
 jV10UTWKoiXdECTNkmamf0PZ1G2TxY=; b=p81OIMGwo8j0dKG5E6KxibcSZB/hI
 Arwy4O57A+4NJjkH83DE0tWIK6mJYc2U6sMztZNXSBRlfXggJHKDOYvUQ4dZVkka
 CtCLxZm6/DJlukGccf4CBM4A6bvAUrqXWp0UJ9HctQeuXL+8JYvBqoaGtjETihIU
 HEVFJu3oA6T34VvsluDuGiDm/Uki07B3PPplK5u0qkM7u3VO7mu+s7kTXb6QyQlr
 bJDMLjLtu1JVyNJREe0QADNLUjcgbijpeqDSDpzV35B7uiumvUOv/mMeUvKneoSX
 ThYjLeTK2t9YKSkph7qw57lVMJXTQ45O+l7FVw07BLbau1faLqKjxL5rg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:date:from
 :in-reply-to:message-id:mime-version:references:subject:to
 :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=
 fm2; bh=lyLBFKJSsiUDVjV10UTWKoiXdECTNkmamf0PZ1G2TxY=; b=fqD48/Wm
 5GvUYNvHW7cMONZ2afisUdKL2OkWU0EjxlFIgz1cHZb3lKfDS9TvGbu4GButtFzr
 VtaXTg64XsMnn/YCd7WAbF+WI1KuLMphHT4SqTl9PeRnfxQZgUYEhKm0oaBYf6jX
 fT/lKIpNofonI1AC9CQeNp9xFjWKMjgHc1Clae8k0BXYQFldokJjPquhiFW5ufNR
 3oCx4jFz3RCcV7qu0RUIPgVBn/LiXBActKs2QuGJEp9xexuApINvCujIMA4zni7s
 owI+5UYAqsMULKwb6PVMLQw3EIB4kyeaW7SYNxpPBugkSaikp/brOslbQ8z+/f5C
 XWS23P43LM7xMA==
X-ME-Sender: <xms:mUwvYFxtHcgPcaHKv4_5b6YQjk6DqBNVMyYA0eeADQgd9Y0Nhl_gTw>
 <xme:mUwvYFRUjoVdVcL_Lv7vHrBi2QYNX-TN96IFs54qBon69Ez1y0VJBBA48Nas-jUEf
 xZ9ISRLB5Se_cVUWUY>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd
 ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes
 fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke
 dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef
 ledrudeiudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh
 hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm
X-ME-Proxy: <xmx:mUwvYPW3LUqERYfGErkvNnApvPuktnnYu8CqegkOW5zftGDPtfJK0A>
 <xmx:mUwvYHjdfPmPcco_cz0UbtY9hCjlxhE1x-eikNFODqzlv9DSKY2g1w>
 <xmx:mUwvYHDgB-Dm2EYatW46aIlSn59vuUVGg1MGV8VCHbNCMYG0jyYWnQ>
 <xmx:mUwvYM4fVVzztMudAY3Khy83MhGmEZfAQd_xXt9wgISPnVC3eWg5LA>
Received: from strike.202.net.flygoat.com (unknown [183.157.39.161])
 by mail.messagingengine.com (Postfix) with ESMTPA id 645B91080057;
 Fri, 19 Feb 2021 00:28:54 -0500 (EST)
From: Jiaxun Yang <jiaxun.yang@flygoat.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 19 Feb 2021 13:28:32 +0800
Message-Id: <20210219052834.533558-3-jiaxun.yang@flygoat.com>
X-Mailer: git-send-email 2.30.1
In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
References: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 2/4] avutil/mips: Extract load store with
	shift C1 pair marco
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn,
 Jiaxun Yang <jiaxun.yang@flygoat.com>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

We're doing some fancy hacks with load store with shift C1
beside unaligned load store. Create a marco for l/r pair
to allow us use it in these places.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
 libavutil/mips/mmiutils.h | 49 ++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h
index fb85a4dd1b..3994085057 100644
--- a/libavutil/mips/mmiutils.h
+++ b/libavutil/mips/mmiutils.h
@@ -55,8 +55,9 @@
 #define MMI_LWC1(fp, addr, bias)                                            \
     "lwc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_ULWC1(fp, addr, bias)                                           \
-    "ulw        %[low32],   "#bias"("#addr")                        \n\t"   \
+#define MMI_LWLRC1(fp, addr, bias, off)                                     \
+    "lwl        %[low32],   "#bias"+"#off"("#addr")                 \n\t"   \
+    "lwr        %[low32],   "#bias"("#addr")                        \n\t"   \
     "mtc1       %[low32],   "#fp"                                   \n\t"
 
 #define MMI_LWXC1(fp, addr, stride, bias)                                   \
@@ -66,9 +67,10 @@
 #define MMI_SWC1(fp, addr, bias)                                            \
     "swc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_USWC1(fp, addr, bias)                                           \
+#define MMI_SWLRC1(fp, addr, bias, off)                                           \
     "mfc1       %[low32],   "#fp"                                   \n\t"   \
-    "usw        %[low32],   "#bias"("#addr")                        \n\t"
+    "swl        %[low32],   "#bias"+"#off"("#addr")                 \n\t"   \
+    "swr        %[low32],   "#bias"("#addr")                        \n\t"
 
 #define MMI_SWXC1(fp, addr, stride, bias)                                   \
     PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
@@ -77,8 +79,9 @@
 #define MMI_LDC1(fp, addr, bias)                                            \
     "ldc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_ULDC1(fp, addr, bias)                                           \
-    "uld        %[all64],   "#bias"("#addr")                        \n\t"   \
+#define MMI_LDLRC1(fp, addr, bias, off)                                     \
+    "ldl        %[all64],   "#bias"+"#off"("#addr")                 \n\t"   \
+    "ldr        %[all64],   "#bias"("#addr")                        \n\t"   \
     "dmtc1      %[all64],   "#fp"                                   \n\t"
 
 #define MMI_LDXC1(fp, addr, stride, bias)                                   \
@@ -88,9 +91,10 @@
 #define MMI_SDC1(fp, addr, bias)                                            \
     "sdc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_USDC1(fp, addr, bias)                                           \
+#define MMI_SDLRC1(fp, addr, bias, off)                                           \
     "dmfc1      %[all64],   "#fp"                                   \n\t"   \
-    "usd        %[all64],   "#bias"("#addr")                        \n\t"
+    "sdl        %[all64],   "#bias"+"#off"("#addr")                 \n\t"   \
+    "sdr        %[all64],   "#bias"("#addr")                        \n\t"
 
 #define MMI_SDXC1(fp, addr, stride, bias)                                   \
     PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
@@ -139,17 +143,18 @@
 #define DECLARE_VAR_LOW32       int32_t low32
 #define RESTRICT_ASM_LOW32      [low32]"=&r"(low32),
 
-#define MMI_ULWC1(fp, addr, bias)                                           \
-    "ulw        %[low32],   "#bias"("#addr")                        \n\t"   \
-    "mtc1       %[low32],   "#fp"                                   \n\t"
+#define MMI_LWLRC1(fp, addr, bias, off)                                     \
+    "lwl        %[low32],   "#bias"+"#off"("#addr")                 \n\t"   \
+    "lwr        %[low32],   "#bias"("#addr")                        \n\t"   \
+    "mtc1       %[low32],   "#fp"                                      \n\t"
 
 #else /* _MIPS_SIM != _ABIO32 */
 
 #define DECLARE_VAR_LOW32
 #define RESTRICT_ASM_LOW32
 
-#define MMI_ULWC1(fp, addr, bias)                                           \
-    "gslwlc1    "#fp",    3+"#bias"("#addr")                        \n\t"   \
+#define MMI_LWLRC1(fp, addr, bias, off)                                     \
+    "gslwlc1    "#fp",      "#off"+"#bias"("#addr")                 \n\t"   \
     "gslwrc1    "#fp",      "#bias"("#addr")                        \n\t"
 
 #endif /* _MIPS_SIM != _ABIO32 */
@@ -160,8 +165,8 @@
 #define MMI_SWC1(fp, addr, bias)                                            \
     "swc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_USWC1(fp, addr, bias)                                           \
-    "gsswlc1    "#fp",    3+"#bias"("#addr")                        \n\t"   \
+#define MMI_SWLRC1(fp, addr, bias, off)                                     \
+    "gsswlc1    "#fp",      "#off"+"#bias"("#addr")                 \n\t"   \
     "gsswrc1    "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_SWXC1(fp, addr, stride, bias)                                   \
@@ -170,8 +175,8 @@
 #define MMI_LDC1(fp, addr, bias)                                            \
     "ldc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_ULDC1(fp, addr, bias)                                           \
-    "gsldlc1    "#fp",    7+"#bias"("#addr")                        \n\t"   \
+#define MMI_LDLRC1(fp, addr, bias, off)                                     \
+    "gsldlc1    "#fp",      "#off"+"#bias"("#addr")                 \n\t"   \
     "gsldrc1    "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_LDXC1(fp, addr, stride, bias)                                   \
@@ -180,8 +185,8 @@
 #define MMI_SDC1(fp, addr, bias)                                            \
     "sdc1       "#fp",      "#bias"("#addr")                        \n\t"
 
-#define MMI_USDC1(fp, addr, bias)                                           \
-    "gssdlc1    "#fp",    7+"#bias"("#addr")                        \n\t"   \
+#define MMI_SDLRC1(fp, addr, bias, off)                                           \
+    "gssdlc1    "#fp",      "#off"+"#bias"("#addr")                 \n\t"   \
     "gssdrc1    "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_SDXC1(fp, addr, stride, bias)                                   \
@@ -201,6 +206,12 @@
 
 #endif /* HAVE_LOONGSON2 */
 
+#define MMI_ULWC1(fp, addr, bias) MMI_LWLRC1(fp, addr, bias, 3)
+#define MMI_USWC1(fp, addr, bias) MMI_SWLRC1(fp, addr, bias, 3)
+
+#define MMI_ULDC1(fp, addr, bias) MMI_LDLRC1(fp, addr, bias, 7)
+#define MMI_USDC1(fp, addr, bias) MMI_SDLRC1(fp, addr, bias, 7)
+
 /**
  * Backup saved registers
  * We're not using compiler's clobber list as it's not smart enough

From patchwork Fri Feb 19 05:28:33 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jiaxun Yang <jiaxun.yang@flygoat.com>
X-Patchwork-Id: 25779
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 3015444ADFD
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 19 Feb 2021 07:29:10 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 17EDC68A1EB;
	Fri, 19 Feb 2021 07:29:10 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com
 [66.111.4.26])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EB7F8680534
 for <ffmpeg-devel@ffmpeg.org>; Fri, 19 Feb 2021 07:29:03 +0200 (EET)
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.nyi.internal (Postfix) with ESMTP id 099015C0117;
 Fri, 19 Feb 2021 00:29:03 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Fri, 19 Feb 2021 00:29:03 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding; s=fm1; bh=GIv6Wg35V0Fh0
 cLUTNRBACT81ULh4YmCwmRH0zELwuw=; b=t2x7l3gV4BliO2Jbfr6gSoLMzKUcR
 GueZKQufltrgbl9qHg2r0X2ddjXZwgKrdHO166DU/ex5vzbVWGpCnkI+qF8BiU2o
 YPkAQihffH8/2QwTOqmma2md1uSQsfdtMJ8c4j9JZPenBf3APJilgddWLjdnXg/T
 85/7TNYAEc88rkT12QHSJY4oyWYDbiCae74PNyFZYNYOaQ/u4EEfh1yo3Qikw+Fx
 5i/0HBjeRvSlxMOUAJ0EPSptylC9iz4Ei0HZUiISwH1oTWWYqafhAnMlbQ1VKAco
 f+68V52JnZRi+9RUD3qZOAmom/Bgiindw1tfpBP4sZbaXfj8B4HovSirw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:date:from
 :in-reply-to:message-id:mime-version:references:subject:to
 :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=
 fm2; bh=GIv6Wg35V0Fh0cLUTNRBACT81ULh4YmCwmRH0zELwuw=; b=cmtQzy0X
 L4zrae9hiiAqsyVrtPGAVvR7rnpF+YhKVz4fanRWkTJ9zxxXsbQ5PqZuGvGDioTH
 CA7PJFfpNwwbMHqWKFMsByWfw6tM0MX5nzeSuOhINOclUCs0Zu2qlFmzOmjg2FqS
 fi4MeeVljyv4xiuGSK1gMTgsRuFWrlCEmG3GTdz9tb+SrooHlrOEY3BvPOf4wyIQ
 7eBcOz0LP9e/DK5/prhUW/I4PenY6V3GK8EF/ZlPW1e9aaduUOCH7Gy1dHU3pbgb
 JHQoKPO324qggMGW83orFbryk2ShT1EVQ8AKgPmYjWvk+KU7EUHR0KP2EX6F+akz
 Yg3jq8JxPcUFbg==
X-ME-Sender: <xms:nkwvYIdNMT_ldearZk9V2SlvpudWc1PAopq77pibQjacf949wecqTQ>
 <xme:nkwvYOdBPl_9-owOeeQm6EwvfLwkjykXjR1K91VdNGHzkQ3zFjJB5eEQk7FejBSJt
 fg36_3waUyDXb7dTLo>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd
 ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes
 fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke
 dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef
 ledrudeiudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh
 hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm
X-ME-Proxy: <xmx:nkwvYEiGVlHW5bWi4VCHGH-8pGt4uN-Q2uOoxUUmN2BQ-h2guOeOJA>
 <xmx:nkwvYCTm2qevnZMacOpedXSZoJ5x3rfeGi-lHV_sIl59U3osd7W0Xw>
 <xmx:nkwvYNz0u1wu4z58seUTZeozlAB1Qvm6nr26LRTorxLaZU5fHeBXow>
 <xmx:nkwvYKMA2ZRe5gYiWmm9BBfx4DPO63hSUfcFUaIxpCTVmGWIvOYEhA>
Received: from strike.202.net.flygoat.com (unknown [183.157.39.161])
 by mail.messagingengine.com (Postfix) with ESMTPA id 5D4FB1080057;
 Fri, 19 Feb 2021 00:28:58 -0500 (EST)
From: Jiaxun Yang <jiaxun.yang@flygoat.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 19 Feb 2021 13:28:33 +0800
Message-Id: <20210219052834.533558-4-jiaxun.yang@flygoat.com>
X-Mailer: git-send-email 2.30.1
In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
References: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 3/4] avcodec/mips: Use MMI marcos to replace
	Loongson3 instructions
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn,
 Jiaxun Yang <jiaxun.yang@flygoat.com>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Loongson3's extention instructions (prefixed with gs) are widely used
in our MMI codebase. However, these instructions are not avilable on
Loongson-2E/F while MMI code should work on these processors.

Previously we introduced mmiutils marcos to provide backward compactbility
but newly commited code didn't follow that. In this patch I revewed the
codebase and converted all these instructions into MMI marcos to get
Loongson2 supproted again.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
 libavcodec/mips/h264chroma_mmi.c  |  26 +++-
 libavcodec/mips/h264dsp_mmi.c     |   8 +-
 libavcodec/mips/hevcdsp_mmi.c     | 251 ++++++++++++------------------
 libavcodec/mips/hpeldsp_mmi.c     |   1 +
 libavcodec/mips/simple_idct_mmi.c |  49 +++---
 libavcodec/mips/vp3dsp_idct_mmi.c |  11 +-
 libavcodec/mips/vp8dsp_mmi.c      | 100 +++++-------
 libavcodec/mips/vp9_mc_mmi.c      | 128 ++++++---------
 8 files changed, 245 insertions(+), 329 deletions(-)

diff --git a/libavcodec/mips/h264chroma_mmi.c b/libavcodec/mips/h264chroma_mmi.c
index 739dd7d4d6..b6ea1ba3b1 100644
--- a/libavcodec/mips/h264chroma_mmi.c
+++ b/libavcodec/mips/h264chroma_mmi.c
@@ -32,6 +32,7 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
     int A = 64, B, C, D, E;
     double ftmp[12];
     uint64_t tmp[1];
+    DECLARE_VAR_ALL64;
 
     if (!(x || y)) {
         /* x=0, y=0, A=64 */
@@ -57,7 +58,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp3], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]          \n\t"
             "bnez       %[h],       1b                                 \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [dst]"+&r"(dst),              [src]"+&r"(src),
               [h]"+&r"(h)
@@ -152,7 +154,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp3], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]          \n\t"
             "bnez       %[h],       1b                                 \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [ftmp4]"=&f"(ftmp[4]),        [ftmp5]"=&f"(ftmp[5]),
               [ftmp6]"=&f"(ftmp[6]),        [ftmp7]"=&f"(ftmp[7]),
@@ -203,7 +206,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp1], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]          \n\t"
             "bnez       %[h],       1b                                 \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [ftmp4]"=&f"(ftmp[4]),        [ftmp5]"=&f"(ftmp[5]),
               [ftmp6]"=&f"(ftmp[6]),        [ftmp7]"=&f"(ftmp[7]),
@@ -272,7 +276,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp2], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]          \n\t"
             "bnez       %[h],       1b                                 \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [ftmp4]"=&f"(ftmp[4]),        [ftmp5]"=&f"(ftmp[5]),
               [ftmp6]"=&f"(ftmp[6]),        [ftmp7]"=&f"(ftmp[7]),
@@ -293,6 +298,7 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
     int A = 64, B, C, D, E;
     double ftmp[10];
     uint64_t tmp[1];
+    DECLARE_VAR_ALL64;
 
     if(!(x || y)){
         /* x=0, y=0, A=64 */
@@ -314,7 +320,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             PTR_ADDU   "%[dst],     %[dst],         %[stride]           \n\t"
             "addi       %[h],       %[h],           -0x02               \n\t"
             "bnez       %[h],       1b                                  \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [dst]"+&r"(dst),              [src]"+&r"(src),
               [h]"+&r"(h)
@@ -378,7 +385,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp1], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]      \n\t"
             "bnez       %[h],       1b                             \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [ftmp4]"=&f"(ftmp[4]),        [ftmp5]"=&f"(ftmp[5]),
               [ftmp6]"=&f"(ftmp[6]),        [ftmp7]"=&f"(ftmp[7]),
@@ -429,7 +437,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp1], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]      \n\t"
             "bnez       %[h],       1b                             \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [ftmp4]"=&f"(ftmp[4]),        [ftmp5]"=&f"(ftmp[5]),
               [ftmp6]"=&f"(ftmp[6]),        [ftmp7]"=&f"(ftmp[7]),
@@ -479,7 +488,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
             MMI_SDC1(%[ftmp1], %[dst], 0x00)
             PTR_ADDU   "%[dst],     %[dst],         %[stride]      \n\t"
             "bnez       %[h],       1b                             \n\t"
-            : [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
+            : RESTRICT_ASM_ALL64
+              [ftmp0]"=&f"(ftmp[0]),        [ftmp1]"=&f"(ftmp[1]),
               [ftmp2]"=&f"(ftmp[2]),        [ftmp3]"=&f"(ftmp[3]),
               [ftmp4]"=&f"(ftmp[4]),        [ftmp5]"=&f"(ftmp[5]),
               [ftmp6]"=&f"(ftmp[6]),        [ftmp7]"=&f"(ftmp[7]),
diff --git a/libavcodec/mips/h264dsp_mmi.c b/libavcodec/mips/h264dsp_mmi.c
index 173e191c77..cb78d5a2f8 100644
--- a/libavcodec/mips/h264dsp_mmi.c
+++ b/libavcodec/mips/h264dsp_mmi.c
@@ -39,8 +39,8 @@ void ff_h264_add_pixels4_8_mmi(uint8_t *dst, int16_t *src, int stride)
         MMI_LDC1(%[ftmp3], %[src], 0x10)
         MMI_LDC1(%[ftmp4], %[src], 0x18)
         /* memset(src, 0, 32); */
-        "gssqc1     %[ftmp0],   %[ftmp0],       0x00(%[src])            \n\t"
-        "gssqc1     %[ftmp0],   %[ftmp0],       0x10(%[src])            \n\t"
+        MMI_SQC1(%[ftmp0], %[ftmp0], %[src], 0x00)
+        MMI_SQC1(%[ftmp0], %[ftmp0], %[src], 0x10)
         MMI_ULWC1(%[ftmp5], %[dst0], 0x00)
         MMI_ULWC1(%[ftmp6], %[dst1], 0x00)
         MMI_ULWC1(%[ftmp7], %[dst2], 0x00)
@@ -89,8 +89,8 @@ void ff_h264_idct_add_8_mmi(uint8_t *dst, int16_t *block, int stride)
         MMI_LDC1(%[ftmp3], %[block], 0x18)
         /* memset(block, 0, 32) */
         "xor        %[ftmp4],   %[ftmp4],       %[ftmp4]                \n\t"
-        "gssqc1     %[ftmp4],   %[ftmp4],       0x00(%[block])          \n\t"
-        "gssqc1     %[ftmp4],   %[ftmp4],       0x10(%[block])          \n\t"
+        MMI_SQC1(%[ftmp4], %[ftmp4], %[block], 0x00)
+        MMI_SQC1(%[ftmp4], %[ftmp4], %[block], 0x10)
         "dli        %[tmp0],    0x01                                    \n\t"
         "mtc1       %[tmp0],    %[ftmp8]                                \n\t"
         "dli        %[tmp0],    0x06                                    \n\t"
diff --git a/libavcodec/mips/hevcdsp_mmi.c b/libavcodec/mips/hevcdsp_mmi.c
index aa83e1f9ad..29e8c885bd 100644
--- a/libavcodec/mips/hevcdsp_mmi.c
+++ b/libavcodec/mips/hevcdsp_mmi.c
@@ -35,6 +35,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src,     \
     uint64_t ftmp[15];                                                   \
     uint64_t rtmp[1];                                                    \
     const int8_t *filter = ff_hevc_qpel_filters[mx - 1];                 \
+    DECLARE_VAR_ALL64;                                                   \
                                                                          \
     x = x_step;                                                          \
     y = height;                                                          \
@@ -50,14 +51,10 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src,     \
                                                                          \
         "1:                                                     \n\t"    \
         "2:                                                     \n\t"    \
-        "gsldlc1      %[ftmp3],      0x07(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp3],      0x00(%[src])               \n\t"    \
-        "gsldlc1      %[ftmp4],      0x08(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp4],      0x01(%[src])               \n\t"    \
-        "gsldlc1      %[ftmp5],      0x09(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp5],      0x02(%[src])               \n\t"    \
-        "gsldlc1      %[ftmp6],      0x0a(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp6],      0x03(%[src])               \n\t"    \
+        MMI_ULDC1(%[ftmp3], %[src], 0x00)                                \
+        MMI_ULDC1(%[ftmp4], %[src], 0x01)                                \
+        MMI_ULDC1(%[ftmp5], %[src], 0x02)                                \
+        MMI_ULDC1(%[ftmp6], %[src], 0x03)                                \
         "punpcklbh    %[ftmp7],      %[ftmp3],      %[ftmp0]    \n\t"    \
         "punpckhbh    %[ftmp8],      %[ftmp3],      %[ftmp0]    \n\t"    \
         "pmullh       %[ftmp7],      %[ftmp7],      %[ftmp1]    \n\t"    \
@@ -83,8 +80,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src,     \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp4]    \n\t"    \
         "paddh        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"    \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"    \
-        "gssdlc1      %[ftmp3],      0x07(%[dst])               \n\t"    \
-        "gssdrc1      %[ftmp3],      0x00(%[dst])               \n\t"    \
+        MMI_ULDC1(%[ftmp3], %[dst], 0x00)                                \
                                                                          \
         "daddi        %[x],          %[x],         -0x01        \n\t"    \
         PTR_ADDIU    "%[src],        %[src],        0x04        \n\t"    \
@@ -98,7 +94,8 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src,     \
         PTR_ADDU     "%[src],        %[src],        %[stride]   \n\t"    \
         PTR_ADDIU    "%[dst],        %[dst],        0x80        \n\t"    \
         "bnez         %[y],          1b                         \n\t"    \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                  \
+        : RESTRICT_ASM_ALL64                                             \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                  \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                  \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                  \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                  \
@@ -134,6 +131,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
     int16_t *tmp = tmp_array;                                            \
     uint64_t ftmp[15];                                                   \
     uint64_t rtmp[1];                                                    \
+    DECLARE_VAR_ALL64;                                                   \
                                                                          \
     src   -= (QPEL_EXTRA_BEFORE * srcstride + 3);                        \
     filter = ff_hevc_qpel_filters[mx - 1];                               \
@@ -151,14 +149,10 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
                                                                          \
         "1:                                                     \n\t"    \
         "2:                                                     \n\t"    \
-        "gsldlc1      %[ftmp3],      0x07(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp3],      0x00(%[src])               \n\t"    \
-        "gsldlc1      %[ftmp4],      0x08(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp4],      0x01(%[src])               \n\t"    \
-        "gsldlc1      %[ftmp5],      0x09(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp5],      0x02(%[src])               \n\t"    \
-        "gsldlc1      %[ftmp6],      0x0a(%[src])               \n\t"    \
-        "gsldrc1      %[ftmp6],      0x03(%[src])               \n\t"    \
+        MMI_ULDC1(%[ftmp3], %[src], 0x00)                                \
+        MMI_ULDC1(%[ftmp4], %[src], 0x01)                                \
+        MMI_ULDC1(%[ftmp5], %[src], 0x02)                                \
+        MMI_ULDC1(%[ftmp6], %[src], 0x03)                                \
         "punpcklbh    %[ftmp7],      %[ftmp3],      %[ftmp0]    \n\t"    \
         "punpckhbh    %[ftmp8],      %[ftmp3],      %[ftmp0]    \n\t"    \
         "pmullh       %[ftmp7],      %[ftmp7],      %[ftmp1]    \n\t"    \
@@ -184,8 +178,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp4]    \n\t"    \
         "paddh        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"    \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"    \
-        "gssdlc1      %[ftmp3],      0x07(%[tmp])               \n\t"    \
-        "gssdrc1      %[ftmp3],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp3], %[tmp], 0x00)                                \
                                                                          \
         "daddi        %[x],          %[x],         -0x01        \n\t"    \
         PTR_ADDIU    "%[src],        %[src],        0x04        \n\t"    \
@@ -199,7 +192,8 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
         PTR_ADDU     "%[src],        %[src],        %[stride]   \n\t"    \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
         "bnez         %[y],          1b                         \n\t"    \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                  \
+        : RESTRICT_ASM_ALL64                                             \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                  \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                  \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                  \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                  \
@@ -228,29 +222,21 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
                                                                          \
         "1:                                                     \n\t"    \
         "2:                                                     \n\t"    \
-        "gsldlc1      %[ftmp3],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp3],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp3], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp4],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp4],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp4], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp5],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp5],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp5], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp6],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp6],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp6], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp7],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp7],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp7], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp8],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp8],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp8], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp9],      0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp9],      0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp9], %[tmp], 0x00)                                \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
-        "gsldlc1      %[ftmp10],     0x07(%[tmp])               \n\t"    \
-        "gsldrc1      %[ftmp10],     0x00(%[tmp])               \n\t"    \
+        MMI_ULDC1(%[ftmp10], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        -0x380      \n\t"    \
         TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6],             \
                      %[ftmp11], %[ftmp12], %[ftmp13], %[ftmp14])         \
@@ -275,8 +261,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
         "paddw        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"    \
         "psraw        %[ftmp5],      %[ftmp5],      %[ftmp0]    \n\t"    \
         "packsswh     %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"    \
-        "gssdlc1      %[ftmp3],      0x07(%[dst])               \n\t"    \
-        "gssdrc1      %[ftmp3],      0x00(%[dst])               \n\t"    \
+        MMI_USDC1(%[ftmp3], %[dst], 0x00)                               \
                                                                          \
         "daddi        %[x],          %[x],         -0x01        \n\t"    \
         PTR_ADDIU    "%[dst],        %[dst],        0x08        \n\t"    \
@@ -290,7 +275,8 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src,    \
         PTR_ADDIU    "%[dst],        %[dst],        0x80        \n\t"    \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"    \
         "bnez         %[y],          1b                         \n\t"    \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                  \
+        : RESTRICT_ASM_ALL64                                             \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                  \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                  \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                  \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                  \
@@ -333,6 +319,8 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst,               \
     uint64_t rtmp[1];                                                   \
     int shift = 7;                                                      \
     int offset = 64;                                                    \
+    DECLARE_VAR_ALL64;                                                  \
+    DECLARE_VAR_LOW32;                                                  \
                                                                         \
     x = width >> 2;                                                     \
     y = height;                                                         \
@@ -351,14 +339,10 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst,               \
         "1:                                                     \n\t"   \
         "li           %[x],        " #x_step "                  \n\t"   \
         "2:                                                     \n\t"   \
-        "gsldlc1      %[ftmp3],      0x07(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp3],      0x00(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp4],      0x08(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp4],      0x01(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp5],      0x09(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp5],      0x02(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp6],      0x0a(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp6],      0x03(%[src])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[src], 0x00)                               \
+        MMI_ULDC1(%[ftmp4], %[src], 0x01)                               \
+        MMI_ULDC1(%[ftmp5], %[src], 0x02)                               \
+        MMI_ULDC1(%[ftmp6], %[src], 0x03)                               \
         "punpcklbh    %[ftmp7],      %[ftmp3],      %[ftmp0]    \n\t"   \
         "punpckhbh    %[ftmp8],      %[ftmp3],      %[ftmp0]    \n\t"   \
         "pmullh       %[ftmp7],      %[ftmp7],      %[ftmp1]    \n\t"   \
@@ -385,8 +369,7 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst,               \
         "paddh        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"   \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"   \
         "paddh        %[ftmp3],      %[ftmp3],      %[offset]   \n\t"   \
-        "gsldlc1      %[ftmp4],      0x07(%[src2])              \n\t"   \
-        "gsldrc1      %[ftmp4],      0x00(%[src2])              \n\t"   \
+        MMI_ULDC1(%[ftmp4], %[src2], 0x00)                              \
         "li           %[rtmp0],      0x10                       \n\t"   \
         "dmtc1        %[rtmp0],      %[ftmp8]                   \n\t"   \
         "punpcklhw    %[ftmp5],      %[ftmp0],      %[ftmp3]    \n\t"   \
@@ -405,8 +388,7 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst,               \
         "pcmpgth      %[ftmp7],      %[ftmp5],      %[ftmp0]    \n\t"   \
         "and          %[ftmp3],      %[ftmp5],      %[ftmp7]    \n\t"   \
         "packushb     %[ftmp3],      %[ftmp3],      %[ftmp3]    \n\t"   \
-        "gsswlc1      %[ftmp3],      0x03(%[dst])               \n\t"   \
-        "gsswrc1      %[ftmp3],      0x00(%[dst])               \n\t"   \
+        MMI_USWC1(%[ftmp3], %[dst], 0x00)                               \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[src],        %[src],        0x04        \n\t"   \
@@ -422,7 +404,8 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst,               \
         PTR_ADDU     "%[dst],        %[dst],    %[dst_stride]   \n\t"   \
         PTR_ADDIU    "%[src2],       %[src2],       0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_ALL64 RESTRICT_ASM_LOW32                         \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
@@ -467,6 +450,8 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
     uint64_t rtmp[1];                                                   \
     int shift = 7;                                                      \
     int offset = 64;                                                    \
+    DECLARE_VAR_ALL64;                                                  \
+    DECLARE_VAR_LOW32;                                                  \
                                                                         \
     src   -= (QPEL_EXTRA_BEFORE * srcstride + 3);                       \
     filter = ff_hevc_qpel_filters[mx - 1];                              \
@@ -484,14 +469,10 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
                                                                         \
         "1:                                                     \n\t"   \
         "2:                                                     \n\t"   \
-        "gsldlc1      %[ftmp3],      0x07(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp3],      0x00(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp4],      0x08(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp4],      0x01(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp5],      0x09(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp5],      0x02(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp6],      0x0a(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp6],      0x03(%[src])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[src], 0x00)                               \
+        MMI_ULDC1(%[ftmp4], %[src], 0x01)                               \
+        MMI_ULDC1(%[ftmp5], %[src], 0x02)                               \
+        MMI_ULDC1(%[ftmp6], %[src], 0x03)                               \
         "punpcklbh    %[ftmp7],      %[ftmp3],      %[ftmp0]    \n\t"   \
         "punpckhbh    %[ftmp8],      %[ftmp3],      %[ftmp0]    \n\t"   \
         "pmullh       %[ftmp7],      %[ftmp7],      %[ftmp1]    \n\t"   \
@@ -517,8 +498,7 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp4]    \n\t"   \
         "paddh        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"   \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"   \
-        "gssdlc1      %[ftmp3],      0x07(%[tmp])               \n\t"   \
-        "gssdrc1      %[ftmp3],      0x00(%[tmp])               \n\t"   \
+        MMI_USDC1(%[ftmp3], %[tmp], 0x00)                               \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[src],        %[src],        0x04        \n\t"   \
@@ -532,7 +512,8 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         PTR_ADDU     "%[src],        %[src],        %[stride]   \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_ALL64                                            \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
@@ -563,29 +544,21 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "1:                                                     \n\t"   \
         "li           %[x],        " #x_step "                  \n\t"   \
         "2:                                                     \n\t"   \
-        "gsldlc1      %[ftmp3],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp3],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp4],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp4],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp4], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp5],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp5],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp5], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp6],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp6],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp6], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp7],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp7],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp7], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp8],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp8],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp8], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp9],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp9],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp9], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp10],     0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp10],     0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp10], %[tmp], 0x00)                              \
         PTR_ADDIU    "%[tmp],        %[tmp],        -0x380      \n\t"   \
         TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6],            \
                      %[ftmp11], %[ftmp12], %[ftmp13], %[ftmp14])        \
@@ -610,8 +583,7 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "paddw        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"   \
         "psraw        %[ftmp5],      %[ftmp5],      %[ftmp0]    \n\t"   \
         "packsswh     %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"   \
-        "gsldlc1      %[ftmp4],      0x07(%[src2])              \n\t"   \
-        "gsldrc1      %[ftmp4],      0x00(%[src2])              \n\t"   \
+        MMI_ULDC1(%[ftmp4], %[src2], 0x00)                              \
         "xor          %[ftmp7],      %[ftmp7],      %[ftmp7]    \n\t"   \
         "li           %[rtmp0],      0x10                       \n\t"   \
         "dmtc1        %[rtmp0],      %[ftmp8]                   \n\t"   \
@@ -633,8 +605,7 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "pcmpgth      %[ftmp7],      %[ftmp5],      %[ftmp7]    \n\t"   \
         "and          %[ftmp3],      %[ftmp5],      %[ftmp7]    \n\t"   \
         "packushb     %[ftmp3],      %[ftmp3],      %[ftmp3]    \n\t"   \
-        "gsswlc1      %[ftmp3],      0x03(%[dst])               \n\t"   \
-        "gsswrc1      %[ftmp3],      0x00(%[dst])               \n\t"   \
+        MMI_USWC1(%[ftmp3], %[dst], 0x00)                               \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[src2],       %[src2],       0x08        \n\t"   \
@@ -650,7 +621,8 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         PTR_ADDU     "%[dst],        %[dst],        %[stride]   \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_ALL64 RESTRICT_ASM_LOW32                         \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
@@ -696,6 +668,8 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
     uint64_t rtmp[1];                                                   \
     int shift = 7;                                                      \
     int offset = 64;                                                    \
+    DECLARE_VAR_ALL64;                                                  \
+    DECLARE_VAR_LOW32;                                                  \
                                                                         \
     src -= (EPEL_EXTRA_BEFORE * srcstride + 1);                         \
     x = width >> 2;                                                     \
@@ -710,14 +684,10 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
                                                                         \
         "1:                                                     \n\t"   \
         "2:                                                     \n\t"   \
-        "gslwlc1      %[ftmp2],      0x03(%[src])               \n\t"   \
-        "gslwrc1      %[ftmp2],      0x00(%[src])               \n\t"   \
-        "gslwlc1      %[ftmp3],      0x04(%[src])               \n\t"   \
-        "gslwrc1      %[ftmp3],      0x01(%[src])               \n\t"   \
-        "gslwlc1      %[ftmp4],      0x05(%[src])               \n\t"   \
-        "gslwrc1      %[ftmp4],      0x02(%[src])               \n\t"   \
-        "gslwlc1      %[ftmp5],      0x06(%[src])               \n\t"   \
-        "gslwrc1      %[ftmp5],      0x03(%[src])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[src], 0x00)                               \
+        MMI_ULDC1(%[ftmp4], %[src], 0x01)                               \
+        MMI_ULDC1(%[ftmp5], %[src], 0x02)                               \
+        MMI_ULDC1(%[ftmp6], %[src], 0x03)                               \
         "punpcklbh    %[ftmp2],      %[ftmp2],      %[ftmp0]    \n\t"   \
         "pmullh       %[ftmp2],      %[ftmp2],      %[ftmp1]    \n\t"   \
         "punpcklbh    %[ftmp3],      %[ftmp3],      %[ftmp0]    \n\t"   \
@@ -731,8 +701,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "paddh        %[ftmp2],      %[ftmp2],      %[ftmp3]    \n\t"   \
         "paddh        %[ftmp4],      %[ftmp4],      %[ftmp5]    \n\t"   \
         "paddh        %[ftmp2],      %[ftmp2],      %[ftmp4]    \n\t"   \
-        "gssdlc1      %[ftmp2],      0x07(%[tmp])               \n\t"   \
-        "gssdrc1      %[ftmp2],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp2], %[tmp], 0x00)                               \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[src],        %[src],        0x04        \n\t"   \
@@ -746,7 +715,8 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         PTR_ADDU     "%[src],        %[src],        %[stride]   \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_ALL64                                            \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
@@ -776,17 +746,13 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "1:                                                     \n\t"   \
         "li           %[x],        " #x_step "                  \n\t"   \
         "2:                                                     \n\t"   \
-        "gsldlc1      %[ftmp3],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp3],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp4],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp4],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp4], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp5],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp5],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp5], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp6],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp6],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp6], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],       -0x180       \n\t"   \
         TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6],            \
                      %[ftmp7], %[ftmp8], %[ftmp9], %[ftmp10])           \
@@ -801,8 +767,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "paddw        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"   \
         "psraw        %[ftmp5],      %[ftmp5],      %[ftmp0]    \n\t"   \
         "packsswh     %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"   \
-        "gsldlc1      %[ftmp4],      0x07(%[src2])              \n\t"   \
-        "gsldrc1      %[ftmp4],      0x00(%[src2])              \n\t"   \
+        MMI_ULDC1(%[ftmp4], %[tmp], 0x02)                               \
         "li           %[rtmp0],      0x10                       \n\t"   \
         "dmtc1        %[rtmp0],      %[ftmp8]                   \n\t"   \
         "punpcklhw    %[ftmp5],      %[ftmp2],      %[ftmp3]    \n\t"   \
@@ -823,8 +788,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         "pcmpgth      %[ftmp7],      %[ftmp5],      %[ftmp2]    \n\t"   \
         "and          %[ftmp3],      %[ftmp5],      %[ftmp7]    \n\t"   \
         "packushb     %[ftmp3],      %[ftmp3],      %[ftmp3]    \n\t"   \
-        "gsswlc1      %[ftmp3],      0x03(%[dst])               \n\t"   \
-        "gsswrc1      %[ftmp3],      0x00(%[dst])               \n\t"   \
+        MMI_USWC1(%[ftmp3], %[dst], 0x0)                                \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[src2],       %[src2],       0x08        \n\t"   \
@@ -840,7 +804,8 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst,              \
         PTR_ADDU     "%[dst],        %[dst],        %[stride]   \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_LOW32 RESTRICT_ASM_ALL64                         \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
@@ -878,6 +843,7 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst,             \
     uint64_t ftmp[12];                                                    \
     uint64_t rtmp[1];                                                     \
     int shift = 7;                                                        \
+    DECLARE_VAR_ALL64;                                                    \
                                                                           \
     y = height;                                                           \
     x = width >> 3;                                                       \
@@ -894,12 +860,9 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst,             \
                                                                           \
         "1:                                                     \n\t"     \
         "2:                                                     \n\t"     \
-        "gsldlc1      %[ftmp5],      0x07(%[src])               \n\t"     \
-        "gsldrc1      %[ftmp5],      0x00(%[src])               \n\t"     \
-        "gsldlc1      %[ftmp2],      0x07(%[src2])              \n\t"     \
-        "gsldrc1      %[ftmp2],      0x00(%[src2])              \n\t"     \
-        "gsldlc1      %[ftmp3],      0x0f(%[src2])              \n\t"     \
-        "gsldrc1      %[ftmp3],      0x08(%[src2])              \n\t"     \
+        MMI_ULDC1(%[ftmp5], %[src], 0x00)                                 \
+        MMI_ULDC1(%[ftmp2], %[src2], 0x00)                                \
+        MMI_ULDC1(%[ftmp3], %[src2], 0x08)                                \
         "punpcklbh    %[ftmp4],      %[ftmp5],      %[ftmp0]    \n\t"     \
         "punpckhbh    %[ftmp5],      %[ftmp5],      %[ftmp0]    \n\t"     \
         "psllh        %[ftmp4],      %[ftmp4],      %[ftmp1]    \n\t"     \
@@ -933,8 +896,7 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst,             \
         "and          %[ftmp2],      %[ftmp2],      %[ftmp3]    \n\t"     \
         "and          %[ftmp4],      %[ftmp4],      %[ftmp5]    \n\t"     \
         "packushb     %[ftmp2],      %[ftmp2],      %[ftmp4]    \n\t"     \
-        "gssdlc1      %[ftmp2],      0x07(%[dst])               \n\t"     \
-        "gssdrc1      %[ftmp2],      0x00(%[dst])               \n\t"     \
+        MMI_USDC1(%[ftmp2], %[dst], 0x0)                                  \
                                                                           \
         "daddi        %[x],          %[x],         -0x01        \n\t"     \
         PTR_ADDIU    "%[src],        %[src],        0x08        \n\t"     \
@@ -951,7 +913,8 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst,             \
         PTR_ADDU     "%[dst],        %[dst],       %[dststride] \n\t"     \
         PTR_ADDIU    "%[src2],       %[src2],       0x80        \n\t"     \
         "bnez         %[y],          1b                         \n\t"     \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                   \
+        : RESTRICT_ASM_ALL64                                              \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                   \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                   \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                   \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                   \
@@ -993,6 +956,8 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
     uint64_t rtmp[1];                                                   \
     int shift = 6;                                                      \
     int offset = 32;                                                    \
+    DECLARE_VAR_ALL64;                                                  \
+    DECLARE_VAR_LOW32;                                                  \
                                                                         \
     src   -= (QPEL_EXTRA_BEFORE * srcstride + 3);                       \
     filter = ff_hevc_qpel_filters[mx - 1];                              \
@@ -1010,14 +975,10 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
                                                                         \
         "1:                                                     \n\t"   \
         "2:                                                     \n\t"   \
-        "gsldlc1      %[ftmp3],      0x07(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp3],      0x00(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp4],      0x08(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp4],      0x01(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp5],      0x09(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp5],      0x02(%[src])               \n\t"   \
-        "gsldlc1      %[ftmp6],      0x0a(%[src])               \n\t"   \
-        "gsldrc1      %[ftmp6],      0x03(%[src])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[src], 0x00)                               \
+        MMI_ULDC1(%[ftmp4], %[src], 0x01)                               \
+        MMI_ULDC1(%[ftmp5], %[src], 0x02)                               \
+        MMI_ULDC1(%[ftmp6], %[src], 0x03)                               \
         "punpcklbh    %[ftmp7],      %[ftmp3],      %[ftmp0]    \n\t"   \
         "punpckhbh    %[ftmp8],      %[ftmp3],      %[ftmp0]    \n\t"   \
         "pmullh       %[ftmp7],      %[ftmp7],      %[ftmp1]    \n\t"   \
@@ -1043,8 +1004,7 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp4]    \n\t"   \
         "paddh        %[ftmp5],      %[ftmp5],      %[ftmp6]    \n\t"   \
         "paddh        %[ftmp3],      %[ftmp3],      %[ftmp5]    \n\t"   \
-        "gssdlc1      %[ftmp3],      0x07(%[tmp])               \n\t"   \
-        "gssdrc1      %[ftmp3],      0x00(%[tmp])               \n\t"   \
+        MMI_USDC1(%[ftmp3], %[tmp], 0x0)                                \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[src],        %[src],        0x04        \n\t"   \
@@ -1058,7 +1018,8 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
         PTR_ADDU     "%[src],        %[src],        %[stride]   \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_ALL64                                            \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
@@ -1090,29 +1051,21 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
         "1:                                                     \n\t"   \
         "li           %[x],        " #x_step "                  \n\t"   \
         "2:                                                     \n\t"   \
-        "gsldlc1      %[ftmp3],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp3],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp3], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp4],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp4],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp4], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp5],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp5],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp5], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp6],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp6],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp6], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp7],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp7],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp7], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp8],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp8],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp8], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp9],      0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp9],      0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp9], %[tmp], 0x00)                               \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
-        "gsldlc1      %[ftmp10],     0x07(%[tmp])               \n\t"   \
-        "gsldrc1      %[ftmp10],     0x00(%[tmp])               \n\t"   \
+        MMI_ULDC1(%[ftmp10], %[tmp], 0x00)                              \
         PTR_ADDIU    "%[tmp],        %[tmp],        -0x380      \n\t"   \
         TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6],            \
                      %[ftmp11], %[ftmp12], %[ftmp13], %[ftmp14])        \
@@ -1143,8 +1096,7 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
         "pcmpgth      %[ftmp7],      %[ftmp3],      %[ftmp7]    \n\t"   \
         "and          %[ftmp3],      %[ftmp3],      %[ftmp7]    \n\t"   \
         "packushb     %[ftmp3],      %[ftmp3],      %[ftmp3]    \n\t"   \
-        "gsswlc1      %[ftmp3],      0x03(%[dst])               \n\t"   \
-        "gsswrc1      %[ftmp3],      0x00(%[dst])               \n\t"   \
+        MMI_USWC1(%[ftmp3], %[dst], 0x00)                               \
                                                                         \
         "daddi        %[x],          %[x],         -0x01        \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x08        \n\t"   \
@@ -1157,7 +1109,8 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst,             \
         PTR_ADDU     "%[dst],        %[dst],        %[stride]   \n\t"   \
         PTR_ADDIU    "%[tmp],        %[tmp],        0x80        \n\t"   \
         "bnez         %[y],          1b                         \n\t"   \
-        : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
+        : RESTRICT_ASM_ALL64 RESTRICT_ASM_LOW32                         \
+          [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]),                 \
           [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]),                 \
           [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]),                 \
           [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]),                 \
diff --git a/libavcodec/mips/hpeldsp_mmi.c b/libavcodec/mips/hpeldsp_mmi.c
index e69b2bd980..ce51815ff4 100644
--- a/libavcodec/mips/hpeldsp_mmi.c
+++ b/libavcodec/mips/hpeldsp_mmi.c
@@ -307,6 +307,7 @@ inline void ff_put_pixels4_l2_8_mmi(uint8_t *dst, const uint8_t *src1,
     double ftmp[4];
     mips_reg addr[5];
     DECLARE_VAR_LOW32;
+    DECLARE_VAR_ADDRT;
 
     __asm__ volatile (
         "1:                                                             \n\t"
diff --git a/libavcodec/mips/simple_idct_mmi.c b/libavcodec/mips/simple_idct_mmi.c
index 73d797ffbc..ca29e2ea4b 100644
--- a/libavcodec/mips/simple_idct_mmi.c
+++ b/libavcodec/mips/simple_idct_mmi.c
@@ -55,6 +55,8 @@ DECLARE_ALIGNED(16, const int16_t, W_arr)[46] = {
 
 void ff_simple_idct_8_mmi(int16_t *block)
 {
+    DECLARE_VAR_ALL64;
+
     BACKUP_REG
     __asm__ volatile (
 
@@ -141,20 +143,20 @@ void ff_simple_idct_8_mmi(int16_t *block)
         /* idctRowCondDC row0~8 */
 
         /* load W */
-        "gslqc1       $f19,      $f18,      0x00(%[w_arr])      \n\t"
-        "gslqc1       $f21,      $f20,      0x10(%[w_arr])      \n\t"
-        "gslqc1       $f23,      $f22,      0x20(%[w_arr])      \n\t"
-        "gslqc1       $f25,      $f24,      0x30(%[w_arr])      \n\t"
-        "gslqc1       $f17,      $f16,      0x40(%[w_arr])      \n\t"
+        MMI_LQC1($f19, $f18, %[w_arr], 0x00)
+        MMI_LQC1($f21, $f20, %[w_arr], 0x10)
+        MMI_LQC1($f23, $f22, %[w_arr], 0x20)
+        MMI_LQC1($f25, $f24, %[w_arr], 0x30)
+        MMI_LQC1($f17, $f16, %[w_arr], 0x40)
         /* load source in block */
-        "gslqc1       $f1,       $f0,       0x00(%[block])      \n\t"
-        "gslqc1       $f3,       $f2,       0x10(%[block])      \n\t"
-        "gslqc1       $f5,       $f4,       0x20(%[block])      \n\t"
-        "gslqc1       $f7,       $f6,       0x30(%[block])      \n\t"
-        "gslqc1       $f9,       $f8,       0x40(%[block])      \n\t"
-        "gslqc1       $f11,      $f10,      0x50(%[block])      \n\t"
-        "gslqc1       $f13,      $f12,      0x60(%[block])      \n\t"
-        "gslqc1       $f15,      $f14,      0x70(%[block])      \n\t"
+        MMI_LQC1($f1, $f0, %[block], 0x00)
+        MMI_LQC1($f3, $f2, %[block], 0x10)
+        MMI_LQC1($f5, $f4, %[block], 0x20)
+        MMI_LQC1($f7, $f6, %[block], 0x30)
+        MMI_LQC1($f9, $f8, %[block], 0x40)
+        MMI_LQC1($f11, $f10, %[block], 0x50)
+        MMI_LQC1($f13, $f12, %[block], 0x60)
+        MMI_LQC1($f15, $f14, %[block], 0x70)
 
         /* $9: mask ; $f17: ROW_SHIFT */
         "dmfc1        $9,        $f17                           \n\t"
@@ -252,8 +254,7 @@ void ff_simple_idct_8_mmi(int16_t *block)
         /* idctSparseCol col0~3 */
 
         /* $f17: ff_p16_32; $f16: COL_SHIFT-16 */
-        "gsldlc1      $f17,      0x57(%[w_arr])                 \n\t"
-        "gsldrc1      $f17,      0x50(%[w_arr])                 \n\t"
+        MMI_ULDC1($f17, %[w_arr], 0x50)
         "li           $10,       4                              \n\t"
         "dmtc1        $10,       $f16                           \n\t"
         "paddh        $f0,       $f0,       $f17                \n\t"
@@ -394,16 +395,16 @@ void ff_simple_idct_8_mmi(int16_t *block)
         "punpcklwd    $f11,      $f27,      $f29                \n\t"
         "punpckhwd    $f15,      $f27,      $f29                \n\t"
         /* Store */
-        "gssqc1       $f1,       $f0,       0x00(%[block])      \n\t"
-        "gssqc1       $f5,       $f4,       0x10(%[block])      \n\t"
-        "gssqc1       $f9,       $f8,       0x20(%[block])      \n\t"
-        "gssqc1       $f13,      $f12,      0x30(%[block])      \n\t"
-        "gssqc1       $f3,       $f2,       0x40(%[block])      \n\t"
-        "gssqc1       $f7,       $f6,       0x50(%[block])      \n\t"
-        "gssqc1       $f11,      $f10,      0x60(%[block])      \n\t"
-        "gssqc1       $f15,      $f14,      0x70(%[block])      \n\t"
+        MMI_SQC1($f1, $f0, %[block], 0x00)
+        MMI_SQC1($f5, $f4, %[block], 0x10)
+        MMI_SQC1($f9, $f8, %[block], 0x20)
+        MMI_SQC1($f13, $f12, %[block], 0x30)
+        MMI_SQC1($f3, $f2, %[block], 0x40)
+        MMI_SQC1($f7, $f6, %[block], 0x50)
+        MMI_SQC1($f11, $f10, %[block], 0x60)
+        MMI_SQC1($f15, $f14, %[block], 0x70)
 
-        : [block]"+&r"(block)
+        : RESTRICT_ASM_ALL64 [block]"+&r"(block)
         : [w_arr]"r"(W_arr)
         : "memory"
     );
diff --git a/libavcodec/mips/vp3dsp_idct_mmi.c b/libavcodec/mips/vp3dsp_idct_mmi.c
index c5c4cf3127..cc1e5bf595 100644
--- a/libavcodec/mips/vp3dsp_idct_mmi.c
+++ b/libavcodec/mips/vp3dsp_idct_mmi.c
@@ -722,6 +722,8 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1,
     if (h == 8) {
         double ftmp[6];
         uint64_t tmp[2];
+        DECLARE_VAR_ALL64;
+    
         __asm__ volatile (
             "li          %[tmp0],        0x08                            \n\t"
             "li          %[tmp1],        0xfefefefe                      \n\t"
@@ -730,10 +732,8 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1,
             "li          %[tmp1],        0x01                            \n\t"
             "dmtc1       %[tmp1],        %[ftmp5]                        \n\t"
             "1:                                                          \n\t"
-            "gsldlc1     %[ftmp1],       0x07(%[src1])                   \n\t"
-            "gsldrc1     %[ftmp1],       0x00(%[src1])                   \n\t"
-            "gsldlc1     %[ftmp2],       0x07(%[src2])                   \n\t"
-            "gsldrc1     %[ftmp2],       0x00(%[src2])                   \n\t"
+            MMI_ULDC1(%[ftmp1], %[src1], 0x0)
+            MMI_ULDC1(%[ftmp2], %[src2], 0x0)
             "xor         %[ftmp3],       %[ftmp1],             %[ftmp2]  \n\t"
             "and         %[ftmp3],       %[ftmp3],             %[ftmp4]  \n\t"
             "psrlw       %[ftmp3],       %[ftmp3],             %[ftmp5]  \n\t"
@@ -745,7 +745,8 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1,
             PTR_ADDU    "%[dst],         %[dst],               %[stride] \n\t"
             PTR_ADDIU   "%[tmp0],        %[tmp0],              -0x01     \n\t"
             "bnez        %[tmp0],        1b                              \n\t"
-            : [dst]"+&r"(dst), [src1]"+&r"(src1), [src2]"+&r"(src2),
+            : RESTRICT_ASM_ALL64
+              [dst]"+&r"(dst), [src1]"+&r"(src1), [src2]"+&r"(src2),
               [ftmp1]"=&f"(ftmp[0]), [ftmp2]"=&f"(ftmp[1]), [ftmp3]"=&f"(ftmp[2]),
               [ftmp4]"=&f"(ftmp[3]), [ftmp5]"=&f"(ftmp[4]), [ftmp6]"=&f"(ftmp[5]),
               [tmp0]"=&r"(tmp[0]), [tmp1]"=&r"(tmp[1])
diff --git a/libavcodec/mips/vp8dsp_mmi.c b/libavcodec/mips/vp8dsp_mmi.c
index bd80aa1445..f76c8625f0 100644
--- a/libavcodec/mips/vp8dsp_mmi.c
+++ b/libavcodec/mips/vp8dsp_mmi.c
@@ -789,51 +789,40 @@ static av_always_inline void vp8_v_loop_filter8_mmi(uint8_t *dst,
     DECLARE_DOUBLE_1;
     DECLARE_DOUBLE_2;
     DECLARE_UINT32_T;
+    DECLARE_VAR_ALL64;
+
     __asm__ volatile(
         /* Get data from dst */
-        "gsldlc1    %[q0],      0x07(%[dst])                      \n\t"
-        "gsldrc1    %[q0],      0x00(%[dst])                      \n\t"
+        MMI_ULDC1(%[q0], %[dst], 0x0)
         PTR_SUBU    "%[tmp0],   %[dst],         %[stride]         \n\t"
-        "gsldlc1    %[p0],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[p0],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[p0], %[tmp0], 0x0)
         PTR_SUBU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gsldlc1    %[p1],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[p1],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[p1], %[tmp0], 0x0)
         PTR_SUBU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gsldlc1    %[p2],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[p2],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[p2], %[tmp0], 0x0)
         PTR_SUBU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gsldlc1    %[p3],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[p3],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[p3], %[tmp0], 0x0)
         PTR_ADDU    "%[tmp0],   %[dst],         %[stride]         \n\t"
-        "gsldlc1    %[q1],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[q1],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[q1], %[tmp0], 0x0)
         PTR_ADDU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gsldlc1    %[q2],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[q2],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[q2], %[tmp0], 0x0)
         PTR_ADDU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gsldlc1    %[q3],      0x07(%[tmp0])                     \n\t"
-        "gsldrc1    %[q3],      0x00(%[tmp0])                     \n\t"
+        MMI_ULDC1(%[q3], %[tmp0], 0x0)
         MMI_VP8_LOOP_FILTER
         /* Move to dst */
-        "gssdlc1    %[q0],      0x07(%[dst])                      \n\t"
-        "gssdrc1    %[q0],      0x00(%[dst])                      \n\t"
+        MMI_USDC1(%[q0], %[dst], 0x0)
         PTR_SUBU    "%[tmp0],   %[dst],         %[stride]         \n\t"
-        "gssdlc1    %[p0],      0x07(%[tmp0])                     \n\t"
-        "gssdrc1    %[p0],      0x00(%[tmp0])                     \n\t"
+        MMI_USDC1(%[p0], %[tmp0], 0x0)
         PTR_SUBU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gssdlc1    %[p1],      0x07(%[tmp0])                     \n\t"
-        "gssdrc1    %[p1],      0x00(%[tmp0])                     \n\t"
+        MMI_USDC1(%[p1], %[tmp0], 0x0)
         PTR_SUBU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gssdlc1    %[p2],      0x07(%[tmp0])                     \n\t"
-        "gssdrc1    %[p2],      0x00(%[tmp0])                     \n\t"
+        MMI_USDC1(%[p2], %[tmp0], 0x0)
         PTR_ADDU    "%[tmp0],   %[dst],         %[stride]         \n\t"
-        "gssdlc1    %[q1],      0x07(%[tmp0])                     \n\t"
-        "gssdrc1    %[q1],      0x00(%[tmp0])                     \n\t"
+        MMI_USDC1(%[q1], %[tmp0], 0x0)
         PTR_ADDU    "%[tmp0],   %[tmp0],        %[stride]         \n\t"
-        "gssdlc1    %[q2],      0x07(%[tmp0])                     \n\t"
-        "gssdrc1    %[q2],      0x00(%[tmp0])                     \n\t"
-        : [p3]"=&f"(ftmp[0]),       [p2]"=&f"(ftmp[1]),
+        MMI_USDC1(%[q2], %[tmp0], 0x0)
+        : RESTRICT_ASM_ALL64
+          [p3]"=&f"(ftmp[0]),       [p2]"=&f"(ftmp[1]),
           [p1]"=&f"(ftmp[2]),       [p0]"=&f"(ftmp[3]),
           [q0]"=&f"(ftmp[4]),       [q1]"=&f"(ftmp[5]),
           [q2]"=&f"(ftmp[6]),       [q3]"=&f"(ftmp[7]),
@@ -874,31 +863,25 @@ static av_always_inline void vp8_h_loop_filter8_mmi(uint8_t *dst,
     DECLARE_DOUBLE_1;
     DECLARE_DOUBLE_2;
     DECLARE_UINT32_T;
+    DECLARE_VAR_ALL64;
+
     __asm__ volatile(
         /* Get data from dst */
-        "gsldlc1    %[p3],        0x03(%[dst])                    \n\t"
-        "gsldrc1    %[p3],        -0x04(%[dst])                   \n\t"
+        MMI_ULDC1(%[p3], %[dst], -0x04)
         PTR_ADDU    "%[tmp0],     %[dst],           %[stride]     \n\t"
-        "gsldlc1    %[p2],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[p2],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[p2], %[tmp0], -0x04)
         PTR_ADDU    "%[tmp0],     %[tmp0],          %[stride]     \n\t"
-        "gsldlc1    %[p1],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[p1],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[p1], %[tmp0], -0x04)
         PTR_ADDU    "%[tmp0],     %[tmp0],          %[stride]     \n\t"
-        "gsldlc1    %[p0],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[p0],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[p0], %[tmp0], -0x04)
         PTR_ADDU    "%[tmp0],     %[tmp0],          %[stride]     \n\t"
-        "gsldlc1    %[q0],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[q0],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[q0], %[tmp0], -0x04)
         PTR_ADDU    "%[tmp0],     %[tmp0],          %[stride]     \n\t"
-        "gsldlc1    %[q1],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[q1],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[q1], %[tmp0], -0x04)
         PTR_ADDU    "%[tmp0],     %[tmp0],          %[stride]     \n\t"
-        "gsldlc1    %[q2],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[q2],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[q2], %[tmp0], -0x04)
         PTR_ADDU    "%[tmp0],     %[tmp0],          %[stride]     \n\t"
-        "gsldlc1    %[q3],        0x03(%[tmp0])                   \n\t"
-        "gsldrc1    %[q3],        -0x04(%[tmp0])                  \n\t"
+        MMI_ULDC1(%[q3], %[tmp0], -0x04)
         /* Matrix transpose */
         TRANSPOSE_8B(%[p3], %[p2], %[p1], %[p0],
                      %[q0], %[q1], %[q2], %[q3],
@@ -909,30 +892,23 @@ static av_always_inline void vp8_h_loop_filter8_mmi(uint8_t *dst,
                      %[q0], %[q1], %[q2], %[q3],
                      %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4])
         /* Move to dst */
-        "gssdlc1    %[p3],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[p3],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[p3], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[p2],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[p2],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[p2], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[p1],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[p1],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[p1], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[p0],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[p0],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[p0], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[q0],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[q0],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[q0], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[q1],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[q1],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[q1], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[q2],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[q2],        -0x04(%[dst])                   \n\t"
+        MMI_USDC1(%[q2], %[dst], -0x04)
         PTR_ADDU    "%[dst],      %[dst],           %[stride]     \n\t"
-        "gssdlc1    %[q3],        0x03(%[dst])                    \n\t"
-        "gssdrc1    %[q3],        -0x04(%[dst])                   \n\t"
-        : [p3]"=&f"(ftmp[0]),       [p2]"=&f"(ftmp[1]),
+        MMI_USDC1(%[q3], %[dst], -0x04)
+        : RESTRICT_ASM_ALL64
+          [p3]"=&f"(ftmp[0]),       [p2]"=&f"(ftmp[1]),
           [p1]"=&f"(ftmp[2]),       [p0]"=&f"(ftmp[3]),
           [q0]"=&f"(ftmp[4]),       [q1]"=&f"(ftmp[5]),
           [q2]"=&f"(ftmp[6]),       [q3]"=&f"(ftmp[7]),
diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c
index e7a83875b9..57825fb967 100644
--- a/libavcodec/mips/vp9_mc_mmi.c
+++ b/libavcodec/mips/vp9_mc_mmi.c
@@ -77,29 +77,24 @@ static void convolve_horiz_mmi(const uint8_t *src, int32_t src_stride,
 {
     double ftmp[15];
     uint32_t tmp[2];
+    DECLARE_VAR_ALL64;
     src -= 3;
     src_stride -= w;
     dst_stride -= w;
     __asm__ volatile (
         "move       %[tmp1],    %[width]                   \n\t"
         "xor        %[ftmp0],   %[ftmp0],    %[ftmp0]      \n\t"
-        "gsldlc1    %[filter1], 0x03(%[filter])            \n\t"
-        "gsldrc1    %[filter1], 0x00(%[filter])            \n\t"
-        "gsldlc1    %[filter2], 0x0b(%[filter])            \n\t"
-        "gsldrc1    %[filter2], 0x08(%[filter])            \n\t"
+        MMI_LDLRC1(%[filter1], %[filter], 0x00, 0x03)
+        MMI_LDLRC1(%[filter2], %[filter], 0x08, 0x03)
         "li         %[tmp0],    0x07                       \n\t"
         "dmtc1      %[tmp0],    %[ftmp13]                  \n\t"
         "punpcklwd  %[ftmp13],  %[ftmp13],   %[ftmp13]     \n\t"
         "1:                                                \n\t"
         /* Get 8 data per row */
-        "gsldlc1    %[ftmp5],   0x07(%[src])               \n\t"
-        "gsldrc1    %[ftmp5],   0x00(%[src])               \n\t"
-        "gsldlc1    %[ftmp7],   0x08(%[src])               \n\t"
-        "gsldrc1    %[ftmp7],   0x01(%[src])               \n\t"
-        "gsldlc1    %[ftmp9],   0x09(%[src])               \n\t"
-        "gsldrc1    %[ftmp9],   0x02(%[src])               \n\t"
-        "gsldlc1    %[ftmp11],  0x0A(%[src])               \n\t"
-        "gsldrc1    %[ftmp11],  0x03(%[src])               \n\t"
+        MMI_ULDC1(%[ftmp5], %[src], 0x00)
+        MMI_ULDC1(%[ftmp7], %[src], 0x01)
+        MMI_ULDC1(%[ftmp9], %[src], 0x02)
+        MMI_ULDC1(%[ftmp11], %[src], 0x03)
         "punpcklbh  %[ftmp4],   %[ftmp5],    %[ftmp0]      \n\t"
         "punpckhbh  %[ftmp5],   %[ftmp5],    %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp6],   %[ftmp7],    %[ftmp0]      \n\t"
@@ -127,7 +122,8 @@ static void convolve_horiz_mmi(const uint8_t *src, int32_t src_stride,
         PTR_ADDU   "%[dst],     %[dst],      %[dst_stride] \n\t"
         PTR_ADDIU  "%[height],  %[height],   -0x01         \n\t"
         "bnez       %[height],  1b                         \n\t"
-        : [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
+        : RESTRICT_ASM_ALL64
+          [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
           [filter1]"=&f"(ftmp[2]),  [filter2]"=&f"(ftmp[3]),
           [ftmp0]"=&f"(ftmp[4]),    [ftmp4]"=&f"(ftmp[5]),
           [ftmp5]"=&f"(ftmp[6]),    [ftmp6]"=&f"(ftmp[7]),
@@ -153,15 +149,14 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride,
     double ftmp[17];
     uint32_t tmp[1];
     ptrdiff_t addr = src_stride;
+    DECLARE_VAR_ALL64;
     src_stride -= w;
     dst_stride -= w;
 
     __asm__ volatile (
         "xor        %[ftmp0],    %[ftmp0],   %[ftmp0]      \n\t"
-        "gsldlc1    %[ftmp4],    0x03(%[filter])           \n\t"
-        "gsldrc1    %[ftmp4],    0x00(%[filter])           \n\t"
-        "gsldlc1    %[ftmp5],    0x0b(%[filter])           \n\t"
-        "gsldrc1    %[ftmp5],    0x08(%[filter])           \n\t"
+        MMI_LDLRC1(%[ftmp4], %[filter], 0x00, 0x03)
+        MMI_LDLRC1(%[ftmp5], %[filter], 0x08, 0x03)
         "punpcklwd  %[filter10], %[ftmp4],   %[ftmp4]      \n\t"
         "punpckhwd  %[filter32], %[ftmp4],   %[ftmp4]      \n\t"
         "punpcklwd  %[filter54], %[ftmp5],   %[ftmp5]      \n\t"
@@ -171,29 +166,21 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride,
         "punpcklwd  %[ftmp13],   %[ftmp13],  %[ftmp13]     \n\t"
         "1:                                                \n\t"
         /* Get 8 data per column */
-        "gsldlc1    %[ftmp4],    0x07(%[src])              \n\t"
-        "gsldrc1    %[ftmp4],    0x00(%[src])              \n\t"
+        MMI_ULDC1(%[ftmp4], %[src], 0x0)
         PTR_ADDU   "%[tmp0],     %[src],     %[addr]       \n\t"
-        "gsldlc1    %[ftmp5],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp5],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp5], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp6],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp6],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp6], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp7],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp7],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp7], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp8],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp8],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp8], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp9],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp9],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp9], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp10],   0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp10],   0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp10], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp11],   0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp11],   0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp11], %[tmp0], 0x0)
         "punpcklbh  %[ftmp4],    %[ftmp4],   %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp5],    %[ftmp5],   %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp6],    %[ftmp6],   %[ftmp0]      \n\t"
@@ -221,7 +208,8 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride,
         PTR_ADDU   "%[dst],      %[dst],     %[dst_stride] \n\t"
         PTR_ADDIU  "%[height],   %[height],  -0x01         \n\t"
         "bnez       %[height],   1b                        \n\t"
-        : [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
+        : RESTRICT_ASM_ALL64
+          [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
           [filter10]"=&f"(ftmp[2]), [filter32]"=&f"(ftmp[3]),
           [filter54]"=&f"(ftmp[4]), [filter76]"=&f"(ftmp[5]),
           [ftmp0]"=&f"(ftmp[6]),    [ftmp4]"=&f"(ftmp[7]),
@@ -247,6 +235,7 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride,
 {
     double ftmp[15];
     uint32_t tmp[2];
+    DECLARE_VAR_ALL64;
     src -= 3;
     src_stride -= w;
     dst_stride -= w;
@@ -254,23 +243,17 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride,
     __asm__ volatile (
         "move       %[tmp1],    %[width]                   \n\t"
         "xor        %[ftmp0],   %[ftmp0],    %[ftmp0]      \n\t"
-        "gsldlc1    %[filter1], 0x03(%[filter])            \n\t"
-        "gsldrc1    %[filter1], 0x00(%[filter])            \n\t"
-        "gsldlc1    %[filter2], 0x0b(%[filter])            \n\t"
-        "gsldrc1    %[filter2], 0x08(%[filter])            \n\t"
+        MMI_LDLRC1(%[filter1], %[filter], 0x00, 0x03)
+        MMI_LDLRC1(%[filter2], %[filter], 0x08, 0x03)
         "li         %[tmp0],    0x07                       \n\t"
         "dmtc1      %[tmp0],    %[ftmp13]                  \n\t"
         "punpcklwd  %[ftmp13],  %[ftmp13],   %[ftmp13]     \n\t"
         "1:                                                \n\t"
         /* Get 8 data per row */
-        "gsldlc1    %[ftmp5],   0x07(%[src])               \n\t"
-        "gsldrc1    %[ftmp5],   0x00(%[src])               \n\t"
-        "gsldlc1    %[ftmp7],   0x08(%[src])               \n\t"
-        "gsldrc1    %[ftmp7],   0x01(%[src])               \n\t"
-        "gsldlc1    %[ftmp9],   0x09(%[src])               \n\t"
-        "gsldrc1    %[ftmp9],   0x02(%[src])               \n\t"
-        "gsldlc1    %[ftmp11],  0x0A(%[src])               \n\t"
-        "gsldrc1    %[ftmp11],  0x03(%[src])               \n\t"
+        MMI_ULDC1(%[ftmp5], %[src], 0x00)
+        MMI_ULDC1(%[ftmp7], %[src], 0x01)
+        MMI_ULDC1(%[ftmp9], %[src], 0x02)
+        MMI_ULDC1(%[ftmp11], %[src], 0x03)
         "punpcklbh  %[ftmp4],   %[ftmp5],    %[ftmp0]      \n\t"
         "punpckhbh  %[ftmp5],   %[ftmp5],    %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp6],   %[ftmp7],    %[ftmp0]      \n\t"
@@ -289,8 +272,7 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride,
         "packsswh   %[srcl],    %[srcl],     %[srch]       \n\t"
         "packushb   %[ftmp12],  %[srcl],     %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp12],  %[ftmp12],   %[ftmp0]      \n\t"
-        "gsldlc1    %[ftmp4],   0x07(%[dst])               \n\t"
-        "gsldrc1    %[ftmp4],   0x00(%[dst])               \n\t"
+        MMI_ULDC1(%[ftmp4], %[dst], 0x0)
         "punpcklbh  %[ftmp4],   %[ftmp4],    %[ftmp0]      \n\t"
         "paddh      %[ftmp12],  %[ftmp12],   %[ftmp4]      \n\t"
         "li         %[tmp0],    0x10001                    \n\t"
@@ -309,7 +291,8 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride,
         PTR_ADDU   "%[dst],     %[dst],      %[dst_stride] \n\t"
         PTR_ADDIU  "%[height],  %[height],   -0x01         \n\t"
         "bnez       %[height],  1b                         \n\t"
-        : [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
+        : RESTRICT_ASM_ALL64
+          [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
           [filter1]"=&f"(ftmp[2]),  [filter2]"=&f"(ftmp[3]),
           [ftmp0]"=&f"(ftmp[4]),    [ftmp4]"=&f"(ftmp[5]),
           [ftmp5]"=&f"(ftmp[6]),    [ftmp6]"=&f"(ftmp[7]),
@@ -335,15 +318,14 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride,
     double ftmp[17];
     uint32_t tmp[1];
     ptrdiff_t addr = src_stride;
+    DECLARE_VAR_ALL64;
     src_stride -= w;
     dst_stride -= w;
 
     __asm__ volatile (
         "xor        %[ftmp0],    %[ftmp0],   %[ftmp0]      \n\t"
-        "gsldlc1    %[ftmp4],    0x03(%[filter])           \n\t"
-        "gsldrc1    %[ftmp4],    0x00(%[filter])           \n\t"
-        "gsldlc1    %[ftmp5],    0x0b(%[filter])           \n\t"
-        "gsldrc1    %[ftmp5],    0x08(%[filter])           \n\t"
+        MMI_LDLRC1(%[ftmp4], %[filter], 0x00, 0x03)
+        MMI_LDLRC1(%[ftmp5], %[filter], 0x08, 0x03)
         "punpcklwd  %[filter10], %[ftmp4],   %[ftmp4]      \n\t"
         "punpckhwd  %[filter32], %[ftmp4],   %[ftmp4]      \n\t"
         "punpcklwd  %[filter54], %[ftmp5],   %[ftmp5]      \n\t"
@@ -353,29 +335,21 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride,
         "punpcklwd  %[ftmp13],   %[ftmp13],  %[ftmp13]     \n\t"
         "1:                                                \n\t"
         /* Get 8 data per column */
-        "gsldlc1    %[ftmp4],    0x07(%[src])              \n\t"
-        "gsldrc1    %[ftmp4],    0x00(%[src])              \n\t"
+        MMI_ULDC1(%[ftmp4], %[src], 0x0)
         PTR_ADDU   "%[tmp0],     %[src],     %[addr]       \n\t"
-        "gsldlc1    %[ftmp5],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp5],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp5], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp6],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp6],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp6], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp7],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp7],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp7], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp8],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp8],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp8], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp9],    0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp9],    0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp9], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp10],   0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp10],   0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp10], %[tmp0], 0x0)
         PTR_ADDU   "%[tmp0],     %[tmp0],    %[addr]       \n\t"
-        "gsldlc1    %[ftmp11],   0x07(%[tmp0])             \n\t"
-        "gsldrc1    %[ftmp11],   0x00(%[tmp0])             \n\t"
+        MMI_ULDC1(%[ftmp11], %[tmp0], 0x0)
         "punpcklbh  %[ftmp4],    %[ftmp4],   %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp5],    %[ftmp5],   %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp6],    %[ftmp6],   %[ftmp0]      \n\t"
@@ -394,8 +368,7 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride,
         "packsswh   %[srcl],     %[srcl],    %[srch]       \n\t"
         "packushb   %[ftmp12],   %[srcl],    %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp12],   %[ftmp12],  %[ftmp0]      \n\t"
-        "gsldlc1    %[ftmp4],    0x07(%[dst])              \n\t"
-        "gsldrc1    %[ftmp4],    0x00(%[dst])              \n\t"
+        MMI_ULDC1(%[ftmp4], %[dst], 0x00)
         "punpcklbh  %[ftmp4],    %[ftmp4],   %[ftmp0]      \n\t"
         "paddh      %[ftmp12],   %[ftmp12],  %[ftmp4]      \n\t"
         "li         %[tmp0],     0x10001                   \n\t"
@@ -414,7 +387,8 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride,
         PTR_ADDU   "%[dst],      %[dst],     %[dst_stride] \n\t"
         PTR_ADDIU  "%[height],   %[height],  -0x01         \n\t"
         "bnez       %[height],   1b                        \n\t"
-        : [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
+        : RESTRICT_ASM_ALL64
+          [srcl]"=&f"(ftmp[0]),     [srch]"=&f"(ftmp[1]),
           [filter10]"=&f"(ftmp[2]), [filter32]"=&f"(ftmp[3]),
           [filter54]"=&f"(ftmp[4]), [filter76]"=&f"(ftmp[5]),
           [ftmp0]"=&f"(ftmp[6]),    [ftmp4]"=&f"(ftmp[7]),
@@ -439,6 +413,7 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride,
 {
     double ftmp[4];
     uint32_t tmp[2];
+    DECLARE_VAR_ALL64;
     src_stride -= w;
     dst_stride -= w;
 
@@ -449,10 +424,8 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride,
         "dmtc1      %[tmp0],    %[ftmp3]                  \n\t"
         "punpcklhw  %[ftmp3],   %[ftmp3],   %[ftmp3]      \n\t"
         "1:                                               \n\t"
-        "gslwlc1    %[ftmp1],   0x07(%[src])              \n\t"
-        "gslwrc1    %[ftmp1],   0x00(%[src])              \n\t"
-        "gslwlc1    %[ftmp2],   0x07(%[dst])              \n\t"
-        "gslwrc1    %[ftmp2],   0x00(%[dst])              \n\t"
+        MMI_ULDC1(%[ftmp1], %[src], 0x00)
+        MMI_ULDC1(%[ftmp2], %[dst], 0x00)
         "punpcklbh  %[ftmp1],   %[ftmp1],   %[ftmp0]      \n\t"
         "punpcklbh  %[ftmp2],   %[ftmp2],   %[ftmp0]      \n\t"
         "paddh      %[ftmp1],   %[ftmp1],   %[ftmp2]      \n\t"
@@ -469,7 +442,8 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride,
         PTR_ADDU   "%[src],     %[src],     %[src_stride] \n\t"
         PTR_ADDIU  "%[height],  %[height],  -0x01         \n\t"
         "bnez       %[height],  1b                        \n\t"
-        : [ftmp0]"=&f"(ftmp[0]),  [ftmp1]"=&f"(ftmp[1]),
+        : RESTRICT_ASM_ALL64
+          [ftmp0]"=&f"(ftmp[0]),  [ftmp1]"=&f"(ftmp[1]),
           [ftmp2]"=&f"(ftmp[2]),  [ftmp3]"=&f"(ftmp[3]),
           [tmp0]"=&r"(tmp[0]),    [tmp1]"=&r"(tmp[1]),
           [src]"+&r"(src),        [dst]"+&r"(dst),

From patchwork Fri Feb 19 05:28:34 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jiaxun Yang <jiaxun.yang@flygoat.com>
X-Patchwork-Id: 25780
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 291F544ADFD
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 19 Feb 2021 07:29:12 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0EE78680534;
	Fri, 19 Feb 2021 07:29:12 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com
 [66.111.4.26])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 27EA9680534
 for <ffmpeg-devel@ffmpeg.org>; Fri, 19 Feb 2021 07:29:06 +0200 (EET)
Received: from compute6.internal (compute6.nyi.internal [10.202.2.46])
 by mailout.nyi.internal (Postfix) with ESMTP id 33E0D5C011C;
 Fri, 19 Feb 2021 00:29:05 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163])
 by compute6.internal (MEProxy); Fri, 19 Feb 2021 00:29:05 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding; s=fm1; bh=hBUPjOdcm+y1o
 B1G37fQEeHfNyvu8R+KyrVB41Tw3d4=; b=ezoGGJ21OfHkDOL6MganHX0gMUdf/
 U5KqXjjvEA/CQqXB7I4zjVrvoBEpmHDbgU5iGd2pk4pLQtrg+Y+dwkMYd8fx6WwR
 Ht+2x/O4K4jae8BzxUTu6yjeM0cuTGOPVs3coL/Cs6kQG0T6wwLzFNmn87Rux2rM
 uJWvpfGYghYFy6rxm4lSrDnXnfF/iP+G3eQPe2p69+OLLpFVLx2WSAiqscFT9vfW
 q5hk0S4PUnQiR+N6DXjQKysG92lDHt1R6XaiI6PwFeobgqUFOIgN2GdTS7Hprn/d
 +fxpZ60nuuJjZrM1JNV9/25veModC6e7/k5ms5mWP1rxV/xD03Vl351Kw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:date:from
 :in-reply-to:message-id:mime-version:references:subject:to
 :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=
 fm2; bh=hBUPjOdcm+y1oB1G37fQEeHfNyvu8R+KyrVB41Tw3d4=; b=BesLM+wA
 aRXG5gIla4ZUhQFf0ceLJnmzv71vFs5Cz7YbGW4nxpCjr1txVtImqb2TA2N4LSH6
 TqKbUqTugrkLkKemUDXf8S6qznx26MvVOqzFe8YRb/bhtkdrhCcckLvF/LvFeH2o
 7LiE51GLIeBxTLjEXWhACygNID4osbqiiCXAHAnCViXiuzNknBopwmUuBx1n6uHI
 V+TYhJuhaaCC8evRn0IxuFS/YPT3btx8s/uNXNqI+n6eaECNxpl2ReX9jcr7FUyt
 3gF7jwagAnnHuL4c8ECFw1yeABxid9cXFQd2GrkY5MIQF4vND/rR5T3VElIoQIY1
 1jO0vhgQsI1tlQ==
X-ME-Sender: <xms:oUwvYGSpn_ObkYkwA62FYxlS77BZsjS1mNWe8UYOfb0iN6DEKGywuA>
 <xme:oUwvYFaq7-jwzVAGu8ixzGJakxYmloHTUZsQF7A3dXDiThCQnYadVuQTndiesJga4
 15iFgserKVIBxEgOs4>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd
 ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes
 fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke
 dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef
 ledrudeiudenucevlhhushhtvghrufhiiigvpedvnecurfgrrhgrmhepmhgrihhlfhhroh
 hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm
X-ME-Proxy: <xmx:oUwvYD3CLs6gfTdMpIKFmTnb903vpEmIt7Y4do1QWEvvGZx_yrw5pA>
 <xmx:oUwvYFWKJffnwMBE9MEz1HmmZXl7pnWPcJjRvYdfBMwaOqwZftxJ1Q>
 <xmx:oUwvYNXHahq45wpQVUxx5s3pGbCdO3n7dSXoeFZYEaHedpLLApNpTQ>
 <xmx:oUwvYFhxYwX5et6wi9C0QbKjlikU_4aesgPSLNnal3breF-S2saHZg>
Received: from strike.202.net.flygoat.com (unknown [183.157.39.161])
 by mail.messagingengine.com (Postfix) with ESMTPA id 2F9D31080057;
 Fri, 19 Feb 2021 00:29:02 -0500 (EST)
From: Jiaxun Yang <jiaxun.yang@flygoat.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 19 Feb 2021 13:28:34 +0800
Message-Id: <20210219052834.533558-5-jiaxun.yang@flygoat.com>
X-Mailer: git-send-email 2.30.1
In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
References: <20210219052834.533558-1-jiaxun.yang@flygoat.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 4/4] avutil/mips: Use $at as MMI macro
	temporary register
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn,
 Jiaxun Yang <jiaxun.yang@flygoat.com>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Some function had exceed 30 inline assembly register oprands limiation
when using LOONGSON2 version of MMI macros. We can avoid that by take
$at, which is register reserved for assembler, as temporary register.

As none of instructions used in these macros is pseudo, it is safe to
utilize $at here.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
 libavutil/mips/mmiutils.h | 115 +++++++++++++++++++++++---------------
 1 file changed, 69 insertions(+), 46 deletions(-)

diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h
index 3994085057..7b7b405ddf 100644
--- a/libavutil/mips/mmiutils.h
+++ b/libavutil/mips/mmiutils.h
@@ -27,78 +27,107 @@
 #include "config.h"
 #include "libavutil/mips/asmdefs.h"
 
-#if HAVE_LOONGSON2
+/* 
+ * These were used to define temporary registers for MMI marcos
+ * however now we're using $at. They're theoretically unnecessary
+ * but just leave them here to avoid mess.
+ */
+#define DECLARE_VAR_LOW32
+#define RESTRICT_ASM_LOW32
+#define DECLARE_VAR_ALL64
+#define RESTRICT_ASM_ALL64
+#define DECLARE_VAR_ADDRT
+#define RESTRICT_ASM_ADDRT
 
-#define DECLARE_VAR_LOW32       int32_t low32
-#define RESTRICT_ASM_LOW32      [low32]"=&r"(low32),
-#define DECLARE_VAR_ALL64       int64_t all64
-#define RESTRICT_ASM_ALL64      [all64]"=&r"(all64),
-#define DECLARE_VAR_ADDRT       mips_reg addrt
-#define RESTRICT_ASM_ADDRT      [addrt]"=&r"(addrt),
+#if HAVE_LOONGSON2
 
 #define MMI_LWX(reg, addr, stride, bias)                                    \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    "lw         "#reg",     "#bias"(%[addrt])                       \n\t"
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    "lw         "#reg",     "#bias"($at)                       \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_SWX(reg, addr, stride, bias)                                    \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    "sw         "#reg",     "#bias"(%[addrt])                       \n\t"
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    "sw         "#reg",     "#bias"($at)                       \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_LDX(reg, addr, stride, bias)                                    \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    "ld         "#reg",     "#bias"(%[addrt])                       \n\t"
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    "ld         "#reg",     "#bias"($at)                       \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_SDX(reg, addr, stride, bias)                                    \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    "sd         "#reg",     "#bias"(%[addrt])                       \n\t"
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    "sd         "#reg",     "#bias"($at)                       \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_LWC1(fp, addr, bias)                                            \
     "lwc1       "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_LWLRC1(fp, addr, bias, off)                                     \
-    "lwl        %[low32],   "#bias"+"#off"("#addr")                 \n\t"   \
-    "lwr        %[low32],   "#bias"("#addr")                        \n\t"   \
-    "mtc1       %[low32],   "#fp"                                   \n\t"
+    ".set noat                                                 \n\t"   \
+    "lwl        $at,   "#bias"+"#off"("#addr")                 \n\t"   \
+    "lwr        $at,   "#bias"("#addr")                        \n\t"   \
+    "mtc1       $at,   "#fp"                                   \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_LWXC1(fp, addr, stride, bias)                                   \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    MMI_LWC1(fp, %[addrt], bias)
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    MMI_LWC1(fp, $at, bias)                                            \
+    ".set at                                                   \n\t"
 
 #define MMI_SWC1(fp, addr, bias)                                            \
     "swc1       "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_SWLRC1(fp, addr, bias, off)                                           \
-    "mfc1       %[low32],   "#fp"                                   \n\t"   \
-    "swl        %[low32],   "#bias"+"#off"("#addr")                 \n\t"   \
-    "swr        %[low32],   "#bias"("#addr")                        \n\t"
+    ".set noat                                                 \n\t"   \
+    "mfc1       $at,   "#fp"                                   \n\t"   \
+    "swl        $at,   "#bias"+"#off"("#addr")                 \n\t"   \
+    "swr        $at,   "#bias"("#addr")                        \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_SWXC1(fp, addr, stride, bias)                                   \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    MMI_SWC1(fp, %[addrt], bias)
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    MMI_SWC1(fp, $at, bias)                                           \
+    ".set at                                                   \n\t"
 
 #define MMI_LDC1(fp, addr, bias)                                            \
     "ldc1       "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_LDLRC1(fp, addr, bias, off)                                     \
-    "ldl        %[all64],   "#bias"+"#off"("#addr")                 \n\t"   \
-    "ldr        %[all64],   "#bias"("#addr")                        \n\t"   \
-    "dmtc1      %[all64],   "#fp"                                   \n\t"
+    ".set noat                                                 \n\t"   \
+    "ldl        $at,   "#bias"+"#off"("#addr")                 \n\t"   \
+    "ldr        $at,   "#bias"("#addr")                        \n\t"   \
+    "dmtc1      $at,   "#fp"                                   \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_LDXC1(fp, addr, stride, bias)                                   \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    MMI_LDC1(fp, %[addrt], bias)
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    MMI_LDC1(fp, $at, bias)                                           \
+    ".set at                                                   \n\t"
 
 #define MMI_SDC1(fp, addr, bias)                                            \
     "sdc1       "#fp",      "#bias"("#addr")                        \n\t"
 
 #define MMI_SDLRC1(fp, addr, bias, off)                                           \
-    "dmfc1      %[all64],   "#fp"                                   \n\t"   \
-    "sdl        %[all64],   "#bias"+"#off"("#addr")                 \n\t"   \
-    "sdr        %[all64],   "#bias"("#addr")                        \n\t"
+    ".set noat                                                 \n\t"   \
+    "dmfc1      $at,   "#fp"                                   \n\t"   \
+    "sdl        $at,   "#bias"+"#off"("#addr")                 \n\t"   \
+    "sdr        $at,   "#bias"("#addr")                        \n\t"   \
+    ".set at                                                   \n\t"
 
 #define MMI_SDXC1(fp, addr, stride, bias)                                   \
-    PTR_ADDU    "%[addrt],  "#addr",    "#stride"                   \n\t"   \
-    MMI_SDC1(fp, %[addrt], bias)
+    ".set noat                                                 \n\t"   \
+    PTR_ADDU    "$at,  "#addr",    "#stride"                   \n\t"   \
+    MMI_SDC1(fp, $at, bias)                                            \
+    ".set at                                                   \n\t"
 
 #define MMI_LQ(reg1, reg2, addr, bias)                                      \
     "ld         "#reg1",    "#bias"("#addr")                        \n\t"   \
@@ -118,11 +147,6 @@
 
 #elif HAVE_LOONGSON3 /* !HAVE_LOONGSON2 */
 
-#define DECLARE_VAR_ALL64
-#define RESTRICT_ASM_ALL64
-#define DECLARE_VAR_ADDRT
-#define RESTRICT_ASM_ADDRT
-
 #define MMI_LWX(reg, addr, stride, bias)                                    \
     "gslwx      "#reg",     "#bias"("#addr", "#stride")             \n\t"
 
@@ -140,13 +164,12 @@
 
 #if _MIPS_SIM == _ABIO32 /* workaround for 3A2000 gslwlc1 bug */
 
-#define DECLARE_VAR_LOW32       int32_t low32
-#define RESTRICT_ASM_LOW32      [low32]"=&r"(low32),
-
 #define MMI_LWLRC1(fp, addr, bias, off)                                     \
-    "lwl        %[low32],   "#bias"+"#off"("#addr")                 \n\t"   \
-    "lwr        %[low32],   "#bias"("#addr")                        \n\t"   \
-    "mtc1       %[low32],   "#fp"                                      \n\t"
+    ".set noat                                                 \n\t"   \
+    "lwl        $at,   "#bias"+"#off"("#addr")                 \n\t"   \
+    "lwr        $at,   "#bias"("#addr")                        \n\t"   \
+    "mtc1       $at,   "#fp"                                   \n\t"   \
+    ".set at                                                   \n\t"
 
 #else /* _MIPS_SIM != _ABIO32 */