From patchwork Fri Feb 19 05:28:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaxun Yang X-Patchwork-Id: 25777 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 0E08944A97A for ; Fri, 19 Feb 2021 07:29:02 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E69E368A0FC; Fri, 19 Feb 2021 07:29:01 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D6A75689FD9 for ; Fri, 19 Feb 2021 07:28:54 +0200 (EET) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id F03E45C0117; Fri, 19 Feb 2021 00:28:53 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Fri, 19 Feb 2021 00:28:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm1; bh=QyuvEWTr+DoxJ rbFVoHRpgPJ6fmIWsQnLNwLE2V4p4I=; b=SqB5FaOPA+5xPYEYFV2jdCSwG1g6g j6h6N7lVvx0khEbFg5fK425GmUsPhrLyJFCQ4mxzEkXwkUWeGNWq6saAeZ/v7Zyz Mc+Iate99GPZVOPsQAzTm6sWtc0IqMGEySTGvW8OTYyyFpzsD0qxhJmbbqTqqRWd o1gJlT91X4UcFKUIq5hrC1o1y72CKul8IafjJaJqhIC18gEvLYgzrZyCQz3F1onJ 67RQqRAQ9/66qsFaucfayMoH1L0BQIq/G/ja5Vkb99FkPzEDULCR48zTxJbxWmgc XIGuQMYxyGduZolBA135DR2lqH816SbG0tI3tYtPKwIhnqiyfBs213NbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=QyuvEWTr+DoxJrbFVoHRpgPJ6fmIWsQnLNwLE2V4p4I=; b=suAq4RoH q60+HuIntKrlvxgnyrADtoJg01twfWlI922zMFnalRr4Dn6LSe0C1H1SFYvwJ3D3 jITA+iqy6XhpUgBWhTXxDEB0WKUH3U61V+LRsIIQATMporStiPrygMHcvhw3wX+6 CFOjX0sjL3qlSituuT4Iw42wwijLRm/9YZPR8rCsIksMt4a1LVg0Tt65vVFo7U5I L6xRgmIGmXqUvIEzhX9hTopiifsQRtA0x0JouGQnOXFcFn2IxvnFhTrIYdeSEFYa REibZhPSmi6fXA6heMREzvmrhJwi+CreeiseOn+82Q0mFt8bmCobvgF9r8EcRthy ebGJj2f5FCODnQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef ledrudeiudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm X-ME-Proxy: Received: from strike.202.net.flygoat.com (unknown [183.157.39.161]) by mail.messagingengine.com (Postfix) with ESMTPA id 66CCC108005B; Fri, 19 Feb 2021 00:28:51 -0500 (EST) From: Jiaxun Yang To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 13:28:31 +0800 Message-Id: <20210219052834.533558-2-jiaxun.yang@flygoat.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com> References: <20210219052834.533558-1-jiaxun.yang@flygoat.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] avutil/mips: Use MMI_{L, S}QC1 macro in {SAVE, RECOVER}_REG X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn, Jiaxun Yang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" {SAVE,RECOVER}_REG will be avilable for Loongson2 again, also comment about the magic. Signed-off-by: Jiaxun Yang --- libavutil/mips/mmiutils.h | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h index 8f692e86c5..fb85a4dd1b 100644 --- a/libavutil/mips/mmiutils.h +++ b/libavutil/mips/mmiutils.h @@ -202,25 +202,27 @@ #endif /* HAVE_LOONGSON2 */ /** - * backup register + * Backup saved registers + * We're not using compiler's clobber list as it's not smart enough + * to take advantage of quad word load/store. */ #define BACKUP_REG \ LOCAL_ALIGNED_16(double, temp_backup_reg, [8]); \ if (_MIPS_SIM == _ABI64) \ __asm__ volatile ( \ - "gssqc1 $f25, $f24, 0x00(%[temp]) \n\t" \ - "gssqc1 $f27, $f26, 0x10(%[temp]) \n\t" \ - "gssqc1 $f29, $f28, 0x20(%[temp]) \n\t" \ - "gssqc1 $f31, $f30, 0x30(%[temp]) \n\t" \ + MMI_SQC1($f25, $f24, %[temp], 0x00) \ + MMI_SQC1($f27, $f26, %[temp], 0x10) \ + MMI_SQC1($f29, $f28, %[temp], 0x20) \ + MMI_SQC1($f31, $f30, %[temp], 0x30) \ : \ : [temp]"r"(temp_backup_reg) \ : "memory" \ ); \ else \ __asm__ volatile ( \ - "gssqc1 $f22, $f20, 0x00(%[temp]) \n\t" \ - "gssqc1 $f26, $f24, 0x10(%[temp]) \n\t" \ - "gssqc1 $f30, $f28, 0x20(%[temp]) \n\t" \ + MMI_SQC1($f22, $f20, %[temp], 0x10) \ + MMI_SQC1($f26, $f24, %[temp], 0x10) \ + MMI_SQC1($f30, $f28, %[temp], 0x20) \ : \ : [temp]"r"(temp_backup_reg) \ : "memory" \ @@ -232,19 +234,19 @@ #define RECOVER_REG \ if (_MIPS_SIM == _ABI64) \ __asm__ volatile ( \ - "gslqc1 $f25, $f24, 0x00(%[temp]) \n\t" \ - "gslqc1 $f27, $f26, 0x10(%[temp]) \n\t" \ - "gslqc1 $f29, $f28, 0x20(%[temp]) \n\t" \ - "gslqc1 $f31, $f30, 0x30(%[temp]) \n\t" \ + MMI_LQC1($f25, $f24, %[temp], 0x00) \ + MMI_LQC1($f27, $f26, %[temp], 0x10) \ + MMI_LQC1($f29, $f28, %[temp], 0x20) \ + MMI_LQC1($f31, $f30, %[temp], 0x30) \ : \ : [temp]"r"(temp_backup_reg) \ : "memory" \ ); \ else \ __asm__ volatile ( \ - "gslqc1 $f22, $f20, 0x00(%[temp]) \n\t" \ - "gslqc1 $f26, $f24, 0x10(%[temp]) \n\t" \ - "gslqc1 $f30, $f28, 0x20(%[temp]) \n\t" \ + MMI_LQC1($f22, $f20, %[temp], 0x10) \ + MMI_LQC1($f26, $f24, %[temp], 0x10) \ + MMI_LQC1($f30, $f28, %[temp], 0x20) \ : \ : [temp]"r"(temp_backup_reg) \ : "memory" \ From patchwork Fri Feb 19 05:28:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaxun Yang X-Patchwork-Id: 25778 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 0D83A44ADFD for ; Fri, 19 Feb 2021 07:29:05 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E65E868A17B; Fri, 19 Feb 2021 07:29:04 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E6B65680534 for ; Fri, 19 Feb 2021 07:28:58 +0200 (EET) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 00E285C0129; Fri, 19 Feb 2021 00:28:58 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Fri, 19 Feb 2021 00:28:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm1; bh=lyLBFKJSsiUDV jV10UTWKoiXdECTNkmamf0PZ1G2TxY=; b=p81OIMGwo8j0dKG5E6KxibcSZB/hI Arwy4O57A+4NJjkH83DE0tWIK6mJYc2U6sMztZNXSBRlfXggJHKDOYvUQ4dZVkka CtCLxZm6/DJlukGccf4CBM4A6bvAUrqXWp0UJ9HctQeuXL+8JYvBqoaGtjETihIU HEVFJu3oA6T34VvsluDuGiDm/Uki07B3PPplK5u0qkM7u3VO7mu+s7kTXb6QyQlr bJDMLjLtu1JVyNJREe0QADNLUjcgbijpeqDSDpzV35B7uiumvUOv/mMeUvKneoSX ThYjLeTK2t9YKSkph7qw57lVMJXTQ45O+l7FVw07BLbau1faLqKjxL5rg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=lyLBFKJSsiUDVjV10UTWKoiXdECTNkmamf0PZ1G2TxY=; b=fqD48/Wm 5GvUYNvHW7cMONZ2afisUdKL2OkWU0EjxlFIgz1cHZb3lKfDS9TvGbu4GButtFzr VtaXTg64XsMnn/YCd7WAbF+WI1KuLMphHT4SqTl9PeRnfxQZgUYEhKm0oaBYf6jX fT/lKIpNofonI1AC9CQeNp9xFjWKMjgHc1Clae8k0BXYQFldokJjPquhiFW5ufNR 3oCx4jFz3RCcV7qu0RUIPgVBn/LiXBActKs2QuGJEp9xexuApINvCujIMA4zni7s owI+5UYAqsMULKwb6PVMLQw3EIB4kyeaW7SYNxpPBugkSaikp/brOslbQ8z+/f5C XWS23P43LM7xMA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef ledrudeiudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm X-ME-Proxy: Received: from strike.202.net.flygoat.com (unknown [183.157.39.161]) by mail.messagingengine.com (Postfix) with ESMTPA id 645B91080057; Fri, 19 Feb 2021 00:28:54 -0500 (EST) From: Jiaxun Yang To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 13:28:32 +0800 Message-Id: <20210219052834.533558-3-jiaxun.yang@flygoat.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com> References: <20210219052834.533558-1-jiaxun.yang@flygoat.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] avutil/mips: Extract load store with shift C1 pair marco X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn, Jiaxun Yang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" We're doing some fancy hacks with load store with shift C1 beside unaligned load store. Create a marco for l/r pair to allow us use it in these places. Signed-off-by: Jiaxun Yang --- libavutil/mips/mmiutils.h | 49 ++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 19 deletions(-) diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h index fb85a4dd1b..3994085057 100644 --- a/libavutil/mips/mmiutils.h +++ b/libavutil/mips/mmiutils.h @@ -55,8 +55,9 @@ #define MMI_LWC1(fp, addr, bias) \ "lwc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_ULWC1(fp, addr, bias) \ - "ulw %[low32], "#bias"("#addr") \n\t" \ +#define MMI_LWLRC1(fp, addr, bias, off) \ + "lwl %[low32], "#bias"+"#off"("#addr") \n\t" \ + "lwr %[low32], "#bias"("#addr") \n\t" \ "mtc1 %[low32], "#fp" \n\t" #define MMI_LWXC1(fp, addr, stride, bias) \ @@ -66,9 +67,10 @@ #define MMI_SWC1(fp, addr, bias) \ "swc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_USWC1(fp, addr, bias) \ +#define MMI_SWLRC1(fp, addr, bias, off) \ "mfc1 %[low32], "#fp" \n\t" \ - "usw %[low32], "#bias"("#addr") \n\t" + "swl %[low32], "#bias"+"#off"("#addr") \n\t" \ + "swr %[low32], "#bias"("#addr") \n\t" #define MMI_SWXC1(fp, addr, stride, bias) \ PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ @@ -77,8 +79,9 @@ #define MMI_LDC1(fp, addr, bias) \ "ldc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_ULDC1(fp, addr, bias) \ - "uld %[all64], "#bias"("#addr") \n\t" \ +#define MMI_LDLRC1(fp, addr, bias, off) \ + "ldl %[all64], "#bias"+"#off"("#addr") \n\t" \ + "ldr %[all64], "#bias"("#addr") \n\t" \ "dmtc1 %[all64], "#fp" \n\t" #define MMI_LDXC1(fp, addr, stride, bias) \ @@ -88,9 +91,10 @@ #define MMI_SDC1(fp, addr, bias) \ "sdc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_USDC1(fp, addr, bias) \ +#define MMI_SDLRC1(fp, addr, bias, off) \ "dmfc1 %[all64], "#fp" \n\t" \ - "usd %[all64], "#bias"("#addr") \n\t" + "sdl %[all64], "#bias"+"#off"("#addr") \n\t" \ + "sdr %[all64], "#bias"("#addr") \n\t" #define MMI_SDXC1(fp, addr, stride, bias) \ PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ @@ -139,17 +143,18 @@ #define DECLARE_VAR_LOW32 int32_t low32 #define RESTRICT_ASM_LOW32 [low32]"=&r"(low32), -#define MMI_ULWC1(fp, addr, bias) \ - "ulw %[low32], "#bias"("#addr") \n\t" \ - "mtc1 %[low32], "#fp" \n\t" +#define MMI_LWLRC1(fp, addr, bias, off) \ + "lwl %[low32], "#bias"+"#off"("#addr") \n\t" \ + "lwr %[low32], "#bias"("#addr") \n\t" \ + "mtc1 %[low32], "#fp" \n\t" #else /* _MIPS_SIM != _ABIO32 */ #define DECLARE_VAR_LOW32 #define RESTRICT_ASM_LOW32 -#define MMI_ULWC1(fp, addr, bias) \ - "gslwlc1 "#fp", 3+"#bias"("#addr") \n\t" \ +#define MMI_LWLRC1(fp, addr, bias, off) \ + "gslwlc1 "#fp", "#off"+"#bias"("#addr") \n\t" \ "gslwrc1 "#fp", "#bias"("#addr") \n\t" #endif /* _MIPS_SIM != _ABIO32 */ @@ -160,8 +165,8 @@ #define MMI_SWC1(fp, addr, bias) \ "swc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_USWC1(fp, addr, bias) \ - "gsswlc1 "#fp", 3+"#bias"("#addr") \n\t" \ +#define MMI_SWLRC1(fp, addr, bias, off) \ + "gsswlc1 "#fp", "#off"+"#bias"("#addr") \n\t" \ "gsswrc1 "#fp", "#bias"("#addr") \n\t" #define MMI_SWXC1(fp, addr, stride, bias) \ @@ -170,8 +175,8 @@ #define MMI_LDC1(fp, addr, bias) \ "ldc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_ULDC1(fp, addr, bias) \ - "gsldlc1 "#fp", 7+"#bias"("#addr") \n\t" \ +#define MMI_LDLRC1(fp, addr, bias, off) \ + "gsldlc1 "#fp", "#off"+"#bias"("#addr") \n\t" \ "gsldrc1 "#fp", "#bias"("#addr") \n\t" #define MMI_LDXC1(fp, addr, stride, bias) \ @@ -180,8 +185,8 @@ #define MMI_SDC1(fp, addr, bias) \ "sdc1 "#fp", "#bias"("#addr") \n\t" -#define MMI_USDC1(fp, addr, bias) \ - "gssdlc1 "#fp", 7+"#bias"("#addr") \n\t" \ +#define MMI_SDLRC1(fp, addr, bias, off) \ + "gssdlc1 "#fp", "#off"+"#bias"("#addr") \n\t" \ "gssdrc1 "#fp", "#bias"("#addr") \n\t" #define MMI_SDXC1(fp, addr, stride, bias) \ @@ -201,6 +206,12 @@ #endif /* HAVE_LOONGSON2 */ +#define MMI_ULWC1(fp, addr, bias) MMI_LWLRC1(fp, addr, bias, 3) +#define MMI_USWC1(fp, addr, bias) MMI_SWLRC1(fp, addr, bias, 3) + +#define MMI_ULDC1(fp, addr, bias) MMI_LDLRC1(fp, addr, bias, 7) +#define MMI_USDC1(fp, addr, bias) MMI_SDLRC1(fp, addr, bias, 7) + /** * Backup saved registers * We're not using compiler's clobber list as it's not smart enough From patchwork Fri Feb 19 05:28:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaxun Yang X-Patchwork-Id: 25779 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 3015444ADFD for ; Fri, 19 Feb 2021 07:29:10 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 17EDC68A1EB; Fri, 19 Feb 2021 07:29:10 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EB7F8680534 for ; Fri, 19 Feb 2021 07:29:03 +0200 (EET) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 099015C0117; Fri, 19 Feb 2021 00:29:03 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Fri, 19 Feb 2021 00:29:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm1; bh=GIv6Wg35V0Fh0 cLUTNRBACT81ULh4YmCwmRH0zELwuw=; b=t2x7l3gV4BliO2Jbfr6gSoLMzKUcR GueZKQufltrgbl9qHg2r0X2ddjXZwgKrdHO166DU/ex5vzbVWGpCnkI+qF8BiU2o YPkAQihffH8/2QwTOqmma2md1uSQsfdtMJ8c4j9JZPenBf3APJilgddWLjdnXg/T 85/7TNYAEc88rkT12QHSJY4oyWYDbiCae74PNyFZYNYOaQ/u4EEfh1yo3Qikw+Fx 5i/0HBjeRvSlxMOUAJ0EPSptylC9iz4Ei0HZUiISwH1oTWWYqafhAnMlbQ1VKAco f+68V52JnZRi+9RUD3qZOAmom/Bgiindw1tfpBP4sZbaXfj8B4HovSirw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=GIv6Wg35V0Fh0cLUTNRBACT81ULh4YmCwmRH0zELwuw=; b=cmtQzy0X L4zrae9hiiAqsyVrtPGAVvR7rnpF+YhKVz4fanRWkTJ9zxxXsbQ5PqZuGvGDioTH CA7PJFfpNwwbMHqWKFMsByWfw6tM0MX5nzeSuOhINOclUCs0Zu2qlFmzOmjg2FqS fi4MeeVljyv4xiuGSK1gMTgsRuFWrlCEmG3GTdz9tb+SrooHlrOEY3BvPOf4wyIQ 7eBcOz0LP9e/DK5/prhUW/I4PenY6V3GK8EF/ZlPW1e9aaduUOCH7Gy1dHU3pbgb JHQoKPO324qggMGW83orFbryk2ShT1EVQ8AKgPmYjWvk+KU7EUHR0KP2EX6F+akz Yg3jq8JxPcUFbg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef ledrudeiudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm X-ME-Proxy: Received: from strike.202.net.flygoat.com (unknown [183.157.39.161]) by mail.messagingengine.com (Postfix) with ESMTPA id 5D4FB1080057; Fri, 19 Feb 2021 00:28:58 -0500 (EST) From: Jiaxun Yang To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 13:28:33 +0800 Message-Id: <20210219052834.533558-4-jiaxun.yang@flygoat.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com> References: <20210219052834.533558-1-jiaxun.yang@flygoat.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] avcodec/mips: Use MMI marcos to replace Loongson3 instructions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn, Jiaxun Yang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Loongson3's extention instructions (prefixed with gs) are widely used in our MMI codebase. However, these instructions are not avilable on Loongson-2E/F while MMI code should work on these processors. Previously we introduced mmiutils marcos to provide backward compactbility but newly commited code didn't follow that. In this patch I revewed the codebase and converted all these instructions into MMI marcos to get Loongson2 supproted again. Signed-off-by: Jiaxun Yang --- libavcodec/mips/h264chroma_mmi.c | 26 +++- libavcodec/mips/h264dsp_mmi.c | 8 +- libavcodec/mips/hevcdsp_mmi.c | 251 ++++++++++++------------------ libavcodec/mips/hpeldsp_mmi.c | 1 + libavcodec/mips/simple_idct_mmi.c | 49 +++--- libavcodec/mips/vp3dsp_idct_mmi.c | 11 +- libavcodec/mips/vp8dsp_mmi.c | 100 +++++------- libavcodec/mips/vp9_mc_mmi.c | 128 ++++++--------- 8 files changed, 245 insertions(+), 329 deletions(-) diff --git a/libavcodec/mips/h264chroma_mmi.c b/libavcodec/mips/h264chroma_mmi.c index 739dd7d4d6..b6ea1ba3b1 100644 --- a/libavcodec/mips/h264chroma_mmi.c +++ b/libavcodec/mips/h264chroma_mmi.c @@ -32,6 +32,7 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int A = 64, B, C, D, E; double ftmp[12]; uint64_t tmp[1]; + DECLARE_VAR_ALL64; if (!(x || y)) { /* x=0, y=0, A=64 */ @@ -57,7 +58,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp3], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) @@ -152,7 +154,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp3], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), @@ -203,7 +206,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp1], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), @@ -272,7 +276,8 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp2], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), @@ -293,6 +298,7 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int A = 64, B, C, D, E; double ftmp[10]; uint64_t tmp[1]; + DECLARE_VAR_ALL64; if(!(x || y)){ /* x=0, y=0, A=64 */ @@ -314,7 +320,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "addi %[h], %[h], -0x02 \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) @@ -378,7 +385,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp1], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), @@ -429,7 +437,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp1], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), @@ -479,7 +488,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, MMI_SDC1(%[ftmp1], %[dst], 0x00) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" "bnez %[h], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), diff --git a/libavcodec/mips/h264dsp_mmi.c b/libavcodec/mips/h264dsp_mmi.c index 173e191c77..cb78d5a2f8 100644 --- a/libavcodec/mips/h264dsp_mmi.c +++ b/libavcodec/mips/h264dsp_mmi.c @@ -39,8 +39,8 @@ void ff_h264_add_pixels4_8_mmi(uint8_t *dst, int16_t *src, int stride) MMI_LDC1(%[ftmp3], %[src], 0x10) MMI_LDC1(%[ftmp4], %[src], 0x18) /* memset(src, 0, 32); */ - "gssqc1 %[ftmp0], %[ftmp0], 0x00(%[src]) \n\t" - "gssqc1 %[ftmp0], %[ftmp0], 0x10(%[src]) \n\t" + MMI_SQC1(%[ftmp0], %[ftmp0], %[src], 0x00) + MMI_SQC1(%[ftmp0], %[ftmp0], %[src], 0x10) MMI_ULWC1(%[ftmp5], %[dst0], 0x00) MMI_ULWC1(%[ftmp6], %[dst1], 0x00) MMI_ULWC1(%[ftmp7], %[dst2], 0x00) @@ -89,8 +89,8 @@ void ff_h264_idct_add_8_mmi(uint8_t *dst, int16_t *block, int stride) MMI_LDC1(%[ftmp3], %[block], 0x18) /* memset(block, 0, 32) */ "xor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" - "gssqc1 %[ftmp4], %[ftmp4], 0x00(%[block]) \n\t" - "gssqc1 %[ftmp4], %[ftmp4], 0x10(%[block]) \n\t" + MMI_SQC1(%[ftmp4], %[ftmp4], %[block], 0x00) + MMI_SQC1(%[ftmp4], %[ftmp4], %[block], 0x10) "dli %[tmp0], 0x01 \n\t" "mtc1 %[tmp0], %[ftmp8] \n\t" "dli %[tmp0], 0x06 \n\t" diff --git a/libavcodec/mips/hevcdsp_mmi.c b/libavcodec/mips/hevcdsp_mmi.c index aa83e1f9ad..29e8c885bd 100644 --- a/libavcodec/mips/hevcdsp_mmi.c +++ b/libavcodec/mips/hevcdsp_mmi.c @@ -35,6 +35,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src, \ uint64_t ftmp[15]; \ uint64_t rtmp[1]; \ const int8_t *filter = ff_hevc_qpel_filters[mx - 1]; \ + DECLARE_VAR_ALL64; \ \ x = x_step; \ y = height; \ @@ -50,14 +51,10 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src, \ \ "1: \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[src]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[src]) \n\t" \ - "gsldlc1 %[ftmp4], 0x08(%[src]) \n\t" \ - "gsldrc1 %[ftmp4], 0x01(%[src]) \n\t" \ - "gsldlc1 %[ftmp5], 0x09(%[src]) \n\t" \ - "gsldrc1 %[ftmp5], 0x02(%[src]) \n\t" \ - "gsldlc1 %[ftmp6], 0x0a(%[src]) \n\t" \ - "gsldrc1 %[ftmp6], 0x03(%[src]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[src], 0x00) \ + MMI_ULDC1(%[ftmp4], %[src], 0x01) \ + MMI_ULDC1(%[ftmp5], %[src], 0x02) \ + MMI_ULDC1(%[ftmp6], %[src], 0x03) \ "punpcklbh %[ftmp7], %[ftmp3], %[ftmp0] \n\t" \ "punpckhbh %[ftmp8], %[ftmp3], %[ftmp0] \n\t" \ "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" \ @@ -83,8 +80,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src, \ "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \ "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gssdlc1 %[ftmp3], 0x07(%[dst]) \n\t" \ - "gssdrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[dst], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x04 \n\t" \ @@ -98,7 +94,8 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src, \ PTR_ADDU "%[src], %[src], %[stride] \n\t" \ PTR_ADDIU "%[dst], %[dst], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -134,6 +131,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ int16_t *tmp = tmp_array; \ uint64_t ftmp[15]; \ uint64_t rtmp[1]; \ + DECLARE_VAR_ALL64; \ \ src -= (QPEL_EXTRA_BEFORE * srcstride + 3); \ filter = ff_hevc_qpel_filters[mx - 1]; \ @@ -151,14 +149,10 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ \ "1: \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[src]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[src]) \n\t" \ - "gsldlc1 %[ftmp4], 0x08(%[src]) \n\t" \ - "gsldrc1 %[ftmp4], 0x01(%[src]) \n\t" \ - "gsldlc1 %[ftmp5], 0x09(%[src]) \n\t" \ - "gsldrc1 %[ftmp5], 0x02(%[src]) \n\t" \ - "gsldlc1 %[ftmp6], 0x0a(%[src]) \n\t" \ - "gsldrc1 %[ftmp6], 0x03(%[src]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[src], 0x00) \ + MMI_ULDC1(%[ftmp4], %[src], 0x01) \ + MMI_ULDC1(%[ftmp5], %[src], 0x02) \ + MMI_ULDC1(%[ftmp6], %[src], 0x03) \ "punpcklbh %[ftmp7], %[ftmp3], %[ftmp0] \n\t" \ "punpckhbh %[ftmp8], %[ftmp3], %[ftmp0] \n\t" \ "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" \ @@ -184,8 +178,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \ "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gssdlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gssdrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x04 \n\t" \ @@ -199,7 +192,8 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ PTR_ADDU "%[src], %[src], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -228,29 +222,21 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ \ "1: \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp5], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp5], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp5], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp6], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp6], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp6], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp7], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp7], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp7], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp8], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp8], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp8], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp9], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp9], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp9], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp10], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp10], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp10], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], -0x380 \n\t" \ TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6], \ %[ftmp11], %[ftmp12], %[ftmp13], %[ftmp14]) \ @@ -275,8 +261,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ "paddw %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "psraw %[ftmp5], %[ftmp5], %[ftmp0] \n\t" \ "packsswh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gssdlc1 %[ftmp3], 0x07(%[dst]) \n\t" \ - "gssdrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ + MMI_USDC1(%[ftmp3], %[dst], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[dst], %[dst], 0x08 \n\t" \ @@ -290,7 +275,8 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ PTR_ADDIU "%[dst], %[dst], 0x80 \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -333,6 +319,8 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ uint64_t rtmp[1]; \ int shift = 7; \ int offset = 64; \ + DECLARE_VAR_ALL64; \ + DECLARE_VAR_LOW32; \ \ x = width >> 2; \ y = height; \ @@ -351,14 +339,10 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ "1: \n\t" \ "li %[x], " #x_step " \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[src]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[src]) \n\t" \ - "gsldlc1 %[ftmp4], 0x08(%[src]) \n\t" \ - "gsldrc1 %[ftmp4], 0x01(%[src]) \n\t" \ - "gsldlc1 %[ftmp5], 0x09(%[src]) \n\t" \ - "gsldrc1 %[ftmp5], 0x02(%[src]) \n\t" \ - "gsldlc1 %[ftmp6], 0x0a(%[src]) \n\t" \ - "gsldrc1 %[ftmp6], 0x03(%[src]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[src], 0x00) \ + MMI_ULDC1(%[ftmp4], %[src], 0x01) \ + MMI_ULDC1(%[ftmp5], %[src], 0x02) \ + MMI_ULDC1(%[ftmp6], %[src], 0x03) \ "punpcklbh %[ftmp7], %[ftmp3], %[ftmp0] \n\t" \ "punpckhbh %[ftmp8], %[ftmp3], %[ftmp0] \n\t" \ "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" \ @@ -385,8 +369,7 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[offset] \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[src2]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[src2]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[src2], 0x00) \ "li %[rtmp0], 0x10 \n\t" \ "dmtc1 %[rtmp0], %[ftmp8] \n\t" \ "punpcklhw %[ftmp5], %[ftmp0], %[ftmp3] \n\t" \ @@ -405,8 +388,7 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ "pcmpgth %[ftmp7], %[ftmp5], %[ftmp0] \n\t" \ "and %[ftmp3], %[ftmp5], %[ftmp7] \n\t" \ "packushb %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ - "gsswlc1 %[ftmp3], 0x03(%[dst]) \n\t" \ - "gsswrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ + MMI_USWC1(%[ftmp3], %[dst], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x04 \n\t" \ @@ -422,7 +404,8 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[dst], %[dst], %[dst_stride] \n\t" \ PTR_ADDIU "%[src2], %[src2], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 RESTRICT_ASM_LOW32 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -467,6 +450,8 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ uint64_t rtmp[1]; \ int shift = 7; \ int offset = 64; \ + DECLARE_VAR_ALL64; \ + DECLARE_VAR_LOW32; \ \ src -= (QPEL_EXTRA_BEFORE * srcstride + 3); \ filter = ff_hevc_qpel_filters[mx - 1]; \ @@ -484,14 +469,10 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ \ "1: \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[src]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[src]) \n\t" \ - "gsldlc1 %[ftmp4], 0x08(%[src]) \n\t" \ - "gsldrc1 %[ftmp4], 0x01(%[src]) \n\t" \ - "gsldlc1 %[ftmp5], 0x09(%[src]) \n\t" \ - "gsldrc1 %[ftmp5], 0x02(%[src]) \n\t" \ - "gsldlc1 %[ftmp6], 0x0a(%[src]) \n\t" \ - "gsldrc1 %[ftmp6], 0x03(%[src]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[src], 0x00) \ + MMI_ULDC1(%[ftmp4], %[src], 0x01) \ + MMI_ULDC1(%[ftmp5], %[src], 0x02) \ + MMI_ULDC1(%[ftmp6], %[src], 0x03) \ "punpcklbh %[ftmp7], %[ftmp3], %[ftmp0] \n\t" \ "punpckhbh %[ftmp8], %[ftmp3], %[ftmp0] \n\t" \ "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" \ @@ -517,8 +498,7 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \ "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gssdlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gssdrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_USDC1(%[ftmp3], %[tmp], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x04 \n\t" \ @@ -532,7 +512,8 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[src], %[src], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -563,29 +544,21 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "1: \n\t" \ "li %[x], " #x_step " \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp5], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp5], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp5], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp6], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp6], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp6], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp7], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp7], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp7], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp8], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp8], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp8], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp9], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp9], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp9], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp10], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp10], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp10], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], -0x380 \n\t" \ TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6], \ %[ftmp11], %[ftmp12], %[ftmp13], %[ftmp14]) \ @@ -610,8 +583,7 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "paddw %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "psraw %[ftmp5], %[ftmp5], %[ftmp0] \n\t" \ "packsswh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[src2]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[src2]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[src2], 0x00) \ "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" \ "li %[rtmp0], 0x10 \n\t" \ "dmtc1 %[rtmp0], %[ftmp8] \n\t" \ @@ -633,8 +605,7 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "pcmpgth %[ftmp7], %[ftmp5], %[ftmp7] \n\t" \ "and %[ftmp3], %[ftmp5], %[ftmp7] \n\t" \ "packushb %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ - "gsswlc1 %[ftmp3], 0x03(%[dst]) \n\t" \ - "gsswrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ + MMI_USWC1(%[ftmp3], %[dst], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src2], %[src2], 0x08 \n\t" \ @@ -650,7 +621,8 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[dst], %[dst], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 RESTRICT_ASM_LOW32 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -696,6 +668,8 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ uint64_t rtmp[1]; \ int shift = 7; \ int offset = 64; \ + DECLARE_VAR_ALL64; \ + DECLARE_VAR_LOW32; \ \ src -= (EPEL_EXTRA_BEFORE * srcstride + 1); \ x = width >> 2; \ @@ -710,14 +684,10 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ \ "1: \n\t" \ "2: \n\t" \ - "gslwlc1 %[ftmp2], 0x03(%[src]) \n\t" \ - "gslwrc1 %[ftmp2], 0x00(%[src]) \n\t" \ - "gslwlc1 %[ftmp3], 0x04(%[src]) \n\t" \ - "gslwrc1 %[ftmp3], 0x01(%[src]) \n\t" \ - "gslwlc1 %[ftmp4], 0x05(%[src]) \n\t" \ - "gslwrc1 %[ftmp4], 0x02(%[src]) \n\t" \ - "gslwlc1 %[ftmp5], 0x06(%[src]) \n\t" \ - "gslwrc1 %[ftmp5], 0x03(%[src]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[src], 0x00) \ + MMI_ULDC1(%[ftmp4], %[src], 0x01) \ + MMI_ULDC1(%[ftmp5], %[src], 0x02) \ + MMI_ULDC1(%[ftmp6], %[src], 0x03) \ "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" \ "pmullh %[ftmp2], %[ftmp2], %[ftmp1] \n\t" \ "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" \ @@ -731,8 +701,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "paddh %[ftmp2], %[ftmp2], %[ftmp3] \n\t" \ "paddh %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \ "paddh %[ftmp2], %[ftmp2], %[ftmp4] \n\t" \ - "gssdlc1 %[ftmp2], 0x07(%[tmp]) \n\t" \ - "gssdrc1 %[ftmp2], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp2], %[tmp], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x04 \n\t" \ @@ -746,7 +715,8 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[src], %[src], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -776,17 +746,13 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "1: \n\t" \ "li %[x], " #x_step " \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp5], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp5], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp5], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp6], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp6], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp6], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], -0x180 \n\t" \ TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6], \ %[ftmp7], %[ftmp8], %[ftmp9], %[ftmp10]) \ @@ -801,8 +767,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "paddw %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "psraw %[ftmp5], %[ftmp5], %[ftmp0] \n\t" \ "packsswh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[src2]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[src2]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[tmp], 0x02) \ "li %[rtmp0], 0x10 \n\t" \ "dmtc1 %[rtmp0], %[ftmp8] \n\t" \ "punpcklhw %[ftmp5], %[ftmp2], %[ftmp3] \n\t" \ @@ -823,8 +788,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ "pcmpgth %[ftmp7], %[ftmp5], %[ftmp2] \n\t" \ "and %[ftmp3], %[ftmp5], %[ftmp7] \n\t" \ "packushb %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ - "gsswlc1 %[ftmp3], 0x03(%[dst]) \n\t" \ - "gsswrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ + MMI_USWC1(%[ftmp3], %[dst], 0x0) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src2], %[src2], 0x08 \n\t" \ @@ -840,7 +804,8 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[dst], %[dst], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_LOW32 RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -878,6 +843,7 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ uint64_t ftmp[12]; \ uint64_t rtmp[1]; \ int shift = 7; \ + DECLARE_VAR_ALL64; \ \ y = height; \ x = width >> 3; \ @@ -894,12 +860,9 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ \ "1: \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp5], 0x07(%[src]) \n\t" \ - "gsldrc1 %[ftmp5], 0x00(%[src]) \n\t" \ - "gsldlc1 %[ftmp2], 0x07(%[src2]) \n\t" \ - "gsldrc1 %[ftmp2], 0x00(%[src2]) \n\t" \ - "gsldlc1 %[ftmp3], 0x0f(%[src2]) \n\t" \ - "gsldrc1 %[ftmp3], 0x08(%[src2]) \n\t" \ + MMI_ULDC1(%[ftmp5], %[src], 0x00) \ + MMI_ULDC1(%[ftmp2], %[src2], 0x00) \ + MMI_ULDC1(%[ftmp3], %[src2], 0x08) \ "punpcklbh %[ftmp4], %[ftmp5], %[ftmp0] \n\t" \ "punpckhbh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" \ "psllh %[ftmp4], %[ftmp4], %[ftmp1] \n\t" \ @@ -933,8 +896,7 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ "and %[ftmp2], %[ftmp2], %[ftmp3] \n\t" \ "and %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \ "packushb %[ftmp2], %[ftmp2], %[ftmp4] \n\t" \ - "gssdlc1 %[ftmp2], 0x07(%[dst]) \n\t" \ - "gssdrc1 %[ftmp2], 0x00(%[dst]) \n\t" \ + MMI_USDC1(%[ftmp2], %[dst], 0x0) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x08 \n\t" \ @@ -951,7 +913,8 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[dst], %[dst], %[dststride] \n\t" \ PTR_ADDIU "%[src2], %[src2], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -993,6 +956,8 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ uint64_t rtmp[1]; \ int shift = 6; \ int offset = 32; \ + DECLARE_VAR_ALL64; \ + DECLARE_VAR_LOW32; \ \ src -= (QPEL_EXTRA_BEFORE * srcstride + 3); \ filter = ff_hevc_qpel_filters[mx - 1]; \ @@ -1010,14 +975,10 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ \ "1: \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[src]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[src]) \n\t" \ - "gsldlc1 %[ftmp4], 0x08(%[src]) \n\t" \ - "gsldrc1 %[ftmp4], 0x01(%[src]) \n\t" \ - "gsldlc1 %[ftmp5], 0x09(%[src]) \n\t" \ - "gsldrc1 %[ftmp5], 0x02(%[src]) \n\t" \ - "gsldlc1 %[ftmp6], 0x0a(%[src]) \n\t" \ - "gsldrc1 %[ftmp6], 0x03(%[src]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[src], 0x00) \ + MMI_ULDC1(%[ftmp4], %[src], 0x01) \ + MMI_ULDC1(%[ftmp5], %[src], 0x02) \ + MMI_ULDC1(%[ftmp6], %[src], 0x03) \ "punpcklbh %[ftmp7], %[ftmp3], %[ftmp0] \n\t" \ "punpckhbh %[ftmp8], %[ftmp3], %[ftmp0] \n\t" \ "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" \ @@ -1043,8 +1004,7 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \ "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ - "gssdlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gssdrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_USDC1(%[ftmp3], %[tmp], 0x0) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x04 \n\t" \ @@ -1058,7 +1018,8 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[src], %[src], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ @@ -1090,29 +1051,21 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ "1: \n\t" \ "li %[x], " #x_step " \n\t" \ "2: \n\t" \ - "gsldlc1 %[ftmp3], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp3], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp4], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp4], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp4], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp5], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp5], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp5], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp6], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp6], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp6], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp7], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp7], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp7], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp8], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp8], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp8], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp9], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp9], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp9], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ - "gsldlc1 %[ftmp10], 0x07(%[tmp]) \n\t" \ - "gsldrc1 %[ftmp10], 0x00(%[tmp]) \n\t" \ + MMI_ULDC1(%[ftmp10], %[tmp], 0x00) \ PTR_ADDIU "%[tmp], %[tmp], -0x380 \n\t" \ TRANSPOSE_4H(%[ftmp3], %[ftmp4], %[ftmp5], %[ftmp6], \ %[ftmp11], %[ftmp12], %[ftmp13], %[ftmp14]) \ @@ -1143,8 +1096,7 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ "pcmpgth %[ftmp7], %[ftmp3], %[ftmp7] \n\t" \ "and %[ftmp3], %[ftmp3], %[ftmp7] \n\t" \ "packushb %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ - "gsswlc1 %[ftmp3], 0x03(%[dst]) \n\t" \ - "gsswrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ + MMI_USWC1(%[ftmp3], %[dst], 0x00) \ \ "daddi %[x], %[x], -0x01 \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x08 \n\t" \ @@ -1157,7 +1109,8 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ PTR_ADDU "%[dst], %[dst], %[stride] \n\t" \ PTR_ADDIU "%[tmp], %[tmp], 0x80 \n\t" \ "bnez %[y], 1b \n\t" \ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ + : RESTRICT_ASM_ALL64 RESTRICT_ASM_LOW32 \ + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), \ [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), \ [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), \ [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), \ diff --git a/libavcodec/mips/hpeldsp_mmi.c b/libavcodec/mips/hpeldsp_mmi.c index e69b2bd980..ce51815ff4 100644 --- a/libavcodec/mips/hpeldsp_mmi.c +++ b/libavcodec/mips/hpeldsp_mmi.c @@ -307,6 +307,7 @@ inline void ff_put_pixels4_l2_8_mmi(uint8_t *dst, const uint8_t *src1, double ftmp[4]; mips_reg addr[5]; DECLARE_VAR_LOW32; + DECLARE_VAR_ADDRT; __asm__ volatile ( "1: \n\t" diff --git a/libavcodec/mips/simple_idct_mmi.c b/libavcodec/mips/simple_idct_mmi.c index 73d797ffbc..ca29e2ea4b 100644 --- a/libavcodec/mips/simple_idct_mmi.c +++ b/libavcodec/mips/simple_idct_mmi.c @@ -55,6 +55,8 @@ DECLARE_ALIGNED(16, const int16_t, W_arr)[46] = { void ff_simple_idct_8_mmi(int16_t *block) { + DECLARE_VAR_ALL64; + BACKUP_REG __asm__ volatile ( @@ -141,20 +143,20 @@ void ff_simple_idct_8_mmi(int16_t *block) /* idctRowCondDC row0~8 */ /* load W */ - "gslqc1 $f19, $f18, 0x00(%[w_arr]) \n\t" - "gslqc1 $f21, $f20, 0x10(%[w_arr]) \n\t" - "gslqc1 $f23, $f22, 0x20(%[w_arr]) \n\t" - "gslqc1 $f25, $f24, 0x30(%[w_arr]) \n\t" - "gslqc1 $f17, $f16, 0x40(%[w_arr]) \n\t" + MMI_LQC1($f19, $f18, %[w_arr], 0x00) + MMI_LQC1($f21, $f20, %[w_arr], 0x10) + MMI_LQC1($f23, $f22, %[w_arr], 0x20) + MMI_LQC1($f25, $f24, %[w_arr], 0x30) + MMI_LQC1($f17, $f16, %[w_arr], 0x40) /* load source in block */ - "gslqc1 $f1, $f0, 0x00(%[block]) \n\t" - "gslqc1 $f3, $f2, 0x10(%[block]) \n\t" - "gslqc1 $f5, $f4, 0x20(%[block]) \n\t" - "gslqc1 $f7, $f6, 0x30(%[block]) \n\t" - "gslqc1 $f9, $f8, 0x40(%[block]) \n\t" - "gslqc1 $f11, $f10, 0x50(%[block]) \n\t" - "gslqc1 $f13, $f12, 0x60(%[block]) \n\t" - "gslqc1 $f15, $f14, 0x70(%[block]) \n\t" + MMI_LQC1($f1, $f0, %[block], 0x00) + MMI_LQC1($f3, $f2, %[block], 0x10) + MMI_LQC1($f5, $f4, %[block], 0x20) + MMI_LQC1($f7, $f6, %[block], 0x30) + MMI_LQC1($f9, $f8, %[block], 0x40) + MMI_LQC1($f11, $f10, %[block], 0x50) + MMI_LQC1($f13, $f12, %[block], 0x60) + MMI_LQC1($f15, $f14, %[block], 0x70) /* $9: mask ; $f17: ROW_SHIFT */ "dmfc1 $9, $f17 \n\t" @@ -252,8 +254,7 @@ void ff_simple_idct_8_mmi(int16_t *block) /* idctSparseCol col0~3 */ /* $f17: ff_p16_32; $f16: COL_SHIFT-16 */ - "gsldlc1 $f17, 0x57(%[w_arr]) \n\t" - "gsldrc1 $f17, 0x50(%[w_arr]) \n\t" + MMI_ULDC1($f17, %[w_arr], 0x50) "li $10, 4 \n\t" "dmtc1 $10, $f16 \n\t" "paddh $f0, $f0, $f17 \n\t" @@ -394,16 +395,16 @@ void ff_simple_idct_8_mmi(int16_t *block) "punpcklwd $f11, $f27, $f29 \n\t" "punpckhwd $f15, $f27, $f29 \n\t" /* Store */ - "gssqc1 $f1, $f0, 0x00(%[block]) \n\t" - "gssqc1 $f5, $f4, 0x10(%[block]) \n\t" - "gssqc1 $f9, $f8, 0x20(%[block]) \n\t" - "gssqc1 $f13, $f12, 0x30(%[block]) \n\t" - "gssqc1 $f3, $f2, 0x40(%[block]) \n\t" - "gssqc1 $f7, $f6, 0x50(%[block]) \n\t" - "gssqc1 $f11, $f10, 0x60(%[block]) \n\t" - "gssqc1 $f15, $f14, 0x70(%[block]) \n\t" + MMI_SQC1($f1, $f0, %[block], 0x00) + MMI_SQC1($f5, $f4, %[block], 0x10) + MMI_SQC1($f9, $f8, %[block], 0x20) + MMI_SQC1($f13, $f12, %[block], 0x30) + MMI_SQC1($f3, $f2, %[block], 0x40) + MMI_SQC1($f7, $f6, %[block], 0x50) + MMI_SQC1($f11, $f10, %[block], 0x60) + MMI_SQC1($f15, $f14, %[block], 0x70) - : [block]"+&r"(block) + : RESTRICT_ASM_ALL64 [block]"+&r"(block) : [w_arr]"r"(W_arr) : "memory" ); diff --git a/libavcodec/mips/vp3dsp_idct_mmi.c b/libavcodec/mips/vp3dsp_idct_mmi.c index c5c4cf3127..cc1e5bf595 100644 --- a/libavcodec/mips/vp3dsp_idct_mmi.c +++ b/libavcodec/mips/vp3dsp_idct_mmi.c @@ -722,6 +722,8 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1, if (h == 8) { double ftmp[6]; uint64_t tmp[2]; + DECLARE_VAR_ALL64; + __asm__ volatile ( "li %[tmp0], 0x08 \n\t" "li %[tmp1], 0xfefefefe \n\t" @@ -730,10 +732,8 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1, "li %[tmp1], 0x01 \n\t" "dmtc1 %[tmp1], %[ftmp5] \n\t" "1: \n\t" - "gsldlc1 %[ftmp1], 0x07(%[src1]) \n\t" - "gsldrc1 %[ftmp1], 0x00(%[src1]) \n\t" - "gsldlc1 %[ftmp2], 0x07(%[src2]) \n\t" - "gsldrc1 %[ftmp2], 0x00(%[src2]) \n\t" + MMI_ULDC1(%[ftmp1], %[src1], 0x0) + MMI_ULDC1(%[ftmp2], %[src2], 0x0) "xor %[ftmp3], %[ftmp1], %[ftmp2] \n\t" "and %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "psrlw %[ftmp3], %[ftmp3], %[ftmp5] \n\t" @@ -745,7 +745,8 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1, PTR_ADDU "%[dst], %[dst], %[stride] \n\t" PTR_ADDIU "%[tmp0], %[tmp0], -0x01 \n\t" "bnez %[tmp0], 1b \n\t" - : [dst]"+&r"(dst), [src1]"+&r"(src1), [src2]"+&r"(src2), + : RESTRICT_ASM_ALL64 + [dst]"+&r"(dst), [src1]"+&r"(src1), [src2]"+&r"(src2), [ftmp1]"=&f"(ftmp[0]), [ftmp2]"=&f"(ftmp[1]), [ftmp3]"=&f"(ftmp[2]), [ftmp4]"=&f"(ftmp[3]), [ftmp5]"=&f"(ftmp[4]), [ftmp6]"=&f"(ftmp[5]), [tmp0]"=&r"(tmp[0]), [tmp1]"=&r"(tmp[1]) diff --git a/libavcodec/mips/vp8dsp_mmi.c b/libavcodec/mips/vp8dsp_mmi.c index bd80aa1445..f76c8625f0 100644 --- a/libavcodec/mips/vp8dsp_mmi.c +++ b/libavcodec/mips/vp8dsp_mmi.c @@ -789,51 +789,40 @@ static av_always_inline void vp8_v_loop_filter8_mmi(uint8_t *dst, DECLARE_DOUBLE_1; DECLARE_DOUBLE_2; DECLARE_UINT32_T; + DECLARE_VAR_ALL64; + __asm__ volatile( /* Get data from dst */ - "gsldlc1 %[q0], 0x07(%[dst]) \n\t" - "gsldrc1 %[q0], 0x00(%[dst]) \n\t" + MMI_ULDC1(%[q0], %[dst], 0x0) PTR_SUBU "%[tmp0], %[dst], %[stride] \n\t" - "gsldlc1 %[p0], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[p0], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[p0], %[tmp0], 0x0) PTR_SUBU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[p1], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[p1], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[p1], %[tmp0], 0x0) PTR_SUBU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[p2], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[p2], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[p2], %[tmp0], 0x0) PTR_SUBU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[p3], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[p3], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[p3], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[dst], %[stride] \n\t" - "gsldlc1 %[q1], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[q1], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[q1], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[q2], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[q2], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[q2], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[q3], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[q3], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[q3], %[tmp0], 0x0) MMI_VP8_LOOP_FILTER /* Move to dst */ - "gssdlc1 %[q0], 0x07(%[dst]) \n\t" - "gssdrc1 %[q0], 0x00(%[dst]) \n\t" + MMI_USDC1(%[q0], %[dst], 0x0) PTR_SUBU "%[tmp0], %[dst], %[stride] \n\t" - "gssdlc1 %[p0], 0x07(%[tmp0]) \n\t" - "gssdrc1 %[p0], 0x00(%[tmp0]) \n\t" + MMI_USDC1(%[p0], %[tmp0], 0x0) PTR_SUBU "%[tmp0], %[tmp0], %[stride] \n\t" - "gssdlc1 %[p1], 0x07(%[tmp0]) \n\t" - "gssdrc1 %[p1], 0x00(%[tmp0]) \n\t" + MMI_USDC1(%[p1], %[tmp0], 0x0) PTR_SUBU "%[tmp0], %[tmp0], %[stride] \n\t" - "gssdlc1 %[p2], 0x07(%[tmp0]) \n\t" - "gssdrc1 %[p2], 0x00(%[tmp0]) \n\t" + MMI_USDC1(%[p2], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[dst], %[stride] \n\t" - "gssdlc1 %[q1], 0x07(%[tmp0]) \n\t" - "gssdrc1 %[q1], 0x00(%[tmp0]) \n\t" + MMI_USDC1(%[q1], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gssdlc1 %[q2], 0x07(%[tmp0]) \n\t" - "gssdrc1 %[q2], 0x00(%[tmp0]) \n\t" - : [p3]"=&f"(ftmp[0]), [p2]"=&f"(ftmp[1]), + MMI_USDC1(%[q2], %[tmp0], 0x0) + : RESTRICT_ASM_ALL64 + [p3]"=&f"(ftmp[0]), [p2]"=&f"(ftmp[1]), [p1]"=&f"(ftmp[2]), [p0]"=&f"(ftmp[3]), [q0]"=&f"(ftmp[4]), [q1]"=&f"(ftmp[5]), [q2]"=&f"(ftmp[6]), [q3]"=&f"(ftmp[7]), @@ -874,31 +863,25 @@ static av_always_inline void vp8_h_loop_filter8_mmi(uint8_t *dst, DECLARE_DOUBLE_1; DECLARE_DOUBLE_2; DECLARE_UINT32_T; + DECLARE_VAR_ALL64; + __asm__ volatile( /* Get data from dst */ - "gsldlc1 %[p3], 0x03(%[dst]) \n\t" - "gsldrc1 %[p3], -0x04(%[dst]) \n\t" + MMI_ULDC1(%[p3], %[dst], -0x04) PTR_ADDU "%[tmp0], %[dst], %[stride] \n\t" - "gsldlc1 %[p2], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[p2], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[p2], %[tmp0], -0x04) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[p1], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[p1], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[p1], %[tmp0], -0x04) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[p0], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[p0], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[p0], %[tmp0], -0x04) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[q0], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[q0], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[q0], %[tmp0], -0x04) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[q1], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[q1], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[q1], %[tmp0], -0x04) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[q2], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[q2], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[q2], %[tmp0], -0x04) PTR_ADDU "%[tmp0], %[tmp0], %[stride] \n\t" - "gsldlc1 %[q3], 0x03(%[tmp0]) \n\t" - "gsldrc1 %[q3], -0x04(%[tmp0]) \n\t" + MMI_ULDC1(%[q3], %[tmp0], -0x04) /* Matrix transpose */ TRANSPOSE_8B(%[p3], %[p2], %[p1], %[p0], %[q0], %[q1], %[q2], %[q3], @@ -909,30 +892,23 @@ static av_always_inline void vp8_h_loop_filter8_mmi(uint8_t *dst, %[q0], %[q1], %[q2], %[q3], %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) /* Move to dst */ - "gssdlc1 %[p3], 0x03(%[dst]) \n\t" - "gssdrc1 %[p3], -0x04(%[dst]) \n\t" + MMI_USDC1(%[p3], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[p2], 0x03(%[dst]) \n\t" - "gssdrc1 %[p2], -0x04(%[dst]) \n\t" + MMI_USDC1(%[p2], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[p1], 0x03(%[dst]) \n\t" - "gssdrc1 %[p1], -0x04(%[dst]) \n\t" + MMI_USDC1(%[p1], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[p0], 0x03(%[dst]) \n\t" - "gssdrc1 %[p0], -0x04(%[dst]) \n\t" + MMI_USDC1(%[p0], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[q0], 0x03(%[dst]) \n\t" - "gssdrc1 %[q0], -0x04(%[dst]) \n\t" + MMI_USDC1(%[q0], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[q1], 0x03(%[dst]) \n\t" - "gssdrc1 %[q1], -0x04(%[dst]) \n\t" + MMI_USDC1(%[q1], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[q2], 0x03(%[dst]) \n\t" - "gssdrc1 %[q2], -0x04(%[dst]) \n\t" + MMI_USDC1(%[q2], %[dst], -0x04) PTR_ADDU "%[dst], %[dst], %[stride] \n\t" - "gssdlc1 %[q3], 0x03(%[dst]) \n\t" - "gssdrc1 %[q3], -0x04(%[dst]) \n\t" - : [p3]"=&f"(ftmp[0]), [p2]"=&f"(ftmp[1]), + MMI_USDC1(%[q3], %[dst], -0x04) + : RESTRICT_ASM_ALL64 + [p3]"=&f"(ftmp[0]), [p2]"=&f"(ftmp[1]), [p1]"=&f"(ftmp[2]), [p0]"=&f"(ftmp[3]), [q0]"=&f"(ftmp[4]), [q1]"=&f"(ftmp[5]), [q2]"=&f"(ftmp[6]), [q3]"=&f"(ftmp[7]), diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c index e7a83875b9..57825fb967 100644 --- a/libavcodec/mips/vp9_mc_mmi.c +++ b/libavcodec/mips/vp9_mc_mmi.c @@ -77,29 +77,24 @@ static void convolve_horiz_mmi(const uint8_t *src, int32_t src_stride, { double ftmp[15]; uint32_t tmp[2]; + DECLARE_VAR_ALL64; src -= 3; src_stride -= w; dst_stride -= w; __asm__ volatile ( "move %[tmp1], %[width] \n\t" "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[filter1], 0x03(%[filter]) \n\t" - "gsldrc1 %[filter1], 0x00(%[filter]) \n\t" - "gsldlc1 %[filter2], 0x0b(%[filter]) \n\t" - "gsldrc1 %[filter2], 0x08(%[filter]) \n\t" + MMI_LDLRC1(%[filter1], %[filter], 0x00, 0x03) + MMI_LDLRC1(%[filter2], %[filter], 0x08, 0x03) "li %[tmp0], 0x07 \n\t" "dmtc1 %[tmp0], %[ftmp13] \n\t" "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" "1: \n\t" /* Get 8 data per row */ - "gsldlc1 %[ftmp5], 0x07(%[src]) \n\t" - "gsldrc1 %[ftmp5], 0x00(%[src]) \n\t" - "gsldlc1 %[ftmp7], 0x08(%[src]) \n\t" - "gsldrc1 %[ftmp7], 0x01(%[src]) \n\t" - "gsldlc1 %[ftmp9], 0x09(%[src]) \n\t" - "gsldrc1 %[ftmp9], 0x02(%[src]) \n\t" - "gsldlc1 %[ftmp11], 0x0A(%[src]) \n\t" - "gsldrc1 %[ftmp11], 0x03(%[src]) \n\t" + MMI_ULDC1(%[ftmp5], %[src], 0x00) + MMI_ULDC1(%[ftmp7], %[src], 0x01) + MMI_ULDC1(%[ftmp9], %[src], 0x02) + MMI_ULDC1(%[ftmp11], %[src], 0x03) "punpcklbh %[ftmp4], %[ftmp5], %[ftmp0] \n\t" "punpckhbh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" "punpcklbh %[ftmp6], %[ftmp7], %[ftmp0] \n\t" @@ -127,7 +122,8 @@ static void convolve_horiz_mmi(const uint8_t *src, int32_t src_stride, PTR_ADDU "%[dst], %[dst], %[dst_stride] \n\t" PTR_ADDIU "%[height], %[height], -0x01 \n\t" "bnez %[height], 1b \n\t" - : [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), [filter1]"=&f"(ftmp[2]), [filter2]"=&f"(ftmp[3]), [ftmp0]"=&f"(ftmp[4]), [ftmp4]"=&f"(ftmp[5]), [ftmp5]"=&f"(ftmp[6]), [ftmp6]"=&f"(ftmp[7]), @@ -153,15 +149,14 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride, double ftmp[17]; uint32_t tmp[1]; ptrdiff_t addr = src_stride; + DECLARE_VAR_ALL64; src_stride -= w; dst_stride -= w; __asm__ volatile ( "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[ftmp4], 0x03(%[filter]) \n\t" - "gsldrc1 %[ftmp4], 0x00(%[filter]) \n\t" - "gsldlc1 %[ftmp5], 0x0b(%[filter]) \n\t" - "gsldrc1 %[ftmp5], 0x08(%[filter]) \n\t" + MMI_LDLRC1(%[ftmp4], %[filter], 0x00, 0x03) + MMI_LDLRC1(%[ftmp5], %[filter], 0x08, 0x03) "punpcklwd %[filter10], %[ftmp4], %[ftmp4] \n\t" "punpckhwd %[filter32], %[ftmp4], %[ftmp4] \n\t" "punpcklwd %[filter54], %[ftmp5], %[ftmp5] \n\t" @@ -171,29 +166,21 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride, "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" "1: \n\t" /* Get 8 data per column */ - "gsldlc1 %[ftmp4], 0x07(%[src]) \n\t" - "gsldrc1 %[ftmp4], 0x00(%[src]) \n\t" + MMI_ULDC1(%[ftmp4], %[src], 0x0) PTR_ADDU "%[tmp0], %[src], %[addr] \n\t" - "gsldlc1 %[ftmp5], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp5], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp5], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp6], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp6], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp6], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp7], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp7], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp7], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp8], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp8], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp8], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp9], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp9], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp9], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp10], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp10], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp10], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp11], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp11], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp11], %[tmp0], 0x0) "punpcklbh %[ftmp4], %[ftmp4], %[ftmp0] \n\t" "punpcklbh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" "punpcklbh %[ftmp6], %[ftmp6], %[ftmp0] \n\t" @@ -221,7 +208,8 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride, PTR_ADDU "%[dst], %[dst], %[dst_stride] \n\t" PTR_ADDIU "%[height], %[height], -0x01 \n\t" "bnez %[height], 1b \n\t" - : [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), [filter10]"=&f"(ftmp[2]), [filter32]"=&f"(ftmp[3]), [filter54]"=&f"(ftmp[4]), [filter76]"=&f"(ftmp[5]), [ftmp0]"=&f"(ftmp[6]), [ftmp4]"=&f"(ftmp[7]), @@ -247,6 +235,7 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride, { double ftmp[15]; uint32_t tmp[2]; + DECLARE_VAR_ALL64; src -= 3; src_stride -= w; dst_stride -= w; @@ -254,23 +243,17 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "move %[tmp1], %[width] \n\t" "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[filter1], 0x03(%[filter]) \n\t" - "gsldrc1 %[filter1], 0x00(%[filter]) \n\t" - "gsldlc1 %[filter2], 0x0b(%[filter]) \n\t" - "gsldrc1 %[filter2], 0x08(%[filter]) \n\t" + MMI_LDLRC1(%[filter1], %[filter], 0x00, 0x03) + MMI_LDLRC1(%[filter2], %[filter], 0x08, 0x03) "li %[tmp0], 0x07 \n\t" "dmtc1 %[tmp0], %[ftmp13] \n\t" "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" "1: \n\t" /* Get 8 data per row */ - "gsldlc1 %[ftmp5], 0x07(%[src]) \n\t" - "gsldrc1 %[ftmp5], 0x00(%[src]) \n\t" - "gsldlc1 %[ftmp7], 0x08(%[src]) \n\t" - "gsldrc1 %[ftmp7], 0x01(%[src]) \n\t" - "gsldlc1 %[ftmp9], 0x09(%[src]) \n\t" - "gsldrc1 %[ftmp9], 0x02(%[src]) \n\t" - "gsldlc1 %[ftmp11], 0x0A(%[src]) \n\t" - "gsldrc1 %[ftmp11], 0x03(%[src]) \n\t" + MMI_ULDC1(%[ftmp5], %[src], 0x00) + MMI_ULDC1(%[ftmp7], %[src], 0x01) + MMI_ULDC1(%[ftmp9], %[src], 0x02) + MMI_ULDC1(%[ftmp11], %[src], 0x03) "punpcklbh %[ftmp4], %[ftmp5], %[ftmp0] \n\t" "punpckhbh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" "punpcklbh %[ftmp6], %[ftmp7], %[ftmp0] \n\t" @@ -289,8 +272,7 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride, "packsswh %[srcl], %[srcl], %[srch] \n\t" "packushb %[ftmp12], %[srcl], %[ftmp0] \n\t" "punpcklbh %[ftmp12], %[ftmp12], %[ftmp0] \n\t" - "gsldlc1 %[ftmp4], 0x07(%[dst]) \n\t" - "gsldrc1 %[ftmp4], 0x00(%[dst]) \n\t" + MMI_ULDC1(%[ftmp4], %[dst], 0x0) "punpcklbh %[ftmp4], %[ftmp4], %[ftmp0] \n\t" "paddh %[ftmp12], %[ftmp12], %[ftmp4] \n\t" "li %[tmp0], 0x10001 \n\t" @@ -309,7 +291,8 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride, PTR_ADDU "%[dst], %[dst], %[dst_stride] \n\t" PTR_ADDIU "%[height], %[height], -0x01 \n\t" "bnez %[height], 1b \n\t" - : [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), [filter1]"=&f"(ftmp[2]), [filter2]"=&f"(ftmp[3]), [ftmp0]"=&f"(ftmp[4]), [ftmp4]"=&f"(ftmp[5]), [ftmp5]"=&f"(ftmp[6]), [ftmp6]"=&f"(ftmp[7]), @@ -335,15 +318,14 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride, double ftmp[17]; uint32_t tmp[1]; ptrdiff_t addr = src_stride; + DECLARE_VAR_ALL64; src_stride -= w; dst_stride -= w; __asm__ volatile ( "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[ftmp4], 0x03(%[filter]) \n\t" - "gsldrc1 %[ftmp4], 0x00(%[filter]) \n\t" - "gsldlc1 %[ftmp5], 0x0b(%[filter]) \n\t" - "gsldrc1 %[ftmp5], 0x08(%[filter]) \n\t" + MMI_LDLRC1(%[ftmp4], %[filter], 0x00, 0x03) + MMI_LDLRC1(%[ftmp5], %[filter], 0x08, 0x03) "punpcklwd %[filter10], %[ftmp4], %[ftmp4] \n\t" "punpckhwd %[filter32], %[ftmp4], %[ftmp4] \n\t" "punpcklwd %[filter54], %[ftmp5], %[ftmp5] \n\t" @@ -353,29 +335,21 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride, "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" "1: \n\t" /* Get 8 data per column */ - "gsldlc1 %[ftmp4], 0x07(%[src]) \n\t" - "gsldrc1 %[ftmp4], 0x00(%[src]) \n\t" + MMI_ULDC1(%[ftmp4], %[src], 0x0) PTR_ADDU "%[tmp0], %[src], %[addr] \n\t" - "gsldlc1 %[ftmp5], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp5], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp5], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp6], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp6], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp6], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp7], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp7], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp7], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp8], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp8], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp8], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp9], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp9], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp9], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp10], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp10], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp10], %[tmp0], 0x0) PTR_ADDU "%[tmp0], %[tmp0], %[addr] \n\t" - "gsldlc1 %[ftmp11], 0x07(%[tmp0]) \n\t" - "gsldrc1 %[ftmp11], 0x00(%[tmp0]) \n\t" + MMI_ULDC1(%[ftmp11], %[tmp0], 0x0) "punpcklbh %[ftmp4], %[ftmp4], %[ftmp0] \n\t" "punpcklbh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" "punpcklbh %[ftmp6], %[ftmp6], %[ftmp0] \n\t" @@ -394,8 +368,7 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride, "packsswh %[srcl], %[srcl], %[srch] \n\t" "packushb %[ftmp12], %[srcl], %[ftmp0] \n\t" "punpcklbh %[ftmp12], %[ftmp12], %[ftmp0] \n\t" - "gsldlc1 %[ftmp4], 0x07(%[dst]) \n\t" - "gsldrc1 %[ftmp4], 0x00(%[dst]) \n\t" + MMI_ULDC1(%[ftmp4], %[dst], 0x00) "punpcklbh %[ftmp4], %[ftmp4], %[ftmp0] \n\t" "paddh %[ftmp12], %[ftmp12], %[ftmp4] \n\t" "li %[tmp0], 0x10001 \n\t" @@ -414,7 +387,8 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride, PTR_ADDU "%[dst], %[dst], %[dst_stride] \n\t" PTR_ADDIU "%[height], %[height], -0x01 \n\t" "bnez %[height], 1b \n\t" - : [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [srcl]"=&f"(ftmp[0]), [srch]"=&f"(ftmp[1]), [filter10]"=&f"(ftmp[2]), [filter32]"=&f"(ftmp[3]), [filter54]"=&f"(ftmp[4]), [filter76]"=&f"(ftmp[5]), [ftmp0]"=&f"(ftmp[6]), [ftmp4]"=&f"(ftmp[7]), @@ -439,6 +413,7 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride, { double ftmp[4]; uint32_t tmp[2]; + DECLARE_VAR_ALL64; src_stride -= w; dst_stride -= w; @@ -449,10 +424,8 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride, "dmtc1 %[tmp0], %[ftmp3] \n\t" "punpcklhw %[ftmp3], %[ftmp3], %[ftmp3] \n\t" "1: \n\t" - "gslwlc1 %[ftmp1], 0x07(%[src]) \n\t" - "gslwrc1 %[ftmp1], 0x00(%[src]) \n\t" - "gslwlc1 %[ftmp2], 0x07(%[dst]) \n\t" - "gslwrc1 %[ftmp2], 0x00(%[dst]) \n\t" + MMI_ULDC1(%[ftmp1], %[src], 0x00) + MMI_ULDC1(%[ftmp2], %[dst], 0x00) "punpcklbh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "paddh %[ftmp1], %[ftmp1], %[ftmp2] \n\t" @@ -469,7 +442,8 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride, PTR_ADDU "%[src], %[src], %[src_stride] \n\t" PTR_ADDIU "%[height], %[height], -0x01 \n\t" "bnez %[height], 1b \n\t" - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), + : RESTRICT_ASM_ALL64 + [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [tmp0]"=&r"(tmp[0]), [tmp1]"=&r"(tmp[1]), [src]"+&r"(src), [dst]"+&r"(dst), From patchwork Fri Feb 19 05:28:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaxun Yang X-Patchwork-Id: 25780 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 291F544ADFD for ; Fri, 19 Feb 2021 07:29:12 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0EE78680534; Fri, 19 Feb 2021 07:29:12 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 27EA9680534 for ; Fri, 19 Feb 2021 07:29:06 +0200 (EET) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 33E0D5C011C; Fri, 19 Feb 2021 00:29:05 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Fri, 19 Feb 2021 00:29:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flygoat.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm1; bh=hBUPjOdcm+y1o B1G37fQEeHfNyvu8R+KyrVB41Tw3d4=; b=ezoGGJ21OfHkDOL6MganHX0gMUdf/ U5KqXjjvEA/CQqXB7I4zjVrvoBEpmHDbgU5iGd2pk4pLQtrg+Y+dwkMYd8fx6WwR Ht+2x/O4K4jae8BzxUTu6yjeM0cuTGOPVs3coL/Cs6kQG0T6wwLzFNmn87Rux2rM uJWvpfGYghYFy6rxm4lSrDnXnfF/iP+G3eQPe2p69+OLLpFVLx2WSAiqscFT9vfW q5hk0S4PUnQiR+N6DXjQKysG92lDHt1R6XaiI6PwFeobgqUFOIgN2GdTS7Hprn/d +fxpZ60nuuJjZrM1JNV9/25veModC6e7/k5ms5mWP1rxV/xD03Vl351Kw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=hBUPjOdcm+y1oB1G37fQEeHfNyvu8R+KyrVB41Tw3d4=; b=BesLM+wA aRXG5gIla4ZUhQFf0ceLJnmzv71vFs5Cz7YbGW4nxpCjr1txVtImqb2TA2N4LSH6 TqKbUqTugrkLkKemUDXf8S6qznx26MvVOqzFe8YRb/bhtkdrhCcckLvF/LvFeH2o 7LiE51GLIeBxTLjEXWhACygNID4osbqiiCXAHAnCViXiuzNknBopwmUuBx1n6uHI V+TYhJuhaaCC8evRn0IxuFS/YPT3btx8s/uNXNqI+n6eaECNxpl2ReX9jcr7FUyt 3gF7jwagAnnHuL4c8ECFw1yeABxid9cXFQd2GrkY5MIQF4vND/rR5T3VElIoQIY1 1jO0vhgQsI1tlQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrjeehgdekvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeflihgrgihunhcujggrnhhguceojhhirgiguhhnrdihrghnghes fhhlhihgohgrthdrtghomheqnecuggftrfgrthhtvghrnhepjeeihffgteelkeelffduke dtheevudejvdegkeekjeefhffhhfetudetgfdtffeunecukfhppedukeefrdduheejrdef ledrudeiudenucevlhhushhtvghrufhiiigvpedvnecurfgrrhgrmhepmhgrihhlfhhroh hmpehjihgrgihunhdrhigrnhhgsehflhihghhorghtrdgtohhm X-ME-Proxy: Received: from strike.202.net.flygoat.com (unknown [183.157.39.161]) by mail.messagingengine.com (Postfix) with ESMTPA id 2F9D31080057; Fri, 19 Feb 2021 00:29:02 -0500 (EST) From: Jiaxun Yang To: ffmpeg-devel@ffmpeg.org Date: Fri, 19 Feb 2021 13:28:34 +0800 Message-Id: <20210219052834.533558-5-jiaxun.yang@flygoat.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210219052834.533558-1-jiaxun.yang@flygoat.com> References: <20210219052834.533558-1-jiaxun.yang@flygoat.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] avutil/mips: Use $at as MMI macro temporary register X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: yinshiyou-hf@loongson.cn, guxiwei-hf@loongson.cn, Jiaxun Yang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Some function had exceed 30 inline assembly register oprands limiation when using LOONGSON2 version of MMI macros. We can avoid that by take $at, which is register reserved for assembler, as temporary register. As none of instructions used in these macros is pseudo, it is safe to utilize $at here. Signed-off-by: Jiaxun Yang --- libavutil/mips/mmiutils.h | 115 +++++++++++++++++++++++--------------- 1 file changed, 69 insertions(+), 46 deletions(-) diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h index 3994085057..7b7b405ddf 100644 --- a/libavutil/mips/mmiutils.h +++ b/libavutil/mips/mmiutils.h @@ -27,78 +27,107 @@ #include "config.h" #include "libavutil/mips/asmdefs.h" -#if HAVE_LOONGSON2 +/* + * These were used to define temporary registers for MMI marcos + * however now we're using $at. They're theoretically unnecessary + * but just leave them here to avoid mess. + */ +#define DECLARE_VAR_LOW32 +#define RESTRICT_ASM_LOW32 +#define DECLARE_VAR_ALL64 +#define RESTRICT_ASM_ALL64 +#define DECLARE_VAR_ADDRT +#define RESTRICT_ASM_ADDRT -#define DECLARE_VAR_LOW32 int32_t low32 -#define RESTRICT_ASM_LOW32 [low32]"=&r"(low32), -#define DECLARE_VAR_ALL64 int64_t all64 -#define RESTRICT_ASM_ALL64 [all64]"=&r"(all64), -#define DECLARE_VAR_ADDRT mips_reg addrt -#define RESTRICT_ASM_ADDRT [addrt]"=&r"(addrt), +#if HAVE_LOONGSON2 #define MMI_LWX(reg, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - "lw "#reg", "#bias"(%[addrt]) \n\t" + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + "lw "#reg", "#bias"($at) \n\t" \ + ".set at \n\t" #define MMI_SWX(reg, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - "sw "#reg", "#bias"(%[addrt]) \n\t" + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + "sw "#reg", "#bias"($at) \n\t" \ + ".set at \n\t" #define MMI_LDX(reg, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - "ld "#reg", "#bias"(%[addrt]) \n\t" + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + "ld "#reg", "#bias"($at) \n\t" \ + ".set at \n\t" #define MMI_SDX(reg, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - "sd "#reg", "#bias"(%[addrt]) \n\t" + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + "sd "#reg", "#bias"($at) \n\t" \ + ".set at \n\t" #define MMI_LWC1(fp, addr, bias) \ "lwc1 "#fp", "#bias"("#addr") \n\t" #define MMI_LWLRC1(fp, addr, bias, off) \ - "lwl %[low32], "#bias"+"#off"("#addr") \n\t" \ - "lwr %[low32], "#bias"("#addr") \n\t" \ - "mtc1 %[low32], "#fp" \n\t" + ".set noat \n\t" \ + "lwl $at, "#bias"+"#off"("#addr") \n\t" \ + "lwr $at, "#bias"("#addr") \n\t" \ + "mtc1 $at, "#fp" \n\t" \ + ".set at \n\t" #define MMI_LWXC1(fp, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - MMI_LWC1(fp, %[addrt], bias) + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + MMI_LWC1(fp, $at, bias) \ + ".set at \n\t" #define MMI_SWC1(fp, addr, bias) \ "swc1 "#fp", "#bias"("#addr") \n\t" #define MMI_SWLRC1(fp, addr, bias, off) \ - "mfc1 %[low32], "#fp" \n\t" \ - "swl %[low32], "#bias"+"#off"("#addr") \n\t" \ - "swr %[low32], "#bias"("#addr") \n\t" + ".set noat \n\t" \ + "mfc1 $at, "#fp" \n\t" \ + "swl $at, "#bias"+"#off"("#addr") \n\t" \ + "swr $at, "#bias"("#addr") \n\t" \ + ".set at \n\t" #define MMI_SWXC1(fp, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - MMI_SWC1(fp, %[addrt], bias) + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + MMI_SWC1(fp, $at, bias) \ + ".set at \n\t" #define MMI_LDC1(fp, addr, bias) \ "ldc1 "#fp", "#bias"("#addr") \n\t" #define MMI_LDLRC1(fp, addr, bias, off) \ - "ldl %[all64], "#bias"+"#off"("#addr") \n\t" \ - "ldr %[all64], "#bias"("#addr") \n\t" \ - "dmtc1 %[all64], "#fp" \n\t" + ".set noat \n\t" \ + "ldl $at, "#bias"+"#off"("#addr") \n\t" \ + "ldr $at, "#bias"("#addr") \n\t" \ + "dmtc1 $at, "#fp" \n\t" \ + ".set at \n\t" #define MMI_LDXC1(fp, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - MMI_LDC1(fp, %[addrt], bias) + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + MMI_LDC1(fp, $at, bias) \ + ".set at \n\t" #define MMI_SDC1(fp, addr, bias) \ "sdc1 "#fp", "#bias"("#addr") \n\t" #define MMI_SDLRC1(fp, addr, bias, off) \ - "dmfc1 %[all64], "#fp" \n\t" \ - "sdl %[all64], "#bias"+"#off"("#addr") \n\t" \ - "sdr %[all64], "#bias"("#addr") \n\t" + ".set noat \n\t" \ + "dmfc1 $at, "#fp" \n\t" \ + "sdl $at, "#bias"+"#off"("#addr") \n\t" \ + "sdr $at, "#bias"("#addr") \n\t" \ + ".set at \n\t" #define MMI_SDXC1(fp, addr, stride, bias) \ - PTR_ADDU "%[addrt], "#addr", "#stride" \n\t" \ - MMI_SDC1(fp, %[addrt], bias) + ".set noat \n\t" \ + PTR_ADDU "$at, "#addr", "#stride" \n\t" \ + MMI_SDC1(fp, $at, bias) \ + ".set at \n\t" #define MMI_LQ(reg1, reg2, addr, bias) \ "ld "#reg1", "#bias"("#addr") \n\t" \ @@ -118,11 +147,6 @@ #elif HAVE_LOONGSON3 /* !HAVE_LOONGSON2 */ -#define DECLARE_VAR_ALL64 -#define RESTRICT_ASM_ALL64 -#define DECLARE_VAR_ADDRT -#define RESTRICT_ASM_ADDRT - #define MMI_LWX(reg, addr, stride, bias) \ "gslwx "#reg", "#bias"("#addr", "#stride") \n\t" @@ -140,13 +164,12 @@ #if _MIPS_SIM == _ABIO32 /* workaround for 3A2000 gslwlc1 bug */ -#define DECLARE_VAR_LOW32 int32_t low32 -#define RESTRICT_ASM_LOW32 [low32]"=&r"(low32), - #define MMI_LWLRC1(fp, addr, bias, off) \ - "lwl %[low32], "#bias"+"#off"("#addr") \n\t" \ - "lwr %[low32], "#bias"("#addr") \n\t" \ - "mtc1 %[low32], "#fp" \n\t" + ".set noat \n\t" \ + "lwl $at, "#bias"+"#off"("#addr") \n\t" \ + "lwr $at, "#bias"("#addr") \n\t" \ + "mtc1 $at, "#fp" \n\t" \ + ".set at \n\t" #else /* _MIPS_SIM != _ABIO32 */