From patchwork Mon Sep 17 09:20:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyou Yin X-Patchwork-Id: 10340 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:12c4:0:0:0:0:0 with SMTP id 65-v6csp3313102jap; Mon, 17 Sep 2018 02:20:46 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYT8BGW5S8vwhoF6pgGRvRclSQcddQOiFGM/Joa1uCe4sS+vejNVvjybDeQdGYC+O5lEPMv X-Received: by 2002:a1c:8e81:: with SMTP id q123-v6mr10427213wmd.56.1537176046106; Mon, 17 Sep 2018 02:20:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537176046; cv=none; d=google.com; s=arc-20160816; b=uAoOvSCybkmAY4enFcNe0o5v7aEQO/JceaM5ZlrkiL9uOITHDi6WI1OUVpBGW+pjaQ O5+vQx74/b7L8k3qQ8LmyC++rE13inEGVgYPy1WfLVSZ+anyvtn6JzNwPdfpBazFxzdu Sdvub5tJhEfl7JtRncOHPLnslen25tmkv2mTvG5HmdXitoSAD95FXA6YG1CelDSWv5Kl 4Ml2ZshtlsXNv/5NirUCMQXskaGA+QAjSgBGCE1hXdJ9IiUFj80dnRTI2STrE8dfyoAI 4lFa1M41ADe2NPZZLNPS5HyywKP5y14y5lI9IQ1K9YK1en1qqwokGe2J4c+DDQKTuPIS 76DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:delivered-to; bh=neAI2PgONSaR8CCB9514gJbcOnpquj3De1s10Opw6qc=; b=kIZk/uvh7tv4hG72VabNw3vKHSzHhPj7rVqZYFH3pXQo/qjgz+ko1H157FJkxykuyK qC+xWRIuw8bgZ+X0kLrMhg5c9v2PIF7qee2K8Epg7KxsToNJorBrj9T1Cbshn9NNeUz0 v0j4BSJ1dVCZlQ7HK+1hazY6D0eFIMlSNRhLQvTJoJQxZuvHD1Fjzc9gwRvFJIeGbcGe La50B/HTgyJAwpTpUQPC2672KkLXi7d0Ifd8YuhMWk5uvRVyeuNKtN3yDIG3LlXSUkpA aF9Iez7TLAmuG6vqynwsyqi0DHgF3S7VNhr+KLn/dqEqGHlLDP3f8ePyYo1HksVxZfQi KiXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 80-v6si6336503wmi.124.2018.09.17.02.20.45; Mon, 17 Sep 2018 02:20:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E3C4C68A6E5; Mon, 17 Sep 2018 12:20:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B4E1E68A577 for ; Mon, 17 Sep 2018 12:20:22 +0300 (EEST) Received: from localhost (unknown [210.45.123.188]) by mail (Coremail) with SMTP id QMiowPCxSeWqcZ9bH90eAA--.5605S3; Mon, 17 Sep 2018 17:19:38 +0800 (CST) From: Shiyou Yin To: ffmpeg-devel@ffmpeg.org Date: Mon, 17 Sep 2018 17:20:28 +0800 Message-Id: <1537176028-24352-1-git-send-email-yinshiyou-hf@loongson.cn> X-Mailer: git-send-email 2.1.0 X-CM-TRANSID: QMiowPCxSeWqcZ9bH90eAA--.5605S3 X-Coremail-Antispam: 1UD129KBjvAXoWfGw1UWF4UAr4DXF17uw47twb_yoW8Wr1kCo WfGr10gas7J3WxZr4UAr1UGr4ayF1jvay7Jw4rtwnxKr1kJr1UCw1rCr45Gr4xXw4xXryf AF1jqr13Z3W5Ga18n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYn7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4 AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26r1I6r4UM28EF7xvwVC0I7IYx2IY6xkF 7I0E14v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7 CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8C rVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4 IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjcxG0xvY0x0EwIxGrVCF72vEw4AK0wCY 02Avz4vE14v_GFWl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1j 6r15MIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r 4UMIIF0xvE42xK8VAvwI8IcIk0rVW3JVWrJr1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAI cVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxUx2Q6DUUUU X-CM-SenderInfo: p1lq2x5l1r3gtki6z05rqj20fqof0/ Subject: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] refine ff_vc1_inv_trans_8x8_mmi. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Combined 1st and 2nd loop into one inline asm in function ff_vc1_inv_trans_8x8_mmi to reduce memory operation, and made some small optimization in ff_vc1_inv_trans_4x8_mmi. --- libavcodec/mips/vc1dsp_mmi.c | 227 ++++++++++++++++++------------------------- 1 file changed, 94 insertions(+), 133 deletions(-) diff --git a/libavcodec/mips/vc1dsp_mmi.c b/libavcodec/mips/vc1dsp_mmi.c index 80778a5..db314de 100644 --- a/libavcodec/mips/vc1dsp_mmi.c +++ b/libavcodec/mips/vc1dsp_mmi.c @@ -30,10 +30,10 @@ #define VC1_INV_TRANCS_8_TYPE1(o1, o2, r1, r2, r3, r4, c0) \ "li %[tmp0], "#r1" \n\t" \ "mtc1 %[tmp0], %[ftmp13] \n\t" \ - "pshufh %[ftmp13], %[ftmp13], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" \ "li %[tmp0], "#r2" \n\t" \ "mtc1 %[tmp0], %[ftmp14] \n\t" \ - "pshufh %[ftmp14], %[ftmp14], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp14], %[ftmp14], %[ftmp14] \n\t" \ "pmaddhw %[ftmp1], %[ftmp5], %[ftmp13] \n\t" \ "pmaddhw %[ftmp2], %[ftmp7], %[ftmp14] \n\t" \ "paddw %[ftmp1], %[ftmp1], %[ftmp2] \n\t" \ @@ -43,10 +43,10 @@ \ "li %[tmp0], "#r3" \n\t" \ "mtc1 %[tmp0], %[ftmp13] \n\t" \ - "pshufh %[ftmp13], %[ftmp13], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" \ "li %[tmp0], "#r4" \n\t" \ "mtc1 %[tmp0], %[ftmp14] \n\t" \ - "pshufh %[ftmp14], %[ftmp14], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp14], %[ftmp14], %[ftmp14] \n\t" \ "pmaddhw %[ftmp3], %[ftmp9], %[ftmp13] \n\t" \ "pmaddhw %[ftmp4], %[ftmp11], %[ftmp14] \n\t" \ "paddw %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \ @@ -54,14 +54,12 @@ "pmaddhw %[ftmp13], %[ftmp12], %[ftmp14] \n\t" \ "paddw %[ftmp4], %[ftmp4], %[ftmp13] \n\t" \ \ + "paddw %[ftmp1], %[ftmp1], "#c0" \n\t" \ + "paddw %[ftmp2], %[ftmp2], "#c0" \n\t" \ "paddw %[ftmp13], %[ftmp1], %[ftmp3] \n\t" \ "psubw %[ftmp14], %[ftmp1], %[ftmp3] \n\t" \ "paddw %[ftmp1], %[ftmp2], %[ftmp4] \n\t" \ "psubw %[ftmp3], %[ftmp2], %[ftmp4] \n\t" \ - "paddw %[ftmp13], %[ftmp13], "#c0" \n\t" \ - "paddw %[ftmp14], %[ftmp14], "#c0" \n\t" \ - "paddw %[ftmp1], %[ftmp1], "#c0" \n\t" \ - "paddw %[ftmp3], %[ftmp3], "#c0" \n\t" \ "psraw %[ftmp13], %[ftmp13], %[ftmp0] \n\t" \ "psraw %[ftmp1], %[ftmp1], %[ftmp0] \n\t" \ "psraw %[ftmp14], %[ftmp14], %[ftmp0] \n\t" \ @@ -76,10 +74,10 @@ #define VC1_INV_TRANCS_8_TYPE2(o1, o2, r1, r2, r3, r4, c0, c1) \ "li %[tmp0], "#r1" \n\t" \ "mtc1 %[tmp0], %[ftmp13] \n\t" \ - "pshufh %[ftmp13], %[ftmp13], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" \ "li %[tmp0], "#r2" \n\t" \ "mtc1 %[tmp0], %[ftmp14] \n\t" \ - "pshufh %[ftmp14], %[ftmp14], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp14], %[ftmp14], %[ftmp14] \n\t" \ "pmaddhw %[ftmp1], %[ftmp5], %[ftmp13] \n\t" \ "pmaddhw %[ftmp2], %[ftmp7], %[ftmp14] \n\t" \ "paddw %[ftmp1], %[ftmp1], %[ftmp2] \n\t" \ @@ -89,10 +87,10 @@ \ "li %[tmp0], "#r3" \n\t" \ "mtc1 %[tmp0], %[ftmp13] \n\t" \ - "pshufh %[ftmp13], %[ftmp13], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp13], %[ftmp13], %[ftmp13] \n\t" \ "li %[tmp0], "#r4" \n\t" \ "mtc1 %[tmp0], %[ftmp14] \n\t" \ - "pshufh %[ftmp14], %[ftmp14], %[ftmp23] \n\t" \ + "punpcklwd %[ftmp14], %[ftmp14], %[ftmp14] \n\t" \ "pmaddhw %[ftmp3], %[ftmp9], %[ftmp13] \n\t" \ "pmaddhw %[ftmp4], %[ftmp11], %[ftmp14] \n\t" \ "paddw %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \ @@ -200,36 +198,32 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) DECLARE_ALIGNED(8, const uint64_t, ff_pw_1_local) = {0x0000000100000001ULL}; DECLARE_ALIGNED(8, const uint64_t, ff_pw_4_local) = {0x0000000400000004ULL}; DECLARE_ALIGNED(8, const uint64_t, ff_pw_64_local)= {0x0000004000000040ULL}; - int16_t *src = block; - int16_t *dst = temp; - double ftmp[24]; + double ftmp[23]; uint64_t tmp[1]; - // 1st loop __asm__ volatile ( + /* 1st loop: start */ "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp0] \n\t" - "li %[tmp0], 0x44 \n\t" - "mtc1 %[tmp0], %[ftmp23] \n\t" // 1st part - MMI_LDC1(%[ftmp1], %[src], 0x00) - MMI_LDC1(%[ftmp2], %[src], 0x20) - MMI_LDC1(%[ftmp3], %[src], 0x40) - MMI_LDC1(%[ftmp4], %[src], 0x60) + MMI_LDC1(%[ftmp1], %[block], 0x00) + MMI_LDC1(%[ftmp11], %[block], 0x10) + MMI_LDC1(%[ftmp2], %[block], 0x20) + MMI_LDC1(%[ftmp12], %[block], 0x30) + MMI_LDC1(%[ftmp3], %[block], 0x40) + MMI_LDC1(%[ftmp13], %[block], 0x50) + MMI_LDC1(%[ftmp4], %[block], 0x60) + MMI_LDC1(%[ftmp14], %[block], 0x70) "punpcklhw %[ftmp5], %[ftmp1], %[ftmp2] \n\t" "punpckhhw %[ftmp6], %[ftmp1], %[ftmp2] \n\t" "punpcklhw %[ftmp7], %[ftmp3], %[ftmp4] \n\t" "punpckhhw %[ftmp8], %[ftmp3], %[ftmp4] \n\t" - MMI_LDC1(%[ftmp1], %[src], 0x10) - MMI_LDC1(%[ftmp2], %[src], 0x30) - MMI_LDC1(%[ftmp3], %[src], 0x50) - MMI_LDC1(%[ftmp4], %[src], 0x70) - "punpcklhw %[ftmp9], %[ftmp1], %[ftmp2] \n\t" - "punpckhhw %[ftmp10], %[ftmp1], %[ftmp2] \n\t" - "punpcklhw %[ftmp11], %[ftmp3], %[ftmp4] \n\t" - "punpckhhw %[ftmp12], %[ftmp3], %[ftmp4] \n\t" + "punpcklhw %[ftmp9], %[ftmp11], %[ftmp12] \n\t" + "punpckhhw %[ftmp10], %[ftmp11], %[ftmp12] \n\t" + "punpcklhw %[ftmp11], %[ftmp13], %[ftmp14] \n\t" + "punpckhhw %[ftmp12], %[ftmp13], %[ftmp14] \n\t" /* ftmp15:dst03,dst02,dst01,dst00 ftmp22:dst73,dst72,dst71,dst70 */ VC1_INV_TRANCS_8_TYPE1(%[ftmp15], %[ftmp22], 0x0010000c, 0x0006000c, @@ -250,37 +244,36 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) TRANSPOSE_4H(%[ftmp15], %[ftmp16], %[ftmp17], %[ftmp18], %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) - MMI_SDC1(%[ftmp15], %[dst], 0x00) - MMI_SDC1(%[ftmp16], %[dst], 0x10) - MMI_SDC1(%[ftmp17], %[dst], 0x20) - MMI_SDC1(%[ftmp18], %[dst], 0x30) - TRANSPOSE_4H(%[ftmp19], %[ftmp20], %[ftmp21], %[ftmp22], %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) - MMI_SDC1(%[ftmp19], %[dst], 0x08) - MMI_SDC1(%[ftmp20], %[dst], 0x18) - MMI_SDC1(%[ftmp21], %[dst], 0x28) - MMI_SDC1(%[ftmp22], %[dst], 0x38) + MMI_SDC1(%[ftmp15], %[temp], 0x00) + MMI_SDC1(%[ftmp19], %[temp], 0x08) + MMI_SDC1(%[ftmp16], %[temp], 0x10) + MMI_SDC1(%[ftmp20], %[temp], 0x18) + MMI_SDC1(%[ftmp17], %[temp], 0x20) + MMI_SDC1(%[ftmp21], %[temp], 0x28) + MMI_SDC1(%[ftmp18], %[temp], 0x30) + MMI_SDC1(%[ftmp22], %[temp], 0x38) // 2nd part - MMI_LDC1(%[ftmp1], %[src], 0x08) - MMI_LDC1(%[ftmp2], %[src], 0x28) - MMI_LDC1(%[ftmp3], %[src], 0x48) - MMI_LDC1(%[ftmp4], %[src], 0x68) + MMI_LDC1(%[ftmp1], %[block], 0x08) + MMI_LDC1(%[ftmp11], %[block], 0x18) + MMI_LDC1(%[ftmp2], %[block], 0x28) + MMI_LDC1(%[ftmp12], %[block], 0x38) + MMI_LDC1(%[ftmp3], %[block], 0x48) + MMI_LDC1(%[ftmp13], %[block], 0x58) + MMI_LDC1(%[ftmp4], %[block], 0x68) + MMI_LDC1(%[ftmp14], %[block], 0x78) "punpcklhw %[ftmp5], %[ftmp1], %[ftmp2] \n\t" "punpckhhw %[ftmp6], %[ftmp1], %[ftmp2] \n\t" "punpcklhw %[ftmp7], %[ftmp3], %[ftmp4] \n\t" "punpckhhw %[ftmp8], %[ftmp3], %[ftmp4] \n\t" - MMI_LDC1(%[ftmp1], %[src], 0x18) - MMI_LDC1(%[ftmp2], %[src], 0x38) - MMI_LDC1(%[ftmp3], %[src], 0x58) - MMI_LDC1(%[ftmp4], %[src], 0x78) - "punpcklhw %[ftmp9], %[ftmp1], %[ftmp2] \n\t" - "punpckhhw %[ftmp10], %[ftmp1], %[ftmp2] \n\t" - "punpcklhw %[ftmp11], %[ftmp3], %[ftmp4] \n\t" - "punpckhhw %[ftmp12], %[ftmp3], %[ftmp4] \n\t" + "punpcklhw %[ftmp9], %[ftmp11], %[ftmp12] \n\t" + "punpckhhw %[ftmp10], %[ftmp11], %[ftmp12] \n\t" + "punpcklhw %[ftmp11], %[ftmp13], %[ftmp14] \n\t" + "punpckhhw %[ftmp12], %[ftmp13], %[ftmp14] \n\t" /* ftmp15:dst03,dst02,dst01,dst00 ftmp22:dst73,dst72,dst71,dst70 */ VC1_INV_TRANCS_8_TYPE1(%[ftmp15], %[ftmp22], 0x0010000c, 0x0006000c, @@ -301,64 +294,33 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) TRANSPOSE_4H(%[ftmp15], %[ftmp16], %[ftmp17], %[ftmp18], %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) - MMI_SDC1(%[ftmp15], %[dst], 0x40) - MMI_SDC1(%[ftmp16], %[dst], 0x50) - MMI_SDC1(%[ftmp17], %[dst], 0x60) - MMI_SDC1(%[ftmp18], %[dst], 0x70) - TRANSPOSE_4H(%[ftmp19], %[ftmp20], %[ftmp21], %[ftmp22], %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) - MMI_SDC1(%[ftmp19], %[dst], 0x48) - MMI_SDC1(%[ftmp20], %[dst], 0x58) - MMI_SDC1(%[ftmp21], %[dst], 0x68) - MMI_SDC1(%[ftmp22], %[dst], 0x78) + MMI_SDC1(%[ftmp19], %[temp], 0x48) + MMI_SDC1(%[ftmp20], %[temp], 0x58) + MMI_SDC1(%[ftmp21], %[temp], 0x68) + MMI_SDC1(%[ftmp22], %[temp], 0x78) + /* 1st loop: end */ - : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), - [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), - [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), - [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [ftmp8]"=&f"(ftmp[8]), [ftmp9]"=&f"(ftmp[9]), - [ftmp10]"=&f"(ftmp[10]), [ftmp11]"=&f"(ftmp[11]), - [ftmp12]"=&f"(ftmp[12]), [ftmp13]"=&f"(ftmp[13]), - [ftmp14]"=&f"(ftmp[14]), [ftmp15]"=&f"(ftmp[15]), - [ftmp16]"=&f"(ftmp[16]), [ftmp17]"=&f"(ftmp[17]), - [ftmp18]"=&f"(ftmp[18]), [ftmp19]"=&f"(ftmp[19]), - [ftmp20]"=&f"(ftmp[20]), [ftmp21]"=&f"(ftmp[21]), - [ftmp22]"=&f"(ftmp[22]), [ftmp23]"=&f"(ftmp[23]), - [tmp0]"=&r"(tmp[0]) - : [ff_pw_4]"f"(ff_pw_4_local), [src]"r"(src), [dst]"r"(dst) - : "memory" - ); - - src = temp; - dst = block; - - // 2nd loop - __asm__ volatile ( + /* 2nd loop: start */ "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp0] \n\t" - "li %[tmp0], 0x44 \n\t" - "mtc1 %[tmp0], %[ftmp23] \n\t" // 1st part - MMI_LDC1(%[ftmp1], %[src], 0x00) - MMI_LDC1(%[ftmp2], %[src], 0x20) - MMI_LDC1(%[ftmp3], %[src], 0x40) - MMI_LDC1(%[ftmp4], %[src], 0x60) + MMI_LDC1(%[ftmp1], %[temp], 0x00) + MMI_LDC1(%[ftmp11], %[temp], 0x10) + MMI_LDC1(%[ftmp2], %[temp], 0x20) + MMI_LDC1(%[ftmp12], %[temp], 0x30) "punpcklhw %[ftmp5], %[ftmp1], %[ftmp2] \n\t" "punpckhhw %[ftmp6], %[ftmp1], %[ftmp2] \n\t" - "punpcklhw %[ftmp7], %[ftmp3], %[ftmp4] \n\t" - "punpckhhw %[ftmp8], %[ftmp3], %[ftmp4] \n\t" + "punpcklhw %[ftmp7], %[ftmp15], %[ftmp17] \n\t" + "punpckhhw %[ftmp8], %[ftmp15], %[ftmp17] \n\t" - MMI_LDC1(%[ftmp1], %[src], 0x10) - MMI_LDC1(%[ftmp2], %[src], 0x30) - MMI_LDC1(%[ftmp3], %[src], 0x50) - MMI_LDC1(%[ftmp4], %[src], 0x70) - "punpcklhw %[ftmp9], %[ftmp1], %[ftmp2] \n\t" - "punpckhhw %[ftmp10], %[ftmp1], %[ftmp2] \n\t" - "punpcklhw %[ftmp11], %[ftmp3], %[ftmp4] \n\t" - "punpckhhw %[ftmp12], %[ftmp3], %[ftmp4] \n\t" + "punpcklhw %[ftmp9], %[ftmp11], %[ftmp12] \n\t" + "punpckhhw %[ftmp10], %[ftmp11], %[ftmp12] \n\t" + "punpcklhw %[ftmp11], %[ftmp16], %[ftmp18] \n\t" + "punpckhhw %[ftmp12], %[ftmp16], %[ftmp18] \n\t" /* ftmp15:dst03,dst02,dst01,dst00 ftmp22:dst73,dst72,dst71,dst70 */ VC1_INV_TRANCS_8_TYPE2(%[ftmp15], %[ftmp22], 0x0010000c, 0x0006000c, @@ -376,33 +338,33 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) VC1_INV_TRANCS_8_TYPE2(%[ftmp18], %[ftmp19], 0xfff0000c, 0xfffa000c, 0xfff70004, 0xfff0000f, %[ff_pw_64], %[ff_pw_1]) - MMI_SDC1(%[ftmp15], %[dst], 0x00) - MMI_SDC1(%[ftmp16], %[dst], 0x10) - MMI_SDC1(%[ftmp17], %[dst], 0x20) - MMI_SDC1(%[ftmp18], %[dst], 0x30) - MMI_SDC1(%[ftmp19], %[dst], 0x40) - MMI_SDC1(%[ftmp20], %[dst], 0x50) - MMI_SDC1(%[ftmp21], %[dst], 0x60) - MMI_SDC1(%[ftmp22], %[dst], 0x70) + MMI_SDC1(%[ftmp15], %[block], 0x00) + MMI_SDC1(%[ftmp16], %[block], 0x10) + MMI_SDC1(%[ftmp17], %[block], 0x20) + MMI_SDC1(%[ftmp18], %[block], 0x30) + MMI_SDC1(%[ftmp19], %[block], 0x40) + MMI_SDC1(%[ftmp20], %[block], 0x50) + MMI_SDC1(%[ftmp21], %[block], 0x60) + MMI_SDC1(%[ftmp22], %[block], 0x70) // 2nd part - MMI_LDC1(%[ftmp1], %[src], 0x08) - MMI_LDC1(%[ftmp2], %[src], 0x28) - MMI_LDC1(%[ftmp3], %[src], 0x48) - MMI_LDC1(%[ftmp4], %[src], 0x68) + MMI_LDC1(%[ftmp1], %[temp], 0x08) + MMI_LDC1(%[ftmp11], %[temp], 0x18) + MMI_LDC1(%[ftmp2], %[temp], 0x28) + MMI_LDC1(%[ftmp12], %[temp], 0x38) + MMI_LDC1(%[ftmp3], %[temp], 0x48) + MMI_LDC1(%[ftmp13], %[temp], 0x58) + MMI_LDC1(%[ftmp4], %[temp], 0x68) + MMI_LDC1(%[ftmp14], %[temp], 0x78) "punpcklhw %[ftmp5], %[ftmp1], %[ftmp2] \n\t" "punpckhhw %[ftmp6], %[ftmp1], %[ftmp2] \n\t" "punpcklhw %[ftmp7], %[ftmp3], %[ftmp4] \n\t" "punpckhhw %[ftmp8], %[ftmp3], %[ftmp4] \n\t" - MMI_LDC1(%[ftmp1], %[src], 0x18) - MMI_LDC1(%[ftmp2], %[src], 0x38) - MMI_LDC1(%[ftmp3], %[src], 0x58) - MMI_LDC1(%[ftmp4], %[src], 0x78) - "punpcklhw %[ftmp9], %[ftmp1], %[ftmp2] \n\t" - "punpckhhw %[ftmp10], %[ftmp1], %[ftmp2] \n\t" - "punpcklhw %[ftmp11], %[ftmp3], %[ftmp4] \n\t" - "punpckhhw %[ftmp12], %[ftmp3], %[ftmp4] \n\t" + "punpcklhw %[ftmp9], %[ftmp11], %[ftmp12] \n\t" + "punpckhhw %[ftmp10], %[ftmp11], %[ftmp12] \n\t" + "punpcklhw %[ftmp11], %[ftmp13], %[ftmp14] \n\t" + "punpckhhw %[ftmp12], %[ftmp13], %[ftmp14] \n\t" /* ftmp15:dst03,dst02,dst01,dst00 ftmp22:dst73,dst72,dst71,dst70 */ VC1_INV_TRANCS_8_TYPE2(%[ftmp15], %[ftmp22], 0x0010000c, 0x0006000c, @@ -420,15 +382,15 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) VC1_INV_TRANCS_8_TYPE2(%[ftmp18], %[ftmp19], 0xfff0000c, 0xfffa000c, 0xfff70004, 0xfff0000f, %[ff_pw_64], %[ff_pw_1]) - MMI_SDC1(%[ftmp15], %[dst], 0x08) - MMI_SDC1(%[ftmp16], %[dst], 0x18) - MMI_SDC1(%[ftmp17], %[dst], 0x28) - MMI_SDC1(%[ftmp18], %[dst], 0x38) - MMI_SDC1(%[ftmp19], %[dst], 0x48) - MMI_SDC1(%[ftmp20], %[dst], 0x58) - MMI_SDC1(%[ftmp21], %[dst], 0x68) - MMI_SDC1(%[ftmp22], %[dst], 0x78) - + MMI_SDC1(%[ftmp15], %[block], 0x08) + MMI_SDC1(%[ftmp16], %[block], 0x18) + MMI_SDC1(%[ftmp17], %[block], 0x28) + MMI_SDC1(%[ftmp18], %[block], 0x38) + MMI_SDC1(%[ftmp19], %[block], 0x48) + MMI_SDC1(%[ftmp20], %[block], 0x58) + MMI_SDC1(%[ftmp21], %[block], 0x68) + MMI_SDC1(%[ftmp22], %[block], 0x78) + /* 2nd loop: end */ : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), @@ -440,10 +402,11 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) [ftmp16]"=&f"(ftmp[16]), [ftmp17]"=&f"(ftmp[17]), [ftmp18]"=&f"(ftmp[18]), [ftmp19]"=&f"(ftmp[19]), [ftmp20]"=&f"(ftmp[20]), [ftmp21]"=&f"(ftmp[21]), - [ftmp22]"=&f"(ftmp[22]), [ftmp23]"=&f"(ftmp[23]), + [ftmp22]"=&f"(ftmp[22]), [tmp0]"=&r"(tmp[0]) : [ff_pw_1]"f"(ff_pw_1_local), [ff_pw_64]"f"(ff_pw_64_local), - [src]"r"(src), [dst]"r"(dst) + [ff_pw_4]"f"(ff_pw_4_local), [block]"r"(block), + [temp]"r"(temp) : "memory" ); } @@ -978,7 +941,7 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) { int16_t *src = block; int16_t *dst = block; - double ftmp[24]; + double ftmp[23]; uint32_t count = 8, tmp[1]; int16_t coeff[16] = {17, 22, 17, 10, 17, 10,-17,-22, @@ -1042,8 +1005,6 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) __asm__ volatile ( "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp0] \n\t" - "li %[tmp0], 0x44 \n\t" - "mtc1 %[tmp0], %[ftmp23] \n\t" MMI_LDC1(%[ftmp1], %[src], 0x00) MMI_LDC1(%[ftmp2], %[src], 0x20) @@ -1149,7 +1110,7 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp16]"=&f"(ftmp[16]), [ftmp17]"=&f"(ftmp[17]), [ftmp18]"=&f"(ftmp[18]), [ftmp19]"=&f"(ftmp[19]), [ftmp20]"=&f"(ftmp[20]), [ftmp21]"=&f"(ftmp[21]), - [ftmp22]"=&f"(ftmp[22]), [ftmp23]"=&f"(ftmp[23]), + [ftmp22]"=&f"(ftmp[22]), [tmp0]"=&r"(tmp[0]) : [ff_pw_1]"f"(ff_pw_1_local), [ff_pw_64]"f"(ff_pw_64_local), [src]"r"(src), [dest]"r"(dest), [linesize]"r"(linesize)