From patchwork Thu Sep 6 08:10:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyou Yin X-Patchwork-Id: 10247 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:12c4:0:0:0:0:0 with SMTP id 65-v6csp207685jap; Thu, 6 Sep 2018 01:11:19 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZqUfrW52M7L+uOH3un8xyM3DEgLF9iOufpN79gH2eLMvmhyXKgBbAdJ/EOU8v0l41Fwnnj X-Received: by 2002:a1c:2283:: with SMTP id i125-v6mr1394617wmi.28.1536221478995; Thu, 06 Sep 2018 01:11:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536221478; cv=none; d=google.com; s=arc-20160816; b=vk/+Q6PAnFmudKxUTJ44/lP6sVD/nWq2BGh1YEQuhu/JT8BM1XMoKGpSYsYm/VpanC wfI6AGuXk3RGPdkPHXs2cQXHifMTcPyowsn9404kYCYjqENzb6421uUxGtSo6i9tkufn hfLEw5ZplwXVUtO1q/ltW2e/oGbzWDpsuch1cSdljzGo3bw+CFoNbSuVpjqK1mMiZ9I2 mwKJxx1xUWtyM9E2WCoKlFmPWVu6F2DWW4KwYwt/zHWzmuZR7WY3yNQ9kENTzSjLjc42 6+B30AWzJYLuCzxFLVdibuawPj0J2ZHT41QO/G4PXKvrKrCLqEWeLXzYcGIENd5RCs9S 4XhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=YTFc20DplUgfVibznkNs7fa1SL5tJcAcYkza9GrZ7uU=; b=I1KdmqHibUQ0ztTfR1pEHTxUeMQe6u8QLtQnEf/l1J34nrZ72qwKfUpE/74KWwB8PB Yn5yPL6SP+f4Luf+2tygAzLUxMxh89tAcVjiUI8V8QQnPkhE6OGbsb+KdaQvmfPTauEU ckEHjH/ObLIQXuY4F4+c3H0Hy75+lI1UhZN3Yq9omOpvtpF2K8QB9kz+8kxEozQD35oR Ksd6K5WfeX+K8KWBwR5AgFhzI4aedd4YROUBNMch7uIss3psbCjzD3mjS+bYJeV1T/eO Ci/Oju9hq2AU3k3fiv1j/QVFNa3mniGIFPJZxZlIUnZ7TgbY2l4OUmKMVkQhAVSyTbzO Vcsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m64-v6si3688827wmh.6.2018.09.06.01.11.18; Thu, 06 Sep 2018 01:11:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 94E85689FB3; Thu, 6 Sep 2018 11:10:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7582C689F26 for ; Thu, 6 Sep 2018 11:10:49 +0300 (EEST) Received: from localhost (unknown [210.45.123.188]) by mail (Coremail) with SMTP id QMiowPDx_3Pv4JBbTNQPAA--.61553S3; Thu, 06 Sep 2018 16:10:23 +0800 (CST) From: Shiyou Yin To: ffmpeg-devel@ffmpeg.org Date: Thu, 6 Sep 2018 16:10:53 +0800 Message-Id: <1536221453-15372-2-git-send-email-yinshiyou-hf@loongson.cn> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1536221453-15372-1-git-send-email-yinshiyou-hf@loongson.cn> References: <1536221453-15372-1-git-send-email-yinshiyou-hf@loongson.cn> X-CM-TRANSID: QMiowPDx_3Pv4JBbTNQPAA--.61553S3 X-Coremail-Antispam: 1UD129KBjvJXoWfGw1fZF1xWr1UJr4DKFWfXwb_yoWDtr17pr WfCFsav3W8tF4xWw17trs3Jr17uFnxA3WfGa9Fqa48XwnIq3Z7tr97JFy7CFy3KF45Xa47 Zrn0y3W3Ar48uaUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkab4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x 0267AKxVWxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8C rVC2j2WlYx0E2Ix0cI8IcVAFwI0_JF0_Jw1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4 IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjcxG0xvY0x0EwIxGrVCF72vEw4AK0wCY 02Avz4vE14v_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1j 6r15MIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r 4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY 6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j2XdbUUUUU= X-CM-SenderInfo: p1lq2x5l1r3gtki6z05rqj20fqof0/ Subject: [FFmpeg-devel] [PATCH 2/2] avutil/mips: [loongson] simplify macro TRANSPOSE_4H and TRANSPOSE_8B X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Simplify macro TRANSPOSE_4H in mmiutils.h and add TRANSPOSE_8B as a common macro. --- libavcodec/mips/vc1dsp_mmi.c | 12 +++---- libavcodec/mips/vp8dsp_mmi.c | 72 +++++-------------------------------- libavutil/mips/mmiutils.h | 84 ++++++++++++++++++++++++++++---------------- 3 files changed, 65 insertions(+), 103 deletions(-) diff --git a/libavcodec/mips/vc1dsp_mmi.c b/libavcodec/mips/vc1dsp_mmi.c index a439b40..80778a5 100644 --- a/libavcodec/mips/vc1dsp_mmi.c +++ b/libavcodec/mips/vc1dsp_mmi.c @@ -248,8 +248,7 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) 0xfff70004, 0xfff0000f, %[ff_pw_4]) TRANSPOSE_4H(%[ftmp15], %[ftmp16], %[ftmp17], %[ftmp18], - %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4], - %[ftmp5], %[tmp0], %[ftmp6], %[ftmp7]) + %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) MMI_SDC1(%[ftmp15], %[dst], 0x00) MMI_SDC1(%[ftmp16], %[dst], 0x10) @@ -257,8 +256,7 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) MMI_SDC1(%[ftmp18], %[dst], 0x30) TRANSPOSE_4H(%[ftmp19], %[ftmp20], %[ftmp21], %[ftmp22], - %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4], - %[ftmp5], %[tmp0], %[ftmp6], %[ftmp7]) + %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) MMI_SDC1(%[ftmp19], %[dst], 0x08) MMI_SDC1(%[ftmp20], %[dst], 0x18) @@ -301,8 +299,7 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) 0xfff70004, 0xfff0000f, %[ff_pw_4]) TRANSPOSE_4H(%[ftmp15], %[ftmp16], %[ftmp17], %[ftmp18], - %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4], - %[ftmp5], %[tmp0], %[ftmp6], %[ftmp7]) + %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) MMI_SDC1(%[ftmp15], %[dst], 0x40) MMI_SDC1(%[ftmp16], %[dst], 0x50) @@ -310,8 +307,7 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) MMI_SDC1(%[ftmp18], %[dst], 0x70) TRANSPOSE_4H(%[ftmp19], %[ftmp20], %[ftmp21], %[ftmp22], - %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4], - %[ftmp5], %[tmp0], %[ftmp6], %[ftmp7]) + %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) MMI_SDC1(%[ftmp19], %[dst], 0x48) MMI_SDC1(%[ftmp20], %[dst], 0x58) diff --git a/libavcodec/mips/vp8dsp_mmi.c b/libavcodec/mips/vp8dsp_mmi.c index b24a87a..bd80aa1 100644 --- a/libavcodec/mips/vp8dsp_mmi.c +++ b/libavcodec/mips/vp8dsp_mmi.c @@ -44,58 +44,6 @@ "punpcklbh "#dst_r", "#src", %[db_2] \n\t" \ "punpckhbh "#dst_l", "#src", %[db_2] \n\t" -#define MMI_TRANSPOSE8x8_UB_UB(src_0, src_1, src_2, src_3, \ - src_4, src_5, src_6, src_7, \ - dst_0, dst_1, dst_2, dst_3, \ - dst_4, dst_5, dst_6, dst_7) \ - "li %[it_1], 0xe4 \n\t" \ - "dmtc1 %[it_1], %[db_1] \n\t" \ - "pshufh %[db_2], "#src_0", %[db_1] \n\t" \ - "punpcklbh "#dst_0", "#src_0", "#src_1" \n\t" \ - "punpckhbh "#dst_1", %[db_2], "#src_1" \n\t" \ - "pshufh %[db_2], "#src_2", %[db_1] \n\t" \ - "punpcklbh "#dst_2", "#src_2", "#src_3" \n\t" \ - "punpckhbh "#dst_3", %[db_2], "#src_3" \n\t" \ - "pshufh %[db_2], "#src_4", %[db_1] \n\t" \ - "punpcklbh "#dst_4", "#src_4", "#src_5" \n\t" \ - "punpckhbh "#dst_5", %[db_2], "#src_5" \n\t" \ - "pshufh %[db_2], "#src_6", %[db_1] \n\t" \ - "punpcklbh "#dst_6", "#src_6", "#src_7" \n\t" \ - "punpckhbh "#dst_7", %[db_2], "#src_7" \n\t" \ - \ - "pshufh %[db_2], "#dst_0", %[db_1] \n\t" \ - "punpcklhw "#dst_0", "#dst_0", "#dst_2" \n\t" \ - "punpckhhw "#dst_2", %[db_2], "#dst_2" \n\t" \ - "pshufh %[db_2], "#dst_1", %[db_1] \n\t" \ - "punpcklhw "#dst_1", "#dst_1", "#dst_3" \n\t" \ - "punpckhhw "#dst_3", %[db_2], "#dst_3" \n\t" \ - "pshufh %[db_2], "#dst_4", %[db_1] \n\t" \ - "punpcklhw "#dst_4", "#dst_4", "#dst_6" \n\t" \ - "punpckhhw "#dst_6", %[db_2], "#dst_6" \n\t" \ - "pshufh %[db_2], "#dst_5", %[db_1] \n\t" \ - "punpcklhw "#dst_5", "#dst_5", "#dst_7" \n\t" \ - "punpckhhw "#dst_7", %[db_2], "#dst_7" \n\t" \ - \ - "pshufh %[db_2], "#dst_0", %[db_1] \n\t" \ - "punpcklwd "#dst_0", "#dst_0", "#dst_4" \n\t" \ - "punpckhwd "#dst_4", %[db_2], "#dst_4" \n\t" \ - "pshufh %[db_2], "#dst_1", %[db_1] \n\t" \ - "punpcklwd "#dst_1", "#dst_1", "#dst_5" \n\t" \ - "punpckhwd "#dst_5", %[db_2], "#dst_5" \n\t" \ - "pshufh %[db_2], "#dst_2", %[db_1] \n\t" \ - "punpcklwd "#dst_2", "#dst_2", "#dst_6" \n\t" \ - "punpckhwd "#dst_6", %[db_2], "#dst_6" \n\t" \ - "pshufh %[db_2], "#dst_3", %[db_1] \n\t" \ - "punpcklwd "#dst_3", "#dst_3", "#dst_7" \n\t" \ - "punpckhwd "#dst_7", %[db_2], "#dst_7" \n\t" \ - \ - "pshufh %[db_2], "#dst_1", %[db_1] \n\t" \ - "pshufh "#dst_1", "#dst_4", %[db_1] \n\t" \ - "pshufh "#dst_4", %[db_2], %[db_1] \n\t" \ - "pshufh %[db_2], "#dst_3", %[db_1] \n\t" \ - "pshufh "#dst_3", "#dst_6", %[db_1] \n\t" \ - "pshufh "#dst_6", %[db_2], %[db_1] \n\t" - #define MMI_VP8_LOOP_FILTER \ /* Calculation of hev */ \ "dmtc1 %[thresh], %[ftmp3] \n\t" \ @@ -952,16 +900,14 @@ static av_always_inline void vp8_h_loop_filter8_mmi(uint8_t *dst, "gsldlc1 %[q3], 0x03(%[tmp0]) \n\t" "gsldrc1 %[q3], -0x04(%[tmp0]) \n\t" /* Matrix transpose */ - MMI_TRANSPOSE8x8_UB_UB(%[p3], %[p2], %[p1], %[p0], - %[q0], %[q1], %[q2], %[q3], - %[p3], %[p2], %[p1], %[p0], - %[q0], %[q1], %[q2], %[q3]) + TRANSPOSE_8B(%[p3], %[p2], %[p1], %[p0], + %[q0], %[q1], %[q2], %[q3], + %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) MMI_VP8_LOOP_FILTER /* Matrix transpose */ - MMI_TRANSPOSE8x8_UB_UB(%[p3], %[p2], %[p1], %[p0], - %[q0], %[q1], %[q2], %[q3], - %[p3], %[p2], %[p1], %[p0], - %[q0], %[q1], %[q2], %[q3]) + TRANSPOSE_8B(%[p3], %[p2], %[p1], %[p0], + %[q0], %[q1], %[q2], %[q3], + %[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4]) /* Move to dst */ "gssdlc1 %[p3], 0x03(%[dst]) \n\t" "gssdrc1 %[p3], -0x04(%[dst]) \n\t" @@ -1233,8 +1179,7 @@ void ff_vp8_idct_add_mmi(uint8_t *dst, int16_t block[16], ptrdiff_t stride) MMI_SDC1(%[ftmp0], %[block], 0x18) TRANSPOSE_4H(%[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4], - %[ftmp5], %[ftmp6], %[ftmp7], %[ftmp8], - %[ftmp9], %[tmp0], %[ftmp0], %[ftmp10]) + %[ftmp5], %[ftmp6], %[ftmp7], %[ftmp8]) // t[0 4 8 12] "paddh %[ftmp5], %[ftmp1], %[ftmp3] \n\t" @@ -1269,8 +1214,7 @@ void ff_vp8_idct_add_mmi(uint8_t *dst, int16_t block[16], ptrdiff_t stride) "psrah %[ftmp4], %[ftmp4], %[ftmp11] \n\t" TRANSPOSE_4H(%[ftmp1], %[ftmp2], %[ftmp3], %[ftmp4], - %[ftmp5], %[ftmp6], %[ftmp7], %[ftmp8], - %[ftmp9], %[tmp0], %[ftmp0], %[ftmp10]) + %[ftmp5], %[ftmp6], %[ftmp7], %[ftmp8]) MMI_LWC1(%[ftmp5], %[dst0], 0x00) MMI_LWC1(%[ftmp6], %[dst1], 0x00) diff --git a/libavutil/mips/mmiutils.h b/libavutil/mips/mmiutils.h index b16edc4..76b1199 100644 --- a/libavutil/mips/mmiutils.h +++ b/libavutil/mips/mmiutils.h @@ -250,30 +250,53 @@ : "memory" \ ); -#define TRANSPOSE_4H(m1, m2, m3, m4, t1, t2, t3, t4, t5, r1, zero, shift) \ - "li "#r1", 0x93 \n\t" \ - "xor "#zero","#zero","#zero" \n\t" \ - "mtc1 "#r1", "#shift" \n\t" \ - "punpcklhw "#t1", "#m1", "#zero" \n\t" \ - "punpcklhw "#t5", "#m2", "#zero" \n\t" \ - "pshufh "#t5", "#t5", "#shift" \n\t" \ - "or "#t1", "#t1", "#t5" \n\t" \ - "punpckhhw "#t2", "#m1", "#zero" \n\t" \ - "punpckhhw "#t5", "#m2", "#zero" \n\t" \ - "pshufh "#t5", "#t5", "#shift" \n\t" \ - "or "#t2", "#t2", "#t5" \n\t" \ - "punpcklhw "#t3", "#m3", "#zero" \n\t" \ - "punpcklhw "#t5", "#m4", "#zero" \n\t" \ - "pshufh "#t5", "#t5", "#shift" \n\t" \ - "or "#t3", "#t3", "#t5" \n\t" \ - "punpckhhw "#t4", "#m3", "#zero" \n\t" \ - "punpckhhw "#t5", "#m4", "#zero" \n\t" \ - "pshufh "#t5", "#t5", "#shift" \n\t" \ - "or "#t4", "#t4", "#t5" \n\t" \ - "punpcklwd "#m1", "#t1", "#t3" \n\t" \ - "punpckhwd "#m2", "#t1", "#t3" \n\t" \ - "punpcklwd "#m3", "#t2", "#t4" \n\t" \ - "punpckhwd "#m4", "#t2", "#t4" \n\t" +/** + * brief: Transpose 4X4 half word packaged data. + * fr_i0, fr_i1, fr_i2, fr_i3: src & dst + * fr_t0, fr_t1, fr_t2, fr_t3: temporary register + */ +#define TRANSPOSE_4H(fr_i0, fr_i1, fr_i2, fr_i3, \ + fr_t0, fr_t1, fr_t2, fr_t3) \ + "punpcklhw "#fr_t0", "#fr_i0", "#fr_i1" \n\t" \ + "punpckhhw "#fr_t1", "#fr_i0", "#fr_i1" \n\t" \ + "punpcklhw "#fr_t2", "#fr_i2", "#fr_i3" \n\t" \ + "punpckhhw "#fr_t3", "#fr_i2", "#fr_i3" \n\t" \ + "punpcklwd "#fr_i0", "#fr_t0", "#fr_t2" \n\t" \ + "punpckhwd "#fr_i1", "#fr_t0", "#fr_t2" \n\t" \ + "punpcklwd "#fr_i2", "#fr_t1", "#fr_t3" \n\t" \ + "punpckhwd "#fr_i3", "#fr_t1", "#fr_t3" \n\t" + +/** + * brief: Transpose 8x8 byte packaged data. + * fr_i0~i7: src & dst + * fr_t0~t3: temporary register + */ +#define TRANSPOSE_8B(fr_i0, fr_i1, fr_i2, fr_i3, fr_i4, fr_i5, \ + fr_i6, fr_i7, fr_t0, fr_t1, fr_t2, fr_t3) \ + "punpcklbh "#fr_t0", "#fr_i0", "#fr_i1" \n\t" \ + "punpckhbh "#fr_t1", "#fr_i0", "#fr_i1" \n\t" \ + "punpcklbh "#fr_t2", "#fr_i2", "#fr_i3" \n\t" \ + "punpckhbh "#fr_t3", "#fr_i2", "#fr_i3" \n\t" \ + "punpcklbh "#fr_i0", "#fr_i4", "#fr_i5" \n\t" \ + "punpckhbh "#fr_i1", "#fr_i4", "#fr_i5" \n\t" \ + "punpcklbh "#fr_i2", "#fr_i6", "#fr_i7" \n\t" \ + "punpckhbh "#fr_i3", "#fr_i6", "#fr_i7" \n\t" \ + "punpcklhw "#fr_i4", "#fr_t0", "#fr_t2" \n\t" \ + "punpckhhw "#fr_i5", "#fr_t0", "#fr_t2" \n\t" \ + "punpcklhw "#fr_i6", "#fr_t1", "#fr_t3" \n\t" \ + "punpckhhw "#fr_i7", "#fr_t1", "#fr_t3" \n\t" \ + "punpcklhw "#fr_t0", "#fr_i0", "#fr_i2" \n\t" \ + "punpckhhw "#fr_t1", "#fr_i0", "#fr_i2" \n\t" \ + "punpcklhw "#fr_t2", "#fr_i1", "#fr_i3" \n\t" \ + "punpckhhw "#fr_t3", "#fr_i1", "#fr_i3" \n\t" \ + "punpcklwd "#fr_i0", "#fr_i4", "#fr_t0" \n\t" \ + "punpckhwd "#fr_i1", "#fr_i4", "#fr_t0" \n\t" \ + "punpcklwd "#fr_i2", "#fr_i5", "#fr_t1" \n\t" \ + "punpckhwd "#fr_i3", "#fr_i5", "#fr_t1" \n\t" \ + "punpcklwd "#fr_i4", "#fr_i6", "#fr_t2" \n\t" \ + "punpckhwd "#fr_i5", "#fr_i6", "#fr_t2" \n\t" \ + "punpcklwd "#fr_i6", "#fr_i7", "#fr_t3" \n\t" \ + "punpckhwd "#fr_i7", "#fr_i7", "#fr_t3" \n\t" /** * brief: Parallel SRA for 8 byte packaged data. @@ -303,15 +326,14 @@ "psrlh "#fr_t1", "#fr_t1", "#fr_i1" \n\t" \ "packsshb "#fr_d0", "#fr_t0", "#fr_t1" \n\t" - -#define PSRAH_4_MMI(fp1, fp2, fp3, fp4, shift) \ - "psrah "#fp1", "#fp1", "#shift" \n\t" \ - "psrah "#fp2", "#fp2", "#shift" \n\t" \ - "psrah "#fp3", "#fp3", "#shift" \n\t" \ +#define PSRAH_4_MMI(fp1, fp2, fp3, fp4, shift) \ + "psrah "#fp1", "#fp1", "#shift" \n\t" \ + "psrah "#fp2", "#fp2", "#shift" \n\t" \ + "psrah "#fp3", "#fp3", "#shift" \n\t" \ "psrah "#fp4", "#fp4", "#shift" \n\t" -#define PSRAH_8_MMI(fp1, fp2, fp3, fp4, fp5, fp6, fp7, fp8, shift) \ - PSRAH_4_MMI(fp1, fp2, fp3, fp4, shift) \ +#define PSRAH_8_MMI(fp1, fp2, fp3, fp4, fp5, fp6, fp7, fp8, shift) \ + PSRAH_4_MMI(fp1, fp2, fp3, fp4, shift) \ PSRAH_4_MMI(fp5, fp6, fp7, fp8, shift)