From patchwork Fri May 28 02:04:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6YeR5rOi?= X-Patchwork-Id: 27959 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:b214:0:0:0:0:0 with SMTP id b20csp131349iof; Thu, 27 May 2021 19:05:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwjppS3p2r4p7+GtdP3Q4LZd36/Z6Tw8TU+k2SQlhmgyKAO9OKAaMWhLFDQdpVoALss17gG X-Received: by 2002:a17:906:3845:: with SMTP id w5mr2733848ejc.466.1622167540114; Thu, 27 May 2021 19:05:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622167540; cv=none; d=google.com; s=arc-20160816; b=KjJC+fNbC1wZJAP50+Pk4AtZh57kWAjqdaN1QmSX7bMUrTndNQ8ET/B2DLxKsUG/Dg B9qYMg5V07gvRiJVJOJOv/p6pOGJC1YHcEhWxc2i45nq6dvegZwCOBXMrnO2qHqwP4AE 1kyXMnpTqibPgTF8M3kyYaQMI3IZ5k8HlcgRDaHGDHp7CXsQXiPLmsS0ascvBHevNto/ ZGKoWkz7SpHffm+2NX2szbKaKw7qRfCsmxiqhZj/ah3sGFX4EkGScSkP9Jd1f0zulRF4 MxkXS0tsQjwXknKDI4dfWvWmfWbV2ORLzIxMQUDvrwil9R/G1TIS987/MjXeuG2ARmne 0jkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:delivered-to; bh=wtKnVdEO9rqBc3Q6Cv/sVbIyzMTfFV8D9IUMBq0vmQc=; b=AdL/APjoIAZnVy+ye4Nnsc7S7KtKcBFxjG9od6wjpLdik8bcTzr1HyHc/0gIrzGQZb y0a09UmuHytICmT53RaoAiHzDI8zZZITOwfoRQPdPmNUi2A7boVOQkFJQcUaI6qd6mE5 ZEQKJxrr7hD5J7bNoJzKFaCP36D+MskvzMf/hlyFE/mHSImO8RN2PRucGPJMKvHQKkB6 bj1KUSkaRf45aFszCllm7OMZD/Bx1LzE1paJNDRKEazXW7QUwS4tXkCnFFAAPdERSzDJ N5zvtQBjspDpYq80YAgc+rclQKa++Yi7n9nltbdYFnvAEUYrXwt14IS6hLyqi9pSfrYm 0FQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id p17si3538727ejr.362.2021.05.27.19.05.39; Thu, 27 May 2021 19:05:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C551A68A0BF; Fri, 28 May 2021 05:05:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D330A689CDB for ; Fri, 28 May 2021 05:04:53 +0300 (EEST) Received: from localhost (unknown [36.33.26.144]) by mail.loongson.cn (Coremail) with SMTP id AQAAf9Dxv0G+T7BgNZQFAA--.5310S3; Fri, 28 May 2021 10:04:46 +0800 (CST) From: Jin Bo To: ffmpeg-devel@ffmpeg.org Date: Fri, 28 May 2021 10:04:39 +0800 Message-Id: <1622167481-10973-1-git-send-email-jinbo@loongson.cn> X-Mailer: git-send-email 2.1.0 X-CM-TRANSID: AQAAf9Dxv0G+T7BgNZQFAA--.5310S3 X-Coremail-Antispam: 1UD129KBjvAXoWDArW5tF13tr1xXr17Wr1Utrb_yoWxXr1UWo W5GrW8tasrJw1xAr4UAr1UAr4YyF18tryUJr4fJ343Gr98Xr1UCr4rC345JrW0gw43GrWa yF1jqF17ZF1UJw48n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUU5q7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2 x7M28EF7xvwVC0I7IYx2IY67AKxVW5JVW7JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVWx JVW8Jr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxV WxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2Wl Yx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbV WUJVW8JwACjI8F5VA0II8E6IAqYI8I648v4I1lc2xSY4AK67AK6r4UMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUXVWUAwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8I cVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_WFyUJVCq3wCI42IY6I8E87 Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZF pf9x0JUvoGQUUUUU= X-CM-SenderInfo: xmlqu0o6or00hjvr0hdfq/1tbiAQAPEl3QvNWC-gACsG Subject: [FFmpeg-devel] [PATCH 1/3] libavcodec/mips: Fix specification of instruction name X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Jin Bo MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: QxcEI/m2c/Vh 1.'xor,or,and' to 'pxor,por,pand'. In the case of operating FPR, gcc supports both of them, clang only supports the second type. 2.'dsrl,srl' to 'ssrld,ssrlw'. In the case of operating FPR, gcc supports both of them, clang only supports the second type. Signed-off-by: Jin Bo --- libavcodec/mips/blockdsp_mmi.c | 8 +- libavcodec/mips/h264chroma_mmi.c | 20 +-- libavcodec/mips/h264dsp_mmi.c | 288 +++++++++++++++++++------------------- libavcodec/mips/h264pred_mmi.c | 18 +-- libavcodec/mips/h264qpel_mmi.c | 26 ++-- libavcodec/mips/hevcdsp_mmi.c | 32 ++--- libavcodec/mips/hpeldsp_mmi.c | 26 ++-- libavcodec/mips/idctdsp_mmi.c | 2 +- libavcodec/mips/mpegvideo_mmi.c | 94 ++++++------- libavcodec/mips/pixblockdsp_mmi.c | 8 +- libavcodec/mips/simple_idct_mmi.c | 14 +- libavcodec/mips/vc1dsp_mmi.c | 34 ++--- libavcodec/mips/vp3dsp_idct_mmi.c | 132 ++++++++--------- libavcodec/mips/vp8dsp_mmi.c | 80 +++++------ libavcodec/mips/vp9_mc_mmi.c | 10 +- libavcodec/mips/wmv2dsp_mmi.c | 2 +- 16 files changed, 397 insertions(+), 397 deletions(-) diff --git a/libavcodec/mips/blockdsp_mmi.c b/libavcodec/mips/blockdsp_mmi.c index 68641e2..8b5c7e9 100644 --- a/libavcodec/mips/blockdsp_mmi.c +++ b/libavcodec/mips/blockdsp_mmi.c @@ -76,8 +76,8 @@ void ff_clear_block_mmi(int16_t *block) double ftmp[2]; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp1] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp1] \n\t" MMI_SQC1(%[ftmp0], %[ftmp1], %[block], 0x00) MMI_SQC1(%[ftmp0], %[ftmp1], %[block], 0x10) MMI_SQC1(%[ftmp0], %[ftmp1], %[block], 0x20) @@ -97,8 +97,8 @@ void ff_clear_blocks_mmi(int16_t *block) double ftmp[2]; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp1] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp1] \n\t" MMI_SQC1(%[ftmp0], %[ftmp1], %[block], 0x00) MMI_SQC1(%[ftmp0], %[ftmp1], %[block], 0x10) MMI_SQC1(%[ftmp0], %[ftmp1], %[block], 0x20) diff --git a/libavcodec/mips/h264chroma_mmi.c b/libavcodec/mips/h264chroma_mmi.c index 739dd7d..dbcba10 100644 --- a/libavcodec/mips/h264chroma_mmi.c +++ b/libavcodec/mips/h264chroma_mmi.c @@ -72,7 +72,7 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, A = 64 - D - B - C; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -172,7 +172,7 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, A = 64 - E; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" @@ -221,7 +221,7 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, A = 64 - E; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" @@ -328,7 +328,7 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, C = (y << 3) - D; A = 64 - D - B - C; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -396,7 +396,7 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, E = x << 3; A = 64 - E; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" @@ -446,7 +446,7 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, E = y << 3; A = 64 - E; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" @@ -509,7 +509,7 @@ void ff_put_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, if (D) { __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -559,7 +559,7 @@ void ff_put_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, } else if (E) { const int step = C ? stride : 1; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" @@ -633,7 +633,7 @@ void ff_avg_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, if (D) { __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -685,7 +685,7 @@ void ff_avg_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, } else if (E) { const int step = C ? stride : 1; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" diff --git a/libavcodec/mips/h264dsp_mmi.c b/libavcodec/mips/h264dsp_mmi.c index d4fcef0..fe12b28 100644 --- a/libavcodec/mips/h264dsp_mmi.c +++ b/libavcodec/mips/h264dsp_mmi.c @@ -34,7 +34,7 @@ void ff_h264_add_pixels4_8_mmi(uint8_t *dst, int16_t *src, int stride) DECLARE_VAR_LOW32; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" MMI_LDC1(%[ftmp1], %[src], 0x00) MMI_LDC1(%[ftmp2], %[src], 0x08) MMI_LDC1(%[ftmp3], %[src], 0x10) @@ -89,7 +89,7 @@ void ff_h264_idct_add_8_mmi(uint8_t *dst, int16_t *block, int stride) MMI_LDC1(%[ftmp2], %[block], 0x10) MMI_LDC1(%[ftmp3], %[block], 0x18) /* memset(block, 0, 32) */ - "xor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" + "pxor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" "gssqc1 %[ftmp4], %[ftmp4], 0x00(%[block]) \n\t" "gssqc1 %[ftmp4], %[ftmp4], 0x10(%[block]) \n\t" "dli %[tmp0], 0x01 \n\t" @@ -127,7 +127,7 @@ void ff_h264_idct_add_8_mmi(uint8_t *dst, int16_t *block, int stride) "psubh %[ftmp5], %[ftmp5], %[ftmp4] \n\t" MMI_ULWC1(%[ftmp2], %[dst], 0x00) MMI_LWXC1(%[ftmp0], %[dst], %[stride], 0x00) - "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" + "pxor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" "psrah %[ftmp3], %[ftmp10], %[ftmp9] \n\t" "psrah %[ftmp4], %[ftmp11], %[ftmp9] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp7] \n\t" @@ -419,7 +419,7 @@ void ff_h264_idct8_add_8_mmi(uint8_t *dst, int16_t *block, int stride) "psubh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" MMI_SDC1(%[ftmp0], $sp, 0x10) "dmfc1 %[tmp1], %[ftmp2] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp2] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp2] \n\t" MMI_SDC1(%[ftmp2], %[block], 0x00) MMI_SDC1(%[ftmp2], %[block], 0x08) MMI_SDC1(%[ftmp2], %[block], 0x10) @@ -555,7 +555,7 @@ void ff_h264_idct8_add_8_mmi(uint8_t *dst, int16_t *block, int stride) "psubh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" MMI_SDC1(%[ftmp7], $sp, 0x18) "dmfc1 %[tmp2], %[ftmp0] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" MMI_ULWC1(%[ftmp6], %[addr0], 0x00) MMI_LWXC1(%[ftmp7], %[addr0], %[stride], 0x00) "psrah %[ftmp2], %[ftmp2], %[ftmp10] \n\t" @@ -646,7 +646,7 @@ void ff_h264_idct_dc_add_8_mmi(uint8_t *dst, int16_t *block, int stride) __asm__ volatile ( "mtc1 %[dc], %[ftmp5] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "pshufh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" MMI_ULWC1(%[ftmp1], %[dst0], 0x00) MMI_ULWC1(%[ftmp2], %[dst1], 0x00) @@ -690,7 +690,7 @@ void ff_h264_idct8_dc_add_8_mmi(uint8_t *dst, int16_t *block, int stride) __asm__ volatile ( "mtc1 %[dc], %[ftmp5] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "pshufh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" MMI_LDC1(%[ftmp1], %[dst0], 0x00) MMI_LDC1(%[ftmp2], %[dst1], 0x00) @@ -929,7 +929,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "packsswh %[ftmp0], %[ftmp0], %[ftmp1] \n\t" "packsswh %[ftmp2], %[ftmp2], %[ftmp5] \n\t" "dmfc1 %[tmp1], %[ftmp0] \n\t" - "dsrl %[ftmp0], %[ftmp0], %[ftmp9] \n\t" + "ssrld %[ftmp0], %[ftmp0], %[ftmp9] \n\t" "mfc1 %[input], %[ftmp0] \n\t" "sh %[tmp1], 0x00(%[output]) \n\t" "sh %[input], 0x80(%[output]) \n\t" @@ -938,7 +938,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "sh %[tmp1], 0x20(%[output]) \n\t" "sh %[input], 0xa0(%[output]) \n\t" "dmfc1 %[tmp1], %[ftmp2] \n\t" - "dsrl %[ftmp2], %[ftmp2], %[ftmp9] \n\t" + "ssrld %[ftmp2], %[ftmp2], %[ftmp9] \n\t" "mfc1 %[input], %[ftmp2] \n\t" "sh %[tmp1], 0x40(%[output]) \n\t" "sh %[input], 0xc0(%[output]) \n\t" @@ -963,7 +963,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "packsswh %[ftmp3], %[ftmp3], %[ftmp1] \n\t" "packsswh %[ftmp4], %[ftmp4], %[ftmp5] \n\t" "dmfc1 %[tmp1], %[ftmp3] \n\t" - "dsrl %[ftmp3], %[ftmp3], %[ftmp9] \n\t" + "ssrld %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "mfc1 %[input], %[ftmp3] \n\t" "sh %[tmp1], 0x100(%[output]) \n\t" "sh %[input], 0x180(%[output]) \n\t" @@ -972,7 +972,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "sh %[tmp1], 0x120(%[output]) \n\t" "sh %[input], 0x1a0(%[output]) \n\t" "dmfc1 %[tmp1], %[ftmp4] \n\t" - "dsrl %[ftmp4], %[ftmp4], %[ftmp9] \n\t" + "ssrld %[ftmp4], %[ftmp4], %[ftmp9] \n\t" "mfc1 %[input], %[ftmp4] \n\t" "sh %[tmp1], 0x140(%[output]) \n\t" "sh %[input], 0x1c0(%[output]) \n\t" @@ -1016,7 +1016,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "packsswh %[ftmp0], %[ftmp0], %[ftmp1] \n\t" "packsswh %[ftmp2], %[ftmp2], %[ftmp5] \n\t" "dmfc1 %[tmp1], %[ftmp0] \n\t" - "dsrl %[ftmp0], %[ftmp0], %[ftmp9] \n\t" + "ssrld %[ftmp0], %[ftmp0], %[ftmp9] \n\t" "sh %[tmp1], 0x00(%[output]) \n\t" "mfc1 %[input], %[ftmp0] \n\t" "dsrl %[tmp1], %[tmp1], 0x10 \n\t" @@ -1025,7 +1025,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, PTR_SRL "%[input], %[input], 0x10 \n\t" "dmfc1 %[tmp1], %[ftmp2] \n\t" "sh %[input], 0xa0(%[output]) \n\t" - "dsrl %[ftmp2], %[ftmp2], %[ftmp9] \n\t" + "ssrld %[ftmp2], %[ftmp2], %[ftmp9] \n\t" "sh %[tmp1], 0x40(%[output]) \n\t" "mfc1 %[input], %[ftmp2] \n\t" "dsrl %[tmp1], %[tmp1], 0x10 \n\t" @@ -1050,7 +1050,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "packsswh %[ftmp3], %[ftmp3], %[ftmp1] \n\t" "packsswh %[ftmp4], %[ftmp4], %[ftmp5] \n\t" "dmfc1 %[tmp1], %[ftmp3] \n\t" - "dsrl %[ftmp3], %[ftmp3], %[ftmp9] \n\t" + "ssrld %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "mfc1 %[input], %[ftmp3] \n\t" "sh %[tmp1], 0x100(%[output]) \n\t" "sh %[input], 0x180(%[output]) \n\t" @@ -1059,7 +1059,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, "sh %[tmp1], 0x120(%[output]) \n\t" "sh %[input], 0x1a0(%[output]) \n\t" "dmfc1 %[tmp1], %[ftmp4] \n\t" - "dsrl %[ftmp4], %[ftmp4], %[ftmp9] \n\t" + "ssrld %[ftmp4], %[ftmp4], %[ftmp9] \n\t" "mfc1 %[input], %[ftmp4] \n\t" "sh %[tmp1], 0x140(%[output]) \n\t" "sh %[input], 0x1c0(%[output]) \n\t" @@ -1144,7 +1144,7 @@ void ff_h264_weight_pixels16_8_mmi(uint8_t *block, ptrdiff_t stride, int height, for (y=0; y> 3; \ __asm__ volatile( \ - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" \ + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" \ "li %[rtmp0], 0x06 \n\t" \ "dmtc1 %[rtmp0], %[ftmp1] \n\t" \ "li %[rtmp0], 0x10 \n\t" \ @@ -930,8 +930,8 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ "packsswh %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \ "pcmpgth %[ftmp3], %[ftmp2], %[ftmp0] \n\t" \ "pcmpgth %[ftmp5], %[ftmp4], %[ftmp0] \n\t" \ - "and %[ftmp2], %[ftmp2], %[ftmp3] \n\t" \ - "and %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \ + "pand %[ftmp2], %[ftmp2], %[ftmp3] \n\t" \ + "pand %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \ "packushb %[ftmp2], %[ftmp2], %[ftmp4] \n\t" \ "gssdlc1 %[ftmp2], 0x07(%[dst]) \n\t" \ "gssdrc1 %[ftmp2], 0x00(%[dst]) \n\t" \ @@ -1006,7 +1006,7 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ "punpcklbh %[ftmp1], %[ftmp0], %[ftmp1] \n\t" \ "psrah %[ftmp1], %[ftmp1], %[ftmp0] \n\t" \ "psrah %[ftmp2], %[ftmp2], %[ftmp0] \n\t" \ - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" \ + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" \ \ "1: \n\t" \ "2: \n\t" \ @@ -1139,9 +1139,9 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ "packsswh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \ "paddh %[ftmp3], %[ftmp3], %[offset] \n\t" \ "psrah %[ftmp3], %[ftmp3], %[shift] \n\t" \ - "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" \ + "pxor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" \ "pcmpgth %[ftmp7], %[ftmp3], %[ftmp7] \n\t" \ - "and %[ftmp3], %[ftmp3], %[ftmp7] \n\t" \ + "pand %[ftmp3], %[ftmp3], %[ftmp7] \n\t" \ "packushb %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ "gsswlc1 %[ftmp3], 0x03(%[dst]) \n\t" \ "gsswrc1 %[ftmp3], 0x00(%[dst]) \n\t" \ diff --git a/libavcodec/mips/hpeldsp_mmi.c b/libavcodec/mips/hpeldsp_mmi.c index e69b2bd..bf3e463 100644 --- a/libavcodec/mips/hpeldsp_mmi.c +++ b/libavcodec/mips/hpeldsp_mmi.c @@ -676,14 +676,14 @@ inline void ff_put_no_rnd_pixels8_l2_8_mmi(uint8_t *dst, const uint8_t *src1, PTR_ADDU "%[addr1], %[src2], %[src_stride2] \n\t" MMI_ULDC1(%[ftmp3], %[addr1], 0x00) PTR_ADDU "%[src1], %[src1], %[addr2] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "pavgb %[ftmp0], %[ftmp0], %[ftmp2] \n\t" "pavgb %[ftmp1], %[ftmp1], %[ftmp3] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" MMI_SDC1(%[ftmp0], %[dst], 0x00) MMI_SDXC1(%[ftmp1], %[dst], %[dst_stride], 0x00) PTR_ADDU "%[src2], %[src2], %[addr3] \n\t" @@ -696,14 +696,14 @@ inline void ff_put_no_rnd_pixels8_l2_8_mmi(uint8_t *dst, const uint8_t *src1, PTR_ADDU "%[addr1], %[src2], %[src_stride2] \n\t" MMI_ULDC1(%[ftmp3], %[addr1], 0x00) PTR_ADDU "%[src1], %[src1], %[addr2] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "pavgb %[ftmp0], %[ftmp0], %[ftmp2] \n\t" "pavgb %[ftmp1], %[ftmp1], %[ftmp3] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp4] \n\t" MMI_SDC1(%[ftmp0], %[dst], 0x00) MMI_SDXC1(%[ftmp1], %[dst], %[dst_stride], 0x00) PTR_ADDU "%[src2], %[src2], %[addr3] \n\t" @@ -846,7 +846,7 @@ void ff_put_pixels8_xy2_8_mmi(uint8_t *block, const uint8_t *pixels, DECLARE_VAR_ADDRT; __asm__ volatile ( - "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" + "pxor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" "dli %[addr0], 0x0f \n\t" "pcmpeqw %[ftmp6], %[ftmp6], %[ftmp6] \n\t" "dmtc1 %[addr0], %[ftmp8] \n\t" diff --git a/libavcodec/mips/idctdsp_mmi.c b/libavcodec/mips/idctdsp_mmi.c index a96dac4..0047aef 100644 --- a/libavcodec/mips/idctdsp_mmi.c +++ b/libavcodec/mips/idctdsp_mmi.c @@ -154,7 +154,7 @@ void ff_add_pixels_clamped_mmi(const int16_t *block, uint64_t tmp[1]; __asm__ volatile ( "li %[tmp0], 0x04 \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "1: \n\t" MMI_LDC1(%[ftmp5], %[pixels], 0x00) PTR_ADDU "%[pixels], %[pixels], %[line_size] \n\t" diff --git a/libavcodec/mips/mpegvideo_mmi.c b/libavcodec/mips/mpegvideo_mmi.c index e4aba08..edaa839 100644 --- a/libavcodec/mips/mpegvideo_mmi.c +++ b/libavcodec/mips/mpegvideo_mmi.c @@ -53,13 +53,13 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block, nCoeffs = s->inter_scantable.raster_end[s->block_last_index[n]]; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "packsswh %[qmul], %[qmul], %[qmul] \n\t" "packsswh %[qmul], %[qmul], %[qmul] \n\t" "packsswh %[qadd], %[qadd], %[qadd] \n\t" "packsswh %[qadd], %[qadd], %[qadd] \n\t" "psubh %[ftmp0], %[ftmp0], %[qadd] \n\t" - "xor %[ftmp5], %[ftmp5], %[ftmp5] \n\t" + "pxor %[ftmp5], %[ftmp5], %[ftmp5] \n\t" ".p2align 4 \n\t" "1: \n\t" @@ -72,12 +72,12 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block, "pmullh %[ftmp2], %[ftmp2], %[qmul] \n\t" "pcmpgth %[ftmp3], %[ftmp3], %[ftmp5] \n\t" "pcmpgth %[ftmp4], %[ftmp4], %[ftmp5] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp3] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp3] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" "paddh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "paddh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp1] \n\t" - "xor %[ftmp4], %[ftmp4], %[ftmp2] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp1] \n\t" + "pxor %[ftmp4], %[ftmp4], %[ftmp2] \n\t" "pcmpeqh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "pcmpeqh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "pandn %[ftmp1], %[ftmp1], %[ftmp3] \n\t" @@ -116,11 +116,11 @@ void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block, __asm__ volatile ( "packsswh %[qmul], %[qmul], %[qmul] \n\t" "packsswh %[qmul], %[qmul], %[qmul] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "packsswh %[qadd], %[qadd], %[qadd] \n\t" "packsswh %[qadd], %[qadd], %[qadd] \n\t" "psubh %[ftmp0], %[ftmp0], %[qadd] \n\t" - "xor %[ftmp5], %[ftmp5], %[ftmp5] \n\t" + "pxor %[ftmp5], %[ftmp5], %[ftmp5] \n\t" ".p2align 4 \n\t" "1: \n\t" PTR_ADDU "%[addr0], %[block], %[nCoeffs] \n\t" @@ -132,12 +132,12 @@ void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block, "pmullh %[ftmp2], %[ftmp2], %[qmul] \n\t" "pcmpgth %[ftmp3], %[ftmp3], %[ftmp5] \n\t" "pcmpgth %[ftmp4], %[ftmp4], %[ftmp5] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp3] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp3] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp4] \n\t" "paddh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "paddh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp1] \n\t" - "xor %[ftmp4], %[ftmp4], %[ftmp2] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp1] \n\t" + "pxor %[ftmp4], %[ftmp4], %[ftmp2] \n\t" "pcmpeqh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "pcmpeqh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "pandn %[ftmp1], %[ftmp1], %[ftmp3] \n\t" @@ -201,18 +201,18 @@ void ff_dct_unquantize_mpeg1_intra_mmi(MpegEncContext *s, int16_t *block, MMI_LDXC1(%[ftmp7], %[addr0], %[quant], 0x08) "pmullh %[ftmp6], %[ftmp6], %[ftmp1] \n\t" "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" - "xor %[ftmp8], %[ftmp8], %[ftmp8] \n\t" - "xor %[ftmp9], %[ftmp9], %[ftmp9] \n\t" + "pxor %[ftmp8], %[ftmp8], %[ftmp8] \n\t" + "pxor %[ftmp9], %[ftmp9], %[ftmp9] \n\t" "pcmpgth %[ftmp8], %[ftmp8], %[ftmp2] \n\t" "pcmpgth %[ftmp9], %[ftmp9], %[ftmp3] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "pmullh %[ftmp2], %[ftmp2], %[ftmp6] \n\t" "pmullh %[ftmp3], %[ftmp3], %[ftmp7] \n\t" - "xor %[ftmp6], %[ftmp6], %[ftmp6] \n\t" - "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" + "pxor %[ftmp6], %[ftmp6], %[ftmp6] \n\t" + "pxor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" "pcmpeqh %[ftmp6], %[ftmp6], %[ftmp4] \n\t" "dli %[tmp0], 0x03 \n\t" "pcmpeqh %[ftmp7], %[ftmp7], %[ftmp5] \n\t" @@ -221,10 +221,10 @@ void ff_dct_unquantize_mpeg1_intra_mmi(MpegEncContext *s, int16_t *block, "psrah %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" - "or %[ftmp2], %[ftmp2], %[ftmp0] \n\t" - "or %[ftmp3], %[ftmp3], %[ftmp0] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" + "por %[ftmp2], %[ftmp2], %[ftmp0] \n\t" + "por %[ftmp3], %[ftmp3], %[ftmp0] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "pandn %[ftmp6], %[ftmp6], %[ftmp2] \n\t" @@ -287,12 +287,12 @@ void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block, MMI_LDXC1(%[ftmp7], %[addr0], %[quant], 0x08) "pmullh %[ftmp6], %[ftmp6], %[ftmp1] \n\t" "pmullh %[ftmp7], %[ftmp7], %[ftmp1] \n\t" - "xor %[ftmp8], %[ftmp8], %[ftmp8] \n\t" - "xor %[ftmp9], %[ftmp9], %[ftmp9] \n\t" + "pxor %[ftmp8], %[ftmp8], %[ftmp8] \n\t" + "pxor %[ftmp9], %[ftmp9], %[ftmp9] \n\t" "pcmpgth %[ftmp8], %[ftmp8], %[ftmp2] \n\t" "pcmpgth %[ftmp9], %[ftmp9], %[ftmp3] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "paddh %[ftmp2], %[ftmp2], %[ftmp2] \n\t" @@ -301,8 +301,8 @@ void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block, "paddh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" "pmullh %[ftmp2], %[ftmp2], %[ftmp6] \n\t" "pmullh %[ftmp3], %[ftmp3], %[ftmp7] \n\t" - "xor %[ftmp6], %[ftmp6], %[ftmp6] \n\t" - "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" + "pxor %[ftmp6], %[ftmp6], %[ftmp6] \n\t" + "pxor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" "pcmpeqh %[ftmp6], %[ftmp6], %[ftmp4] \n\t" "dli %[tmp0], 0x04 \n\t" "pcmpeqh %[ftmp7], %[ftmp7], %[ftmp5] \n\t" @@ -311,10 +311,10 @@ void ff_dct_unquantize_mpeg1_inter_mmi(MpegEncContext *s, int16_t *block, "psrah %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" - "or %[ftmp2], %[ftmp2], %[ftmp0] \n\t" - "or %[ftmp3], %[ftmp3], %[ftmp0] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" + "por %[ftmp2], %[ftmp2], %[ftmp0] \n\t" + "por %[ftmp3], %[ftmp3], %[ftmp0] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp9] \n\t" "pandn %[ftmp6], %[ftmp6], %[ftmp2] \n\t" @@ -386,26 +386,26 @@ void ff_dct_unquantize_mpeg2_intra_mmi(MpegEncContext *s, int16_t *block, MMI_LDXC1(%[ftmp6], %[addr0], %[quant], 0x08) "pmullh %[ftmp5], %[ftmp5], %[ftmp9] \n\t" "pmullh %[ftmp6], %[ftmp6], %[ftmp9] \n\t" - "xor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" - "xor %[ftmp8], %[ftmp8], %[ftmp8] \n\t" + "pxor %[ftmp7], %[ftmp7], %[ftmp7] \n\t" + "pxor %[ftmp8], %[ftmp8], %[ftmp8] \n\t" "pcmpgth %[ftmp7], %[ftmp7], %[ftmp1] \n\t" "pcmpgth %[ftmp8], %[ftmp8], %[ftmp2] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp7] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp7] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "psubh %[ftmp1], %[ftmp1], %[ftmp7] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "pmullh %[ftmp1], %[ftmp1], %[ftmp5] \n\t" "pmullh %[ftmp2], %[ftmp2], %[ftmp6] \n\t" - "xor %[ftmp5], %[ftmp5], %[ftmp5] \n\t" - "xor %[ftmp6], %[ftmp6], %[ftmp6] \n\t" + "pxor %[ftmp5], %[ftmp5], %[ftmp5] \n\t" + "pxor %[ftmp6], %[ftmp6], %[ftmp6] \n\t" "pcmpeqh %[ftmp5], %[ftmp5], %[ftmp3] \n\t" "dli %[tmp0], 0x03 \n\t" "pcmpeqh %[ftmp6] , %[ftmp6], %[ftmp4] \n\t" "mtc1 %[tmp0], %[ftmp3] \n\t" "psrah %[ftmp1], %[ftmp1], %[ftmp3] \n\t" "psrah %[ftmp2], %[ftmp2], %[ftmp3] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp7] \n\t" - "xor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp7] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "psubh %[ftmp1], %[ftmp1], %[ftmp7] \n\t" "psubh %[ftmp2], %[ftmp2], %[ftmp8] \n\t" "pandn %[ftmp5], %[ftmp5], %[ftmp1] \n\t" @@ -445,16 +445,16 @@ void ff_denoise_dct_mmi(MpegEncContext *s, int16_t *block) s->dct_count[intra]++; __asm__ volatile( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "1: \n\t" MMI_LDC1(%[ftmp1], %[block], 0x00) - "xor %[ftmp2], %[ftmp2], %[ftmp2] \n\t" + "pxor %[ftmp2], %[ftmp2], %[ftmp2] \n\t" MMI_LDC1(%[ftmp3], %[block], 0x08) - "xor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" + "pxor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" "pcmpgth %[ftmp2], %[ftmp2], %[ftmp1] \n\t" "pcmpgth %[ftmp4], %[ftmp4], %[ftmp3] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp2] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp2] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "psubh %[ftmp1], %[ftmp1], %[ftmp2] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" MMI_LDC1(%[ftmp6], %[offset], 0x00) @@ -463,8 +463,8 @@ void ff_denoise_dct_mmi(MpegEncContext *s, int16_t *block) MMI_LDC1(%[ftmp6], %[offset], 0x08) "mov.d %[ftmp7], %[ftmp3] \n\t" "psubush %[ftmp3], %[ftmp3], %[ftmp6] \n\t" - "xor %[ftmp1], %[ftmp1], %[ftmp2] \n\t" - "xor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" + "pxor %[ftmp1], %[ftmp1], %[ftmp2] \n\t" + "pxor %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "psubh %[ftmp1], %[ftmp1], %[ftmp2] \n\t" "psubh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" MMI_SDC1(%[ftmp1], %[block], 0x00) diff --git a/libavcodec/mips/pixblockdsp_mmi.c b/libavcodec/mips/pixblockdsp_mmi.c index a915a3c..1230f5d 100644 --- a/libavcodec/mips/pixblockdsp_mmi.c +++ b/libavcodec/mips/pixblockdsp_mmi.c @@ -33,7 +33,7 @@ void ff_get_pixels_8_mmi(int16_t *av_restrict block, const uint8_t *pixels, DECLARE_VAR_ADDRT; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" MMI_LDC1(%[ftmp1], %[pixels], 0x00) MMI_LDXC1(%[ftmp2], %[pixels], %[stride], 0x00) @@ -103,12 +103,12 @@ void ff_diff_pixels_mmi(int16_t *av_restrict block, const uint8_t *src1, __asm__ volatile ( "li %[tmp0], 0x08 \n\t" - "xor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" + "pxor %[ftmp4], %[ftmp4], %[ftmp4] \n\t" "1: \n\t" MMI_LDC1(%[ftmp0], %[src1], 0x00) - "or %[ftmp1], %[ftmp0], %[ftmp0] \n\t" + "por %[ftmp1], %[ftmp0], %[ftmp0] \n\t" MMI_LDC1(%[ftmp2], %[src2], 0x00) - "or %[ftmp3], %[ftmp2], %[ftmp2] \n\t" + "por %[ftmp3], %[ftmp2], %[ftmp2] \n\t" "punpcklbh %[ftmp0], %[ftmp0], %[ftmp4] \n\t" "punpckhbh %[ftmp1], %[ftmp1], %[ftmp4] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp4] \n\t" diff --git a/libavcodec/mips/simple_idct_mmi.c b/libavcodec/mips/simple_idct_mmi.c index e4b58dc..ad068a8 100644 --- a/libavcodec/mips/simple_idct_mmi.c +++ b/libavcodec/mips/simple_idct_mmi.c @@ -133,7 +133,7 @@ void ff_simple_idct_8_mmi(int16_t *block) "psllh $f28, "#src1", $f30 \n\t" \ "dmtc1 $9, $f31 \n\t" \ "punpcklhw $f29, $f28, $f28 \n\t" \ - "and $f29, $f29, $f31 \n\t" \ + "pand $f29, $f29, $f31 \n\t" \ "paddw $f28, $f28, $f29 \n\t" \ "punpcklwd "#src1", $f28, $f28 \n\t" \ "punpcklwd "#src2", $f28, $f28 \n\t" \ @@ -268,9 +268,9 @@ void ff_simple_idct_8_mmi(int16_t *block) "punpcklwd $f8, $f27, $f29 \n\t" "punpckhwd $f12, $f27, $f29 \n\t" - "or $f26, $f2, $f6 \n\t" - "or $f26, $f26, $f10 \n\t" - "or $f26, $f26, $f14 \n\t" + "por $f26, $f2, $f6 \n\t" + "por $f26, $f26, $f10 \n\t" + "por $f26, $f26, $f14 \n\t" "dmfc1 $10, $f26 \n\t" "bnez $10, 1f \n\t" /* case1: In this case, row[1,3,5,7] are all zero */ @@ -338,9 +338,9 @@ void ff_simple_idct_8_mmi(int16_t *block) "punpcklwd $f9, $f27, $f29 \n\t" "punpckhwd $f13, $f27, $f29 \n\t" - "or $f26, $f3, $f7 \n\t" - "or $f26, $f26, $f11 \n\t" - "or $f26, $f26, $f15 \n\t" + "por $f26, $f3, $f7 \n\t" + "por $f26, $f26, $f11 \n\t" + "por $f26, $f26, $f15 \n\t" "dmfc1 $10, $f26 \n\t" "bnez $10, 1f \n\t" /* case1: In this case, row[1,3,5,7] are all zero */ diff --git a/libavcodec/mips/vc1dsp_mmi.c b/libavcodec/mips/vc1dsp_mmi.c index 348ecd2..a8ab3f6 100644 --- a/libavcodec/mips/vc1dsp_mmi.c +++ b/libavcodec/mips/vc1dsp_mmi.c @@ -134,7 +134,7 @@ void ff_vc1_inv_trans_8x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo dc = (3 * dc + 16) >> 5; __asm__ volatile( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "pshufh %[dc], %[dc], %[ftmp0] \n\t" "li %[count], 0x02 \n\t" @@ -425,7 +425,7 @@ void ff_vc1_inv_trans_8x4_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo dc = (17 * dc + 64) >> 7; __asm__ volatile( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "pshufh %[dc], %[dc], %[ftmp0] \n\t" MMI_LDC1(%[ftmp1], %[dest0], 0x00) @@ -705,7 +705,7 @@ void ff_vc1_inv_trans_8x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) MMI_LWC1(%[ftmp3], %[tmp0], 0x00) PTR_ADDU "%[tmp0], %[tmp0], %[linesize] \n\t" MMI_LWC1(%[ftmp4], %[tmp0], 0x00) - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "punpcklbh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" @@ -829,7 +829,7 @@ void ff_vc1_inv_trans_8x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) MMI_LWC1(%[ftmp3], %[tmp0], 0x04) PTR_ADDU "%[tmp0], %[tmp0], %[linesize] \n\t" MMI_LWC1(%[ftmp4], %[tmp0], 0x04) - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "punpcklbh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" @@ -877,7 +877,7 @@ void ff_vc1_inv_trans_4x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo dc = (12 * dc + 64) >> 7; __asm__ volatile( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "pshufh %[dc], %[dc], %[ftmp0] \n\t" MMI_LWC1(%[ftmp1], %[dest0], 0x00) @@ -1058,7 +1058,7 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) MMI_LWC1(%[ftmp7], %[tmp0], 0x00) PTR_ADDU "%[tmp0], %[tmp0], %[linesize] \n\t" MMI_LWC1(%[ftmp8], %[tmp0], 0x00) - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "punpcklbh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" @@ -1133,7 +1133,7 @@ void ff_vc1_inv_trans_4x4_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo dc = (17 * dc + 64) >> 7; __asm__ volatile( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "pshufh %[dc], %[dc], %[ftmp0] \n\t" MMI_LWC1(%[ftmp1], %[dest0], 0x00) @@ -1339,7 +1339,7 @@ void ff_vc1_inv_trans_4x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) MMI_LWC1(%[ftmp3], %[tmp0], 0x00) PTR_ADDU "%[tmp0], %[tmp0], %[linesize] \n\t" MMI_LWC1(%[ftmp4], %[tmp0], 0x00) - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "punpcklbh %[ftmp1], %[ftmp1], %[ftmp0] \n\t" "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" @@ -1664,7 +1664,7 @@ static void vc1_put_ver_16b_shift2_mmi(int16_t *dst, DECLARE_VAR_ADDRT; __asm__ volatile( - "xor $f0, $f0, $f0 \n\t" + "pxor $f0, $f0, $f0 \n\t" "li $8, 0x03 \n\t" LOAD_ROUNDER_MMI("%[rnd]") "ldc1 $f12, %[ff_pw_9] \n\t" @@ -1771,7 +1771,7 @@ static void OPNAME ## vc1_shift2_mmi(uint8_t *dst, const uint8_t *src, \ rnd = 8 - rnd; \ \ __asm__ volatile( \ - "xor $f0, $f0, $f0 \n\t" \ + "pxor $f0, $f0, $f0 \n\t" \ "li $10, 0x08 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ "ldc1 $f12, %[ff_pw_9] \n\t" \ @@ -1898,7 +1898,7 @@ vc1_put_ver_16b_ ## NAME ## _mmi(int16_t *dst, const uint8_t *src, \ src -= src_stride; \ \ __asm__ volatile( \ - "xor $f0, $f0, $f0 \n\t" \ + "pxor $f0, $f0, $f0 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ "ldc1 $f10, %[ff_pw_53] \n\t" \ "ldc1 $f12, %[ff_pw_18] \n\t" \ @@ -1973,7 +1973,7 @@ OPNAME ## vc1_hor_16b_ ## NAME ## _mmi(uint8_t *dst, mips_reg stride, \ rnd -= (-4+58+13-3)*256; /* Add -256 bias */ \ \ __asm__ volatile( \ - "xor $f0, $f0, $f0 \n\t" \ + "pxor $f0, $f0, $f0 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ "ldc1 $f10, %[ff_pw_53] \n\t" \ "ldc1 $f12, %[ff_pw_18] \n\t" \ @@ -2023,7 +2023,7 @@ OPNAME ## vc1_## NAME ## _mmi(uint8_t *dst, const uint8_t *src, \ rnd = 32-rnd; \ \ __asm__ volatile ( \ - "xor $f0, $f0, $f0 \n\t" \ + "pxor $f0, $f0, $f0 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ "ldc1 $f10, %[ff_pw_53] \n\t" \ "ldc1 $f12, %[ff_pw_18] \n\t" \ @@ -2259,7 +2259,7 @@ void ff_put_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */, __asm__ volatile( "li %[tmp0], 0x06 \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp9] \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -2314,7 +2314,7 @@ void ff_put_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */, __asm__ volatile( "li %[tmp0], 0x06 \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp5] \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -2367,7 +2367,7 @@ void ff_avg_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */, __asm__ volatile( "li %[tmp0], 0x06 \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp9] \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" @@ -2425,7 +2425,7 @@ void ff_avg_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */, __asm__ volatile( "li %[tmp0], 0x06 \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp5] \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" diff --git a/libavcodec/mips/vp3dsp_idct_mmi.c b/libavcodec/mips/vp3dsp_idct_mmi.c index c5c4cf3..0d4cba1 100644 --- a/libavcodec/mips/vp3dsp_idct_mmi.c +++ b/libavcodec/mips/vp3dsp_idct_mmi.c @@ -34,7 +34,7 @@ static void idct_row_mmi(int16_t *input) double ftmp[23]; uint64_t tmp[2]; __asm__ volatile ( - "xor %[ftmp10], %[ftmp10], %[ftmp10] \n\t" + "pxor %[ftmp10], %[ftmp10], %[ftmp10] \n\t" LOAD_CONST(%[csth_1], 1) "li %[tmp0], 0x02 \n\t" "1: \n\t" @@ -51,14 +51,14 @@ static void idct_row_mmi(int16_t *input) LOAD_CONST(%[ftmp9], 12785) "pmulhh %[A], %[ftmp9], %[ftmp7] \n\t" "pcmpgth %[C], %[ftmp10], %[ftmp1] \n\t" - "or %[mask], %[C], %[csth_1] \n\t" + "por %[mask], %[C], %[csth_1] \n\t" "pmullh %[B], %[ftmp1], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[B] \n\t" "pmullh %[B], %[B], %[mask] \n\t" "paddh %[A], %[A], %[B] \n\t" "paddh %[A], %[A], %[C] \n\t" "pcmpgth %[D], %[ftmp10], %[ftmp7] \n\t" - "or %[mask], %[D], %[csth_1] \n\t" + "por %[mask], %[D], %[csth_1] \n\t" "pmullh %[ftmp7], %[ftmp7], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[ftmp7] \n\t" "pmullh %[B], %[B], %[mask] \n\t" @@ -69,12 +69,12 @@ static void idct_row_mmi(int16_t *input) LOAD_CONST(%[ftmp8], 54491) LOAD_CONST(%[ftmp9], 36410) "pcmpgth %[Ad], %[ftmp10], %[ftmp5] \n\t" - "or %[mask], %[Ad], %[csth_1] \n\t" + "por %[mask], %[Ad], %[csth_1] \n\t" "pmullh %[ftmp1], %[ftmp5], %[mask] \n\t" "pmulhuh %[C], %[ftmp9], %[ftmp1] \n\t" "pmullh %[C], %[C], %[mask] \n\t" "pcmpgth %[Bd], %[ftmp10], %[ftmp3] \n\t" - "or %[mask], %[Bd], %[csth_1] \n\t" + "por %[mask], %[Bd], %[csth_1] \n\t" "pmullh %[D], %[ftmp3], %[mask] \n\t" "pmulhuh %[D], %[ftmp8], %[D] \n\t" "pmullh %[D], %[D], %[mask] \n\t" @@ -82,12 +82,12 @@ static void idct_row_mmi(int16_t *input) "paddh %[C], %[C], %[Ad] \n\t" "paddh %[C], %[C], %[Bd] \n\t" "pcmpgth %[Bd], %[ftmp10], %[ftmp3] \n\t" - "or %[mask], %[Bd], %[csth_1] \n\t" + "por %[mask], %[Bd], %[csth_1] \n\t" "pmullh %[ftmp1], %[ftmp3], %[mask] \n\t" "pmulhuh %[D], %[ftmp9], %[ftmp1] \n\t" "pmullh %[D], %[D], %[mask] \n\t" "pcmpgth %[Ed], %[ftmp10], %[ftmp5] \n\t" - "or %[mask], %[Ed], %[csth_1] \n\t" + "por %[mask], %[Ed], %[csth_1] \n\t" "pmullh %[Ad], %[ftmp5], %[mask] \n\t" "pmulhuh %[Ad], %[ftmp8], %[Ad] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" @@ -98,14 +98,14 @@ static void idct_row_mmi(int16_t *input) LOAD_CONST(%[ftmp8], 46341) "psubh %[Ad], %[A], %[C] \n\t" "pcmpgth %[Bd], %[ftmp10], %[Ad] \n\t" - "or %[mask], %[Bd], %[csth_1] \n\t" + "por %[mask], %[Bd], %[csth_1] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" "pmulhuh %[Ad], %[ftmp8], %[Ad] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" "paddh %[Ad], %[Ad], %[Bd] \n\t" "psubh %[Bd], %[B], %[D] \n\t" "pcmpgth %[Cd], %[ftmp10], %[Bd] \n\t" - "or %[mask], %[Cd], %[csth_1] \n\t" + "por %[mask], %[Cd], %[csth_1] \n\t" "pmullh %[Bd], %[Bd], %[mask] \n\t" "pmulhuh %[Bd], %[ftmp8], %[Bd] \n\t" "pmullh %[Bd], %[Bd], %[mask] \n\t" @@ -114,14 +114,14 @@ static void idct_row_mmi(int16_t *input) "paddh %[Dd], %[B], %[D] \n\t" "paddh %[A], %[ftmp0], %[ftmp4] \n\t" "pcmpgth %[B], %[ftmp10], %[A] \n\t" - "or %[mask], %[B], %[csth_1] \n\t" + "por %[mask], %[B], %[csth_1] \n\t" "pmullh %[A], %[A], %[mask] \n\t" "pmulhuh %[A], %[ftmp8], %[A] \n\t" "pmullh %[A], %[A], %[mask] \n\t" "paddh %[A], %[A], %[B] \n\t" "psubh %[B], %[ftmp0], %[ftmp4] \n\t" "pcmpgth %[C], %[ftmp10], %[B] \n\t" - "or %[mask], %[C], %[csth_1] \n\t" + "por %[mask], %[C], %[csth_1] \n\t" "pmullh %[B], %[B], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[B] \n\t" "pmullh %[B], %[B], %[mask] \n\t" @@ -131,14 +131,14 @@ static void idct_row_mmi(int16_t *input) LOAD_CONST(%[ftmp9], 25080) "pmulhh %[C], %[ftmp9], %[ftmp6] \n\t" "pcmpgth %[D], %[ftmp10], %[ftmp2] \n\t" - "or %[mask], %[D], %[csth_1] \n\t" + "por %[mask], %[D], %[csth_1] \n\t" "pmullh %[Ed], %[ftmp2], %[mask] \n\t" "pmulhuh %[Ed], %[ftmp8], %[Ed] \n\t" "pmullh %[Ed], %[Ed], %[mask] \n\t" "paddh %[C], %[C], %[Ed] \n\t" "paddh %[C], %[C], %[D] \n\t" "pcmpgth %[Ed], %[ftmp10], %[ftmp6] \n\t" - "or %[mask], %[Ed], %[csth_1] \n\t" + "por %[mask], %[Ed], %[csth_1] \n\t" "pmullh %[ftmp6], %[ftmp6], %[mask] \n\t" "pmulhuh %[D], %[ftmp8], %[ftmp6] \n\t" "pmullh %[D], %[D], %[mask] \n\t" @@ -193,7 +193,7 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) for (int i = 0; i < 8; ++i) temp_value[i] = av_clip_uint8(128 + ((46341 * input[i << 3] + (8 << 16)) >> 20)); __asm__ volatile ( - "xor %[ftmp10], %[ftmp10], %[ftmp10] \n\t" + "pxor %[ftmp10], %[ftmp10], %[ftmp10] \n\t" "li %[tmp0], 0x02 \n\t" "1: \n\t" "ldc1 %[ftmp0], 0x00(%[input]) \n\t" @@ -213,14 +213,14 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[Gd], 1) "pmulhh %[A], %[ftmp9], %[ftmp7] \n\t" "pcmpgth %[C], %[ftmp10], %[ftmp1] \n\t" - "or %[mask], %[C], %[Gd] \n\t" + "por %[mask], %[C], %[Gd] \n\t" "pmullh %[B], %[ftmp1], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[B] \n\t" "pmullh %[B], %[B], %[mask] \n\t" "paddh %[A], %[A], %[B] \n\t" "paddh %[A], %[A], %[C] \n\t" "pcmpgth %[D], %[ftmp10], %[ftmp7] \n\t" - "or %[mask], %[D], %[Gd] \n\t" + "por %[mask], %[D], %[Gd] \n\t" "pmullh %[Ad], %[ftmp7], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[Ad] \n\t" "pmullh %[B], %[B], %[mask] \n\t" @@ -231,12 +231,12 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[ftmp8], 54491) LOAD_CONST(%[ftmp9], 36410) "pcmpgth %[Ad], %[ftmp10], %[ftmp5] \n\t" - "or %[mask], %[Ad], %[Gd] \n\t" + "por %[mask], %[Ad], %[Gd] \n\t" "pmullh %[Cd], %[ftmp5], %[mask] \n\t" "pmulhuh %[C], %[ftmp9], %[Cd] \n\t" "pmullh %[C], %[C], %[mask] \n\t" "pcmpgth %[Bd], %[ftmp10], %[ftmp3] \n\t" - "or %[mask], %[Bd], %[Gd] \n\t" + "por %[mask], %[Bd], %[Gd] \n\t" "pmullh %[D], %[ftmp3], %[mask] \n\t" "pmulhuh %[D], %[ftmp8], %[D] \n\t" "pmullh %[D], %[D], %[mask] \n\t" @@ -244,12 +244,12 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) "paddh %[C], %[C], %[Ad] \n\t" "paddh %[C], %[C], %[Bd] \n\t" "pcmpgth %[Bd], %[ftmp10], %[ftmp3] \n\t" - "or %[mask], %[Bd], %[Gd] \n\t" + "por %[mask], %[Bd], %[Gd] \n\t" "pmullh %[Cd], %[ftmp3], %[mask] \n\t" "pmulhuh %[D], %[ftmp9], %[Cd] \n\t" "pmullh %[D], %[D], %[mask] \n\t" "pcmpgth %[Ed], %[ftmp10], %[ftmp5] \n\t" - "or %[mask], %[Ed], %[Gd] \n\t" + "por %[mask], %[Ed], %[Gd] \n\t" "pmullh %[Ad], %[ftmp5], %[mask] \n\t" "pmulhuh %[Ad], %[ftmp8], %[Ad] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" @@ -260,14 +260,14 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[ftmp8], 46341) "psubh %[Ad], %[A], %[C] \n\t" "pcmpgth %[Bd], %[ftmp10], %[Ad] \n\t" - "or %[mask], %[Bd], %[Gd] \n\t" + "por %[mask], %[Bd], %[Gd] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" "pmulhuh %[Ad], %[ftmp8], %[Ad] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" "paddh %[Ad], %[Ad], %[Bd] \n\t" "psubh %[Bd], %[B], %[D] \n\t" "pcmpgth %[Cd], %[ftmp10], %[Bd] \n\t" - "or %[mask], %[Cd], %[Gd] \n\t" + "por %[mask], %[Cd], %[Gd] \n\t" "pmullh %[Bd], %[Bd], %[mask] \n\t" "pmulhuh %[Bd], %[ftmp8], %[Bd] \n\t" "pmullh %[Bd], %[Bd], %[mask] \n\t" @@ -278,7 +278,7 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[Ed], 2056) "paddh %[A], %[ftmp0], %[ftmp4] \n\t" "pcmpgth %[B], %[ftmp10], %[A] \n\t" - "or %[mask], %[B], %[Gd] \n\t" + "por %[mask], %[B], %[Gd] \n\t" "pmullh %[A], %[A], %[mask] \n\t" "pmulhuh %[A], %[ftmp8], %[A] \n\t" "pmullh %[A], %[A], %[mask] \n\t" @@ -286,7 +286,7 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) "paddh %[A], %[A], %[Ed] \n\t" "psubh %[B], %[ftmp0], %[ftmp4] \n\t" "pcmpgth %[C], %[ftmp10], %[B] \n\t" - "or %[mask], %[C], %[Gd] \n\t" + "por %[mask], %[C], %[Gd] \n\t" "pmullh %[B], %[B], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[B] \n\t" "pmullh %[B], %[B], %[mask] \n\t" @@ -297,14 +297,14 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[ftmp9], 25080) "pmulhh %[C], %[ftmp9], %[ftmp6] \n\t" "pcmpgth %[D], %[ftmp10], %[ftmp2] \n\t" - "or %[mask], %[D], %[Gd] \n\t" + "por %[mask], %[D], %[Gd] \n\t" "pmullh %[Ed], %[ftmp2], %[mask] \n\t" "pmulhuh %[Ed], %[ftmp8], %[Ed] \n\t" "pmullh %[Ed], %[Ed], %[mask] \n\t" "paddh %[C], %[C], %[Ed] \n\t" "paddh %[C], %[C], %[D] \n\t" "pcmpgth %[Ed], %[ftmp10], %[ftmp6] \n\t" - "or %[mask], %[Ed], %[Gd] \n\t" + "por %[mask], %[Ed], %[Gd] \n\t" "pmullh %[D], %[ftmp6], %[mask] \n\t" "pmulhuh %[D], %[ftmp8], %[D] \n\t" "pmullh %[D], %[D], %[mask] \n\t" @@ -317,12 +317,12 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) "psubh %[C], %[B], %[Ad] \n\t" "psubh %[B], %[Bd], %[D] \n\t" "paddh %[D], %[Bd], %[D] \n\t" - "or %[mask], %[ftmp1], %[ftmp2] \n\t" - "or %[mask], %[mask], %[ftmp3] \n\t" - "or %[mask], %[mask], %[ftmp4] \n\t" - "or %[mask], %[mask], %[ftmp5] \n\t" - "or %[mask], %[mask], %[ftmp6] \n\t" - "or %[mask], %[mask], %[ftmp7] \n\t" + "por %[mask], %[ftmp1], %[ftmp2] \n\t" + "por %[mask], %[mask], %[ftmp3] \n\t" + "por %[mask], %[mask], %[ftmp4] \n\t" + "por %[mask], %[mask], %[ftmp5] \n\t" + "por %[mask], %[mask], %[ftmp6] \n\t" + "por %[mask], %[mask], %[ftmp7] \n\t" "pcmpeqh %[mask], %[mask], %[ftmp10] \n\t" "packushb %[mask], %[mask], %[ftmp10] \n\t" "li %[tmp1], 0x04 \n\t" @@ -361,7 +361,7 @@ static void idct_column_true_mmi(uint8_t *dst, int stride, int16_t *input) "packushb %[ftmp7], %[ftmp7], %[ftmp10] \n\t" "lwc1 %[Ed], 0x00(%[temp_value]) \n\t" - "and %[Ed], %[Ed], %[mask] \n\t" + "pand %[Ed], %[Ed], %[mask] \n\t" "paddb %[ftmp0], %[ftmp0], %[Ed] \n\t" "paddb %[ftmp1], %[ftmp1], %[Ed] \n\t" "paddb %[ftmp2], %[ftmp2], %[Ed] \n\t" @@ -412,7 +412,7 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) for (int i = 0; i < 8; ++i) temp_value[i] = (46341 * input[i << 3] + (8 << 16)) >> 20; __asm__ volatile ( - "xor %[ftmp10], %[ftmp10], %[ftmp10] \n\t" + "pxor %[ftmp10], %[ftmp10], %[ftmp10] \n\t" "li %[tmp0], 0x02 \n\t" "1: \n\t" "ldc1 %[ftmp0], 0x00(%[input]) \n\t" @@ -432,14 +432,14 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[Gd], 1) "pmulhh %[A], %[ftmp9], %[ftmp7] \n\t" "pcmpgth %[C], %[ftmp10], %[ftmp1] \n\t" - "or %[mask], %[C], %[Gd] \n\t" + "por %[mask], %[C], %[Gd] \n\t" "pmullh %[B], %[ftmp1], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[B] \n\t" "pmullh %[B], %[B], %[mask] \n\t" "paddh %[A], %[A], %[B] \n\t" "paddh %[A], %[A], %[C] \n\t" "pcmpgth %[D], %[ftmp10], %[ftmp7] \n\t" - "or %[mask], %[D], %[Gd] \n\t" + "por %[mask], %[D], %[Gd] \n\t" "pmullh %[Ad], %[ftmp7], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[Ad] \n\t" "pmullh %[B], %[B], %[mask] \n\t" @@ -450,12 +450,12 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[ftmp8], 54491) LOAD_CONST(%[ftmp9], 36410) "pcmpgth %[Ad], %[ftmp10], %[ftmp5] \n\t" - "or %[mask], %[Ad], %[Gd] \n\t" + "por %[mask], %[Ad], %[Gd] \n\t" "pmullh %[Cd], %[ftmp5], %[mask] \n\t" "pmulhuh %[C], %[ftmp9], %[Cd] \n\t" "pmullh %[C], %[C], %[mask] \n\t" "pcmpgth %[Bd], %[ftmp10], %[ftmp3] \n\t" - "or %[mask], %[Bd], %[Gd] \n\t" + "por %[mask], %[Bd], %[Gd] \n\t" "pmullh %[D], %[ftmp3], %[mask] \n\t" "pmulhuh %[D], %[ftmp8], %[D] \n\t" "pmullh %[D], %[D], %[mask] \n\t" @@ -463,12 +463,12 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) "paddh %[C], %[C], %[Ad] \n\t" "paddh %[C], %[C], %[Bd] \n\t" "pcmpgth %[Bd], %[ftmp10], %[ftmp3] \n\t" - "or %[mask], %[Bd], %[Gd] \n\t" + "por %[mask], %[Bd], %[Gd] \n\t" "pmullh %[Cd], %[ftmp3], %[mask] \n\t" "pmulhuh %[D], %[ftmp9], %[Cd] \n\t" "pmullh %[D], %[D], %[mask] \n\t" "pcmpgth %[Ed], %[ftmp10], %[ftmp5] \n\t" - "or %[mask], %[Ed], %[Gd] \n\t" + "por %[mask], %[Ed], %[Gd] \n\t" "pmullh %[Ad], %[ftmp5], %[mask] \n\t" "pmulhuh %[Ad], %[ftmp8], %[Ad] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" @@ -479,14 +479,14 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[ftmp8], 46341) "psubh %[Ad], %[A], %[C] \n\t" "pcmpgth %[Bd], %[ftmp10], %[Ad] \n\t" - "or %[mask], %[Bd], %[Gd] \n\t" + "por %[mask], %[Bd], %[Gd] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" "pmulhuh %[Ad], %[ftmp8], %[Ad] \n\t" "pmullh %[Ad], %[Ad], %[mask] \n\t" "paddh %[Ad], %[Ad], %[Bd] \n\t" "psubh %[Bd], %[B], %[D] \n\t" "pcmpgth %[Cd], %[ftmp10], %[Bd] \n\t" - "or %[mask], %[Cd], %[Gd] \n\t" + "por %[mask], %[Cd], %[Gd] \n\t" "pmullh %[Bd], %[Bd], %[mask] \n\t" "pmulhuh %[Bd], %[ftmp8], %[Bd] \n\t" "pmullh %[Bd], %[Bd], %[mask] \n\t" @@ -497,7 +497,7 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[Ed], 8) "paddh %[A], %[ftmp0], %[ftmp4] \n\t" "pcmpgth %[B], %[ftmp10], %[A] \n\t" - "or %[mask], %[B], %[Gd] \n\t" + "por %[mask], %[B], %[Gd] \n\t" "pmullh %[A], %[A], %[mask] \n\t" "pmulhuh %[A], %[ftmp8], %[A] \n\t" "pmullh %[A], %[A], %[mask] \n\t" @@ -505,7 +505,7 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) "paddh %[A], %[A], %[Ed] \n\t" "psubh %[B], %[ftmp0], %[ftmp4] \n\t" "pcmpgth %[C], %[ftmp10], %[B] \n\t" - "or %[mask], %[C], %[Gd] \n\t" + "por %[mask], %[C], %[Gd] \n\t" "pmullh %[B], %[B], %[mask] \n\t" "pmulhuh %[B], %[ftmp8], %[B] \n\t" "pmullh %[B], %[B], %[mask] \n\t" @@ -516,14 +516,14 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) LOAD_CONST(%[ftmp9], 25080) "pmulhh %[C], %[ftmp9], %[ftmp6] \n\t" "pcmpgth %[D], %[ftmp10], %[ftmp2] \n\t" - "or %[mask], %[D], %[Gd] \n\t" + "por %[mask], %[D], %[Gd] \n\t" "pmullh %[Ed], %[ftmp2], %[mask] \n\t" "pmulhuh %[Ed], %[ftmp8], %[Ed] \n\t" "pmullh %[Ed], %[Ed], %[mask] \n\t" "paddh %[C], %[C], %[Ed] \n\t" "paddh %[C], %[C], %[D] \n\t" "pcmpgth %[Ed], %[ftmp10], %[ftmp6] \n\t" - "or %[mask], %[Ed], %[Gd] \n\t" + "por %[mask], %[Ed], %[Gd] \n\t" "pmullh %[D], %[ftmp6], %[mask] \n\t" "pmulhuh %[D], %[ftmp8], %[D] \n\t" "pmullh %[D], %[D], %[mask] \n\t" @@ -536,12 +536,12 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) "psubh %[C], %[B], %[Ad] \n\t" "psubh %[B], %[Bd], %[D] \n\t" "paddh %[D], %[Bd], %[D] \n\t" - "or %[mask], %[ftmp1], %[ftmp2] \n\t" - "or %[mask], %[mask], %[ftmp3] \n\t" - "or %[mask], %[mask], %[ftmp4] \n\t" - "or %[mask], %[mask], %[ftmp5] \n\t" - "or %[mask], %[mask], %[ftmp6] \n\t" - "or %[mask], %[mask], %[ftmp7] \n\t" + "por %[mask], %[ftmp1], %[ftmp2] \n\t" + "por %[mask], %[mask], %[ftmp3] \n\t" + "por %[mask], %[mask], %[ftmp4] \n\t" + "por %[mask], %[mask], %[ftmp5] \n\t" + "por %[mask], %[mask], %[ftmp6] \n\t" + "por %[mask], %[mask], %[ftmp7] \n\t" "pcmpeqh %[mask], %[mask], %[ftmp10] \n\t" "li %[tmp1], 0x04 \n\t" "dmtc1 %[tmp1], %[ftmp8] \n\t" @@ -587,16 +587,16 @@ static void idct_column_false_mmi(uint8_t *dst, int stride, int16_t *input) "punpcklbh %[Cd], %[Cd], %[ftmp10] \n\t" "punpcklbh %[Dd], %[Dd], %[ftmp10] \n\t" "ldc1 %[Ed], 0x00(%[temp_value]) \n\t" - "and %[Ed], %[Ed], %[mask] \n\t" - "nor %[mask], %[mask], %[mask] \n\t" - "and %[ftmp0], %[ftmp0], %[mask] \n\t" - "and %[ftmp1], %[ftmp1], %[mask] \n\t" - "and %[ftmp2], %[ftmp2], %[mask] \n\t" - "and %[ftmp3], %[ftmp3], %[mask] \n\t" - "and %[ftmp4], %[ftmp4], %[mask] \n\t" - "and %[ftmp5], %[ftmp5], %[mask] \n\t" - "and %[ftmp6], %[ftmp6], %[mask] \n\t" - "and %[ftmp7], %[ftmp7], %[mask] \n\t" + "pand %[Ed], %[Ed], %[mask] \n\t" + "pnor %[mask], %[mask], %[mask] \n\t" + "pand %[ftmp0], %[ftmp0], %[mask] \n\t" + "pand %[ftmp1], %[ftmp1], %[mask] \n\t" + "pand %[ftmp2], %[ftmp2], %[mask] \n\t" + "pand %[ftmp3], %[ftmp3], %[mask] \n\t" + "pand %[ftmp4], %[ftmp4], %[mask] \n\t" + "pand %[ftmp5], %[ftmp5], %[mask] \n\t" + "pand %[ftmp6], %[ftmp6], %[mask] \n\t" + "pand %[ftmp7], %[ftmp7], %[mask] \n\t" "paddh %[ftmp0], %[ftmp0], %[A] \n\t" "paddh %[ftmp1], %[ftmp1], %[B] \n\t" "paddh %[ftmp2], %[ftmp2], %[C] \n\t" @@ -689,7 +689,7 @@ void ff_vp3_idct_dc_add_mmi(uint8_t *dest, ptrdiff_t line_size, int16_t *block) double ftmp[7]; uint64_t tmp; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "mtc1 %[dc], %[ftmp5] \n\t" "pshufh %[ftmp5], %[ftmp5], %[ftmp0] \n\t" "li %[tmp0], 0x08 \n\t" @@ -734,10 +734,10 @@ void ff_put_no_rnd_pixels_l2_mmi(uint8_t *dst, const uint8_t *src1, "gsldrc1 %[ftmp1], 0x00(%[src1]) \n\t" "gsldlc1 %[ftmp2], 0x07(%[src2]) \n\t" "gsldrc1 %[ftmp2], 0x00(%[src2]) \n\t" - "xor %[ftmp3], %[ftmp1], %[ftmp2] \n\t" - "and %[ftmp3], %[ftmp3], %[ftmp4] \n\t" + "pxor %[ftmp3], %[ftmp1], %[ftmp2] \n\t" + "pand %[ftmp3], %[ftmp3], %[ftmp4] \n\t" "psrlw %[ftmp3], %[ftmp3], %[ftmp5] \n\t" - "and %[ftmp6], %[ftmp1], %[ftmp2] \n\t" + "pand %[ftmp6], %[ftmp1], %[ftmp2] \n\t" "paddw %[ftmp3], %[ftmp3], %[ftmp6] \n\t" "sdc1 %[ftmp3], 0x00(%[dst]) \n\t" PTR_ADDU "%[src1], %[src1], %[stride] \n\t" diff --git a/libavcodec/mips/vp8dsp_mmi.c b/libavcodec/mips/vp8dsp_mmi.c index ae0b555..b352906 100644 --- a/libavcodec/mips/vp8dsp_mmi.c +++ b/libavcodec/mips/vp8dsp_mmi.c @@ -38,10 +38,10 @@ "pcmpeqb %[db_1], "#src1", "#src2" \n\t" \ "pmaxub %[db_2], "#src1", "#src2" \n\t" \ "pcmpeqb %[db_2], %[db_2], "#src1" \n\t" \ - "xor "#dst", %[db_2], %[db_1] \n\t" + "pxor "#dst", %[db_2], %[db_1] \n\t" #define MMI_BTOH(dst_l, dst_r, src) \ - "xor %[db_1], %[db_1], %[db_1] \n\t" \ + "pxor %[db_1], %[db_1], %[db_1] \n\t" \ "pcmpgtb %[db_2], %[db_1], "#src" \n\t" \ "punpcklbh "#dst_r", "#src", %[db_2] \n\t" \ "punpckhbh "#dst_l", "#src", %[db_2] \n\t" @@ -84,17 +84,17 @@ "punpcklwd %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ MMI_PCMPGTUB(%[mask], %[mask], %[ftmp3]) \ "pcmpeqw %[ftmp3], %[ftmp3], %[ftmp3] \n\t" \ - "xor %[mask], %[mask], %[ftmp3] \n\t" \ + "pxor %[mask], %[mask], %[ftmp3] \n\t" \ /* VP8_MBFILTER */ \ "li %[tmp0], 0x80808080 \n\t" \ "dmtc1 %[tmp0], %[ftmp7] \n\t" \ "punpcklwd %[ftmp7], %[ftmp7], %[ftmp7] \n\t" \ - "xor %[p2], %[p2], %[ftmp7] \n\t" \ - "xor %[p1], %[p1], %[ftmp7] \n\t" \ - "xor %[p0], %[p0], %[ftmp7] \n\t" \ - "xor %[q0], %[q0], %[ftmp7] \n\t" \ - "xor %[q1], %[q1], %[ftmp7] \n\t" \ - "xor %[q2], %[q2], %[ftmp7] \n\t" \ + "pxor %[p2], %[p2], %[ftmp7] \n\t" \ + "pxor %[p1], %[p1], %[ftmp7] \n\t" \ + "pxor %[p0], %[p0], %[ftmp7] \n\t" \ + "pxor %[q0], %[q0], %[ftmp7] \n\t" \ + "pxor %[q1], %[q1], %[ftmp7] \n\t" \ + "pxor %[q2], %[q2], %[ftmp7] \n\t" \ "psubsb %[ftmp4], %[p1], %[q1] \n\t" \ "psubb %[ftmp5], %[q0], %[p0] \n\t" \ MMI_BTOH(%[ftmp1], %[ftmp0], %[ftmp5]) \ @@ -109,8 +109,8 @@ "paddh %[ftmp1], %[ftmp3], %[ftmp1] \n\t" \ /* Combine left and right part */ \ "packsshb %[ftmp1], %[ftmp0], %[ftmp1] \n\t" \ - "and %[ftmp1], %[ftmp1], %[mask] \n\t" \ - "and %[ftmp2], %[ftmp1], %[hev] \n\t" \ + "pand %[ftmp1], %[ftmp1], %[mask] \n\t" \ + "pand %[ftmp2], %[ftmp1], %[hev] \n\t" \ "li %[tmp0], 0x04040404 \n\t" \ "dmtc1 %[tmp0], %[ftmp0] \n\t" \ "punpcklwd %[ftmp0], %[ftmp0], %[ftmp0] \n\t" \ @@ -129,8 +129,8 @@ "paddsb %[p0], %[p0], %[ftmp4] \n\t" \ /* filt_val &= ~hev */ \ "pcmpeqw %[ftmp0], %[ftmp0], %[ftmp0] \n\t" \ - "xor %[hev], %[hev], %[ftmp0] \n\t" \ - "and %[ftmp1], %[ftmp1], %[hev] \n\t" \ + "pxor %[hev], %[hev], %[ftmp0] \n\t" \ + "pand %[ftmp1], %[ftmp1], %[hev] \n\t" \ MMI_BTOH(%[ftmp5], %[ftmp6], %[ftmp1]) \ "li %[tmp0], 0x07 \n\t" \ "dmtc1 %[tmp0], %[ftmp2] \n\t" \ @@ -151,9 +151,9 @@ /* Combine left and right part */ \ "packsshb %[ftmp4], %[ftmp3], %[ftmp4] \n\t" \ "psubsb %[q0], %[q0], %[ftmp4] \n\t" \ - "xor %[q0], %[q0], %[ftmp7] \n\t" \ + "pxor %[q0], %[q0], %[ftmp7] \n\t" \ "paddsb %[p0], %[p0], %[ftmp4] \n\t" \ - "xor %[p0], %[p0], %[ftmp7] \n\t" \ + "pxor %[p0], %[p0], %[ftmp7] \n\t" \ "li %[tmp0], 0x00120012 \n\t" \ "dmtc1 %[tmp0], %[ftmp1] \n\t" \ "punpcklwd %[ftmp1], %[ftmp1], %[ftmp1] \n\t" \ @@ -168,9 +168,9 @@ /* Combine left and right part */ \ "packsshb %[ftmp4], %[ftmp3], %[ftmp4] \n\t" \ "psubsb %[q1], %[q1], %[ftmp4] \n\t" \ - "xor %[q1], %[q1], %[ftmp7] \n\t" \ + "pxor %[q1], %[q1], %[ftmp7] \n\t" \ "paddsb %[p1], %[p1], %[ftmp4] \n\t" \ - "xor %[p1], %[p1], %[ftmp7] \n\t" \ + "pxor %[p1], %[p1], %[ftmp7] \n\t" \ "li %[tmp0], 0x03 \n\t" \ "dmtc1 %[tmp0], %[ftmp1] \n\t" \ /* Right part */ \ @@ -186,9 +186,9 @@ /* Combine left and right part */ \ "packsshb %[ftmp4], %[ftmp3], %[ftmp4] \n\t" \ "psubsb %[q2], %[q2], %[ftmp4] \n\t" \ - "xor %[q2], %[q2], %[ftmp7] \n\t" \ + "pxor %[q2], %[q2], %[ftmp7] \n\t" \ "paddsb %[p2], %[p2], %[ftmp4] \n\t" \ - "xor %[p2], %[p2], %[ftmp7] \n\t" + "pxor %[p2], %[p2], %[ftmp7] \n\t" #define PUT_VP8_EPEL4_H6_MMI(src, dst) \ MMI_ULWC1(%[ftmp1], src, 0x00) \ @@ -1021,7 +1021,7 @@ void ff_vp8_luma_dc_wht_mmi(int16_t block[4][4][16], int16_t dc[16]) block[3][3][0] = (dc[12] - dc[15] + 3 - dc[13] + dc[14]) >> 3; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" MMI_SDC1(%[ftmp0], %[dc], 0x00) MMI_SDC1(%[ftmp0], %[dc], 0x08) MMI_SDC1(%[ftmp0], %[dc], 0x10) @@ -1136,7 +1136,7 @@ void ff_vp8_idct_add_mmi(uint8_t *dst, int16_t block[16], ptrdiff_t stride) DECLARE_VAR_ALL64; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" MMI_LDC1(%[ftmp1], %[block], 0x00) MMI_LDC1(%[ftmp2], %[block], 0x08) MMI_LDC1(%[ftmp3], %[block], 0x10) @@ -1302,7 +1302,7 @@ void ff_vp8_idct_dc_add_mmi(uint8_t *dst, int16_t block[16], ptrdiff_t stride) block[0] = 0; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "mtc1 %[dc], %[ftmp5] \n\t" MMI_LWC1(%[ftmp1], %[dst0], 0x00) MMI_LWC1(%[ftmp2], %[dst1], 0x00) @@ -1618,7 +1618,7 @@ void ff_put_vp8_epel16_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[15] = cm[(filter[2] * src[15] - filter[1] * src[14] + filter[3] * src[16] - filter[4] * src[17] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -1685,7 +1685,7 @@ void ff_put_vp8_epel8_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[7] = cm[(filter[2] * src[7] - filter[1] * src[ 6] + filter[3] * src[8] - filter[4] * src[9] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -1742,7 +1742,7 @@ void ff_put_vp8_epel4_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[3] = cm[(filter[2] * src[3] - filter[1] * src[ 2] + filter[3] * src[4] - filter[4] * src[5] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -1811,7 +1811,7 @@ void ff_put_vp8_epel16_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[15] = cm[(filter[2]*src[15] - filter[1]*src[14] + filter[0]*src[13] + filter[3]*src[16] - filter[4]*src[17] + filter[5]*src[18] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -1879,7 +1879,7 @@ void ff_put_vp8_epel8_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[7] = cm[(filter[2]*src[7] - filter[1]*src[ 6] + filter[0]*src[ 5] + filter[3]*src[8] - filter[4]*src[9] + filter[5]*src[10] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -1937,7 +1937,7 @@ void ff_put_vp8_epel4_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[3] = cm[(filter[2]*src[3] - filter[1]*src[ 2] + filter[0]*src[ 1] + filter[3]*src[4] - filter[4]*src[5] + filter[5]*src[ 6] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2007,7 +2007,7 @@ void ff_put_vp8_epel16_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[15] = cm[(filter[2] * src[15] - filter[1] * src[15-srcstride] + filter[3] * src[15+srcstride] - filter[4] * src[15+2*srcstride] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2076,7 +2076,7 @@ void ff_put_vp8_epel8_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[7] = cm[(filter[2] * src[7] - filter[1] * src[7-srcstride] + filter[3] * src[7+srcstride] - filter[4] * src[7+2*srcstride] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2135,7 +2135,7 @@ void ff_put_vp8_epel4_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[3] = cm[(filter[2] * src[3] - filter[1] * src[3-srcstride] + filter[3] * src[3+srcstride] - filter[4] * src[3+2*srcstride] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2205,7 +2205,7 @@ void ff_put_vp8_epel16_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[15] = cm[(filter[2]*src[15] - filter[1]*src[15-srcstride] + filter[0]*src[15-2*srcstride] + filter[3]*src[15+srcstride] - filter[4]*src[15+2*srcstride] + filter[5]*src[15+3*srcstride] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2275,7 +2275,7 @@ void ff_put_vp8_epel8_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[7] = cm[(filter[2]*src[7] - filter[1]*src[7-srcstride] + filter[0]*src[7-2*srcstride] + filter[3]*src[7+srcstride] - filter[4]*src[7+2*srcstride] + filter[5]*src[7+3*srcstride] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2335,7 +2335,7 @@ void ff_put_vp8_epel4_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, dst[3] = cm[(filter[2]*src[3] - filter[1]*src[3-srcstride] + filter[0]*src[3-2*srcstride] + filter[3]*src[3+srcstride] - filter[4]*src[3+2*srcstride] + filter[5]*src[3+3*srcstride] + 64) >> 7]; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x07 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" @@ -2873,7 +2873,7 @@ void ff_put_vp8_bilinear16_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, dst[15] = (a * src[15] + b * src[16] + 4) >> 3; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" "pshufh %[a], %[a], %[ftmp0] \n\t" @@ -2940,7 +2940,7 @@ void ff_put_vp8_bilinear16_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, dst[7] = (c * src[7] + d * src[7 + sstride] + 4) >> 3; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" "pshufh %[c], %[c], %[ftmp0] \n\t" @@ -3041,7 +3041,7 @@ void ff_put_vp8_bilinear8_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, dst[7] = (a * src[7] + b * src[8] + 4) >> 3; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" "pshufh %[a], %[a], %[ftmp0] \n\t" @@ -3102,7 +3102,7 @@ void ff_put_vp8_bilinear8_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, dst[7] = (c * src[7] + d * src[7 + sstride] + 4) >> 3; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" "pshufh %[c], %[c], %[ftmp0] \n\t" @@ -3194,7 +3194,7 @@ void ff_put_vp8_bilinear4_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, dst[3] = (a * src[3] + b * src[4] + 4) >> 3; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" "pshufh %[a], %[a], %[ftmp0] \n\t" @@ -3252,7 +3252,7 @@ void ff_put_vp8_bilinear4_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, dst[3] = (c * src[3] + d * src[3 + sstride] + 4) >> 3; */ __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x03 \n\t" "mtc1 %[tmp0], %[ftmp4] \n\t" "pshufh %[c], %[c], %[ftmp0] \n\t" diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c index e7a8387..fa65ff5 100644 --- a/libavcodec/mips/vp9_mc_mmi.c +++ b/libavcodec/mips/vp9_mc_mmi.c @@ -82,7 +82,7 @@ static void convolve_horiz_mmi(const uint8_t *src, int32_t src_stride, dst_stride -= w; __asm__ volatile ( "move %[tmp1], %[width] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "gsldlc1 %[filter1], 0x03(%[filter]) \n\t" "gsldrc1 %[filter1], 0x00(%[filter]) \n\t" "gsldlc1 %[filter2], 0x0b(%[filter]) \n\t" @@ -157,7 +157,7 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride, dst_stride -= w; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "gsldlc1 %[ftmp4], 0x03(%[filter]) \n\t" "gsldrc1 %[ftmp4], 0x00(%[filter]) \n\t" "gsldlc1 %[ftmp5], 0x0b(%[filter]) \n\t" @@ -253,7 +253,7 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "move %[tmp1], %[width] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "gsldlc1 %[filter1], 0x03(%[filter]) \n\t" "gsldrc1 %[filter1], 0x00(%[filter]) \n\t" "gsldlc1 %[filter2], 0x0b(%[filter]) \n\t" @@ -339,7 +339,7 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride, dst_stride -= w; __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "gsldlc1 %[ftmp4], 0x03(%[filter]) \n\t" "gsldrc1 %[ftmp4], 0x00(%[filter]) \n\t" "gsldlc1 %[ftmp5], 0x0b(%[filter]) \n\t" @@ -444,7 +444,7 @@ static void convolve_avg_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "move %[tmp1], %[width] \n\t" - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" "li %[tmp0], 0x10001 \n\t" "dmtc1 %[tmp0], %[ftmp3] \n\t" "punpcklhw %[ftmp3], %[ftmp3], %[ftmp3] \n\t" diff --git a/libavcodec/mips/wmv2dsp_mmi.c b/libavcodec/mips/wmv2dsp_mmi.c index 82e16f9..1a6781a 100644 --- a/libavcodec/mips/wmv2dsp_mmi.c +++ b/libavcodec/mips/wmv2dsp_mmi.c @@ -106,7 +106,7 @@ void ff_wmv2_idct_add_mmi(uint8_t *dest, ptrdiff_t line_size, int16_t *block) wmv2_idct_col_mmi(block + i); __asm__ volatile ( - "xor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" + "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" // low 4 loop MMI_LDC1(%[ftmp1], %[block], 0x00) From patchwork Fri May 28 02:04:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6YeR5rOi?= X-Patchwork-Id: 27961 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:b214:0:0:0:0:0 with SMTP id b20csp131254iof; Thu, 27 May 2021 19:05:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoQirYCCX8+aIZoD4XpkbCr78bseeVBgf3HrygGLAXITC3YrEMPq5/0z8R+KNADd5z9prO X-Received: by 2002:aa7:d30f:: with SMTP id p15mr7345474edq.264.1622167532615; Thu, 27 May 2021 19:05:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622167532; cv=none; d=google.com; s=arc-20160816; b=hEuSzPCMK0/n4rkHTwrj2luyReBS+Ka9nDNqtMDw15JCpsaWBhucZQh6gF8+b7jK9z gNJv0UfV2StrpaeztEcwgRScBoNt+aUNC20OOpuqSZ/nbBF+cRSPMg6wv6rWJGvJNmqZ tLo2uZcBfg8Ks5MOjZxfr7NetrwCjelM0RpiZFCSrE/pVLd/MIStS5iml9OD4csumxNx ewFVTGXugQHVyKRN721lD/XxHlpOJGgV6aZF7TBRsAEyq2aGgROCzQj/M50qXobQ6fQB O0QecG1sHxKo3FuD6F3/57WiaFbsz2qslu+QLh8kf9xhjKkESYqqMYgDnLwP2Kn1u2Sm Y+5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=Z2J+a9gwZlAZZ1Gcv5a15eX/2h4QXfLLWnwsg6w81S4=; b=nmSrFjLgtvbeeDNi4NUJMvL275qcGZ/bpT6yHtmGB62FSrLxGIARgBSVIQp/XT7Fn8 FVFUue4LmwRU8YFN+w6OeEvOTCwFnWf8G51MW2GH+tkDE8J7IxWVhRUk1MSi1DJ4aRrV nZ+p76Zs1hCDNwXPIrG1Nw5rrFKpIpfx8+erxd8h5jXTrT/bqRH2uwhcOuwQT4vz/tjc RywZIEUBac/ffM3joEGfIXSy95AIcw4wlFjgGeqaywk4s2hUVSvgxOpV4mpmD/WW2rOo 1cTl/vVBGUI/Z2eayKnExYYZ9moh1nLKQ30mMcM+Bey8JogLlrs5R4hS9CpL37XmIgu4 3rzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id bx25si3392134edb.505.2021.05.27.19.05.08; Thu, 27 May 2021 19:05:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 550E668A044; Fri, 28 May 2021 05:05:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D617F689F27 for ; Fri, 28 May 2021 05:04:53 +0300 (EEST) Received: from localhost (unknown [36.33.26.144]) by mail.loongson.cn (Coremail) with SMTP id AQAAf9Dxf0LAT7BgNpQFAA--.5320S3; Fri, 28 May 2021 10:04:48 +0800 (CST) From: Jin Bo To: ffmpeg-devel@ffmpeg.org Date: Fri, 28 May 2021 10:04:40 +0800 Message-Id: <1622167481-10973-2-git-send-email-jinbo@loongson.cn> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1622167481-10973-1-git-send-email-jinbo@loongson.cn> References: <1622167481-10973-1-git-send-email-jinbo@loongson.cn> X-CM-TRANSID: AQAAf9Dxf0LAT7BgNpQFAA--.5320S3 X-Coremail-Antispam: 1UD129KBjvAXoWDAr17JF4fCw1xJr4kAr4fZrb_yoW3XF1fWo Z8X395ZrWkW3s7WF1vyr1UtF4UAr4kt3WUWFWrJw12kF93Xa45ArWFkrn3AFnFqr48Ca45 XFy8t395CFnrtFykn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUU567AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2 x7M28EF7xvwVC0I7IYx2IY67AKxVW5JVW7JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVWx JVW8Jr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxV WxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2Wl Yx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbV WUJVW8JwACjI8F5VA0II8E6IAqYI8I648v4I1lc2xSY4AK67AK6r4UMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUXVWUAwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8I cVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aV AFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuY vjfUYc_-DUUUU X-CM-SenderInfo: xmlqu0o6or00hjvr0hdfq/1tbiAQAPEl3QvNWC-gAEsA Subject: [FFmpeg-devel] [PATCH 2/3] libavcodec/mips: Fix build errors reported by clang X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Jin Bo MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: zdS4/gWwza1l Clang is more strict on the type of asm operands, float or double type variable should use constraint 'f', integer variable should use constraint 'r'. Signed-off-by: Jin Bo --- libavcodec/mips/constants.c | 89 +++++++------ libavcodec/mips/constants.h | 88 +++++++------ libavcodec/mips/h264chroma_mmi.c | 157 +++++++++++------------ libavcodec/mips/h264dsp_mmi.c | 20 +-- libavcodec/mips/h264pred_mmi.c | 23 ++-- libavcodec/mips/h264qpel_mmi.c | 34 ++--- libavcodec/mips/hevcdsp_mmi.c | 59 +++++---- libavcodec/mips/idctdsp_mmi.c | 2 +- libavcodec/mips/mpegvideo_mmi.c | 20 +-- libavcodec/mips/vc1dsp_mmi.c | 176 +++++++++++++------------- libavcodec/mips/vp8dsp_mmi.c | 263 +++++++++++++++++++++++++++++---------- libavutil/mips/asmdefs.h | 8 ++ 12 files changed, 536 insertions(+), 403 deletions(-) diff --git a/libavcodec/mips/constants.c b/libavcodec/mips/constants.c index 8c990b6..6a8f1a5 100644 --- a/libavcodec/mips/constants.c +++ b/libavcodec/mips/constants.c @@ -19,50 +19,49 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ -#include "config.h" -#include "libavutil/mem_internal.h" +#include "libavutil/intfloat.h" #include "constants.h" -DECLARE_ALIGNED(8, const uint64_t, ff_pw_1) = {0x0001000100010001ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_2) = {0x0002000200020002ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_3) = {0x0003000300030003ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_4) = {0x0004000400040004ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_5) = {0x0005000500050005ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_6) = {0x0006000600060006ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_8) = {0x0008000800080008ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_9) = {0x0009000900090009ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_10) = {0x000A000A000A000AULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_12) = {0x000C000C000C000CULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_15) = {0x000F000F000F000FULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_16) = {0x0010001000100010ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_17) = {0x0011001100110011ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_18) = {0x0012001200120012ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_20) = {0x0014001400140014ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_22) = {0x0016001600160016ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_28) = {0x001C001C001C001CULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_32) = {0x0020002000200020ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_53) = {0x0035003500350035ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_64) = {0x0040004000400040ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_128) = {0x0080008000800080ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_512) = {0x0200020002000200ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_m8tom5) = {0xFFFBFFFAFFF9FFF8ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_m4tom1) = {0xFFFFFFFEFFFDFFFCULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_1to4) = {0x0004000300020001ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_5to8) = {0x0008000700060005ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_0to3) = {0x0003000200010000ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_4to7) = {0x0007000600050004ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_8tob) = {0x000b000a00090008ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pw_ctof) = {0x000f000e000d000cULL}; - -DECLARE_ALIGNED(8, const uint64_t, ff_pb_1) = {0x0101010101010101ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pb_3) = {0x0303030303030303ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pb_80) = {0x8080808080808080ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pb_A1) = {0xA1A1A1A1A1A1A1A1ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_pb_FE) = {0xFEFEFEFEFEFEFEFEULL}; - -DECLARE_ALIGNED(8, const uint64_t, ff_rnd) = {0x0004000400040004ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_rnd2) = {0x0040004000400040ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_rnd3) = {0x0020002000200020ULL}; - -DECLARE_ALIGNED(8, const uint64_t, ff_wm1010) = {0xFFFF0000FFFF0000ULL}; -DECLARE_ALIGNED(8, const uint64_t, ff_d40000) = {0x0000000000040000ULL}; +union av_intfloat64 ff_pw_1 = {0x0001000100010001ULL}; +union av_intfloat64 ff_pw_2 = {0x0002000200020002ULL}; +union av_intfloat64 ff_pw_3 = {0x0003000300030003ULL}; +union av_intfloat64 ff_pw_4 = {0x0004000400040004ULL}; +union av_intfloat64 ff_pw_5 = {0x0005000500050005ULL}; +union av_intfloat64 ff_pw_6 = {0x0006000600060006ULL}; +union av_intfloat64 ff_pw_8 = {0x0008000800080008ULL}; +union av_intfloat64 ff_pw_9 = {0x0009000900090009ULL}; +union av_intfloat64 ff_pw_10 = {0x000A000A000A000AULL}; +union av_intfloat64 ff_pw_12 = {0x000C000C000C000CULL}; +union av_intfloat64 ff_pw_15 = {0x000F000F000F000FULL}; +union av_intfloat64 ff_pw_16 = {0x0010001000100010ULL}; +union av_intfloat64 ff_pw_17 = {0x0011001100110011ULL}; +union av_intfloat64 ff_pw_18 = {0x0012001200120012ULL}; +union av_intfloat64 ff_pw_20 = {0x0014001400140014ULL}; +union av_intfloat64 ff_pw_22 = {0x0016001600160016ULL}; +union av_intfloat64 ff_pw_28 = {0x001C001C001C001CULL}; +union av_intfloat64 ff_pw_32 = {0x0020002000200020ULL}; +union av_intfloat64 ff_pw_53 = {0x0035003500350035ULL}; +union av_intfloat64 ff_pw_64 = {0x0040004000400040ULL}; +union av_intfloat64 ff_pw_128 = {0x0080008000800080ULL}; +union av_intfloat64 ff_pw_512 = {0x0200020002000200ULL}; +union av_intfloat64 ff_pw_m8tom5 = {0xFFFBFFFAFFF9FFF8ULL}; +union av_intfloat64 ff_pw_m4tom1 = {0xFFFFFFFEFFFDFFFCULL}; +union av_intfloat64 ff_pw_1to4 = {0x0004000300020001ULL}; +union av_intfloat64 ff_pw_5to8 = {0x0008000700060005ULL}; +union av_intfloat64 ff_pw_0to3 = {0x0003000200010000ULL}; +union av_intfloat64 ff_pw_4to7 = {0x0007000600050004ULL}; +union av_intfloat64 ff_pw_8tob = {0x000b000a00090008ULL}; +union av_intfloat64 ff_pw_ctof = {0x000f000e000d000cULL}; +union av_intfloat64 ff_pw_32_1 = {0x0000000100000001ULL}; +union av_intfloat64 ff_pw_32_4 = {0x0000000400000004ULL}; +union av_intfloat64 ff_pw_32_64 = {0x0000004000000040ULL}; +union av_intfloat64 ff_pb_1 = {0x0101010101010101ULL}; +union av_intfloat64 ff_pb_3 = {0x0303030303030303ULL}; +union av_intfloat64 ff_pb_80 = {0x8080808080808080ULL}; +union av_intfloat64 ff_pb_A1 = {0xA1A1A1A1A1A1A1A1ULL}; +union av_intfloat64 ff_pb_FE = {0xFEFEFEFEFEFEFEFEULL}; +union av_intfloat64 ff_rnd = {0x0004000400040004ULL}; +union av_intfloat64 ff_rnd2 = {0x0040004000400040ULL}; +union av_intfloat64 ff_rnd3 = {0x0020002000200020ULL}; +union av_intfloat64 ff_ff_wm1010 = {0xFFFF0000FFFF0000ULL}; +union av_intfloat64 ff_d40000 = {0x0000000000040000ULL}; diff --git a/libavcodec/mips/constants.h b/libavcodec/mips/constants.h index 2604559..bd86cd1 100644 --- a/libavcodec/mips/constants.h +++ b/libavcodec/mips/constants.h @@ -22,50 +22,48 @@ #ifndef AVCODEC_MIPS_CONSTANTS_H #define AVCODEC_MIPS_CONSTANTS_H -#include - -extern const uint64_t ff_pw_1; -extern const uint64_t ff_pw_2; -extern const uint64_t ff_pw_3; -extern const uint64_t ff_pw_4; -extern const uint64_t ff_pw_5; -extern const uint64_t ff_pw_6; -extern const uint64_t ff_pw_8; -extern const uint64_t ff_pw_9; -extern const uint64_t ff_pw_10; -extern const uint64_t ff_pw_12; -extern const uint64_t ff_pw_15; -extern const uint64_t ff_pw_16; -extern const uint64_t ff_pw_17; -extern const uint64_t ff_pw_18; -extern const uint64_t ff_pw_20; -extern const uint64_t ff_pw_22; -extern const uint64_t ff_pw_28; -extern const uint64_t ff_pw_32; -extern const uint64_t ff_pw_53; -extern const uint64_t ff_pw_64; -extern const uint64_t ff_pw_128; -extern const uint64_t ff_pw_512; -extern const uint64_t ff_pw_m8tom5; -extern const uint64_t ff_pw_m4tom1; -extern const uint64_t ff_pw_1to4; -extern const uint64_t ff_pw_5to8; -extern const uint64_t ff_pw_0to3; -extern const uint64_t ff_pw_4to7; -extern const uint64_t ff_pw_8tob; -extern const uint64_t ff_pw_ctof; - -extern const uint64_t ff_pb_1; -extern const uint64_t ff_pb_3; -extern const uint64_t ff_pb_80; -extern const uint64_t ff_pb_A1; -extern const uint64_t ff_pb_FE; - -extern const uint64_t ff_rnd; -extern const uint64_t ff_rnd2; -extern const uint64_t ff_rnd3; - -extern const uint64_t ff_wm1010; -extern const uint64_t ff_d40000; +extern union av_intfloat64 ff_pw_1; +extern union av_intfloat64 ff_pw_2; +extern union av_intfloat64 ff_pw_3; +extern union av_intfloat64 ff_pw_4; +extern union av_intfloat64 ff_pw_5; +extern union av_intfloat64 ff_pw_6; +extern union av_intfloat64 ff_pw_8; +extern union av_intfloat64 ff_pw_9; +extern union av_intfloat64 ff_pw_10; +extern union av_intfloat64 ff_pw_12; +extern union av_intfloat64 ff_pw_15; +extern union av_intfloat64 ff_pw_16; +extern union av_intfloat64 ff_pw_17; +extern union av_intfloat64 ff_pw_18; +extern union av_intfloat64 ff_pw_20; +extern union av_intfloat64 ff_pw_22; +extern union av_intfloat64 ff_pw_28; +extern union av_intfloat64 ff_pw_32; +extern union av_intfloat64 ff_pw_53; +extern union av_intfloat64 ff_pw_64; +extern union av_intfloat64 ff_pw_128; +extern union av_intfloat64 ff_pw_512; +extern union av_intfloat64 ff_pw_m8tom5; +extern union av_intfloat64 ff_pw_m4tom1; +extern union av_intfloat64 ff_pw_1to4; +extern union av_intfloat64 ff_pw_5to8; +extern union av_intfloat64 ff_pw_0to3; +extern union av_intfloat64 ff_pw_4to7; +extern union av_intfloat64 ff_pw_8tob; +extern union av_intfloat64 ff_pw_ctof; +extern union av_intfloat64 ff_pw_32_1; +extern union av_intfloat64 ff_pw_32_4; +extern union av_intfloat64 ff_pw_32_64; +extern union av_intfloat64 ff_pb_1; +extern union av_intfloat64 ff_pb_3; +extern union av_intfloat64 ff_pb_80; +extern union av_intfloat64 ff_pb_A1; +extern union av_intfloat64 ff_pb_FE; +extern union av_intfloat64 ff_rnd; +extern union av_intfloat64 ff_rnd2; +extern union av_intfloat64 ff_rnd3; +extern union av_intfloat64 ff_wm1010; +extern union av_intfloat64 ff_d40000; #endif /* AVCODEC_MIPS_CONSTANTS_H */ diff --git a/libavcodec/mips/h264chroma_mmi.c b/libavcodec/mips/h264chroma_mmi.c index dbcba10..cc2d7cb 100644 --- a/libavcodec/mips/h264chroma_mmi.c +++ b/libavcodec/mips/h264chroma_mmi.c @@ -29,12 +29,12 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int h, int x, int y) { - int A = 64, B, C, D, E; double ftmp[12]; - uint64_t tmp[1]; + union mmi_intfloat64 A, B, C, D, E; + A.i = 64; if (!(x || y)) { - /* x=0, y=0, A=64 */ + /* x=0, y=0, A.i=64 */ __asm__ volatile ( "1: \n\t" MMI_ULDC1(%[ftmp0], %[src], 0x00) @@ -66,14 +66,13 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, ); } else if (x && y) { /* x!=0, y!=0 */ - D = x * y; - B = (x << 3) - D; - C = (y << 3) - D; - A = 64 - D - B - C; + D.i = x * y; + B.i = (x << 3) - D.i; + C.i = (y << 3) - D.i; + A.i = 64 - D.i - B.i - C.i; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp9] \n\t" @@ -158,22 +157,21 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), [ftmp8]"=&f"(ftmp[8]), [ftmp9]"=&f"(ftmp[9]), [ftmp10]"=&f"(ftmp[10]), [ftmp11]"=&f"(ftmp[11]), - [tmp0]"=&r"(tmp[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) - : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D) + : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32.f), + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [tmp0]"r"(0x06) : "memory" ); } else if (x) { /* x!=0, y==0 */ - E = x << 3; - A = 64 - E; + E.i = x << 3; + A.i = 64 - E.i; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp7] \n\t" @@ -207,22 +205,20 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [tmp0]"=&r"(tmp[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [E]"f"(E) + [ff_pw_32]"f"(ff_pw_32.f), [tmp0]"r"(0x06), + [A]"f"(A.f), [E]"f"(E.f) : "memory" ); } else { /* x==0, y!=0 */ - E = y << 3; - A = 64 - E; + E.i = y << 3; + A.i = 64 - E.i; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp7] \n\t" @@ -276,12 +272,12 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [ftmp8]"=&f"(ftmp[8]), [tmp0]"=&r"(tmp[0]), + [ftmp8]"=&f"(ftmp[8]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [E]"f"(E) + [ff_pw_32]"f"(ff_pw_32.f), [A]"f"(A.f), + [E]"f"(E.f), [tmp0]"r"(0x06) : "memory" ); } @@ -290,12 +286,12 @@ void ff_put_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int h, int x, int y) { - int A = 64, B, C, D, E; double ftmp[10]; - uint64_t tmp[1]; + union mmi_intfloat64 A, B, C, D, E; + A.i = 64; if(!(x || y)){ - /* x=0, y=0, A=64 */ + /* x=0, y=0, A.i=64 */ __asm__ volatile ( "1: \n\t" MMI_ULDC1(%[ftmp0], %[src], 0x00) @@ -323,13 +319,12 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, ); } else if (x && y) { /* x!=0, y!=0 */ - D = x * y; - B = (x << 3) - D; - C = (y << 3) - D; - A = 64 - D - B - C; + D.i = x * y; + B.i = (x << 3) - D.i; + C.i = (y << 3) - D.i; + A.i = 64 - D.i - B.i - C.i; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp9] \n\t" @@ -383,21 +378,20 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), [ftmp8]"=&f"(ftmp[8]), [ftmp9]"=&f"(ftmp[9]), - [tmp0]"=&r"(tmp[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) - : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D) + : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32.f), + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [tmp0]"r"(0x06) : "memory" ); } else if (x) { /* x!=0, y==0 */ - E = x << 3; - A = 64 - E; + E.i = x << 3; + A.i = 64 - E.i; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp7] \n\t" @@ -433,21 +427,19 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [tmp0]"=&r"(tmp[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [E]"f"(E) + [ff_pw_32]"f"(ff_pw_32.f), [tmp0]"r"(0x06), + [A]"f"(A.f), [E]"f"(E.f) : "memory" ); } else { /* x==0, y!=0 */ - E = y << 3; - A = 64 - E; + E.i = y << 3; + A.i = 64 - E.i; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp7] \n\t" @@ -469,8 +461,8 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, "pmullh %[ftmp6], %[ftmp6], %[E] \n\t" "paddh %[ftmp2], %[ftmp4], %[ftmp6] \n\t" - "paddh %[ftmp1], %[ftmp1], %[ff_pw_32] \n\t" - "paddh %[ftmp2], %[ftmp2], %[ff_pw_32] \n\t" + "paddh %[ftmp1], %[ftmp1], %[ff_pw_32] \n\t" + "paddh %[ftmp2], %[ftmp2], %[ff_pw_32] \n\t" "psrlh %[ftmp1], %[ftmp1], %[ftmp7] \n\t" "psrlh %[ftmp2], %[ftmp2], %[ftmp7] \n\t" "packushb %[ftmp1], %[ftmp1], %[ftmp2] \n\t" @@ -483,12 +475,11 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [tmp0]"=&r"(tmp[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [E]"f"(E) + [ff_pw_32]"f"(ff_pw_32.f), [tmp0]"r"(0x06), + [A]"f"(A.f), [E]"f"(E.f) : "memory" ); } @@ -497,20 +488,19 @@ void ff_avg_h264_chroma_mc8_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, void ff_put_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int h, int x, int y) { - const int A = (8 - x) * (8 - y); - const int B = x * (8 - y); - const int C = (8 - x) * y; - const int D = x * y; - const int E = B + C; double ftmp[8]; - uint64_t tmp[1]; mips_reg addr[1]; + union mmi_intfloat64 A, B, C, D, E; DECLARE_VAR_LOW32; + A.i = (8 - x) * (8 - y); + B.i = x * (8 - y); + C.i = (8 - x) * y; + D.i = x * y; + E.i = B.i + C.i; - if (D) { + if (D.i) { __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp7] \n\t" @@ -547,20 +537,19 @@ void ff_put_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [tmp0]"=&r"(tmp[0]), RESTRICT_ASM_LOW32 [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) - : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D) + : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32.f), + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [tmp0]"r"(0x06) : "memory" ); - } else if (E) { - const int step = C ? stride : 1; + } else if (E.i) { + const int step = C.i ? stride : 1; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp5] \n\t" @@ -585,14 +574,13 @@ void ff_put_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), - [tmp0]"=&r"(tmp[0]), RESTRICT_ASM_LOW32 [addr0]"=&r"(addr[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) : [stride]"r"((mips_reg)stride),[step]"r"((mips_reg)step), - [ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [E]"f"(E) + [ff_pw_32]"f"(ff_pw_32.f), [tmp0]"r"(0x06), + [A]"f"(A.f), [E]"f"(E.f) : "memory" ); } else { @@ -621,20 +609,19 @@ void ff_put_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, void ff_avg_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int h, int x, int y) { - const int A = (8 - x) *(8 - y); - const int B = x * (8 - y); - const int C = (8 - x) * y; - const int D = x * y; - const int E = B + C; double ftmp[8]; - uint64_t tmp[1]; mips_reg addr[1]; + union mmi_intfloat64 A, B, C, D, E; DECLARE_VAR_LOW32; + A.i = (8 - x) *(8 - y); + B.i = x * (8 - y); + C.i = (8 - x) * y; + D.i = x * y; + E.i = B.i + C.i; - if (D) { + if (D.i) { __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[B], %[B], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp7] \n\t" @@ -673,20 +660,19 @@ void ff_avg_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), [ftmp6]"=&f"(ftmp[6]), [ftmp7]"=&f"(ftmp[7]), - [tmp0]"=&r"(tmp[0]), RESTRICT_ASM_LOW32 [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) - : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D) + : [stride]"r"((mips_reg)stride),[ff_pw_32]"f"(ff_pw_32.f), + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [tmp0]"r"(0x06) : "memory" ); - } else if (E) { - const int step = C ? stride : 1; + } else if (E.i) { + const int step = C.i ? stride : 1; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "dli %[tmp0], 0x06 \n\t" "pshufh %[A], %[A], %[ftmp0] \n\t" "pshufh %[E], %[E], %[ftmp0] \n\t" "mtc1 %[tmp0], %[ftmp5] \n\t" @@ -713,14 +699,13 @@ void ff_avg_h264_chroma_mc4_mmi(uint8_t *dst, uint8_t *src, ptrdiff_t stride, : [ftmp0]"=&f"(ftmp[0]), [ftmp1]"=&f"(ftmp[1]), [ftmp2]"=&f"(ftmp[2]), [ftmp3]"=&f"(ftmp[3]), [ftmp4]"=&f"(ftmp[4]), [ftmp5]"=&f"(ftmp[5]), - [tmp0]"=&r"(tmp[0]), RESTRICT_ASM_LOW32 [addr0]"=&r"(addr[0]), [dst]"+&r"(dst), [src]"+&r"(src), [h]"+&r"(h) : [stride]"r"((mips_reg)stride),[step]"r"((mips_reg)step), - [ff_pw_32]"f"(ff_pw_32), - [A]"f"(A), [E]"f"(E) + [ff_pw_32]"f"(ff_pw_32.f), [tmp0]"r"(0x06), + [A]"f"(A.f), [E]"f"(E.f) : "memory" ); } else { diff --git a/libavcodec/mips/h264dsp_mmi.c b/libavcodec/mips/h264dsp_mmi.c index fe12b28..6e77995 100644 --- a/libavcodec/mips/h264dsp_mmi.c +++ b/libavcodec/mips/h264dsp_mmi.c @@ -162,7 +162,7 @@ void ff_h264_idct_add_8_mmi(uint8_t *dst, int16_t *block, int stride) RESTRICT_ASM_ADDRT [tmp0]"=&r"(tmp[0]) : [dst]"r"(dst), [block]"r"(block), - [stride]"r"((mips_reg)stride), [ff_pw_32]"f"(ff_pw_32) + [stride]"r"((mips_reg)stride), [ff_pw_32]"f"(ff_pw_32.f) : "memory" ); @@ -1078,7 +1078,7 @@ void ff_h264_luma_dc_dequant_idct_8_mmi(int16_t *output, int16_t *input, RESTRICT_ASM_ALL64 [output]"+&r"(output), [input]"+&r"(input), [qmul]"+&r"(qmul) - : [ff_pw_1]"f"(ff_pw_1) + : [ff_pw_1]"f"(ff_pw_1.f) : "memory" ); } @@ -1556,8 +1556,8 @@ void ff_deblock_v8_luma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int bet [addr0]"=&r"(addr[0]), [addr1]"=&r"(addr[1]) : [pix]"r"(pix), [stride]"r"((mips_reg)stride), [alpha]"r"((mips_reg)alpha), [beta]"r"((mips_reg)beta), - [tc0]"r"(tc0), [ff_pb_1]"f"(ff_pb_1), - [ff_pb_3]"f"(ff_pb_3), [ff_pb_A1]"f"(ff_pb_A1) + [tc0]"r"(tc0), [ff_pb_1]"f"(ff_pb_1.f), + [ff_pb_3]"f"(ff_pb_3.f), [ff_pb_A1]"f"(ff_pb_A1.f) : "memory" ); } @@ -1866,8 +1866,8 @@ void ff_deblock_v_chroma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, [addr0]"=&r"(addr[0]) : [pix]"r"(pix), [stride]"r"((mips_reg)stride), [alpha]"r"(alpha), [beta]"r"(beta), - [tc0]"r"(tc0), [ff_pb_1]"f"(ff_pb_1), - [ff_pb_3]"f"(ff_pb_3), [ff_pb_A1]"f"(ff_pb_A1) + [tc0]"r"(tc0), [ff_pb_1]"f"(ff_pb_1.f), + [ff_pb_3]"f"(ff_pb_3.f), [ff_pb_A1]"f"(ff_pb_A1.f) : "memory" ); } @@ -1945,7 +1945,7 @@ void ff_deblock_v_chroma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, [addr0]"=&r"(addr[0]) : [pix]"r"(pix), [stride]"r"((mips_reg)stride), [alpha]"r"(alpha), [beta]"r"(beta), - [ff_pb_1]"f"(ff_pb_1) + [ff_pb_1]"f"(ff_pb_1.f) : "memory" ); } @@ -2084,8 +2084,8 @@ void ff_deblock_h_chroma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int be [pix]"+&r"(pix) : [alpha]"r"(alpha), [beta]"r"(beta), [stride]"r"((mips_reg)stride), [tc0]"r"(tc0), - [ff_pb_1]"f"(ff_pb_1), [ff_pb_3]"f"(ff_pb_3), - [ff_pb_A1]"f"(ff_pb_A1) + [ff_pb_1]"f"(ff_pb_1.f), [ff_pb_3]"f"(ff_pb_3.f), + [ff_pb_A1]"f"(ff_pb_A1.f) : "memory" ); } @@ -2218,7 +2218,7 @@ void ff_deblock_h_chroma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, [addr4]"=&r"(addr[4]), [addr5]"=&r"(addr[5]), [pix]"+&r"(pix) : [alpha]"r"(alpha), [beta]"r"(beta), - [stride]"r"((mips_reg)stride), [ff_pb_1]"f"(ff_pb_1) + [stride]"r"((mips_reg)stride), [ff_pb_1]"f"(ff_pb_1.f) : "memory" ); } diff --git a/libavcodec/mips/h264pred_mmi.c b/libavcodec/mips/h264pred_mmi.c index f8947a0..480411f 100644 --- a/libavcodec/mips/h264pred_mmi.c +++ b/libavcodec/mips/h264pred_mmi.c @@ -155,9 +155,9 @@ void ff_pred16x16_dc_8_mmi(uint8_t *src, ptrdiff_t stride) void ff_pred8x8l_top_dc_8_mmi(uint8_t *src, int has_topleft, int has_topright, ptrdiff_t stride) { - uint32_t dc; double ftmp[11]; mips_reg tmp[3]; + union av_intfloat64 dc; DECLARE_VAR_ALL64; DECLARE_VAR_ADDRT; @@ -209,12 +209,12 @@ void ff_pred8x8l_top_dc_8_mmi(uint8_t *src, int has_topleft, [ftmp10]"=&f"(ftmp[10]), [tmp0]"=&r"(tmp[0]), [tmp1]"=&r"(tmp[1]), RESTRICT_ASM_ALL64 - [dc]"=r"(dc) + [dc]"=r"(dc.i) : [srcA]"r"((mips_reg)(src-stride-1)), [src0]"r"((mips_reg)(src-stride)), [src1]"r"((mips_reg)(src-stride+1)), [has_topleft]"r"(has_topleft), [has_topright]"r"(has_topright), - [ff_pb_1]"r"(ff_pb_1), [ff_pw_2]"f"(ff_pw_2) + [ff_pb_1]"r"(ff_pb_1.i), [ff_pw_2]"f"(ff_pw_2.f) : "memory" ); @@ -238,7 +238,7 @@ void ff_pred8x8l_top_dc_8_mmi(uint8_t *src, int has_topleft, RESTRICT_ASM_ALL64 RESTRICT_ASM_ADDRT [src]"+&r"(src) - : [dc]"f"(dc), [stride]"r"((mips_reg)stride) + : [dc]"f"(dc.f), [stride]"r"((mips_reg)stride) : "memory" ); } @@ -246,9 +246,10 @@ void ff_pred8x8l_top_dc_8_mmi(uint8_t *src, int has_topleft, void ff_pred8x8l_dc_8_mmi(uint8_t *src, int has_topleft, int has_topright, ptrdiff_t stride) { - uint32_t dc, dc1, dc2; + uint32_t dc1, dc2; double ftmp[14]; mips_reg tmp[1]; + union av_intfloat64 dc; const int l0 = ((has_topleft ? src[-1+-1*stride] : src[-1+0*stride]) + 2*src[-1+0*stride] + src[-1+1*stride] + 2) >> 2; const int l1 = (src[-1+0*stride] + 2*src[-1+1*stride] + src[-1+2*stride] + 2) >> 2; @@ -322,7 +323,7 @@ void ff_pred8x8l_dc_8_mmi(uint8_t *src, int has_topleft, int has_topright, ); dc1 = l0+l1+l2+l3+l4+l5+l6+l7; - dc = ((dc1+dc2+8)>>4)*0x01010101U; + dc.i = ((dc1+dc2+8)>>4)*0x01010101U; __asm__ volatile ( "dli %[tmp0], 0x02 \n\t" @@ -344,7 +345,7 @@ void ff_pred8x8l_dc_8_mmi(uint8_t *src, int has_topleft, int has_topright, RESTRICT_ASM_ALL64 RESTRICT_ASM_ADDRT [src]"+&r"(src) - : [dc]"f"(dc), [stride]"r"((mips_reg)stride) + : [dc]"f"(dc.f), [stride]"r"((mips_reg)stride) : "memory" ); } @@ -965,10 +966,10 @@ static inline void pred16x16_plane_compat_mmi(uint8_t *src, int stride, [addr0]"=&r"(addr[0]) : [src]"r"(src), [stride]"r"((mips_reg)stride), [svq3]"r"(svq3), [rv40]"r"(rv40), - [ff_pw_m8tom5]"f"(ff_pw_m8tom5), [ff_pw_m4tom1]"f"(ff_pw_m4tom1), - [ff_pw_1to4]"f"(ff_pw_1to4), [ff_pw_5to8]"f"(ff_pw_5to8), - [ff_pw_0to3]"f"(ff_pw_0to3), [ff_pw_4to7]"r"(ff_pw_4to7), - [ff_pw_8tob]"r"(ff_pw_8tob), [ff_pw_ctof]"r"(ff_pw_ctof) + [ff_pw_m8tom5]"f"(ff_pw_m8tom5.f),[ff_pw_m4tom1]"f"(ff_pw_m4tom1.f), + [ff_pw_1to4]"f"(ff_pw_1to4.f), [ff_pw_5to8]"f"(ff_pw_5to8.f), + [ff_pw_0to3]"f"(ff_pw_0to3.f), [ff_pw_4to7]"r"(ff_pw_4to7.i), + [ff_pw_8tob]"r"(ff_pw_8tob.i), [ff_pw_ctof]"r"(ff_pw_ctof.i) : "memory" ); } diff --git a/libavcodec/mips/h264qpel_mmi.c b/libavcodec/mips/h264qpel_mmi.c index 72362d3..3482956 100644 --- a/libavcodec/mips/h264qpel_mmi.c +++ b/libavcodec/mips/h264qpel_mmi.c @@ -155,8 +155,8 @@ static void put_h264_qpel4_h_lowpass_mmi(uint8_t *dst, const uint8_t *src, [dst]"+&r"(dst), [src]"+&r"(src) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_20]"f"(ff_pw_20), [ff_pw_5]"f"(ff_pw_5), - [ff_pw_16]"f"(ff_pw_16) + [ff_pw_20]"f"(ff_pw_20.f), [ff_pw_5]"f"(ff_pw_5.f), + [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -225,8 +225,8 @@ static void put_h264_qpel8_h_lowpass_mmi(uint8_t *dst, const uint8_t *src, [dst]"+&r"(dst), [src]"+&r"(src) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_20]"f"(ff_pw_20), [ff_pw_5]"f"(ff_pw_5), - [ff_pw_16]"f"(ff_pw_16) + [ff_pw_20]"f"(ff_pw_20.f), [ff_pw_5]"f"(ff_pw_5.f), + [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -293,8 +293,8 @@ static void avg_h264_qpel4_h_lowpass_mmi(uint8_t *dst, const uint8_t *src, [dst]"+&r"(dst), [src]"+&r"(src) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_20]"f"(ff_pw_20), [ff_pw_5]"f"(ff_pw_5), - [ff_pw_16]"f"(ff_pw_16) + [ff_pw_20]"f"(ff_pw_20.f), [ff_pw_5]"f"(ff_pw_5.f), + [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -365,8 +365,8 @@ static void avg_h264_qpel8_h_lowpass_mmi(uint8_t *dst, const uint8_t *src, [dst]"+&r"(dst), [src]"+&r"(src) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_20]"f"(ff_pw_20), [ff_pw_5]"f"(ff_pw_5), - [ff_pw_16]"f"(ff_pw_16) + [ff_pw_20]"f"(ff_pw_20.f), [ff_pw_5]"f"(ff_pw_5.f), + [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -486,7 +486,7 @@ static void put_h264_qpel4_v_lowpass_mmi(uint8_t *dst, const uint8_t *src, [dst]"+&r"(dst), [src]"+&r"(src) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -780,7 +780,7 @@ static void put_h264_qpel8_v_lowpass_mmi(uint8_t *dst, const uint8_t *src, [h]"+&r"(h) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); @@ -909,7 +909,7 @@ static void avg_h264_qpel4_v_lowpass_mmi(uint8_t *dst, const uint8_t *src, [src]"+&r"(src), [dst]"+&r"(dst) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -1235,7 +1235,7 @@ static void avg_h264_qpel8_v_lowpass_mmi(uint8_t *dst, const uint8_t *src, [h]"+&r"(h) : [dstStride]"r"((mips_reg)dstStride), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); @@ -1306,7 +1306,7 @@ static void put_h264_qpel4_hv_lowpass_mmi(uint8_t *dst, const uint8_t *src, [tmp]"+&r"(tmp), [src]"+&r"(src) : [tmpStride]"r"(8), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_20]"f"(ff_pw_20), [ff_pw_5]"f"(ff_pw_5) + [ff_pw_20]"f"(ff_pw_20.f), [ff_pw_5]"f"(ff_pw_5.f) : "memory" ); @@ -1567,7 +1567,7 @@ static void put_h264_qpel8or16_hv1_lowpass_mmi(int16_t *tmp, [src]"+&r"(src) : [tmp]"r"(tmp), [size]"r"(size), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); @@ -1742,7 +1742,7 @@ static void put_h264_qpel8_h_lowpass_l2_mmi(uint8_t *dst, const uint8_t *src, [src2]"+&r"(src2), [h]"+&r"(h) : [src2Stride]"r"((mips_reg)src2Stride), [dstStride]"r"((mips_reg)dstStride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } @@ -1870,7 +1870,7 @@ static void avg_h264_qpel4_hv_lowpass_mmi(uint8_t *dst, const uint8_t *src, [tmp]"+&r"(tmp), [src]"+&r"(src) : [tmpStride]"r"(8), [srcStride]"r"((mips_reg)srcStride), - [ff_pw_20]"f"(ff_pw_20), [ff_pw_5]"f"(ff_pw_5) + [ff_pw_20]"f"(ff_pw_20.f), [ff_pw_5]"f"(ff_pw_5.f) : "memory" ); @@ -2065,7 +2065,7 @@ static void avg_h264_qpel8_h_lowpass_l2_mmi(uint8_t *dst, const uint8_t *src, [src2]"+&r"(src2) : [dstStride]"r"((mips_reg)dstStride), [src2Stride]"r"((mips_reg)src2Stride), - [ff_pw_5]"f"(ff_pw_5), [ff_pw_16]"f"(ff_pw_16) + [ff_pw_5]"f"(ff_pw_5.f), [ff_pw_16]"f"(ff_pw_16.f) : "memory" ); } diff --git a/libavcodec/mips/hevcdsp_mmi.c b/libavcodec/mips/hevcdsp_mmi.c index e89d37e..87fc255 100644 --- a/libavcodec/mips/hevcdsp_mmi.c +++ b/libavcodec/mips/hevcdsp_mmi.c @@ -32,7 +32,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, uint8_t *_src, \ int x, y; \ pixel *src = (pixel*)_src - 3; \ ptrdiff_t srcstride = _srcstride / sizeof(pixel); \ - uint64_t ftmp[15]; \ + double ftmp[15]; \ uint64_t rtmp[1]; \ const int8_t *filter = ff_hevc_qpel_filters[mx - 1]; \ \ @@ -132,7 +132,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, uint8_t *_src, \ ptrdiff_t srcstride = _srcstride / sizeof(pixel); \ int16_t tmp_array[(MAX_PB_SIZE + QPEL_EXTRA) * MAX_PB_SIZE]; \ int16_t *tmp = tmp_array; \ - uint64_t ftmp[15]; \ + double ftmp[15]; \ uint64_t rtmp[1]; \ \ src -= (QPEL_EXTRA_BEFORE * srcstride + 3); \ @@ -329,10 +329,12 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ pixel *dst = (pixel *)_dst; \ ptrdiff_t dststride = _dststride / sizeof(pixel); \ const int8_t *filter = ff_hevc_qpel_filters[mx - 1]; \ - uint64_t ftmp[20]; \ + double ftmp[20]; \ uint64_t rtmp[1]; \ - int shift = 7; \ - int offset = 64; \ + union av_intfloat64 shift; \ + union av_intfloat64 offset; \ + shift.i = 7; \ + offset.i = 64; \ \ x = width >> 2; \ y = height; \ @@ -430,9 +432,9 @@ void ff_hevc_put_hevc_qpel_bi_h##w##_8_mmi(uint8_t *_dst, \ [ftmp10]"=&f"(ftmp[10]), [ftmp11]"=&f"(ftmp[11]), \ [ftmp12]"=&f"(ftmp[12]), [src2]"+&r"(src2), \ [dst]"+&r"(dst), [src]"+&r"(src), [y]"+&r"(y), [x]"=&r"(x), \ - [offset]"+&f"(offset), [rtmp0]"=&r"(rtmp[0]) \ + [offset]"+&f"(offset.f), [rtmp0]"=&r"(rtmp[0]) \ : [src_stride]"r"(srcstride), [dst_stride]"r"(dststride), \ - [filter]"r"(filter), [shift]"f"(shift) \ + [filter]"r"(filter), [shift]"f"(shift.f) \ : "memory" \ ); \ } @@ -463,10 +465,12 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ ptrdiff_t dststride = _dststride / sizeof(pixel); \ int16_t tmp_array[(MAX_PB_SIZE + QPEL_EXTRA) * MAX_PB_SIZE]; \ int16_t *tmp = tmp_array; \ - uint64_t ftmp[20]; \ + double ftmp[20]; \ uint64_t rtmp[1]; \ - int shift = 7; \ - int offset = 64; \ + union av_intfloat64 shift; \ + union av_intfloat64 offset; \ + shift.i = 7; \ + offset.i = 64; \ \ src -= (QPEL_EXTRA_BEFORE * srcstride + 3); \ filter = ff_hevc_qpel_filters[mx - 1]; \ @@ -659,9 +663,9 @@ void ff_hevc_put_hevc_qpel_bi_hv##w##_8_mmi(uint8_t *_dst, \ [ftmp12]"=&f"(ftmp[12]), [ftmp13]"=&f"(ftmp[13]), \ [ftmp14]"=&f"(ftmp[14]), [src2]"+&r"(src2), \ [dst]"+&r"(dst), [tmp]"+&r"(tmp), [y]"+&r"(y), [x]"=&r"(x), \ - [offset]"+&f"(offset), [rtmp0]"=&r"(rtmp[0]) \ + [offset]"+&f"(offset.f), [rtmp0]"=&r"(rtmp[0]) \ : [filter]"r"(filter), [stride]"r"(dststride), \ - [shift]"f"(shift) \ + [shift]"f"(shift.f) \ : "memory" \ ); \ } @@ -692,10 +696,12 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ const int8_t *filter = ff_hevc_epel_filters[mx - 1]; \ int16_t tmp_array[(MAX_PB_SIZE + EPEL_EXTRA) * MAX_PB_SIZE]; \ int16_t *tmp = tmp_array; \ - uint64_t ftmp[12]; \ + double ftmp[12]; \ uint64_t rtmp[1]; \ - int shift = 7; \ - int offset = 64; \ + union av_intfloat64 shift; \ + union av_intfloat64 offset; \ + shift.i = 7; \ + offset.i = 64; \ \ src -= (EPEL_EXTRA_BEFORE * srcstride + 1); \ x = width >> 2; \ @@ -847,9 +853,9 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t *_dst, \ [ftmp8]"=&f"(ftmp[8]), [ftmp9]"=&f"(ftmp[9]), \ [ftmp10]"=&f"(ftmp[10]), [src2]"+&r"(src2), \ [dst]"+&r"(dst), [tmp]"+&r"(tmp), [y]"+&r"(y), [x]"=&r"(x), \ - [offset]"+&f"(offset), [rtmp0]"=&r"(rtmp[0]) \ + [offset]"+&f"(offset.f), [rtmp0]"=&r"(rtmp[0]) \ : [filter]"r"(filter), [stride]"r"(dststride), \ - [shift]"f"(shift) \ + [shift]"f"(shift.f) \ : "memory" \ ); \ } @@ -875,9 +881,10 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ ptrdiff_t srcstride = _srcstride / sizeof(pixel); \ pixel *dst = (pixel *)_dst; \ ptrdiff_t dststride = _dststride / sizeof(pixel); \ - uint64_t ftmp[12]; \ + double ftmp[12]; \ uint64_t rtmp[1]; \ - int shift = 7; \ + union av_intfloat64 shift; \ + shift.i = 7; \ \ y = height; \ x = width >> 3; \ @@ -959,7 +966,7 @@ void ff_hevc_put_hevc_pel_bi_pixels##w##_8_mmi(uint8_t *_dst, \ [ftmp10]"=&f"(ftmp[10]), [offset]"=&f"(ftmp[11]), \ [src2]"+&r"(src2), [dst]"+&r"(dst), [src]"+&r"(src), \ [x]"+&r"(x), [y]"+&r"(y), [rtmp0]"=&r"(rtmp[0]) \ - : [dststride]"r"(dststride), [shift]"f"(shift), \ + : [dststride]"r"(dststride), [shift]"f"(shift.f), \ [srcstride]"r"(srcstride) \ : "memory" \ ); \ @@ -989,10 +996,12 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ ptrdiff_t dststride = _dststride / sizeof(pixel); \ int16_t tmp_array[(MAX_PB_SIZE + QPEL_EXTRA) * MAX_PB_SIZE]; \ int16_t *tmp = tmp_array; \ - uint64_t ftmp[20]; \ + double ftmp[20]; \ uint64_t rtmp[1]; \ - int shift = 6; \ - int offset = 32; \ + union av_intfloat64 shift; \ + union av_intfloat64 offset; \ + shift.i = 6; \ + offset.i = 32; \ \ src -= (QPEL_EXTRA_BEFORE * srcstride + 3); \ filter = ff_hevc_qpel_filters[mx - 1]; \ @@ -1166,9 +1175,9 @@ void ff_hevc_put_hevc_qpel_uni_hv##w##_8_mmi(uint8_t *_dst, \ [ftmp12]"=&f"(ftmp[12]), [ftmp13]"=&f"(ftmp[13]), \ [ftmp14]"=&f"(ftmp[14]), \ [dst]"+&r"(dst), [tmp]"+&r"(tmp), [y]"+&r"(y), [x]"=&r"(x), \ - [offset]"+&f"(offset), [rtmp0]"=&r"(rtmp[0]) \ + [offset]"+&f"(offset.f), [rtmp0]"=&r"(rtmp[0]) \ : [filter]"r"(filter), [stride]"r"(dststride), \ - [shift]"f"(shift) \ + [shift]"f"(shift.f) \ : "memory" \ ); \ } diff --git a/libavcodec/mips/idctdsp_mmi.c b/libavcodec/mips/idctdsp_mmi.c index 0047aef..d22e5ee 100644 --- a/libavcodec/mips/idctdsp_mmi.c +++ b/libavcodec/mips/idctdsp_mmi.c @@ -142,7 +142,7 @@ void ff_put_signed_pixels_clamped_mmi(const int16_t *block, [pixels]"+&r"(pixels) : [block]"r"(block), [line_size]"r"((mips_reg)line_size), - [ff_pb_80]"f"(ff_pb_80) + [ff_pb_80]"f"(ff_pb_80.f) : "memory" ); } diff --git a/libavcodec/mips/mpegvideo_mmi.c b/libavcodec/mips/mpegvideo_mmi.c index edaa839..3d5b5e2 100644 --- a/libavcodec/mips/mpegvideo_mmi.c +++ b/libavcodec/mips/mpegvideo_mmi.c @@ -28,12 +28,13 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block, int n, int qscale) { - int64_t level, qmul, qadd, nCoeffs; + int64_t level, nCoeffs; double ftmp[6]; mips_reg addr[1]; + union mmi_intfloat64 qmul_u, qadd_u; DECLARE_VAR_ALL64; - qmul = qscale << 1; + qmul_u.i = qscale << 1; av_assert2(s->block_last_index[n]>=0 || s->h263_aic); if (!s->h263_aic) { @@ -41,9 +42,9 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block, level = block[0] * s->y_dc_scale; else level = block[0] * s->c_dc_scale; - qadd = (qscale-1) | 1; + qadd_u.i = (qscale-1) | 1; } else { - qadd = 0; + qadd_u.i = 0; level = block[0]; } @@ -93,7 +94,7 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block, [addr0]"=&r"(addr[0]) : [block]"r"((mips_reg)(block+nCoeffs)), [nCoeffs]"r"((mips_reg)(2*(-nCoeffs))), - [qmul]"f"(qmul), [qadd]"f"(qadd) + [qmul]"f"(qmul_u.f), [qadd]"f"(qadd_u.f) : "memory" ); @@ -103,13 +104,14 @@ void ff_dct_unquantize_h263_intra_mmi(MpegEncContext *s, int16_t *block, void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block, int n, int qscale) { - int64_t qmul, qadd, nCoeffs; + int64_t nCoeffs; double ftmp[6]; mips_reg addr[1]; + union mmi_intfloat64 qmul_u, qadd_u; DECLARE_VAR_ALL64; - qmul = qscale << 1; - qadd = (qscale - 1) | 1; + qmul_u.i = qscale << 1; + qadd_u.i = (qscale - 1) | 1; av_assert2(s->block_last_index[n]>=0 || s->h263_aic); nCoeffs = s->inter_scantable.raster_end[s->block_last_index[n]]; @@ -153,7 +155,7 @@ void ff_dct_unquantize_h263_inter_mmi(MpegEncContext *s, int16_t *block, [addr0]"=&r"(addr[0]) : [block]"r"((mips_reg)(block+nCoeffs)), [nCoeffs]"r"((mips_reg)(2*(-nCoeffs))), - [qmul]"f"(qmul), [qadd]"f"(qadd) + [qmul]"f"(qmul_u.f), [qadd]"f"(qadd_u.f) : "memory" ); } diff --git a/libavcodec/mips/vc1dsp_mmi.c b/libavcodec/mips/vc1dsp_mmi.c index a8ab3f6..27a3c81 100644 --- a/libavcodec/mips/vc1dsp_mmi.c +++ b/libavcodec/mips/vc1dsp_mmi.c @@ -129,9 +129,11 @@ void ff_vc1_inv_trans_8x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo double ftmp[9]; mips_reg addr[1]; int count; + union mmi_intfloat64 dc_u; dc = (3 * dc + 1) >> 1; dc = (3 * dc + 16) >> 5; + dc_u.i = dc; __asm__ volatile( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" @@ -189,7 +191,7 @@ void ff_vc1_inv_trans_8x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo [addr0]"=&r"(addr[0]), [count]"=&r"(count), [dest]"+&r"(dest) : [linesize]"r"((mips_reg)linesize), - [dc]"f"(dc) + [dc]"f"(dc_u.f) : "memory" ); } @@ -198,9 +200,6 @@ void ff_vc1_inv_trans_8x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) { DECLARE_ALIGNED(16, int16_t, temp[64]); - DECLARE_ALIGNED(8, const uint64_t, ff_pw_1_local) = {0x0000000100000001ULL}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_4_local) = {0x0000000400000004ULL}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_64_local)= {0x0000004000000040ULL}; double ftmp[23]; uint64_t tmp[1]; @@ -407,8 +406,8 @@ void ff_vc1_inv_trans_8x8_mmi(int16_t block[64]) [ftmp20]"=&f"(ftmp[20]), [ftmp21]"=&f"(ftmp[21]), [ftmp22]"=&f"(ftmp[22]), [tmp0]"=&r"(tmp[0]) - : [ff_pw_1]"f"(ff_pw_1_local), [ff_pw_64]"f"(ff_pw_64_local), - [ff_pw_4]"f"(ff_pw_4_local), [block]"r"(block), + : [ff_pw_1]"f"(ff_pw_32_1.f), [ff_pw_64]"f"(ff_pw_32_64.f), + [ff_pw_4]"f"(ff_pw_32_4.f), [block]"r"(block), [temp]"r"(temp) : "memory" ); @@ -420,9 +419,11 @@ void ff_vc1_inv_trans_8x4_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo { int dc = block[0]; double ftmp[9]; + union mmi_intfloat64 dc_u; dc = ( 3 * dc + 1) >> 1; dc = (17 * dc + 64) >> 7; + dc_u.i = dc; __asm__ volatile( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" @@ -467,7 +468,7 @@ void ff_vc1_inv_trans_8x4_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo [ftmp8]"=&f"(ftmp[8]) : [dest0]"r"(dest+0*linesize), [dest1]"r"(dest+1*linesize), [dest2]"r"(dest+2*linesize), [dest3]"r"(dest+3*linesize), - [dc]"f"(dc) + [dc]"f"(dc_u.f) : "memory" ); } @@ -480,8 +481,6 @@ void ff_vc1_inv_trans_8x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) double ftmp[16]; uint32_t tmp[1]; int16_t count = 4; - DECLARE_ALIGNED(16, const uint64_t, ff_pw_4_local) = {0x0000000400000004ULL}; - DECLARE_ALIGNED(16, const uint64_t, ff_pw_64_local)= {0x0000004000000040ULL}; int16_t coeff[64] = {12, 16, 16, 15, 12, 9, 6, 4, 12, 15, 6, -4, -12, -16, -16, -9, 12, 9, -6, -16, -12, 4, 16, 15, @@ -591,7 +590,7 @@ void ff_vc1_inv_trans_8x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp12]"=&f"(ftmp[12]), [ftmp13]"=&f"(ftmp[13]), [ftmp14]"=&f"(ftmp[14]), [tmp0]"=&r"(tmp[0]), [src]"+&r"(src), [dst]"+&r"(dst), [count]"+&r"(count) - : [ff_pw_4]"f"(ff_pw_4_local), [coeff]"r"(coeff) + : [ff_pw_4]"f"(ff_pw_32_4.f), [coeff]"r"(coeff) : "memory" ); @@ -859,7 +858,7 @@ void ff_vc1_inv_trans_8x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp12]"=&f"(ftmp[12]), [ftmp13]"=&f"(ftmp[13]), [ftmp14]"=&f"(ftmp[14]), [ftmp15]"=&f"(ftmp[15]), [tmp0]"=&r"(tmp[0]) - : [ff_pw_64]"f"(ff_pw_64_local), + : [ff_pw_64]"f"(ff_pw_32_64.f), [src]"r"(src), [dest]"r"(dest), [linesize]"r"(linesize) :"memory" ); @@ -871,10 +870,12 @@ void ff_vc1_inv_trans_4x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo { int dc = block[0]; double ftmp[9]; + union mmi_intfloat64 dc_u; DECLARE_VAR_LOW32; dc = (17 * dc + 4) >> 3; dc = (12 * dc + 64) >> 7; + dc_u.i = dc; __asm__ volatile( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" @@ -934,7 +935,7 @@ void ff_vc1_inv_trans_4x8_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo [dest2]"r"(dest+2*linesize), [dest3]"r"(dest+3*linesize), [dest4]"r"(dest+4*linesize), [dest5]"r"(dest+5*linesize), [dest6]"r"(dest+6*linesize), [dest7]"r"(dest+7*linesize), - [dc]"f"(dc) + [dc]"f"(dc_u.f) : "memory" ); } @@ -945,14 +946,11 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) int16_t *src = block; int16_t *dst = block; double ftmp[23]; - uint32_t count = 8, tmp[1]; + uint64_t count = 8, tmp[1]; int16_t coeff[16] = {17, 22, 17, 10, 17, 10,-17,-22, 17,-10,-17, 22, 17,-22, 17,-10}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_1_local) = {0x0000000100000001ULL}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_4_local) = {0x0000000400000004ULL}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_64_local)= {0x0000004000000040ULL}; // 1st loop __asm__ volatile ( @@ -998,7 +996,7 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp10]"=&f"(ftmp[10]), [ftmp11]"=&f"(ftmp[11]), [tmp0]"=&r"(tmp[0]), [count]"+&r"(count), [src]"+&r"(src), [dst]"+&r"(dst) - : [ff_pw_4]"f"(ff_pw_4_local), [coeff]"r"(coeff) + : [ff_pw_4]"f"(ff_pw_32_4.f), [coeff]"r"(coeff) : "memory" ); @@ -1115,7 +1113,7 @@ void ff_vc1_inv_trans_4x8_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp20]"=&f"(ftmp[20]), [ftmp21]"=&f"(ftmp[21]), [ftmp22]"=&f"(ftmp[22]), [tmp0]"=&r"(tmp[0]) - : [ff_pw_1]"f"(ff_pw_1_local), [ff_pw_64]"f"(ff_pw_64_local), + : [ff_pw_1]"f"(ff_pw_32_1.f), [ff_pw_64]"f"(ff_pw_32_64.f), [src]"r"(src), [dest]"r"(dest), [linesize]"r"(linesize) : "memory" ); @@ -1127,10 +1125,12 @@ void ff_vc1_inv_trans_4x4_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo { int dc = block[0]; double ftmp[5]; + union mmi_intfloat64 dc_u; DECLARE_VAR_LOW32; dc = (17 * dc + 4) >> 3; dc = (17 * dc + 64) >> 7; + dc_u.i = dc; __asm__ volatile( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" @@ -1166,7 +1166,7 @@ void ff_vc1_inv_trans_4x4_dc_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *blo [ftmp4]"=&f"(ftmp[4]) : [dest0]"r"(dest+0*linesize), [dest1]"r"(dest+1*linesize), [dest2]"r"(dest+2*linesize), [dest3]"r"(dest+3*linesize), - [dc]"f"(dc) + [dc]"f"(dc_u.f) : "memory" ); } @@ -1181,8 +1181,6 @@ void ff_vc1_inv_trans_4x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) 17, 10,-17,-22, 17,-10,-17, 22, 17,-22, 17,-10}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_4_local) = {0x0000000400000004ULL}; - DECLARE_ALIGNED(8, const uint64_t, ff_pw_64_local)= {0x0000004000000040ULL}; // 1st loop __asm__ volatile ( @@ -1226,7 +1224,7 @@ void ff_vc1_inv_trans_4x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp10]"=&f"(ftmp[10]), [ftmp11]"=&f"(ftmp[11]), [tmp0]"=&r"(tmp[0]), [count]"+&r"(count), [src]"+&r"(src), [dst]"+&r"(dst) - : [ff_pw_4]"f"(ff_pw_4_local), [coeff]"r"(coeff) + : [ff_pw_4]"f"(ff_pw_32_4.f), [coeff]"r"(coeff) : "memory" ); @@ -1370,7 +1368,7 @@ void ff_vc1_inv_trans_4x4_mmi(uint8_t *dest, ptrdiff_t linesize, int16_t *block) [ftmp12]"=&f"(ftmp[12]), [ftmp13]"=&f"(ftmp[13]), [ftmp14]"=&f"(ftmp[14]), [ftmp15]"=&f"(ftmp[15]), [tmp0]"=&r"(tmp[0]) - : [ff_pw_64]"f"(ff_pw_64_local), + : [ff_pw_64]"f"(ff_pw_32_64.f), [src]"r"(src), [dest]"r"(dest), [linesize]"r"(linesize) :"memory" ); @@ -1660,14 +1658,15 @@ static void vc1_put_ver_16b_shift2_mmi(int16_t *dst, const uint8_t *src, mips_reg stride, int rnd, int64_t shift) { + union mmi_intfloat64 shift_u; DECLARE_VAR_LOW32; DECLARE_VAR_ADDRT; + shift_u.i = shift; __asm__ volatile( "pxor $f0, $f0, $f0 \n\t" "li $8, 0x03 \n\t" LOAD_ROUNDER_MMI("%[rnd]") - "ldc1 $f12, %[ff_pw_9] \n\t" "1: \n\t" MMI_ULWC1($f4, %[src], 0x00) PTR_ADDU "%[src], %[src], %[stride] \n\t" @@ -1689,9 +1688,9 @@ static void vc1_put_ver_16b_shift2_mmi(int16_t *dst, : RESTRICT_ASM_LOW32 RESTRICT_ASM_ADDRT [src]"+r"(src), [dst]"+r"(dst) : [stride]"r"(stride), [stride1]"r"(-2*stride), - [shift]"f"(shift), [rnd]"m"(rnd), - [stride2]"r"(9*stride-4), [ff_pw_9]"m"(ff_pw_9) - : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", "$f12", + [shift]"f"(shift_u.f), [rnd]"m"(rnd), + [stride2]"r"(9*stride-4) + : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", "$f14", "$f16", "memory" ); } @@ -1713,8 +1712,6 @@ static void OPNAME ## vc1_hor_16b_shift2_mmi(uint8_t *dst, mips_reg stride, \ \ __asm__ volatile( \ LOAD_ROUNDER_MMI("%[rnd]") \ - "ldc1 $f12, %[ff_pw_128] \n\t" \ - "ldc1 $f10, %[ff_pw_9] \n\t" \ "1: \n\t" \ MMI_ULDC1($f2, %[src], 0x00) \ MMI_ULDC1($f4, %[src], 0x08) \ @@ -1728,16 +1725,16 @@ static void OPNAME ## vc1_hor_16b_shift2_mmi(uint8_t *dst, mips_reg stride, \ "paddh $f6, $f6, $f0 \n\t" \ MMI_ULDC1($f0, %[src], 0x0b) \ "paddh $f8, $f8, $f0 \n\t" \ - "pmullh $f6, $f6, $f10 \n\t" \ - "pmullh $f8, $f8, $f10 \n\t" \ + "pmullh $f6, $f6, %[ff_pw_9] \n\t" \ + "pmullh $f8, $f8, %[ff_pw_9] \n\t" \ "psubh $f6, $f6, $f2 \n\t" \ "psubh $f8, $f8, $f4 \n\t" \ "li $8, 0x07 \n\t" \ "mtc1 $8, $f16 \n\t" \ NORMALIZE_MMI("$f16") \ /* Remove bias */ \ - "paddh $f6, $f6, $f12 \n\t" \ - "paddh $f8, $f8, $f12 \n\t" \ + "paddh $f6, $f6, %[ff_pw_128] \n\t" \ + "paddh $f8, $f8, %[ff_pw_128] \n\t" \ TRANSFER_DO_PACK(OP) \ "addiu %[h], %[h], -0x01 \n\t" \ PTR_ADDIU "%[src], %[src], 0x18 \n\t" \ @@ -1747,8 +1744,8 @@ static void OPNAME ## vc1_hor_16b_shift2_mmi(uint8_t *dst, mips_reg stride, \ [h]"+r"(h), \ [src]"+r"(src), [dst]"+r"(dst) \ : [stride]"r"(stride), [rnd]"m"(rnd), \ - [ff_pw_9]"m"(ff_pw_9), [ff_pw_128]"m"(ff_pw_128) \ - : "$8", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", "$f12", "$f14", \ + [ff_pw_9]"f"(ff_pw_9.f), [ff_pw_128]"f"(ff_pw_128.f) \ + : "$8", "$f0", "$f2", "$f4", "$f6", "$f8", "$f14", \ "$f16", "memory" \ ); \ } @@ -1774,7 +1771,6 @@ static void OPNAME ## vc1_shift2_mmi(uint8_t *dst, const uint8_t *src, \ "pxor $f0, $f0, $f0 \n\t" \ "li $10, 0x08 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ - "ldc1 $f12, %[ff_pw_9] \n\t" \ "1: \n\t" \ MMI_ULWC1($f6, %[src], 0x00) \ MMI_ULWC1($f8, %[src], 0x04) \ @@ -1791,8 +1787,8 @@ static void OPNAME ## vc1_shift2_mmi(uint8_t *dst, const uint8_t *src, \ PTR_ADDU "$9, %[src], %[offset_x2n] \n\t" \ MMI_ULWC1($f2, $9, 0x00) \ MMI_ULWC1($f4, $9, 0x04) \ - "pmullh $f6, $f6, $f12 \n\t" /* 0,9,9,0*/ \ - "pmullh $f8, $f8, $f12 \n\t" /* 0,9,9,0*/ \ + "pmullh $f6, $f6, %[ff_pw_9] \n\t" /* 0,9,9,0*/ \ + "pmullh $f8, $f8, %[ff_pw_9] \n\t" /* 0,9,9,0*/ \ "punpcklbh $f2, $f2, $f0 \n\t" \ "punpcklbh $f4, $f4, $f0 \n\t" \ "psubh $f6, $f6, $f2 \n\t" /*-1,9,9,0*/ \ @@ -1819,9 +1815,9 @@ static void OPNAME ## vc1_shift2_mmi(uint8_t *dst, const uint8_t *src, \ : [offset]"r"(offset), [offset_x2n]"r"(-2*offset), \ [stride]"r"(stride), [rnd]"m"(rnd), \ [stride1]"r"(stride-offset), \ - [ff_pw_9]"m"(ff_pw_9) \ + [ff_pw_9]"f"(ff_pw_9.f) \ : "$8", "$9", "$10", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", \ - "$f12", "$f14", "$f16", "memory" \ + "$f14", "$f16", "memory" \ ); \ } @@ -1852,8 +1848,8 @@ VC1_SHIFT2(OP_AVG, avg_) LOAD($f8, $9, M*4) \ UNPACK("$f6") \ UNPACK("$f8") \ - "pmullh $f6, $f6, $f12 \n\t" /* *18 */ \ - "pmullh $f8, $f8, $f12 \n\t" /* *18 */ \ + "pmullh $f6, $f6, %[ff_pw_18] \n\t" /* *18 */ \ + "pmullh $f8, $f8, %[ff_pw_18] \n\t" /* *18 */ \ "psubh $f6, $f6, $f2 \n\t" /* *18, -3 */ \ "psubh $f8, $f8, $f4 \n\t" /* *18, -3 */ \ PTR_ADDU "$9, %[src], "#A4" \n\t" \ @@ -1872,8 +1868,8 @@ VC1_SHIFT2(OP_AVG, avg_) LOAD($f4, $9, M*4) \ UNPACK("$f2") \ UNPACK("$f4") \ - "pmullh $f2, $f2, $f10 \n\t" /* *53 */ \ - "pmullh $f4, $f4, $f10 \n\t" /* *53 */ \ + "pmullh $f2, $f2, %[ff_pw_53] \n\t" /* *53 */ \ + "pmullh $f4, $f4, %[ff_pw_53] \n\t" /* *53 */ \ "paddh $f6, $f6, $f2 \n\t" /* 4,53,18,-3 */ \ "paddh $f8, $f8, $f4 \n\t" /* 4,53,18,-3 */ @@ -1892,16 +1888,16 @@ vc1_put_ver_16b_ ## NAME ## _mmi(int16_t *dst, const uint8_t *src, \ int rnd, int64_t shift) \ { \ int h = 8; \ + union mmi_intfloat64 shift_u; \ DECLARE_VAR_LOW32; \ DECLARE_VAR_ADDRT; \ + shift_u.i = shift; \ \ src -= src_stride; \ \ __asm__ volatile( \ "pxor $f0, $f0, $f0 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ - "ldc1 $f10, %[ff_pw_53] \n\t" \ - "ldc1 $f12, %[ff_pw_18] \n\t" \ ".p2align 3 \n\t" \ "1: \n\t" \ MSPEL_FILTER13_CORE(DO_UNPACK, MMI_ULWC1, 1, A1, A2, A3, A4) \ @@ -1917,12 +1913,12 @@ vc1_put_ver_16b_ ## NAME ## _mmi(int16_t *dst, const uint8_t *src, \ PTR_ADDU "$9, %[src], "#A2" \n\t" \ MMI_ULWC1($f6, $9, 0x08) \ DO_UNPACK("$f6") \ - "pmullh $f6, $f6, $f12 \n\t" /* *18 */ \ + "pmullh $f6, $f6, %[ff_pw_18] \n\t" /* *18 */ \ "psubh $f6, $f6, $f2 \n\t" /* *18,-3 */ \ PTR_ADDU "$9, %[src], "#A3" \n\t" \ MMI_ULWC1($f2, $9, 0x08) \ DO_UNPACK("$f2") \ - "pmullh $f2, $f2, $f10 \n\t" /* *53 */ \ + "pmullh $f2, $f2, %[ff_pw_53] \n\t" /* *53 */ \ "paddh $f6, $f6, $f2 \n\t" /* *53,18,-3 */ \ PTR_ADDU "$9, %[src], "#A4" \n\t" \ MMI_ULWC1($f2, $9, 0x08) \ @@ -1945,10 +1941,10 @@ vc1_put_ver_16b_ ## NAME ## _mmi(int16_t *dst, const uint8_t *src, \ [src]"+r"(src), [dst]"+r"(dst) \ : [stride_x1]"r"(src_stride), [stride_x2]"r"(2*src_stride), \ [stride_x3]"r"(3*src_stride), \ - [rnd]"m"(rnd), [shift]"f"(shift), \ - [ff_pw_53]"m"(ff_pw_53), [ff_pw_18]"m"(ff_pw_18), \ - [ff_pw_3]"f"(ff_pw_3) \ - : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", "$f12", \ + [rnd]"m"(rnd), [shift]"f"(shift_u.f), \ + [ff_pw_53]"f"(ff_pw_53.f), [ff_pw_18]"f"(ff_pw_18.f), \ + [ff_pw_3]"f"(ff_pw_3.f) \ + : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", \ "$f14", "$f16", "memory" \ ); \ } @@ -1975,8 +1971,6 @@ OPNAME ## vc1_hor_16b_ ## NAME ## _mmi(uint8_t *dst, mips_reg stride, \ __asm__ volatile( \ "pxor $f0, $f0, $f0 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ - "ldc1 $f10, %[ff_pw_53] \n\t" \ - "ldc1 $f12, %[ff_pw_18] \n\t" \ ".p2align 3 \n\t" \ "1: \n\t" \ MSPEL_FILTER13_CORE(DONT_UNPACK, MMI_ULDC1, 2, A1, A2, A3, A4) \ @@ -1995,9 +1989,9 @@ OPNAME ## vc1_hor_16b_ ## NAME ## _mmi(uint8_t *dst, mips_reg stride, \ [h]"+r"(h), \ [src]"+r"(src), [dst]"+r"(dst) \ : [stride]"r"(stride), [rnd]"m"(rnd), \ - [ff_pw_53]"m"(ff_pw_53), [ff_pw_18]"m"(ff_pw_18), \ - [ff_pw_3]"f"(ff_pw_3), [ff_pw_128]"f"(ff_pw_128) \ - : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", "$f12", \ + [ff_pw_53]"f"(ff_pw_53.f), [ff_pw_18]"f"(ff_pw_18.f), \ + [ff_pw_3]"f"(ff_pw_3.f), [ff_pw_128]"f"(ff_pw_128.f) \ + : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", \ "$f14", "$f16", "memory" \ ); \ } @@ -2025,8 +2019,6 @@ OPNAME ## vc1_## NAME ## _mmi(uint8_t *dst, const uint8_t *src, \ __asm__ volatile ( \ "pxor $f0, $f0, $f0 \n\t" \ LOAD_ROUNDER_MMI("%[rnd]") \ - "ldc1 $f10, %[ff_pw_53] \n\t" \ - "ldc1 $f12, %[ff_pw_18] \n\t" \ ".p2align 3 \n\t" \ "1: \n\t" \ MSPEL_FILTER13_CORE(DO_UNPACK, MMI_ULWC1, 1, A1, A2, A3, A4) \ @@ -2044,9 +2036,9 @@ OPNAME ## vc1_## NAME ## _mmi(uint8_t *dst, const uint8_t *src, \ : [offset_x1]"r"(offset), [offset_x2]"r"(2*offset), \ [offset_x3]"r"(3*offset), [stride]"r"(stride), \ [rnd]"m"(rnd), \ - [ff_pw_53]"m"(ff_pw_53), [ff_pw_18]"m"(ff_pw_18), \ - [ff_pw_3]"f"(ff_pw_3) \ - : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", "$f10", "$f12", \ + [ff_pw_53]"f"(ff_pw_53.f), [ff_pw_18]"f"(ff_pw_18.f), \ + [ff_pw_3]"f"(ff_pw_3.f) \ + : "$8", "$9", "$f0", "$f2", "$f4", "$f6", "$f8", \ "$f14", "$f16", "memory" \ ); \ } @@ -2246,14 +2238,15 @@ void ff_put_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */, uint8_t *src /* align 1 */, ptrdiff_t stride, int h, int x, int y) { - const int A = (8 - x) * (8 - y); - const int B = (x) * (8 - y); - const int C = (8 - x) * (y); - const int D = (x) * (y); + union mmi_intfloat64 A, B, C, D; double ftmp[10]; uint32_t tmp[1]; DECLARE_VAR_ALL64; DECLARE_VAR_ADDRT; + A.i = (8 - x) * (8 - y); + B.i = (x) * (8 - y); + C.i = (8 - x) * (y); + D.i = (x) * (y); av_assert2(x < 8 && y < 8 && x >= 0 && y >= 0); @@ -2290,9 +2283,9 @@ void ff_put_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */, [src]"+&r"(src), [dst]"+&r"(dst), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D), - [ff_pw_28]"f"(ff_pw_28) + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [ff_pw_28]"f"(ff_pw_28.f) : "memory" ); } @@ -2301,14 +2294,15 @@ void ff_put_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */, uint8_t *src /* align 1 */, ptrdiff_t stride, int h, int x, int y) { - const int A = (8 - x) * (8 - y); - const int B = (x) * (8 - y); - const int C = (8 - x) * (y); - const int D = (x) * (y); + union mmi_intfloat64 A, B, C, D; double ftmp[6]; uint32_t tmp[1]; DECLARE_VAR_LOW32; DECLARE_VAR_ADDRT; + A.i = (8 - x) * (8 - y); + B.i = (x) * (8 - y); + C.i = (8 - x) * (y); + D.i = (x) * (y); av_assert2(x < 8 && y < 8 && x >= 0 && y >= 0); @@ -2343,9 +2337,9 @@ void ff_put_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */, [src]"+&r"(src), [dst]"+&r"(dst), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D), - [ff_pw_28]"f"(ff_pw_28) + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [ff_pw_28]"f"(ff_pw_28.f) : "memory" ); } @@ -2354,14 +2348,15 @@ void ff_avg_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */, uint8_t *src /* align 1 */, ptrdiff_t stride, int h, int x, int y) { - const int A = (8 - x) * (8 - y); - const int B = (x) * (8 - y); - const int C = (8 - x) * (y); - const int D = (x) * (y); + union mmi_intfloat64 A, B, C, D; double ftmp[10]; uint32_t tmp[1]; DECLARE_VAR_ALL64; DECLARE_VAR_ADDRT; + A.i = (8 - x) * (8 - y); + B.i = (x) * (8 - y); + C.i = (8 - x) * (y); + D.i = (x) * (y); av_assert2(x < 8 && y < 8 && x >= 0 && y >= 0); @@ -2401,9 +2396,9 @@ void ff_avg_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */, [src]"+&r"(src), [dst]"+&r"(dst), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D), - [ff_pw_28]"f"(ff_pw_28) + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [ff_pw_28]"f"(ff_pw_28.f) : "memory" ); } @@ -2412,14 +2407,15 @@ void ff_avg_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */, uint8_t *src /* align 1 */, ptrdiff_t stride, int h, int x, int y) { - const int A = (8 - x) * (8 - y); - const int B = ( x) * (8 - y); - const int C = (8 - x) * ( y); - const int D = ( x) * ( y); + union mmi_intfloat64 A, B, C, D; double ftmp[6]; uint32_t tmp[1]; DECLARE_VAR_LOW32; DECLARE_VAR_ADDRT; + A.i = (8 - x) * (8 - y); + B.i = (x) * (8 - y); + C.i = (8 - x) * (y); + D.i = (x) * (y); av_assert2(x < 8 && y < 8 && x >= 0 && y >= 0); @@ -2457,9 +2453,9 @@ void ff_avg_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */, [src]"+&r"(src), [dst]"+&r"(dst), [h]"+&r"(h) : [stride]"r"((mips_reg)stride), - [A]"f"(A), [B]"f"(B), - [C]"f"(C), [D]"f"(D), - [ff_pw_28]"f"(ff_pw_28) + [A]"f"(A.f), [B]"f"(B.f), + [C]"f"(C.f), [D]"f"(D.f), + [ff_pw_28]"f"(ff_pw_28.f) : "memory" ); } diff --git a/libavcodec/mips/vp8dsp_mmi.c b/libavcodec/mips/vp8dsp_mmi.c index b352906..327eaf5 100644 --- a/libavcodec/mips/vp8dsp_mmi.c +++ b/libavcodec/mips/vp8dsp_mmi.c @@ -1128,12 +1128,14 @@ void ff_vp8_luma_dc_wht_dc_mmi(int16_t block[4][4][16], int16_t dc[16]) void ff_vp8_idct_add_mmi(uint8_t *dst, int16_t block[16], ptrdiff_t stride) { #if 1 - DECLARE_ALIGNED(8, const uint64_t, ff_ph_4e7b) = {0x4e7b4e7b4e7b4e7bULL}; - DECLARE_ALIGNED(8, const uint64_t, ff_ph_22a3) = {0x22a322a322a322a3ULL}; double ftmp[12]; uint32_t tmp[1]; + union av_intfloat64 ff_ph_4e7b_u; + union av_intfloat64 ff_ph_22a3_u; DECLARE_VAR_LOW32; DECLARE_VAR_ALL64; + ff_ph_4e7b_u.i = 0x4e7b4e7b4e7b4e7bULL; + ff_ph_22a3_u.i = 0x22a322a322a322a3ULL; __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" @@ -1253,8 +1255,8 @@ void ff_vp8_idct_add_mmi(uint8_t *dst, int16_t block[16], ptrdiff_t stride) [tmp0]"=&r"(tmp[0]) : [dst0]"r"(dst), [dst1]"r"(dst+stride), [dst2]"r"(dst+2*stride), [dst3]"r"(dst+3*stride), - [block]"r"(block), [ff_pw_4]"f"(ff_pw_4), - [ff_ph_4e7b]"f"(ff_ph_4e7b), [ff_ph_22a3]"f"(ff_ph_22a3) + [block]"r"(block), [ff_pw_4]"f"(ff_pw_4.f), + [ff_ph_4e7b]"f"(ff_ph_4e7b_u.f), [ff_ph_22a3]"f"(ff_ph_22a3_u.f) : "memory" ); #else @@ -1595,8 +1597,16 @@ void ff_put_vp8_epel16_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, const uint64_t *filter = fourtap_subpel_filters[mx - 1]; double ftmp[9]; uint32_t tmp[1]; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; mips_reg src1, dst1; DECLARE_VAR_ALL64; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; /* dst[0] = cm[(filter[2] * src[0] - filter[1] * src[-1] + filter[3] * src[1] - filter[4] * src[2] + 64) >> 7]; @@ -1644,11 +1654,11 @@ void ff_put_vp8_epel16_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [dst1]"=&r"(dst1), [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter1]"f"(filter[1]), [filter2]"f"(filter[2]), - [filter3]"f"(filter[3]), [filter4]"f"(filter[4]) + [filter1]"f"(filter1.f), [filter2]"f"(filter2.f), + [filter3]"f"(filter3.f), [filter4]"f"(filter4.f) : "memory" ); #else @@ -1672,7 +1682,16 @@ void ff_put_vp8_epel8_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, const uint64_t *filter = fourtap_subpel_filters[mx - 1]; double ftmp[9]; uint32_t tmp[1]; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; DECLARE_VAR_ALL64; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + /* dst[0] = cm[(filter[2] * src[0] - filter[1] * src[-1] + filter[3] * src[1] - filter[4] * src[2] + 64) >> 7]; @@ -1705,11 +1724,11 @@ void ff_put_vp8_epel8_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, RESTRICT_ASM_ALL64 [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter1]"f"(filter[1]), [filter2]"f"(filter[2]), - [filter3]"f"(filter[3]), [filter4]"f"(filter[4]) + [filter1]"f"(filter1.f), [filter2]"f"(filter2.f), + [filter3]"f"(filter3.f), [filter4]"f"(filter4.f) : "memory" ); #else @@ -1733,7 +1752,15 @@ void ff_put_vp8_epel4_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, const uint64_t *filter = fourtap_subpel_filters[mx - 1]; double ftmp[6]; uint32_t tmp[1]; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; DECLARE_VAR_LOW32; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; /* dst[0] = cm[(filter[2] * src[0] - filter[1] * src[-1] + filter[3] * src[1] - filter[4] * src[2] + 64) >> 7]; @@ -1760,11 +1787,11 @@ void ff_put_vp8_epel4_h4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, RESTRICT_ASM_LOW32 [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter1]"f"(filter[1]), [filter2]"f"(filter[2]), - [filter3]"f"(filter[3]), [filter4]"f"(filter[4]) + [filter1]"f"(filter1.f), [filter2]"f"(filter2.f), + [filter3]"f"(filter3.f), [filter4]"f"(filter4.f) : "memory" ); #else @@ -1789,7 +1816,19 @@ void ff_put_vp8_epel16_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[9]; uint32_t tmp[1]; mips_reg src1, dst1; + union av_intfloat64 filter0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; + union av_intfloat64 filter5; DECLARE_VAR_ALL64; + filter0.i = filter[0]; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + filter5.i = filter[5]; /* dst[ 0] = cm[(filter[2]*src[ 0] - filter[1]*src[-1] + filter[0]*src[-2] + filter[3]*src[ 1] - filter[4]*src[ 2] + filter[5]*src[ 3] + 64) >> 7]; @@ -1837,12 +1876,12 @@ void ff_put_vp8_epel16_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [dst1]"=&r"(dst1), [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter0]"f"(filter[0]), [filter1]"f"(filter[1]), - [filter2]"f"(filter[2]), [filter3]"f"(filter[3]), - [filter4]"f"(filter[4]), [filter5]"f"(filter[5]) + [filter0]"f"(filter0.f), [filter1]"f"(filter1.f), + [filter2]"f"(filter2.f), [filter3]"f"(filter3.f), + [filter4]"f"(filter4.f), [filter5]"f"(filter5.f) : "memory" ); #else @@ -1866,7 +1905,19 @@ void ff_put_vp8_epel8_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, const uint64_t *filter = fourtap_subpel_filters[mx - 1]; double ftmp[9]; uint32_t tmp[1]; + union av_intfloat64 filter0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; + union av_intfloat64 filter5; DECLARE_VAR_ALL64; + filter0.i = filter[0]; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + filter5.i = filter[5]; /* dst[0] = cm[(filter[2]*src[0] - filter[1]*src[-1] + filter[0]*src[-2] + filter[3]*src[1] - filter[4]*src[2] + filter[5]*src[ 3] + 64) >> 7]; @@ -1899,12 +1950,12 @@ void ff_put_vp8_epel8_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, RESTRICT_ASM_ALL64 [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter0]"f"(filter[0]), [filter1]"f"(filter[1]), - [filter2]"f"(filter[2]), [filter3]"f"(filter[3]), - [filter4]"f"(filter[4]), [filter5]"f"(filter[5]) + [filter0]"f"(filter0.f), [filter1]"f"(filter1.f), + [filter2]"f"(filter2.f), [filter3]"f"(filter3.f), + [filter4]"f"(filter4.f), [filter5]"f"(filter5.f) : "memory" ); #else @@ -1928,7 +1979,19 @@ void ff_put_vp8_epel4_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, const uint64_t *filter = fourtap_subpel_filters[mx - 1]; double ftmp[6]; uint32_t tmp[1]; + union av_intfloat64 filter0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; + union av_intfloat64 filter5; DECLARE_VAR_LOW32; + filter0.i = filter[0]; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + filter5.i = filter[5]; /* dst[0] = cm[(filter[2]*src[0] - filter[1]*src[-1] + filter[0]*src[-2] + filter[3]*src[1] - filter[4]*src[2] + filter[5]*src[ 3] + 64) >> 7]; @@ -1955,12 +2018,12 @@ void ff_put_vp8_epel4_h6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, RESTRICT_ASM_LOW32 [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter0]"f"(filter[0]), [filter1]"f"(filter[1]), - [filter2]"f"(filter[2]), [filter3]"f"(filter[3]), - [filter4]"f"(filter[4]), [filter5]"f"(filter[5]) + [filter0]"f"(filter0.f), [filter1]"f"(filter1.f), + [filter2]"f"(filter2.f), [filter3]"f"(filter3.f), + [filter4]"f"(filter4.f), [filter5]"f"(filter5.f) : "memory" ); #else @@ -1985,7 +2048,15 @@ void ff_put_vp8_epel16_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[9]; uint32_t tmp[1]; mips_reg src0, src1, dst0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; DECLARE_VAR_ALL64; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; /* dst[0] = cm[(filter[2] * src[0] - filter[1] * src[ -srcstride] + filter[3] * src[ srcstride] - filter[4] * src[ 2*srcstride] + 64) >> 7]; @@ -2034,11 +2105,11 @@ void ff_put_vp8_epel16_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter1]"f"(filter[1]), [filter2]"f"(filter[2]), - [filter3]"f"(filter[3]), [filter4]"f"(filter[4]) + [filter1]"f"(filter1.f), [filter2]"f"(filter2.f), + [filter3]"f"(filter3.f), [filter4]"f"(filter4.f) : "memory" ); #else @@ -2063,7 +2134,15 @@ void ff_put_vp8_epel8_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[9]; uint32_t tmp[1]; mips_reg src1; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; DECLARE_VAR_ALL64; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; /* dst[0] = cm[(filter[2] * src[0] - filter[1] * src[ -srcstride] + filter[3] * src[ srcstride] - filter[4] * src[ 2*srcstride] + 64) >> 7]; @@ -2097,11 +2176,11 @@ void ff_put_vp8_epel8_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter1]"f"(filter[1]), [filter2]"f"(filter[2]), - [filter3]"f"(filter[3]), [filter4]"f"(filter[4]) + [filter1]"f"(filter1.f), [filter2]"f"(filter2.f), + [filter3]"f"(filter3.f), [filter4]"f"(filter4.f) : "memory" ); #else @@ -2126,7 +2205,15 @@ void ff_put_vp8_epel4_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[6]; uint32_t tmp[1]; mips_reg src1; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; DECLARE_VAR_LOW32; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; /* dst[0] = cm[(filter[2] * src[0] - filter[1] * src[ -srcstride] + filter[3] * src[ srcstride] - filter[4] * src[ 2*srcstride] + 64) >> 7]; @@ -2154,11 +2241,11 @@ void ff_put_vp8_epel4_v4_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter1]"f"(filter[1]), [filter2]"f"(filter[2]), - [filter3]"f"(filter[3]), [filter4]"f"(filter[4]) + [filter1]"f"(filter1.f), [filter2]"f"(filter2.f), + [filter3]"f"(filter3.f), [filter4]"f"(filter4.f) : "memory" ); #else @@ -2183,7 +2270,19 @@ void ff_put_vp8_epel16_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[9]; uint32_t tmp[1]; mips_reg src0, src1, dst0; + union av_intfloat64 filter0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; + union av_intfloat64 filter5; DECLARE_VAR_ALL64; + filter0.i = filter[0]; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + filter5.i = filter[5]; /* dst[0] = cm[(filter[2]*src[0] - filter[1]*src[0-srcstride] + filter[0]*src[0-2*srcstride] + filter[3]*src[0+srcstride] - filter[4]*src[0+2*srcstride] + filter[5]*src[0+3*srcstride] + 64) >> 7]; @@ -2232,12 +2331,12 @@ void ff_put_vp8_epel16_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter0]"f"(filter[0]), [filter1]"f"(filter[1]), - [filter2]"f"(filter[2]), [filter3]"f"(filter[3]), - [filter4]"f"(filter[4]), [filter5]"f"(filter[5]) + [filter0]"f"(filter0.f), [filter1]"f"(filter1.f), + [filter2]"f"(filter2.f), [filter3]"f"(filter3.f), + [filter4]"f"(filter4.f), [filter5]"f"(filter5.f) : "memory" ); #else @@ -2262,7 +2361,19 @@ void ff_put_vp8_epel8_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[9]; uint32_t tmp[1]; mips_reg src1; + union av_intfloat64 filter0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; + union av_intfloat64 filter5; DECLARE_VAR_ALL64; + filter0.i = filter[0]; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + filter5.i = filter[5]; /* dst[0] = cm[(filter[2]*src[0] - filter[1]*src[0-srcstride] + filter[0]*src[0-2*srcstride] + filter[3]*src[0+srcstride] - filter[4]*src[0+2*srcstride] + filter[5]*src[0+3*srcstride] + 64) >> 7]; @@ -2296,12 +2407,12 @@ void ff_put_vp8_epel8_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter0]"f"(filter[0]), [filter1]"f"(filter[1]), - [filter2]"f"(filter[2]), [filter3]"f"(filter[3]), - [filter4]"f"(filter[4]), [filter5]"f"(filter[5]) + [filter0]"f"(filter0.f), [filter1]"f"(filter1.f), + [filter2]"f"(filter2.f), [filter3]"f"(filter3.f), + [filter4]"f"(filter4.f), [filter5]"f"(filter5.f) : "memory" ); #else @@ -2326,7 +2437,19 @@ void ff_put_vp8_epel4_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, double ftmp[6]; uint32_t tmp[1]; mips_reg src1; + union av_intfloat64 filter0; + union av_intfloat64 filter1; + union av_intfloat64 filter2; + union av_intfloat64 filter3; + union av_intfloat64 filter4; + union av_intfloat64 filter5; DECLARE_VAR_LOW32; + filter0.i = filter[0]; + filter1.i = filter[1]; + filter2.i = filter[2]; + filter3.i = filter[3]; + filter4.i = filter[4]; + filter5.i = filter[5]; /* dst[0] = cm[(filter[2]*src[0] - filter[1]*src[0-srcstride] + filter[0]*src[0-2*srcstride] + filter[3]*src[0+srcstride] - filter[4]*src[0+2*srcstride] + filter[5]*src[0+3*srcstride] + 64) >> 7]; @@ -2354,12 +2477,12 @@ void ff_put_vp8_epel4_v6_mmi(uint8_t *dst, ptrdiff_t dststride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src) - : [ff_pw_64]"f"(ff_pw_64), + : [ff_pw_64]"f"(ff_pw_64.f), [srcstride]"r"((mips_reg)srcstride), [dststride]"r"((mips_reg)dststride), - [filter0]"f"(filter[0]), [filter1]"f"(filter[1]), - [filter2]"f"(filter[2]), [filter3]"f"(filter[3]), - [filter4]"f"(filter[4]), [filter5]"f"(filter[5]) + [filter0]"f"(filter0.f), [filter1]"f"(filter1.f), + [filter2]"f"(filter2.f), [filter3]"f"(filter3.f), + [filter4]"f"(filter4.f), [filter5]"f"(filter5.f) : "memory" ); #else @@ -2847,11 +2970,13 @@ void ff_put_vp8_bilinear16_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, ptrdiff_t sstride, int h, int mx, int my) { #if 1 - int a = 8 - mx, b = mx; + union mmi_intfloat64 a, b; double ftmp[7]; uint32_t tmp[1]; mips_reg dst0, src0; DECLARE_VAR_ALL64; + a.i = 8 - mx; + b.i = mx; /* dst[0] = (a * src[0] + b * src[1] + 4) >> 3; @@ -2900,10 +3025,10 @@ void ff_put_vp8_bilinear16_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, [dst0]"=&r"(dst0), [src0]"=&r"(src0), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src), - [a]"+&f"(a), [b]"+&f"(b) + [a]"+&f"(a.f), [b]"+&f"(b.f) : [sstride]"r"((mips_reg)sstride), [dstride]"r"((mips_reg)dstride), - [ff_pw_4]"f"(ff_pw_4) + [ff_pw_4]"f"(ff_pw_4.f) : "memory" ); #else @@ -2923,11 +3048,13 @@ void ff_put_vp8_bilinear16_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, ptrdiff_t sstride, int h, int mx, int my) { #if 1 - int c = 8 - my, d = my; + union mmi_intfloat64 c, d; double ftmp[7]; uint32_t tmp[1]; mips_reg src0, src1, dst0; DECLARE_VAR_ALL64; + c.i = 8 - my; + d.i = my; /* dst[0] = (c * src[0] + d * src[ sstride] + 4) >> 3; @@ -2968,10 +3095,10 @@ void ff_put_vp8_bilinear16_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src), - [c]"+&f"(c), [d]"+&f"(d) + [c]"+&f"(c.f), [d]"+&f"(d.f) : [sstride]"r"((mips_reg)sstride), [dstride]"r"((mips_reg)dstride), - [ff_pw_4]"f"(ff_pw_4) + [ff_pw_4]"f"(ff_pw_4.f) : "memory" ); #else @@ -3025,10 +3152,12 @@ void ff_put_vp8_bilinear8_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, ptrdiff_t sstride, int h, int mx, int my) { #if 1 - int a = 8 - mx, b = mx; + union mmi_intfloat64 a, b; double ftmp[7]; uint32_t tmp[1]; DECLARE_VAR_ALL64; + a.i = 8 - mx; + b.i = mx; /* dst[0] = (a * src[0] + b * src[1] + 4) >> 3; @@ -3062,10 +3191,10 @@ void ff_put_vp8_bilinear8_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, RESTRICT_ASM_ALL64 [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src), - [a]"+&f"(a), [b]"+&f"(b) + [a]"+&f"(a.f), [b]"+&f"(b.f) : [sstride]"r"((mips_reg)sstride), [dstride]"r"((mips_reg)dstride), - [ff_pw_4]"f"(ff_pw_4) + [ff_pw_4]"f"(ff_pw_4.f) : "memory" ); #else @@ -3085,11 +3214,13 @@ void ff_put_vp8_bilinear8_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, ptrdiff_t sstride, int h, int mx, int my) { #if 1 - int c = 8 - my, d = my; + union mmi_intfloat64 c, d; double ftmp[7]; uint32_t tmp[1]; mips_reg src1; DECLARE_VAR_ALL64; + c.i = 8 - my; + d.i = my; /* dst[0] = (c * src[0] + d * src[ sstride] + 4) >> 3; @@ -3124,10 +3255,10 @@ void ff_put_vp8_bilinear8_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src), - [c]"+&f"(c), [d]"+&f"(d) + [c]"+&f"(c.f), [d]"+&f"(d.f) : [sstride]"r"((mips_reg)sstride), [dstride]"r"((mips_reg)dstride), - [ff_pw_4]"f"(ff_pw_4) + [ff_pw_4]"f"(ff_pw_4.f) : "memory" ); #else @@ -3181,11 +3312,13 @@ void ff_put_vp8_bilinear4_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, ptrdiff_t sstride, int h, int mx, int my) { #if 1 - int a = 8 - mx, b = mx; + union mmi_intfloat64 a, b; double ftmp[5]; uint32_t tmp[1]; DECLARE_VAR_LOW32; DECLARE_VAR_ALL64; + a.i = 8 - mx; + b.i = mx; /* dst[0] = (a * src[0] + b * src[1] + 4) >> 3; @@ -3215,10 +3348,10 @@ void ff_put_vp8_bilinear4_h_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, RESTRICT_ASM_ALL64 [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src), - [a]"+&f"(a), [b]"+&f"(b) + [a]"+&f"(a.f), [b]"+&f"(b.f) : [sstride]"r"((mips_reg)sstride), [dstride]"r"((mips_reg)dstride), - [ff_pw_4]"f"(ff_pw_4) + [ff_pw_4]"f"(ff_pw_4.f) : "memory" ); #else @@ -3238,12 +3371,14 @@ void ff_put_vp8_bilinear4_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, ptrdiff_t sstride, int h, int mx, int my) { #if 1 - int c = 8 - my, d = my; + union mmi_intfloat64 c, d; double ftmp[7]; uint32_t tmp[1]; mips_reg src1; DECLARE_VAR_LOW32; DECLARE_VAR_ALL64; + c.i = 8 - my; + d.i = my; /* dst[0] = (c * src[0] + d * src[ sstride] + 4) >> 3; @@ -3274,10 +3409,10 @@ void ff_put_vp8_bilinear4_v_mmi(uint8_t *dst, ptrdiff_t dstride, uint8_t *src, [src1]"=&r"(src1), [h]"+&r"(h), [dst]"+&r"(dst), [src]"+&r"(src), - [c]"+&f"(c), [d]"+&f"(d) + [c]"+&f"(c.f), [d]"+&f"(d.f) : [sstride]"r"((mips_reg)sstride), [dstride]"r"((mips_reg)dstride), - [ff_pw_4]"f"(ff_pw_4) + [ff_pw_4]"f"(ff_pw_4.f) : "memory" ); #else diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h index 76bb2b9..659342b 100644 --- a/libavutil/mips/asmdefs.h +++ b/libavutil/mips/asmdefs.h @@ -27,6 +27,8 @@ #ifndef AVUTIL_MIPS_ASMDEFS_H #define AVUTIL_MIPS_ASMDEFS_H +#include + #if defined(_ABI64) && _MIPS_SIM == _ABI64 # define mips_reg int64_t # define PTRSIZE " 8 " @@ -97,4 +99,10 @@ __asm__(".macro parse_r var r\n\t" ".endif\n\t" ".endm"); +/* General union structure for clang adaption */ +union mmi_intfloat64 { + int64_t i; + double f; +}; + #endif /* AVCODEC_MIPS_ASMDEFS_H */ From patchwork Fri May 28 02:04:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6YeR5rOi?= X-Patchwork-Id: 27960 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:b214:0:0:0:0:0 with SMTP id b20csp131217iof; Thu, 27 May 2021 19:05:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyGLUoys608OKdHXMNAgUeHO4ZqQE60BT0T0yVU7D6bOdlZm4dkEjZxIOVFQgz/z6oEhNJw X-Received: by 2002:a05:6402:40c:: with SMTP id q12mr7282082edv.0.1622167529512; Thu, 27 May 2021 19:05:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622167529; cv=none; d=google.com; s=arc-20160816; b=lnuAKm9h/H2/+kp7kh0+AjTEsmYCqUsQ9F7FfV+hTz0xJDxiAvf3J+Khfpe/C7naHd zQWDVgtnT8D9YfCDccOyoSgHxpbFWfAToN+LVU35julz2FTvHfuGQVX8YVJbD9n6sLW6 vXjIDhblk98XE5Ps8Wb430BirSM/GFMFESy+aHZfnZyNV8ocZP1WKa0wcWHJ8M60dxYL wgzkOo7XwxebbxGdhzeY0ddw4R+d+W5pc9h9auTCPNo4BJBt0TpLzVe88pKfwO7jTnkk qkc0vuz71rtdBDJzF6S28yg+Ec7fLVfuytVrI1Qt4A0M6nJc9nLp67ypTnFDCs7QUANO WYZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=GricPjPZFG0gONliu/yu2COIlC6q27KRcW/wUPLrbkQ=; b=V7HakyywAmB4Lub5HBy0F9rMzush46BHb29eCXF7QI/HSBS9RS2zohwa2MVghLykTq QSKK9s4CbaAqUnZ5qxYnLX6v5OAKv9T6I0MqxiN9RFhlXROU5cfVEkQA04Z+I+7EDpMo wOny9As8Vyvokd8SE9Tv3CPgoualhoYBSmGQJlimhviTv4EmfTCiM1DKI1qPRl/zEwIY gaOj9wH+RWuwKycVMujrA+m68G9TCOHyxW+TIJrh7+e/Ud//rn4R9ToV/2EcFgr0WmzV OJUuEosgap3jGvBiRTJO7AFGX2WYJvMnxN5ar06i64xFBNrxo/mqoxEF+ekATgEwFsUG DhoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u17si3688596edr.539.2021.05.27.19.05.24; Thu, 27 May 2021 19:05:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8AA3568A03A; Fri, 28 May 2021 05:05:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A01B6689FB5 for ; Fri, 28 May 2021 05:04:55 +0300 (EEST) Received: from localhost (unknown [36.33.26.144]) by mail.loongson.cn (Coremail) with SMTP id AQAAf9DxD0LDT7BgN5QFAA--.5302S3; Fri, 28 May 2021 10:04:52 +0800 (CST) From: Jin Bo To: ffmpeg-devel@ffmpeg.org Date: Fri, 28 May 2021 10:04:41 +0800 Message-Id: <1622167481-10973-3-git-send-email-jinbo@loongson.cn> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1622167481-10973-1-git-send-email-jinbo@loongson.cn> References: <1622167481-10973-1-git-send-email-jinbo@loongson.cn> X-CM-TRANSID: AQAAf9DxD0LDT7BgN5QFAA--.5302S3 X-Coremail-Antispam: 1UD129KBjvJXoWxAr4DJw13CFyfXr4rJryUGFg_yoW5GrWUpF yfGw4jy3W0ya1fur9rAr10gasrCr4DXrs7ZwsrtFyrJ390vF18WrWxX342g34rAF1vq3Wj vrWjqa45ZasrZr7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkv14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j 6r4UJwA2z4x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E14v26F 4UJVW0owAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv 7VC0I7IYx2IY67AKxVWUAVWUtwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r 1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY02Avz4vE14v_ Gr1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxV WUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1Y6r17MIIYrxkI 7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r 1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI 42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x0JU3pnPUUUUU= X-CM-SenderInfo: xmlqu0o6or00hjvr0hdfq/1tbiAQAPEl3QvNWC-gAGsC Subject: [FFmpeg-devel] [PATCH 3/3] libavcodec/mips: Fix fate errors reported by clang X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Jin Bo MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: PdEJg4Ubsbu3 The data width of gsldrc1/gsldlc1 should be 8 bytes wide. Signed-off-by: Jin Bo --- libavcodec/mips/vp9_mc_mmi.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c index fa65ff5..812f7a6 100644 --- a/libavcodec/mips/vp9_mc_mmi.c +++ b/libavcodec/mips/vp9_mc_mmi.c @@ -83,9 +83,9 @@ static void convolve_horiz_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "move %[tmp1], %[width] \n\t" "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[filter1], 0x03(%[filter]) \n\t" + "gsldlc1 %[filter1], 0x07(%[filter]) \n\t" "gsldrc1 %[filter1], 0x00(%[filter]) \n\t" - "gsldlc1 %[filter2], 0x0b(%[filter]) \n\t" + "gsldlc1 %[filter2], 0x0f(%[filter]) \n\t" "gsldrc1 %[filter2], 0x08(%[filter]) \n\t" "li %[tmp0], 0x07 \n\t" "dmtc1 %[tmp0], %[ftmp13] \n\t" @@ -158,9 +158,9 @@ static void convolve_vert_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[ftmp4], 0x03(%[filter]) \n\t" + "gsldlc1 %[ftmp4], 0x07(%[filter]) \n\t" "gsldrc1 %[ftmp4], 0x00(%[filter]) \n\t" - "gsldlc1 %[ftmp5], 0x0b(%[filter]) \n\t" + "gsldlc1 %[ftmp5], 0x0f(%[filter]) \n\t" "gsldrc1 %[ftmp5], 0x08(%[filter]) \n\t" "punpcklwd %[filter10], %[ftmp4], %[ftmp4] \n\t" "punpckhwd %[filter32], %[ftmp4], %[ftmp4] \n\t" @@ -254,9 +254,9 @@ static void convolve_avg_horiz_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "move %[tmp1], %[width] \n\t" "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[filter1], 0x03(%[filter]) \n\t" + "gsldlc1 %[filter1], 0x07(%[filter]) \n\t" "gsldrc1 %[filter1], 0x00(%[filter]) \n\t" - "gsldlc1 %[filter2], 0x0b(%[filter]) \n\t" + "gsldlc1 %[filter2], 0x0f(%[filter]) \n\t" "gsldrc1 %[filter2], 0x08(%[filter]) \n\t" "li %[tmp0], 0x07 \n\t" "dmtc1 %[tmp0], %[ftmp13] \n\t" @@ -340,9 +340,9 @@ static void convolve_avg_vert_mmi(const uint8_t *src, int32_t src_stride, __asm__ volatile ( "pxor %[ftmp0], %[ftmp0], %[ftmp0] \n\t" - "gsldlc1 %[ftmp4], 0x03(%[filter]) \n\t" + "gsldlc1 %[ftmp4], 0x07(%[filter]) \n\t" "gsldrc1 %[ftmp4], 0x00(%[filter]) \n\t" - "gsldlc1 %[ftmp5], 0x0b(%[filter]) \n\t" + "gsldlc1 %[ftmp5], 0x0f(%[filter]) \n\t" "gsldrc1 %[ftmp5], 0x08(%[filter]) \n\t" "punpcklwd %[filter10], %[ftmp4], %[ftmp4] \n\t" "punpckhwd %[filter32], %[ftmp4], %[ftmp4] \n\t"