From patchwork Tue Mar 30 12:51:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyou Yin X-Patchwork-Id: 26665 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 60C4544BD83 for ; Tue, 30 Mar 2021 15:52:19 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4935B689F3A; Tue, 30 Mar 2021 15:52:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A65F2688152 for ; Tue, 30 Mar 2021 15:52:09 +0300 (EEST) Received: from localhost (unknown [36.33.26.144]) by mail.loongson.cn (Coremail) with SMTP id AQAAf9Ax7cj3HmNgBWQCAA--.2312S3; Tue, 30 Mar 2021 20:52:07 +0800 (CST) From: Shiyou Yin To: ffmpeg-devel@ffmpeg.org Date: Tue, 30 Mar 2021 20:51:55 +0800 Message-Id: <1617108715-24232-6-git-send-email-yinshiyou-hf@loongson.cn> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1617108715-24232-1-git-send-email-yinshiyou-hf@loongson.cn> References: <1617108715-24232-1-git-send-email-yinshiyou-hf@loongson.cn> X-CM-TRANSID: AQAAf9Ax7cj3HmNgBWQCAA--.2312S3 X-Coremail-Antispam: 1UD129KBjvJXoW3AFy8CFWxXr4UuF4xXr4UXFb_yoW7Xr18pr 4fuaySgryUXFyj9wnrAwn5Cw15tr4kGFW2yFWUGw1fW3s8Ca47tr9aqr4fZFyUWFWrAF1x Was7Kw17GrsxAr7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkFb7Iv0xC_Kw4lb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I2 0VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rw A2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xII jxv20xvEc7CjxVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I 8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI 64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv67AKxVWxJVW8Jr 1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxkIecxEwVAFwVW5JwCF04k2 0xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI 8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jr0_JrylIxkGc2Ij64vIr41l IxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIx AIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2 jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07bOGQgUUUUU= X-CM-SenderInfo: p1lq2x5l1r3gtki6z05rqj20fqof0/ Subject: [FFmpeg-devel] [PATCH v3 5/5] mips: Fix potential illegal instruction error. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" MSA2 optimizations are attached to MSA macros in generic_macros_msa.h. It's difficult to do runtime check for them. Remove this part of code can make it more robust. H264 1080p decoding: 5.13x==>5.12x. --- configure | 7 +------ libavutil/mips/generic_macros_msa.h | 37 ------------------------------------- 2 files changed, 1 insertion(+), 43 deletions(-) diff --git a/configure b/configure index d7a3f50..7b05612 100755 --- a/configure +++ b/configure @@ -451,7 +451,6 @@ Optimization options (experts only): --disable-mipsdsp disable MIPS DSP ASE R1 optimizations --disable-mipsdspr2 disable MIPS DSP ASE R2 optimizations --disable-msa disable MSA optimizations - --disable-msa2 disable MSA2 optimizations --disable-mipsfpu disable floating point MIPS optimizations --disable-mmi disable Loongson SIMD optimizations --disable-fast-unaligned consider unaligned accesses slow @@ -2025,7 +2024,6 @@ ARCH_EXT_LIST_MIPS=" mipsdsp mipsdspr2 msa - msa2 " ARCH_EXT_LIST_LOONGSON=" @@ -2564,7 +2562,6 @@ mipsdsp_deps="mips" mipsdspr2_deps="mips" mmi_deps_any="loongson2 loongson3" msa_deps="mipsfpu" -msa2_deps="msa" cpunop_deps="i686" x86_64_select="i686" @@ -5907,9 +5904,8 @@ elif enabled mips; then enabled mipsdsp && check_inline_asm_flags mipsdsp '"addu.qb $t0, $t1, $t2"' '-mdsp' enabled mipsdspr2 && check_inline_asm_flags mipsdspr2 '"absq_s.qb $t0, $t1"' '-mdspr2' - # MSA and MSA2 can be detected at runtime so we supply extra flags here + # MSA can be detected at runtime so we supply extra flags here enabled mipsfpu && enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 1"' '-mmsa' && append MSAFLAGS '-mmsa' - enabled msa && enabled msa2 && check_inline_asm msa2 '"nxbits.any.b $w0, $w0"' '-mmsa2' && append MSAFLAGS '-mmsa2' # loongson2 have no switch cflag so we can only probe toolchain ability enabled loongson2 && check_inline_asm loongson2 '"dmult.g $8, $9, $10"' && disable loongson3 @@ -7340,7 +7336,6 @@ if enabled mips; then echo "MIPS DSP R1 enabled ${mipsdsp-no}" echo "MIPS DSP R2 enabled ${mipsdspr2-no}" echo "MIPS MSA enabled ${msa-no}" - echo "MIPS MSA2 enabled ${msa2-no}" echo "LOONGSON MMI enabled ${mmi-no}" fi if enabled ppc; then diff --git a/libavutil/mips/generic_macros_msa.h b/libavutil/mips/generic_macros_msa.h index bb25e9f..1486f72 100644 --- a/libavutil/mips/generic_macros_msa.h +++ b/libavutil/mips/generic_macros_msa.h @@ -25,10 +25,6 @@ #include #include -#if HAVE_MSA2 -#include -#endif - #define ALIGNMENT 16 #define ALLOC_ALIGNED(align) __attribute__ ((aligned((align) << 1))) @@ -1119,15 +1115,6 @@ unsigned absolute diff values, even-odd pairs are added together to generate 8 halfword results. */ -#if HAVE_MSA2 -#define SAD_UB2_UH(in0, in1, ref0, ref1) \ -( { \ - v8u16 sad_m = { 0 }; \ - sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in0, (v16u8) ref0); \ - sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in1, (v16u8) ref1); \ - sad_m; \ -} ) -#else #define SAD_UB2_UH(in0, in1, ref0, ref1) \ ( { \ v16u8 diff0_m, diff1_m; \ @@ -1141,7 +1128,6 @@ \ sad_m; \ } ) -#endif // #if HAVE_MSA2 /* Description : Insert specified word elements from input vectors to 1 destination vector @@ -2183,12 +2169,6 @@ extracted and interleaved with same vector 'in0' to generate 4 word elements keeping sign intact */ -#if HAVE_MSA2 -#define UNPCK_R_SH_SW(in, out) \ -{ \ - out = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \ -} -#else #define UNPCK_R_SH_SW(in, out) \ { \ v8i16 sign_m; \ @@ -2196,7 +2176,6 @@ sign_m = __msa_clti_s_h((v8i16) in, 0); \ out = (v4i32) __msa_ilvr_h(sign_m, (v8i16) in); \ } -#endif // #if HAVE_MSA2 /* Description : Sign extend byte elements from input vector and return halfword results in pair of vectors @@ -2209,13 +2188,6 @@ Then interleaved left with same vector 'in0' to generate 8 signed halfword elements in 'out1' */ -#if HAVE_MSA2 -#define UNPCK_SB_SH(in, out0, out1) \ -{ \ - out0 = (v4i32) __builtin_msa2_w2x_lo_s_b((v16i8) in); \ - out1 = (v4i32) __builtin_msa2_w2x_hi_s_b((v16i8) in); \ -} -#else #define UNPCK_SB_SH(in, out0, out1) \ { \ v16i8 tmp_m; \ @@ -2223,7 +2195,6 @@ tmp_m = __msa_clti_s_b((v16i8) in, 0); \ ILVRL_B2_SH(tmp_m, in, out0, out1); \ } -#endif // #if HAVE_MSA2 /* Description : Zero extend unsigned byte elements to halfword elements Arguments : Inputs - in (1 input unsigned byte vector) @@ -2250,13 +2221,6 @@ Then interleaved left with same vector 'in0' to generate 4 signed word elements in 'out1' */ -#if HAVE_MSA2 -#define UNPCK_SH_SW(in, out0, out1) \ -{ \ - out0 = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \ - out1 = (v4i32) __builtin_msa2_w2x_hi_s_h((v8i16) in); \ -} -#else #define UNPCK_SH_SW(in, out0, out1) \ { \ v8i16 tmp_m; \ @@ -2264,7 +2228,6 @@ tmp_m = __msa_clti_s_h((v8i16) in, 0); \ ILVRL_H2_SW(tmp_m, in, out0, out1); \ } -#endif // #if HAVE_MSA2 /* Description : Swap two variables Arguments : Inputs - in0, in1