From patchwork Fri Jul 28 08:42:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: kaustubh.raste@imgtec.com X-Patchwork-Id: 4482 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp96071vsb; Fri, 28 Jul 2017 01:41:38 -0700 (PDT) X-Received: by 10.28.153.21 with SMTP id b21mr4993355wme.96.1501231298167; Fri, 28 Jul 2017 01:41:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501231298; cv=none; d=google.com; s=arc-20160816; b=a69dilHp3wXCddnsgFndWOUXtYSFFEVDbn3c1MPKmgwtXbmhr867CR1Ey2HrKTyJfI sj0vt76DtJg+t1UXb9e1Ms04E3FGLaoKTNpM53o3WZJqZAe8pDk1APd7DaXkw5RY1btD FQHr02kvwRYkkTPgJQYi+r+8M9sAVoOKofVNv+ucJl4pm862JHDjYUC+xWFwi+bQj/Fe f7fahxdHsz8kH4LDVaBFnnNmgoyg4UpLU8oDUwrvPkJNSFCuPi/OdlW0W8ersnhn4NnX rhnl+VexEUqbimEriJnOsT9JIKHSAErRY7MEGy9DHEUKzf03a/l6JMV0Zbx2NL5waDRQ 9IxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :delivered-to:arc-authentication-results; bh=sBfYYOeaRnqT0qZXCHk81SUj6uoGaiBvYxEvX2PtdmE=; b=EOykEvKfOwdvCCFlqpmmStFM2U+JeiZiwaMI6/UtkpvFRsIWcEEWEz1i6n8g9h8Ll2 4UvCpJPNuKivOj4Penc8He5FvCN9DF0OiyrcQHOhZNuj9P+7DNsvc5P5hTTHnZ7VroEQ qoSulJQrWtQ/NrBqIAltlAP4/tYfQPvkkHGcGdr9Bu5qJA6Ji5xk0CaqwuAdvjdYZVMA dKjNlzsPZl/PwdmHRS+TYKQQQkPwXg76gyq2vGid7+rPd9edQzMwBGdy9ErjNkjevB8q DTbP6K+N3WT6EGkRIfCC65BV1RDQMq5K5qG+eDwX13Eu4AiIihZJFvQkP0Pt20eCv9Yz XZlg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l12si3148970wmg.260.2017.07.28.01.41.37; Fri, 28 Jul 2017 01:41:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 485DA6899B7; Fri, 28 Jul 2017 11:41:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E65C1689938 for ; Fri, 28 Jul 2017 11:41:27 +0300 (EEST) Received: from hhmail02.hh.imgtec.org (unknown [10.100.10.20]) by Forcepoint Email with ESMTPS id 9D5EEC4C4E19C for ; Fri, 28 Jul 2017 09:41:24 +0100 (IST) Received: from pudesk204.pu.imgtec.org (192.168.91.13) by hhmail02.hh.imgtec.org (10.100.10.20) with Microsoft SMTP Server (TLS) id 14.3.294.0; Fri, 28 Jul 2017 09:41:27 +0100 From: To: Date: Fri, 28 Jul 2017 14:12:06 +0530 Message-ID: <1501231326-16802-1-git-send-email-kaustubh.raste@imgtec.com> X-Mailer: git-send-email 1.7.9.5 MIME-Version: 1.0 X-Originating-IP: [192.168.91.13] Subject: [FFmpeg-devel] [PATCH] libavcodec/mips: Improve avc dequant-idct luma dc msa function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Kaustubh Raste Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Kaustubh Raste Signed-off-by: Kaustubh Raste --- libavcodec/mips/h264idct_msa.c | 66 +++++++++++++++++++--------------------- 1 file changed, 32 insertions(+), 34 deletions(-) diff --git a/libavcodec/mips/h264idct_msa.c b/libavcodec/mips/h264idct_msa.c index 81e09e9..861befe 100644 --- a/libavcodec/mips/h264idct_msa.c +++ b/libavcodec/mips/h264idct_msa.c @@ -40,17 +40,20 @@ static void avc_deq_idct_luma_dc_msa(int16_t *dst, int16_t *src, int32_t de_q_val) { #define DC_DEST_STRIDE 16 - int16_t out0, out1, out2, out3; - v8i16 src0, src1, src2, src3; + int16_t out0, out1, out2, out3, out4, out5, out6, out7; + v8i16 src1, src3; v8i16 vec0, vec1, vec2, vec3; + v8i16 tmp0, tmp1, tmp2, tmp3; v8i16 hres0, hres1, hres2, hres3; v8i16 vres0, vres1, vres2, vres3; v4i32 vres0_r, vres1_r, vres2_r, vres3_r; - v4i32 de_q_vec = __msa_fill_w(de_q_val); + const v4i32 de_q_vec = __msa_fill_w(de_q_val); + const v8i16 src0 = LD_SH(src); + const v8i16 src2 = LD_SH(src + 8); - LD4x4_SH(src, src0, src1, src2, src3); - TRANSPOSE4x4_SH_SH(src0, src1, src2, src3, src0, src1, src2, src3); - BUTTERFLY_4(src0, src2, src3, src1, vec0, vec3, vec2, vec1); + ILVL_D2_SH(src0, src0, src2, src2, src1, src3); + TRANSPOSE4x4_SH_SH(src0, src1, src2, src3, tmp0, tmp1, tmp2, tmp3); + BUTTERFLY_4(tmp0, tmp2, tmp3, tmp1, vec0, vec3, vec2, vec1); BUTTERFLY_4(vec0, vec1, vec2, vec3, hres0, hres3, hres2, hres1); TRANSPOSE4x4_SH_SH(hres0, hres1, hres2, hres3, hres0, hres1, hres2, hres3); BUTTERFLY_4(hres0, hres1, hres3, hres2, vec0, vec3, vec2, vec1); @@ -72,40 +75,35 @@ static void avc_deq_idct_luma_dc_msa(int16_t *dst, int16_t *src, out1 = __msa_copy_s_h(vec0, 1); out2 = __msa_copy_s_h(vec0, 2); out3 = __msa_copy_s_h(vec0, 3); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); + out4 = __msa_copy_s_h(vec0, 4); + out5 = __msa_copy_s_h(vec0, 5); + out6 = __msa_copy_s_h(vec0, 6); + out7 = __msa_copy_s_h(vec0, 7); + SH(out0, (dst + 0 * DC_DEST_STRIDE)); + SH(out1, (dst + 2 * DC_DEST_STRIDE)); + SH(out2, (dst + 8 * DC_DEST_STRIDE)); SH(out3, (dst + 10 * DC_DEST_STRIDE)); - dst += DC_DEST_STRIDE; - - out0 = __msa_copy_s_h(vec0, 4); - out1 = __msa_copy_s_h(vec0, 5); - out2 = __msa_copy_s_h(vec0, 6); - out3 = __msa_copy_s_h(vec0, 7); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); - SH(out3, (dst + 10 * DC_DEST_STRIDE)); - dst += (3 * DC_DEST_STRIDE); + SH(out4, (dst + 1 * DC_DEST_STRIDE)); + SH(out5, (dst + 3 * DC_DEST_STRIDE)); + SH(out6, (dst + 9 * DC_DEST_STRIDE)); + SH(out7, (dst + 11 * DC_DEST_STRIDE)); out0 = __msa_copy_s_h(vec1, 0); out1 = __msa_copy_s_h(vec1, 1); out2 = __msa_copy_s_h(vec1, 2); out3 = __msa_copy_s_h(vec1, 3); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); - SH(out3, (dst + 10 * DC_DEST_STRIDE)); - dst += DC_DEST_STRIDE; - - out0 = __msa_copy_s_h(vec1, 4); - out1 = __msa_copy_s_h(vec1, 5); - out2 = __msa_copy_s_h(vec1, 6); - out3 = __msa_copy_s_h(vec1, 7); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); - SH(out3, (dst + 10 * DC_DEST_STRIDE)); + out4 = __msa_copy_s_h(vec1, 4); + out5 = __msa_copy_s_h(vec1, 5); + out6 = __msa_copy_s_h(vec1, 6); + out7 = __msa_copy_s_h(vec1, 7); + SH(out0, (dst + 4 * DC_DEST_STRIDE)); + SH(out1, (dst + 6 * DC_DEST_STRIDE)); + SH(out2, (dst + 12 * DC_DEST_STRIDE)); + SH(out3, (dst + 14 * DC_DEST_STRIDE)); + SH(out4, (dst + 5 * DC_DEST_STRIDE)); + SH(out5, (dst + 7 * DC_DEST_STRIDE)); + SH(out6, (dst + 13 * DC_DEST_STRIDE)); + SH(out7, (dst + 15 * DC_DEST_STRIDE)); #undef DC_DEST_STRIDE }