From patchwork Fri Jul 28 11:17:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manojkumar Bhosale X-Patchwork-Id: 4484 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp235090vsb; Fri, 28 Jul 2017 04:17:43 -0700 (PDT) X-Received: by 10.223.163.158 with SMTP id l30mr6747962wrb.203.1501240663037; Fri, 28 Jul 2017 04:17:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501240663; cv=none; d=google.com; s=arc-20160816; b=AbXtzgvn5lQv2eLkel9dtBTnxv3qUxfGgtrrRzMQBdhDdVJ+1A4UyxUqi58sfuqdz7 8OuZ9B3QYsBk3rkYpb1ZjvIlkLRxtfzBb2I1VVn38jQRI+WzpUFxpU98a82SMiBlWB3T a/8bEX06ZKvd/J6mPM7ApeoHoZxE5PfKQC0rnCx84Kptgsf2lMZ01Roi9Kp+v6M7bmpl xa37RZ2Sw/a6quz6ZT8/5OTte19/36RVYeOvml/CUbW7/XrCX/lHW2h3suwITXPde8gG uY7vapRd/3ey7QzzpWnAOxAY+7kkIBA6zWmAdMsADy5Zdke4le0Wk9XKRvJyekSvUBS3 9RXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:to:from:delivered-to:arc-authentication-results; bh=L7+wzglKxKqmMmd3+B3XtUHK9Vnu4yo+7aQ14UqnNbg=; b=pHEHuTuQh5QvxS/Ywwc/GcfAOpipNnh1zFbUa8bDpL0/EuGIm+NApE08vJ6J7YoCuZ zT4zQQSSE1gXeCEe/6V8qkueYdPo5yniXr76M7iyyAKKjpqTWoz3jmVH0SwUhRiVQx5B RLyeX/+XBl+kXdQ2mH8FH6q7DomtwTTx02bMIpm6TczGZpoIvszMl0eqyU22deC64ow1 Fxs++0GUAXCE4is/cSwt5gvw2elneArYivz9qfPfznCGSlLded89/0YwRWmBWYVFf+ed g2AWFO+R6qb/3qLVWMv04WRSrUbfKp7GQ29Agoy/FH+v3WXnPkZ7HoGebgukjJjWTVqg rLbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h40si20280228wrh.65.2017.07.28.04.17.42; Fri, 28 Jul 2017 04:17:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E4157689A3D; Fri, 28 Jul 2017 14:17:37 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CD08968831C for ; Fri, 28 Jul 2017 14:17:30 +0300 (EEST) Received: from HHMAIL01.hh.imgtec.org (unknown [10.100.10.19]) by Forcepoint Email with ESMTPS id 407F8BB0F5D27 for ; Fri, 28 Jul 2017 12:17:27 +0100 (IST) Received: from HHMAIL-X.hh.imgtec.org (10.100.10.113) by HHMAIL01.hh.imgtec.org (10.100.10.19) with Microsoft SMTP Server (TLS) id 14.3.294.0; Fri, 28 Jul 2017 12:17:30 +0100 Received: from PUMAIL01.pu.imgtec.org (192.168.91.250) by HHMAIL-X.hh.imgtec.org (10.100.10.113) with Microsoft SMTP Server (TLS) id 14.3.294.0; Fri, 28 Jul 2017 12:17:30 +0100 Received: from PUMAIL01.pu.imgtec.org ([::1]) by PUMAIL01.pu.imgtec.org ([::1]) with mapi id 14.03.0266.001; Fri, 28 Jul 2017 16:47:27 +0530 From: Manojkumar Bhosale To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] libavcodec/mips: Improve avc dequant-idct luma dc msa function Thread-Index: AQHTB31WjDok+twOt0SMClOWfzD9ZKJpFxIw Date: Fri, 28 Jul 2017 11:17:26 +0000 Message-ID: <70293ACCC3BA6A4E81FFCA024C7A86E1E058C153@PUMAIL01.pu.imgtec.org> References: <1501231326-16802-1-git-send-email-kaustubh.raste@imgtec.com> In-Reply-To: <1501231326-16802-1-git-send-email-kaustubh.raste@imgtec.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.91.86] MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] libavcodec/mips: Improve avc dequant-idct luma dc msa function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Kaustubh Raste Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" LGTM -----Original Message----- From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of kaustubh.raste@imgtec.com Sent: Friday, July 28, 2017 2:12 PM To: ffmpeg-devel@ffmpeg.org Cc: Kaustubh Raste Subject: [FFmpeg-devel] [PATCH] libavcodec/mips: Improve avc dequant-idct luma dc msa function From: Kaustubh Raste Signed-off-by: Kaustubh Raste --- libavcodec/mips/h264idct_msa.c | 66 +++++++++++++++++++--------------------- 1 file changed, 32 insertions(+), 34 deletions(-) out3 = __msa_copy_s_h(vec0, 3); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); + out4 = __msa_copy_s_h(vec0, 4); + out5 = __msa_copy_s_h(vec0, 5); + out6 = __msa_copy_s_h(vec0, 6); + out7 = __msa_copy_s_h(vec0, 7); + SH(out0, (dst + 0 * DC_DEST_STRIDE)); + SH(out1, (dst + 2 * DC_DEST_STRIDE)); + SH(out2, (dst + 8 * DC_DEST_STRIDE)); SH(out3, (dst + 10 * DC_DEST_STRIDE)); - dst += DC_DEST_STRIDE; - - out0 = __msa_copy_s_h(vec0, 4); - out1 = __msa_copy_s_h(vec0, 5); - out2 = __msa_copy_s_h(vec0, 6); - out3 = __msa_copy_s_h(vec0, 7); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); - SH(out3, (dst + 10 * DC_DEST_STRIDE)); - dst += (3 * DC_DEST_STRIDE); + SH(out4, (dst + 1 * DC_DEST_STRIDE)); + SH(out5, (dst + 3 * DC_DEST_STRIDE)); + SH(out6, (dst + 9 * DC_DEST_STRIDE)); + SH(out7, (dst + 11 * DC_DEST_STRIDE)); out0 = __msa_copy_s_h(vec1, 0); out1 = __msa_copy_s_h(vec1, 1); out2 = __msa_copy_s_h(vec1, 2); out3 = __msa_copy_s_h(vec1, 3); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); - SH(out3, (dst + 10 * DC_DEST_STRIDE)); - dst += DC_DEST_STRIDE; - - out0 = __msa_copy_s_h(vec1, 4); - out1 = __msa_copy_s_h(vec1, 5); - out2 = __msa_copy_s_h(vec1, 6); - out3 = __msa_copy_s_h(vec1, 7); - SH(out0, dst); - SH(out1, (dst + 2 * DC_DEST_STRIDE)); - SH(out2, (dst + 8 * DC_DEST_STRIDE)); - SH(out3, (dst + 10 * DC_DEST_STRIDE)); + out4 = __msa_copy_s_h(vec1, 4); + out5 = __msa_copy_s_h(vec1, 5); + out6 = __msa_copy_s_h(vec1, 6); + out7 = __msa_copy_s_h(vec1, 7); + SH(out0, (dst + 4 * DC_DEST_STRIDE)); + SH(out1, (dst + 6 * DC_DEST_STRIDE)); + SH(out2, (dst + 12 * DC_DEST_STRIDE)); + SH(out3, (dst + 14 * DC_DEST_STRIDE)); + SH(out4, (dst + 5 * DC_DEST_STRIDE)); + SH(out5, (dst + 7 * DC_DEST_STRIDE)); + SH(out6, (dst + 13 * DC_DEST_STRIDE)); + SH(out7, (dst + 15 * DC_DEST_STRIDE)); #undef DC_DEST_STRIDE } -- 1.7.9.5 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel diff --git a/libavcodec/mips/h264idct_msa.c b/libavcodec/mips/h264idct_msa.c index 81e09e9..861befe 100644 --- a/libavcodec/mips/h264idct_msa.c +++ b/libavcodec/mips/h264idct_msa.c @@ -40,17 +40,20 @@ static void avc_deq_idct_luma_dc_msa(int16_t *dst, int16_t *src, int32_t de_q_val) { #define DC_DEST_STRIDE 16 - int16_t out0, out1, out2, out3; - v8i16 src0, src1, src2, src3; + int16_t out0, out1, out2, out3, out4, out5, out6, out7; + v8i16 src1, src3; v8i16 vec0, vec1, vec2, vec3; + v8i16 tmp0, tmp1, tmp2, tmp3; v8i16 hres0, hres1, hres2, hres3; v8i16 vres0, vres1, vres2, vres3; v4i32 vres0_r, vres1_r, vres2_r, vres3_r; - v4i32 de_q_vec = __msa_fill_w(de_q_val); + const v4i32 de_q_vec = __msa_fill_w(de_q_val); + const v8i16 src0 = LD_SH(src); + const v8i16 src2 = LD_SH(src + 8); - LD4x4_SH(src, src0, src1, src2, src3); - TRANSPOSE4x4_SH_SH(src0, src1, src2, src3, src0, src1, src2, src3); - BUTTERFLY_4(src0, src2, src3, src1, vec0, vec3, vec2, vec1); + ILVL_D2_SH(src0, src0, src2, src2, src1, src3); + TRANSPOSE4x4_SH_SH(src0, src1, src2, src3, tmp0, tmp1, tmp2, tmp3); + BUTTERFLY_4(tmp0, tmp2, tmp3, tmp1, vec0, vec3, vec2, vec1); BUTTERFLY_4(vec0, vec1, vec2, vec3, hres0, hres3, hres2, hres1); TRANSPOSE4x4_SH_SH(hres0, hres1, hres2, hres3, hres0, hres1, hres2, hres3); BUTTERFLY_4(hres0, hres1, hres3, hres2, vec0, vec3, vec2, vec1); @@ -72,40 +75,35 @@ static void avc_deq_idct_luma_dc_msa(int16_t *dst, int16_t *src, out1 = __msa_copy_s_h(vec0, 1); out2 = __msa_copy_s_h(vec0, 2);