From patchwork Fri Sep 15 12:04:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manojkumar Bhosale X-Patchwork-Id: 5150 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.36.26 with SMTP id f26csp559086jaa; Fri, 15 Sep 2017 05:05:11 -0700 (PDT) X-Google-Smtp-Source: ADKCNb6w4109mdJnDbmZZ1adFA2COG+Wss75lRpXQ5L9nsm0gKIeGejkY45TUmSD62/mBN919sRL X-Received: by 10.223.145.163 with SMTP id 32mr21235142wri.279.1505477110954; Fri, 15 Sep 2017 05:05:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505477110; cv=none; d=google.com; s=arc-20160816; b=RDlmLqtsOBI0xpx7NDlrf7uaqp91rL1plJUin6qAEgViwwrqZn2Mi72gy/EoUW2PSo DYnlzE5QVgvzvvlRseYcXRexfHeWg7lv8taVrN8ILGxT5aiQ0Lalayr4R40kRf1dbdz6 zJIM/Ru06DtP206AfBqkqLrPDThTKwmaaXnd7K3hWMxt5lzbCL+ZYZGSVZgId173mkai Slfq9tp0jjPzHg7z3Ygws4QX1oyBYHY+ctyB3akTO4G6ImezEOWwtWrYbfaVjwNP1bDH KnV02IfYvqR7syn+FW+K/hA6z88IyDXwVLEr5+vjzL053s4tqJ5qXu5h8Unmc/Oj7KDS Kpkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:to:from:delivered-to:arc-authentication-results; bh=ibe4RSgnbVopAZr1hqL/mteJHInuWwp2RC8ENPSpwww=; b=QEfLxNKxLip7mEQPuQl+MxDTKTD3QH+UI3I7Hfd+H7Z39xC0m8Zj07nS5TijkAKFV3 JoegfbHv+vkPGKRVWv96FM6OCyUf5KL/kAtZUfamqpmZKMyc6OotCJeO7XSYVb4PjtP/ tw/DqRHmfYtWBAJ98huw5BZfmhPqsdDrYWUVV6lHT5f/GgF5vMZIouqtViuOJqw/AEkf dKxmsypBa/FSFy89RKgsWd9mCFdGZes8BxFmrHVw7f90mn2jnjsyocynurScq/nOkytA 0fTKgeplTnpKIZTK34QqzLZSFEHNLF+tWYkYKUs68GWeivMJoovTSh8KrK028Qkor7y8 IhDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2si822740wma.28.2017.09.15.05.05.10; Fri, 15 Sep 2017 05:05:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 706DF689F5D; Fri, 15 Sep 2017 15:05:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D94D5689F5C for ; Fri, 15 Sep 2017 15:04:55 +0300 (EEST) Received: from hhmail02.hh.imgtec.org (unknown [10.100.10.20]) by Forcepoint Email with ESMTPS id D15DAEAB07D11 for ; Fri, 15 Sep 2017 13:04:58 +0100 (IST) Received: from HHMAIL-X.hh.imgtec.org (10.100.10.113) by hhmail02.hh.imgtec.org (10.100.10.20) with Microsoft SMTP Server (TLS) id 14.3.294.0; Fri, 15 Sep 2017 13:05:01 +0100 Received: from PUMAIL01.pu.imgtec.org (192.168.91.250) by HHMAIL-X.hh.imgtec.org (10.100.10.113) with Microsoft SMTP Server (TLS) id 14.3.294.0; Fri, 15 Sep 2017 13:05:01 +0100 Received: from PUMAIL01.pu.imgtec.org ([::1]) by PUMAIL01.pu.imgtec.org ([::1]) with mapi id 14.03.0266.001; Fri, 15 Sep 2017 17:34:59 +0530 From: Manojkumar Bhosale To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] avcodec/mips: Improve avc mc copy msa functions Thread-Index: AQHTLemXHJ0+k5rd506tdB+jMNZdXqK12dZg Date: Fri, 15 Sep 2017 12:04:58 +0000 Message-ID: <70293ACCC3BA6A4E81FFCA024C7A86E1E0591B27@PUMAIL01.pu.imgtec.org> References: <1505455981-27923-1-git-send-email-kaustubh.raste@imgtec.com> In-Reply-To: <1505455981-27923-1-git-send-email-kaustubh.raste@imgtec.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.91.86] MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] avcodec/mips: Improve avc mc copy msa functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Kaustubh Raste Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" LGTM -----Original Message----- From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of kaustubh.raste@imgtec.com Sent: Friday, September 15, 2017 11:43 AM To: ffmpeg-devel@ffmpeg.org Cc: Kaustubh Raste Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Improve avc mc copy msa functions From: Kaustubh Raste Remove loops and unroll as block sizes are known. Signed-off-by: Kaustubh Raste --- libavcodec/mips/h264qpel_msa.c | 81 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 75 insertions(+), 6 deletions(-) http://ffmpeg.org/mailman/listinfo/ffmpeg-devel diff --git a/libavcodec/mips/h264qpel_msa.c b/libavcodec/mips/h264qpel_msa.c index 43d21f7..05dffea 100644 --- a/libavcodec/mips/h264qpel_msa.c +++ b/libavcodec/mips/h264qpel_msa.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2015 Parag Salasakar (Parag.Salasakar@imgtec.com) + * Copyright (c) 2015 -2017 Parag Salasakar + (Parag.Salasakar@imgtec.com) * * This file is part of FFmpeg. * @@ -2966,31 +2966,100 @@ static void avg_width16_msa(const uint8_t *src, int32_t src_stride, void ff_put_h264_qpel16_mc00_msa(uint8_t *dst, const uint8_t *src, ptrdiff_t stride) { - copy_width16_msa(src, stride, dst, stride, 16); + v16u8 src0, src1, src2, src3, src4, src5, src6, src7; + v16u8 src8, src9, src10, src11, src12, src13, src14, src15; + + LD_UB8(src, stride, src0, src1, src2, src3, src4, src5, src6, src7); + src += (8 * stride); + LD_UB8(src, stride, src8, src9, src10, src11, src12, src13, src14, + src15); + + ST_UB8(src0, src1, src2, src3, src4, src5, src6, src7, dst, stride); + dst += (8 * stride); + ST_UB8(src8, src9, src10, src11, src12, src13, src14, src15, dst, + stride); } void ff_put_h264_qpel8_mc00_msa(uint8_t *dst, const uint8_t *src, ptrdiff_t stride) { - copy_width8_msa(src, stride, dst, stride, 8); + uint64_t src0, src1, src2, src3, src4, src5, src6, src7; + + LD4(src, stride, src0, src1, src2, src3); + src += 4 * stride; + LD4(src, stride, src4, src5, src6, src7); + SD4(src0, src1, src2, src3, dst, stride); + dst += 4 * stride; + SD4(src4, src5, src6, src7, dst, stride); } void ff_avg_h264_qpel16_mc00_msa(uint8_t *dst, const uint8_t *src, ptrdiff_t stride) { - avg_width16_msa(src, stride, dst, stride, 16); + v16u8 src0, src1, src2, src3, src4, src5, src6, src7; + v16u8 dst0, dst1, dst2, dst3, dst4, dst5, dst6, dst7; + + LD_UB8(src, stride, src0, src1, src2, src3, src4, src5, src6, src7); + src += (8 * stride); + LD_UB8(dst, stride, dst0, dst1, dst2, dst3, dst4, dst5, dst6, + dst7); + + AVER_UB4_UB(src0, dst0, src1, dst1, src2, dst2, src3, dst3, dst0, dst1, + dst2, dst3); + AVER_UB4_UB(src4, dst4, src5, dst5, src6, dst6, src7, dst7, dst4, dst5, + dst6, dst7); + ST_UB8(dst0, dst1, dst2, dst3, dst4, dst5, dst6, dst7, dst, stride); + dst += (8 * stride); + + LD_UB8(src, stride, src0, src1, src2, src3, src4, src5, src6, src7); + LD_UB8(dst, stride, dst0, dst1, dst2, dst3, dst4, dst5, dst6, + dst7); + + AVER_UB4_UB(src0, dst0, src1, dst1, src2, dst2, src3, dst3, dst0, dst1, + dst2, dst3); + AVER_UB4_UB(src4, dst4, src5, dst5, src6, dst6, src7, dst7, dst4, dst5, + dst6, dst7); + ST_UB8(dst0, dst1, dst2, dst3, dst4, dst5, dst6, dst7, dst, + stride); } void ff_avg_h264_qpel8_mc00_msa(uint8_t *dst, const uint8_t *src, ptrdiff_t stride) { - avg_width8_msa(src, stride, dst, stride, 8); + uint64_t tp0, tp1, tp2, tp3, tp4, tp5, tp6, tp7; + v16u8 src0 = { 0 }, src1 = { 0 }, src2 = { 0 }, src3 = { 0 }; + v16u8 dst0 = { 0 }, dst1 = { 0 }, dst2 = { 0 }, dst3 = { 0 }; + + LD4(src, stride, tp0, tp1, tp2, tp3); + src += 4 * stride; + LD4(src, stride, tp4, tp5, tp6, tp7); + INSERT_D2_UB(tp0, tp1, src0); + INSERT_D2_UB(tp2, tp3, src1); + INSERT_D2_UB(tp4, tp5, src2); + INSERT_D2_UB(tp6, tp7, src3); + + LD4(dst, stride, tp0, tp1, tp2, tp3); + LD4(dst + 4 * stride, stride, tp4, tp5, tp6, tp7); + INSERT_D2_UB(tp0, tp1, dst0); + INSERT_D2_UB(tp2, tp3, dst1); + INSERT_D2_UB(tp4, tp5, dst2); + INSERT_D2_UB(tp6, tp7, dst3); + + AVER_UB4_UB(src0, dst0, src1, dst1, src2, dst2, src3, dst3, dst0, dst1, + dst2, dst3); + + ST8x8_UB(dst0, dst1, dst2, dst3, dst, stride); } void ff_avg_h264_qpel4_mc00_msa(uint8_t *dst, const uint8_t *src, ptrdiff_t stride) { - avg_width4_msa(src, stride, dst, stride, 4); + uint32_t tp0, tp1, tp2, tp3; + v16u8 src0 = { 0 }, dst0 = { 0 }; + + LW4(src, stride, tp0, tp1, tp2, tp3); + INSERT_W4_UB(tp0, tp1, tp2, tp3, src0); + LW4(dst, stride, tp0, tp1, tp2, tp3); + INSERT_W4_UB(tp0, tp1, tp2, tp3, dst0); + + dst0 = __msa_aver_u_b(src0, dst0); + + ST4x4_UB(dst0, dst0, 0, 1, 2, 3, dst, stride); } void ff_put_h264_qpel16_mc10_msa(uint8_t *dst, const uint8_t *src, -- 1.7.9.5 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org