From patchwork Tue Sep 26 07:48:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manojkumar Bhosale X-Patchwork-Id: 5286 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.36.26 with SMTP id f26csp3528458jaa; Tue, 26 Sep 2017 00:49:06 -0700 (PDT) X-Google-Smtp-Source: AOwi7QB3wPnkdlqrG2Smgm7fGRQKX9hriRErlVZhk0OnjErwGzd7awlSR7FEH+YcB1fgt1IJlKcf X-Received: by 10.223.165.73 with SMTP id j9mr8929879wrb.62.1506412146098; Tue, 26 Sep 2017 00:49:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1506412146; cv=none; d=google.com; s=arc-20160816; b=wzxEiJHY7Nj/jEDUPqhFcTpdIVvQgw1KOunGP6qBhfyD4bapLIuGrPtY/QD4DFS4Hb 5gXdrxWF5fLz8/7EbcHuFlyMPEcJeuQCcdZfoLSY2/9usMeICztYlaj8r3AAwGYeyZan q9rKB4p8nzqybDO4/5pfJ3wBjp6c+UHzNhVZmIYe7mJvDD0czbLSAIAV6zenvzsyOMPf Cvd0ObfUoEZS/0tsTuiVH/kx7d97v5Zgv7vfewBBPQ3vkYxaTnEYp7Faf8zgff7r2YUw BPXSr6GpmnkX0HmYg1PvA69blI3RuDYCm5MhdWFGd9okTgofM0+uGGd2I0Lf9cvPel1c JPIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:to:from:delivered-to:arc-authentication-results; bh=ID2/49oU5qbQNK/2LbUpCSSAnoMbk5cW8gDiYa470ws=; b=QYdGn5EUSYe/AwC3FAMzdsmKEwWdg0su1DPf4p/0EdRz4Kgna4xJghXmBByABeJooG CU0Dd2I8WSXX4zGTP+1qRTKI5oaHLUwD4z1hs0pXdnA49srqFgUHmqZ0Z4PGKPlsK4Jm EyAA71J1AVUyuPDDQpyiz4NMmVA9D5GvvSYd099ngZnxbnaeITanOXGQIZ2N7A76PAQu tXt/wtsTBnuuq3xo6xJtaSEMSUk2u/+KvFAwUHLC8OFcZR9UL30GWqBFKj59yAqVFhzE F2VmUL/WlRnTTuC0dS6brAMsYAWOu3Ie4jZ7vsfSeid4UIlMprPvXmP1hGOKP8/3FCzB cZcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g6si1064362wmf.54.2017.09.26.00.49.05; Tue, 26 Sep 2017 00:49:06 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5801068092E; Tue, 26 Sep 2017 10:48:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A2270680384 for ; Tue, 26 Sep 2017 10:48:45 +0300 (EEST) Received: from HHMAIL01.hh.imgtec.org (unknown [10.100.10.19]) by Forcepoint Email with ESMTPS id F1C57CB2CFF6E for ; Tue, 26 Sep 2017 08:48:52 +0100 (IST) Received: from PUMAIL01.pu.imgtec.org (192.168.91.250) by HHMAIL01.hh.imgtec.org (10.100.10.19) with Microsoft SMTP Server (TLS) id 14.3.361.1; Tue, 26 Sep 2017 08:48:55 +0100 Received: from PUMAIL01.pu.imgtec.org ([::1]) by PUMAIL01.pu.imgtec.org ([::1]) with mapi id 14.03.0266.001; Tue, 26 Sep 2017 13:18:52 +0530 From: Manojkumar Bhosale To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] avcodec/mips: Removed generic function call in avc intra msa functions Thread-Index: AQHTNoWZYK3DKteo8UeCpKZmcjSdu6LGyrmA Date: Tue, 26 Sep 2017 07:48:51 +0000 Message-ID: <70293ACCC3BA6A4E81FFCA024C7A86E1E0592A2A@PUMAIL01.pu.imgtec.org> References: <1506402614-650-1-git-send-email-kaustubh.raste@imgtec.com> In-Reply-To: <1506402614-650-1-git-send-email-kaustubh.raste@imgtec.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.91.86] MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] avcodec/mips: Removed generic function call in avc intra msa functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Kaustubh Raste Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" LGTM -----Original Message----- From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of kaustubh.raste@imgtec.com Sent: Tuesday, September 26, 2017 10:40 AM To: ffmpeg-devel@ffmpeg.org Cc: Kaustubh Raste Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Removed generic function call in avc intra msa functions From: Kaustubh Raste Signed-off-by: Kaustubh Raste --- libavcodec/mips/h264pred_msa.c | 215 +++++++++++++++++----------------------- 1 file changed, 92 insertions(+), 123 deletions(-) void ff_h264_intra_pred_dc_left_16x16_msa(uint8_t *src, ptrdiff_t stride) { - uint8_t *src_top = src - stride; uint8_t *src_left = src - 1; uint8_t *dst = src; - - intra_predict_dc_16x16_msa(src_top, src_left, stride, dst, stride, 0, 1); + uint32_t addition; + v16u8 out; + + addition = src_left[ 0 * stride]; + addition += src_left[ 1 * stride]; + addition += src_left[ 2 * stride]; + addition += src_left[ 3 * stride]; + addition += src_left[ 4 * stride]; + addition += src_left[ 5 * stride]; + addition += src_left[ 6 * stride]; + addition += src_left[ 7 * stride]; + addition += src_left[ 8 * stride]; + addition += src_left[ 9 * stride]; + addition += src_left[10 * stride]; + addition += src_left[11 * stride]; + addition += src_left[12 * stride]; + addition += src_left[13 * stride]; + addition += src_left[14 * stride]; + addition += src_left[15 * stride]; + + addition = (addition + 8) >> 4; + out = (v16u8) __msa_fill_b(addition); + + ST_UB8(out, out, out, out, out, out, out, out, dst, stride); + dst += (8 * stride); + ST_UB8(out, out, out, out, out, out, out, out, dst, stride); } void ff_h264_intra_pred_dc_top_16x16_msa(uint8_t *src, ptrdiff_t stride) { uint8_t *src_top = src - stride; - uint8_t *src_left = src - 1; uint8_t *dst = src; + v16u8 src_above, out; + v8u16 sum_above; + v4u32 sum_top; + v2u64 sum; + + src_above = LD_UB(src_top); - intra_predict_dc_16x16_msa(src_top, src_left, stride, dst, stride, 1, 0); + sum_above = __msa_hadd_u_h(src_above, src_above); + sum_top = __msa_hadd_u_w(sum_above, sum_above); + sum = __msa_hadd_u_d(sum_top, sum_top); + sum_top = (v4u32) __msa_pckev_w((v4i32) sum, (v4i32) sum); + sum = __msa_hadd_u_d(sum_top, sum_top); + sum = (v2u64) __msa_srari_d((v2i64) sum, 4); + out = (v16u8) __msa_splati_b((v16i8) sum, 0); + + ST_UB8(out, out, out, out, out, out, out, out, dst, stride); + dst += (8 * stride); + ST_UB8(out, out, out, out, out, out, out, out, dst, stride); } void ff_h264_intra_pred_dc_128_8x8_msa(uint8_t *src, ptrdiff_t stride) { - uint8_t *src_top = src - stride; - uint8_t *src_left = src - 1; - uint8_t *dst = src; + uint64_t out; + v16u8 store; + + store = (v16u8) __msa_fill_b(128); + out = __msa_copy_u_d((v2i64) store, 0); - intra_predict_dc_8x8_msa(src_top, src_left, stride, dst, stride, 0, 0); + SD4(out, out, out, out, src, stride); + src += (4 * stride); + SD4(out, out, out, out, src, stride); } void ff_h264_intra_pred_dc_128_16x16_msa(uint8_t *src, ptrdiff_t stride) { - uint8_t *src_top = src - stride; - uint8_t *src_left = src - 1; - uint8_t *dst = src; + v16u8 out; + + out = (v16u8) __msa_fill_b(128); - intra_predict_dc_16x16_msa(src_top, src_left, stride, dst, stride, 0, 0); + ST_UB8(out, out, out, out, out, out, out, out, src, stride); + src += (8 * stride); + ST_UB8(out, out, out, out, out, out, out, out, src, stride); } void ff_vp8_pred8x8_127_dc_8_msa(uint8_t *src, ptrdiff_t stride) -- 1.7.9.5 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel diff --git a/libavcodec/mips/h264pred_msa.c b/libavcodec/mips/h264pred_msa.c index c297aec..b9990c1 100644 --- a/libavcodec/mips/h264pred_msa.c +++ b/libavcodec/mips/h264pred_msa.c @@ -106,115 +106,6 @@ static void intra_predict_horiz_16x16_msa(uint8_t *src, int32_t src_stride, dst, dst_stride); } -static void intra_predict_dc_8x8_msa(uint8_t *src_top, uint8_t *src_left, - int32_t src_stride_left, - uint8_t *dst, int32_t dst_stride, - uint8_t is_above, uint8_t is_left) -{ - uint32_t row; - uint32_t out, addition = 0; - v16u8 src_above, store; - v8u16 sum_above; - v4u32 sum_top; - v2u64 sum; - - if (is_left && is_above) { - src_above = LD_UB(src_top); - - sum_above = __msa_hadd_u_h(src_above, src_above); - sum_top = __msa_hadd_u_w(sum_above, sum_above); - sum = __msa_hadd_u_d(sum_top, sum_top); - addition = __msa_copy_u_w((v4i32) sum, 0); - - for (row = 0; row < 8; row++) { - addition += src_left[row * src_stride_left]; - } - - addition = (addition + 8) >> 4; - store = (v16u8) __msa_fill_b(addition); - } else if (is_left) { - for (row = 0; row < 8; row++) { - addition += src_left[row * src_stride_left]; - } - - addition = (addition + 4) >> 3; - store = (v16u8) __msa_fill_b(addition); - } else if (is_above) { - src_above = LD_UB(src_top); - - sum_above = __msa_hadd_u_h(src_above, src_above); - sum_top = __msa_hadd_u_w(sum_above, sum_above); - sum = __msa_hadd_u_d(sum_top, sum_top); - sum = (v2u64) __msa_srari_d((v2i64) sum, 3); - store = (v16u8) __msa_splati_b((v16i8) sum, 0); - } else { - store = (v16u8) __msa_ldi_b(128); - } - - out = __msa_copy_u_w((v4i32) store, 0); - - for (row = 8; row--;) { - SW(out, dst); - SW(out, (dst + 4)); - dst += dst_stride; - } -} - -static void intra_predict_dc_16x16_msa(uint8_t *src_top, uint8_t *src_left, - int32_t src_stride_left, - uint8_t *dst, int32_t dst_stride, - uint8_t is_above, uint8_t is_left) -{ - uint32_t row; - uint32_t addition = 0; - v16u8 src_above, store; - v8u16 sum_above; - v4u32 sum_top; - v2u64 sum; - - if (is_left && is_above) { - src_above = LD_UB(src_top); - - sum_above = __msa_hadd_u_h(src_above, src_above); - sum_top = __msa_hadd_u_w(sum_above, sum_above); - sum = __msa_hadd_u_d(sum_top, sum_top); - sum_top = (v4u32) __msa_pckev_w((v4i32) sum, (v4i32) sum); - sum = __msa_hadd_u_d(sum_top, sum_top); - addition = __msa_copy_u_w((v4i32) sum, 0); - - for (row = 0; row < 16; row++) { - addition += src_left[row * src_stride_left]; - } - - addition = (addition + 16) >> 5; - store = (v16u8) __msa_fill_b(addition); - } else if (is_left) { - for (row = 0; row < 16; row++) { - addition += src_left[row * src_stride_left]; - } - - addition = (addition + 8) >> 4; - store = (v16u8) __msa_fill_b(addition); - } else if (is_above) { - src_above = LD_UB(src_top); - - sum_above = __msa_hadd_u_h(src_above, src_above); - sum_top = __msa_hadd_u_w(sum_above, sum_above); - sum = __msa_hadd_u_d(sum_top, sum_top); - sum_top = (v4u32) __msa_pckev_w((v4i32) sum, (v4i32) sum); - sum = __msa_hadd_u_d(sum_top, sum_top); - sum = (v2u64) __msa_srari_d((v2i64) sum, 4); - store = (v16u8) __msa_splati_b((v16i8) sum, 0); - } else { - store = (v16u8) __msa_ldi_b(128); - } - - for (row = 16; row--;) { - ST_UB(store, dst); - dst += dst_stride; - } -} - #define INTRA_PREDICT_VALDC_8X8_MSA(val) \ static void intra_predict_##val##dc_8x8_msa(uint8_t *dst, int32_t dst_stride) \ { \ @@ -646,8 +537,42 @@ void ff_h264_intra_pred_dc_16x16_msa(uint8_t *src, ptrdiff_t stride) uint8_t *src_top = src - stride; uint8_t *src_left = src - 1; uint8_t *dst = src; + uint32_t addition = 0; + v16u8 src_above, out; + v8u16 sum_above; + v4u32 sum_top; + v2u64 sum; - intra_predict_dc_16x16_msa(src_top, src_left, stride, dst, stride, 1, 1); + src_above = LD_UB(src_top); + + sum_above = __msa_hadd_u_h(src_above, src_above); + sum_top = __msa_hadd_u_w(sum_above, sum_above); + sum = __msa_hadd_u_d(sum_top, sum_top); + sum_top = (v4u32) __msa_pckev_w((v4i32) sum, (v4i32) sum); + sum = __msa_hadd_u_d(sum_top, sum_top); + addition = __msa_copy_u_w((v4i32) sum, 0); + addition += src_left[ 0 * stride]; + addition += src_left[ 1 * stride]; + addition += src_left[ 2 * stride]; + addition += src_left[ 3 * stride]; + addition += src_left[ 4 * stride]; + addition += src_left[ 5 * stride]; + addition += src_left[ 6 * stride]; + addition += src_left[ 7 * stride]; + addition += src_left[ 8 * stride]; + addition += src_left[ 9 * stride]; + addition += src_left[10 * stride]; + addition += src_left[11 * stride]; + addition += src_left[12 * stride]; + addition += src_left[13 * stride]; + addition += src_left[14 * stride]; + addition += src_left[15 * stride]; + addition = (addition + 16) >> 5; + out = (v16u8) __msa_fill_b(addition); + + ST_UB8(out, out, out, out, out, out, out, out, dst, stride); + dst += (8 * stride); + ST_UB8(out, out, out, out, out, out, out, out, dst, stride); } void ff_h264_intra_pred_vert_16x16_msa(uint8_t *src, ptrdiff_t stride) @@ -666,38 +591,82 @@ void ff_h264_intra_pred_horiz_16x16_msa(uint8_t *src, ptrdiff_t stride)