From patchwork Thu Sep 7 09:00:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manojkumar Bhosale X-Patchwork-Id: 5028 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.15.201 with SMTP id 70csp1673593jao; Thu, 7 Sep 2017 02:00:39 -0700 (PDT) X-Google-Smtp-Source: ADKCNb4LwQAA26Xu+1gM3DxfXeLPqgdvWVPdwz1/LRlbe1po4WkkQd8Gdw3OCS3M1UnTtHdyouun X-Received: by 10.223.130.79 with SMTP id 73mr1319944wrb.241.1504774839785; Thu, 07 Sep 2017 02:00:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1504774839; cv=none; d=google.com; s=arc-20160816; b=Mb1f09oOMRcs9LxD2QgX57xGp3SrNgZaySmidssS5iRwTMEymfWUMQQ675w6l+1frq Cd9S8yusGsXJzV8ko8hz7Uk2AWygjcRGn3G6qC7L0KiVM/xb7SG7qHC+oscPEAjHCQ5J IIL8R69Vr+AuCSyrbC6b3MlnAMwzf/X28tVvxQx+j/V6G0wrt+1nv3U1+XSMd5hv/h/A prD++C8+gVtxE+fNazsDHazaiwjsJZPHTKrHZr7wR+b33RAu5Hs1Qx08R+WovAEG1GVU /ui6rm5cwJYvJIQZkA0OZ0TxCS6+rd+MSnRG+AbKIkEMzyifaXvKfsA/zfTA17bqE+Ck nK9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:to:from:delivered-to:arc-authentication-results; bh=PwYbazibOt+56Xnx9i2bYPhzbOXlGTQHip6QrO9kk+k=; b=YTHxMNBm9rV2vUkE2pet2jyh43DCsXfMekENewfftl4j9khgrE5Sum4/k1tv7NCmDh ng5UqgOzTMCtyjUMSCswtIMlAdAC5QHkvNrNE8rCGL3jZGs1M9M4Ea351s7TBTLVWPIe wwCkzG/r/PzJUMy3dbh0jnusxhdxpq3f1o03wzBCDwqITwieNKE+3l9WqfVMTH744Jq/ CndOw84NI4I5fJx3jt1Uo9AJR+Q09bHjiIWAVmGM3zMSqBLl0jvlNYgNwi8PX9hsPv2Q gtlTCX4rp9hT4rzXuJwOsGogRbipyHRSvFtbSGisZ00WW9HNppyA3jtQiNed48wlkY1P xkpw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n50si1707663wrb.411.2017.09.07.02.00.39; Thu, 07 Sep 2017 02:00:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 63C56689E03; Thu, 7 Sep 2017 12:00:05 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2E90C689C1F for ; Thu, 7 Sep 2017 12:00:04 +0300 (EEST) Received: from hhmail02.hh.imgtec.org (unknown [10.100.10.20]) by Forcepoint Email with ESMTPS id 085B9C3187971 for ; Thu, 7 Sep 2017 10:00:02 +0100 (IST) Received: from PUMAIL01.pu.imgtec.org (192.168.91.250) by hhmail02.hh.imgtec.org (10.100.10.20) with Microsoft SMTP Server (TLS) id 14.3.294.0; Thu, 7 Sep 2017 10:00:04 +0100 Received: from PUMAIL01.pu.imgtec.org ([::1]) by PUMAIL01.pu.imgtec.org ([::1]) with mapi id 14.03.0266.001; Thu, 7 Sep 2017 14:30:02 +0530 From: Manojkumar Bhosale To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa functions Thread-Index: AQHTJXmw84YHIqPWcEafryGB1xB0RaKpJFOw Date: Thu, 7 Sep 2017 09:00:01 +0000 Message-ID: <70293ACCC3BA6A4E81FFCA024C7A86E1E058FEE1@PUMAIL01.pu.imgtec.org> References: <1504528292-9634-1-git-send-email-kaustubh.raste@imgtec.com> In-Reply-To: <1504528292-9634-1-git-send-email-kaustubh.raste@imgtec.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.91.86] MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Kaustubh Raste Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" LGTM -----Original Message----- From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of kaustubh.raste@imgtec.com Sent: Monday, September 4, 2017 6:02 PM To: ffmpeg-devel@ffmpeg.org Cc: Kaustubh Raste Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa functions From: Kaustubh Raste Updated VP9_LPF_FILTER4_4W macro to process on 8 bit data. Replaced VP9_LPF_FILTER4_8W with VP9_LPF_FILTER4_4W. Signed-off-by: Kaustubh Raste --- libavcodec/mips/vp9_lpf_msa.c | 94 ++++++----------------------------------- 1 file changed, 14 insertions(+), 80 deletions(-) LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, + q0_out, q1_out); p1_d = __msa_copy_u_d((v2i64) p1_out, 0); @@ -342,7 +276,7 @@ void ff_loop_filter_v_8_8_msa(uint8_t *src, ptrdiff_t pitch, LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, + q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1065,7 +999,7 @@ void ff_loop_filter_v_16_8_msa(uint8_t *src, ptrdiff_t pitch, LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, + q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1280,7 +1214,7 @@ void ff_loop_filter_h_4_8_msa(uint8_t *src, ptrdiff_t pitch, p3, p2, p1, p0, q0, q1, q2, q3); LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1, p0, q0, q1); + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1, p0, q0, q1); ILVR_B2_SH(p0, p1, q1, q0, vec0, vec1); ILVRL_H2_SH(vec1, vec0, vec2, vec3); @@ -1367,7 +1301,7 @@ void ff_loop_filter_h_8_8_msa(uint8_t *src, ptrdiff_t pitch, /* flat4 */ VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); /* filter4 */ - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, + q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1868,7 +1802,7 @@ static int32_t vp9_vt_lpf_t4_and_t8_8w(uint8_t *src, uint8_t *filter48, /* flat4 */ VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); /* filter4 */ - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, + q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); -- 1.7.9.5 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel diff --git a/libavcodec/mips/vp9_lpf_msa.c b/libavcodec/mips/vp9_lpf_msa.c index eef8afc..c82a9e9 100644 --- a/libavcodec/mips/vp9_lpf_msa.c +++ b/libavcodec/mips/vp9_lpf_msa.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2015 Shivraj Patil (Shivraj.Patil@imgtec.com) + * Copyright (c) 2015 - 2017 Shivraj Patil (Shivraj.Patil@imgtec.com) * * This file is part of FFmpeg. * @@ -22,63 +22,12 @@ #include "libavutil/mips/generic_macros_msa.h" #include "vp9dsp_mips.h" -#define VP9_LPF_FILTER4_8W(p1_in, p0_in, q0_in, q1_in, mask_in, hev_in, \ - p1_out, p0_out, q0_out, q1_out) \ -{ \ - v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt_sign; \ - v16i8 filt, filt1, filt2, cnst4b, cnst3b; \ - v8i16 q0_sub_p0_r, filt_r, cnst3h; \ - \ - p1_m = (v16i8) __msa_xori_b(p1_in, 0x80); \ - p0_m = (v16i8) __msa_xori_b(p0_in, 0x80); \ - q0_m = (v16i8) __msa_xori_b(q0_in, 0x80); \ - q1_m = (v16i8) __msa_xori_b(q1_in, 0x80); \ - \ - filt = __msa_subs_s_b(p1_m, q1_m); \ - filt = filt & (v16i8) hev_in; \ - q0_sub_p0 = q0_m - p0_m; \ - filt_sign = __msa_clti_s_b(filt, 0); \ - \ - cnst3h = __msa_ldi_h(3); \ - q0_sub_p0_r = (v8i16) __msa_ilvr_b(q0_sub_p0, q0_sub_p0); \ - q0_sub_p0_r = __msa_dotp_s_h((v16i8) q0_sub_p0_r, (v16i8) cnst3h); \ - filt_r = (v8i16) __msa_ilvr_b(filt_sign, filt); \ - filt_r += q0_sub_p0_r; \ - filt_r = __msa_sat_s_h(filt_r, 7); \ - \ - /* combine left and right part */ \ - filt = __msa_pckev_b((v16i8) filt_r, (v16i8) filt_r); \ - \ - filt = filt & (v16i8) mask_in; \ - cnst4b = __msa_ldi_b(4); \ - filt1 = __msa_adds_s_b(filt, cnst4b); \ - filt1 >>= 3; \ - \ - cnst3b = __msa_ldi_b(3); \ - filt2 = __msa_adds_s_b(filt, cnst3b); \ - filt2 >>= 3; \ - \ - q0_m = __msa_subs_s_b(q0_m, filt1); \ - q0_out = __msa_xori_b((v16u8) q0_m, 0x80); \ - p0_m = __msa_adds_s_b(p0_m, filt2); \ - p0_out = __msa_xori_b((v16u8) p0_m, 0x80); \ - \ - filt = __msa_srari_b(filt1, 1); \ - hev_in = __msa_xori_b((v16u8) hev_in, 0xff); \ - filt = filt & (v16i8) hev_in; \ - \ - q1_m = __msa_subs_s_b(q1_m, filt); \ - q1_out = __msa_xori_b((v16u8) q1_m, 0x80); \ - p1_m = __msa_adds_s_b(p1_m, filt); \ - p1_out = __msa_xori_b((v16u8) p1_m, 0x80); \ -} - #define VP9_LPF_FILTER4_4W(p1_in, p0_in, q0_in, q1_in, mask_in, hev_in, \ p1_out, p0_out, q0_out, q1_out) \ { \ - v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt_sign; \ - v16i8 filt, filt1, filt2, cnst4b, cnst3b; \ - v8i16 q0_sub_p0_r, q0_sub_p0_l, filt_l, filt_r, cnst3h; \ + v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt, filt1, filt2; \ + const v16i8 cnst4b = __msa_ldi_b(4); \ + const v16i8 cnst3b = __msa_ldi_b(3); \ \ p1_m = (v16i8) __msa_xori_b(p1_in, 0x80); \ p0_m = (v16i8) __msa_xori_b(p0_in, 0x80); \ @@ -89,30 +38,15 @@ \ filt = filt & (v16i8) hev_in; \ \ - q0_sub_p0 = q0_m - p0_m; \ - filt_sign = __msa_clti_s_b(filt, 0); \ - \ - cnst3h = __msa_ldi_h(3); \ - q0_sub_p0_r = (v8i16) __msa_ilvr_b(q0_sub_p0, q0_sub_p0); \ - q0_sub_p0_r = __msa_dotp_s_h((v16i8) q0_sub_p0_r, (v16i8) cnst3h); \ - filt_r = (v8i16) __msa_ilvr_b(filt_sign, filt); \ - filt_r += q0_sub_p0_r; \ - filt_r = __msa_sat_s_h(filt_r, 7); \ - \ - q0_sub_p0_l = (v8i16) __msa_ilvl_b(q0_sub_p0, q0_sub_p0); \ - q0_sub_p0_l = __msa_dotp_s_h((v16i8) q0_sub_p0_l, (v16i8) cnst3h); \ - filt_l = (v8i16) __msa_ilvl_b(filt_sign, filt); \ - filt_l += q0_sub_p0_l; \ - filt_l = __msa_sat_s_h(filt_l, 7); \ - \ - filt = __msa_pckev_b((v16i8) filt_l, (v16i8) filt_r); \ + q0_sub_p0 = __msa_subs_s_b(q0_m, p0_m); \ + filt = __msa_adds_s_b(filt, q0_sub_p0); \ + filt = __msa_adds_s_b(filt, q0_sub_p0); \ + filt = __msa_adds_s_b(filt, q0_sub_p0); \ filt = filt & (v16i8) mask_in; \ \ - cnst4b = __msa_ldi_b(4); \ filt1 = __msa_adds_s_b(filt, cnst4b); \ filt1 >>= 3; \ \ - cnst3b = __msa_ldi_b(3); \ filt2 = __msa_adds_s_b(filt, cnst3b); \ filt2 >>= 3; \ \ @@ -277,7 +211,7 @@ void ff_loop_filter_v_4_8_msa(uint8_t *src, ptrdiff_t pitch,