From patchwork Thu Sep  7 09:00:01 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
X-Patchwork-Id: 5028
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.2.15.201 with SMTP id 70csp1673593jao;
	Thu, 7 Sep 2017 02:00:39 -0700 (PDT)
X-Google-Smtp-Source: 
 ADKCNb4LwQAA26Xu+1gM3DxfXeLPqgdvWVPdwz1/LRlbe1po4WkkQd8Gdw3OCS3M1UnTtHdyouun
X-Received: by 10.223.130.79 with SMTP id 73mr1319944wrb.241.1504774839785;
	Thu, 07 Sep 2017 02:00:39 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1504774839; cv=none;
	d=google.com; s=arc-20160816;
	b=Mb1f09oOMRcs9LxD2QgX57xGp3SrNgZaySmidssS5iRwTMEymfWUMQQ675w6l+1frq
	Cd9S8yusGsXJzV8ko8hz7Uk2AWygjcRGn3G6qC7L0KiVM/xb7SG7qHC+oscPEAjHCQ5J
	IIL8R69Vr+AuCSyrbC6b3MlnAMwzf/X28tVvxQx+j/V6G0wrt+1nv3U1+XSMd5hv/h/A
	prD++C8+gVtxE+fNazsDHazaiwjsJZPHTKrHZr7wR+b33RAu5Hs1Qx08R+WovAEG1GVU
	/ui6rm5cwJYvJIQZkA0OZ0TxCS6+rd+MSnRG+AbKIkEMzyifaXvKfsA/zfTA17bqE+Ck
	nK9g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=arc-20160816;
	h=sender:errors-to:content-transfer-encoding:cc:reply-to
	:list-subscribe:list-help:list-post:list-archive:list-unsubscribe
	:list-id:precedence:subject:mime-version:content-language
	:accept-language:in-reply-to:references:message-id:date:thread-index
	:thread-topic:to:from:delivered-to:arc-authentication-results;
	bh=PwYbazibOt+56Xnx9i2bYPhzbOXlGTQHip6QrO9kk+k=;
	b=YTHxMNBm9rV2vUkE2pet2jyh43DCsXfMekENewfftl4j9khgrE5Sum4/k1tv7NCmDh
	ng5UqgOzTMCtyjUMSCswtIMlAdAC5QHkvNrNE8rCGL3jZGs1M9M4Ea351s7TBTLVWPIe
	wwCkzG/r/PzJUMy3dbh0jnusxhdxpq3f1o03wzBCDwqITwieNKE+3l9WqfVMTH744Jq/
	CndOw84NI4I5fJx3jt1Uo9AJR+Q09bHjiIWAVmGM3zMSqBLl0jvlNYgNwi8PX9hsPv2Q
	gtlTCX4rp9hT4rzXuJwOsGogRbipyHRSvFtbSGisZ00WW9HNppyA3jtQiNed48wlkY1P
	xkpw==
ARC-Authentication-Results: i=1; mx.google.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	n50si1707663wrb.411.2017.09.07.02.00.39;
	Thu, 07 Sep 2017 02:00:39 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 63C56689E03;
	Thu,  7 Sep 2017 12:00:05 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2E90C689C1F
	for <ffmpeg-devel@ffmpeg.org>; Thu,  7 Sep 2017 12:00:04 +0300 (EEST)
Received: from hhmail02.hh.imgtec.org (unknown [10.100.10.20])
	by Forcepoint Email with ESMTPS id 085B9C3187971
	for <ffmpeg-devel@ffmpeg.org>; Thu,  7 Sep 2017 10:00:02 +0100 (IST)
Received: from PUMAIL01.pu.imgtec.org (192.168.91.250) by
	hhmail02.hh.imgtec.org (10.100.10.20) with Microsoft SMTP Server
	(TLS) id 14.3.294.0; Thu, 7 Sep 2017 10:00:04 +0100
Received: from PUMAIL01.pu.imgtec.org ([::1]) by PUMAIL01.pu.imgtec.org
	([::1]) with mapi id 14.03.0266.001; Thu, 7 Sep 2017 14:30:02 +0530
From: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Thread-Topic: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa
	functions
Thread-Index: AQHTJXmw84YHIqPWcEafryGB1xB0RaKpJFOw
Date: Thu, 7 Sep 2017 09:00:01 +0000
Message-ID: <70293ACCC3BA6A4E81FFCA024C7A86E1E058FEE1@PUMAIL01.pu.imgtec.org>
References: <1504528292-9634-1-git-send-email-kaustubh.raste@imgtec.com>
In-Reply-To: <1504528292-9634-1-git-send-email-kaustubh.raste@imgtec.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [192.168.91.86]
MIME-Version: 1.0
Subject: Re: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa
	functions
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: Kaustubh Raste <Kaustubh.Raste@imgtec.com>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

LGTM

-----Original Message-----
From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of kaustubh.raste@imgtec.com
Sent: Monday, September 4, 2017 6:02 PM
To: ffmpeg-devel@ffmpeg.org
Cc: Kaustubh Raste
Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa functions

From: Kaustubh Raste <kaustubh.raste@imgtec.com>

Updated VP9_LPF_FILTER4_4W macro to process on 8 bit data.
Replaced VP9_LPF_FILTER4_8W with VP9_LPF_FILTER4_4W.

Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
---
 libavcodec/mips/vp9_lpf_msa.c |   94 ++++++-----------------------------------
 1 file changed, 14 insertions(+), 80 deletions(-)

 
     LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh,
                  hev, mask, flat);
-    VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out,
+    VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, 
+ q0_out,
                        q1_out);
 
     p1_d = __msa_copy_u_d((v2i64) p1_out, 0); @@ -342,7 +276,7 @@ void ff_loop_filter_v_8_8_msa(uint8_t *src, ptrdiff_t pitch,
     LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh,
                  hev, mask, flat);
     VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat);
-    VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out,
+    VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, 
+ q0_out,
                        q1_out);
 
     flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1065,7 +999,7 @@ void ff_loop_filter_v_16_8_msa(uint8_t *src, ptrdiff_t pitch,
     LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh,
                  hev, mask, flat);
     VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat);
-    VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out,
+    VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, 
+ q0_out,
                        q1_out);
 
     flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1280,7 +1214,7 @@ void ff_loop_filter_h_4_8_msa(uint8_t *src, ptrdiff_t pitch,
                        p3, p2, p1, p0, q0, q1, q2, q3);
     LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh,
                  hev, mask, flat);
-    VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1, p0, q0, q1);
+    VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1, p0, q0, q1);
     ILVR_B2_SH(p0, p1, q1, q0, vec0, vec1);
     ILVRL_H2_SH(vec1, vec0, vec2, vec3);
 
@@ -1367,7 +1301,7 @@ void ff_loop_filter_h_8_8_msa(uint8_t *src, ptrdiff_t pitch,
     /* flat4 */
     VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat);
     /* filter4 */
-    VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out,
+    VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, 
+ q0_out,
                        q1_out);
 
     flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1868,7 +1802,7 @@ static int32_t vp9_vt_lpf_t4_and_t8_8w(uint8_t *src, uint8_t *filter48,
     /* flat4 */
     VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat);
     /* filter4 */
-    VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out,
+    VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, 
+ q0_out,
                        q1_out);
 
     flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat);
--
1.7.9.5

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

diff --git a/libavcodec/mips/vp9_lpf_msa.c b/libavcodec/mips/vp9_lpf_msa.c index eef8afc..c82a9e9 100644
--- a/libavcodec/mips/vp9_lpf_msa.c
+++ b/libavcodec/mips/vp9_lpf_msa.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015 Shivraj Patil (Shivraj.Patil@imgtec.com)
+ * Copyright (c) 2015 - 2017 Shivraj Patil (Shivraj.Patil@imgtec.com)
  *
  * This file is part of FFmpeg.
  *
@@ -22,63 +22,12 @@
 #include "libavutil/mips/generic_macros_msa.h"
 #include "vp9dsp_mips.h"
 
-#define VP9_LPF_FILTER4_8W(p1_in, p0_in, q0_in, q1_in, mask_in, hev_in,  \
-                           p1_out, p0_out, q0_out, q1_out)               \
-{                                                                        \
-    v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt_sign;                  \
-    v16i8 filt, filt1, filt2, cnst4b, cnst3b;                            \
-    v8i16 q0_sub_p0_r, filt_r, cnst3h;                                   \
-                                                                         \
-    p1_m = (v16i8) __msa_xori_b(p1_in, 0x80);                            \
-    p0_m = (v16i8) __msa_xori_b(p0_in, 0x80);                            \
-    q0_m = (v16i8) __msa_xori_b(q0_in, 0x80);                            \
-    q1_m = (v16i8) __msa_xori_b(q1_in, 0x80);                            \
-                                                                         \
-    filt = __msa_subs_s_b(p1_m, q1_m);                                   \
-    filt = filt & (v16i8) hev_in;                                        \
-    q0_sub_p0 = q0_m - p0_m;                                             \
-    filt_sign = __msa_clti_s_b(filt, 0);                                 \
-                                                                         \
-    cnst3h = __msa_ldi_h(3);                                             \
-    q0_sub_p0_r = (v8i16) __msa_ilvr_b(q0_sub_p0, q0_sub_p0);            \
-    q0_sub_p0_r = __msa_dotp_s_h((v16i8) q0_sub_p0_r, (v16i8) cnst3h);   \
-    filt_r = (v8i16) __msa_ilvr_b(filt_sign, filt);                      \
-    filt_r += q0_sub_p0_r;                                               \
-    filt_r = __msa_sat_s_h(filt_r, 7);                                   \
-                                                                         \
-    /* combine left and right part */                                    \
-    filt = __msa_pckev_b((v16i8) filt_r, (v16i8) filt_r);                \
-                                                                         \
-    filt = filt & (v16i8) mask_in;                                       \
-    cnst4b = __msa_ldi_b(4);                                             \
-    filt1 = __msa_adds_s_b(filt, cnst4b);                                \
-    filt1 >>= 3;                                                         \
-                                                                         \
-    cnst3b = __msa_ldi_b(3);                                             \
-    filt2 = __msa_adds_s_b(filt, cnst3b);                                \
-    filt2 >>= 3;                                                         \
-                                                                         \
-    q0_m = __msa_subs_s_b(q0_m, filt1);                                  \
-    q0_out = __msa_xori_b((v16u8) q0_m, 0x80);                           \
-    p0_m = __msa_adds_s_b(p0_m, filt2);                                  \
-    p0_out = __msa_xori_b((v16u8) p0_m, 0x80);                           \
-                                                                         \
-    filt = __msa_srari_b(filt1, 1);                                      \
-    hev_in = __msa_xori_b((v16u8) hev_in, 0xff);                         \
-    filt = filt & (v16i8) hev_in;                                        \
-                                                                         \
-    q1_m = __msa_subs_s_b(q1_m, filt);                                   \
-    q1_out = __msa_xori_b((v16u8) q1_m, 0x80);                           \
-    p1_m = __msa_adds_s_b(p1_m, filt);                                   \
-    p1_out = __msa_xori_b((v16u8) p1_m, 0x80);                           \
-}
-
 #define VP9_LPF_FILTER4_4W(p1_in, p0_in, q0_in, q1_in, mask_in, hev_in,  \
                            p1_out, p0_out, q0_out, q1_out)               \
 {                                                                        \
-    v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt_sign;                  \
-    v16i8 filt, filt1, filt2, cnst4b, cnst3b;                            \
-    v8i16 q0_sub_p0_r, q0_sub_p0_l, filt_l, filt_r, cnst3h;              \
+    v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt, filt1, filt2;         \
+    const v16i8 cnst4b = __msa_ldi_b(4);                                 \
+    const v16i8 cnst3b = __msa_ldi_b(3);                                 \
                                                                          \
     p1_m = (v16i8) __msa_xori_b(p1_in, 0x80);                            \
     p0_m = (v16i8) __msa_xori_b(p0_in, 0x80);                            \
@@ -89,30 +38,15 @@
                                                                          \
     filt = filt & (v16i8) hev_in;                                        \
                                                                          \
-    q0_sub_p0 = q0_m - p0_m;                                             \
-    filt_sign = __msa_clti_s_b(filt, 0);                                 \
-                                                                         \
-    cnst3h = __msa_ldi_h(3);                                             \
-    q0_sub_p0_r = (v8i16) __msa_ilvr_b(q0_sub_p0, q0_sub_p0);            \
-    q0_sub_p0_r = __msa_dotp_s_h((v16i8) q0_sub_p0_r, (v16i8) cnst3h);   \
-    filt_r = (v8i16) __msa_ilvr_b(filt_sign, filt);                      \
-    filt_r += q0_sub_p0_r;                                               \
-    filt_r = __msa_sat_s_h(filt_r, 7);                                   \
-                                                                         \
-    q0_sub_p0_l = (v8i16) __msa_ilvl_b(q0_sub_p0, q0_sub_p0);            \
-    q0_sub_p0_l = __msa_dotp_s_h((v16i8) q0_sub_p0_l, (v16i8) cnst3h);   \
-    filt_l = (v8i16) __msa_ilvl_b(filt_sign, filt);                      \
-    filt_l += q0_sub_p0_l;                                               \
-    filt_l = __msa_sat_s_h(filt_l, 7);                                   \
-                                                                         \
-    filt = __msa_pckev_b((v16i8) filt_l, (v16i8) filt_r);                \
+    q0_sub_p0 = __msa_subs_s_b(q0_m, p0_m);                              \
+    filt = __msa_adds_s_b(filt, q0_sub_p0);                              \
+    filt = __msa_adds_s_b(filt, q0_sub_p0);                              \
+    filt = __msa_adds_s_b(filt, q0_sub_p0);                              \
     filt = filt & (v16i8) mask_in;                                       \
                                                                          \
-    cnst4b = __msa_ldi_b(4);                                             \
     filt1 = __msa_adds_s_b(filt, cnst4b);                                \
     filt1 >>= 3;                                                         \
                                                                          \
-    cnst3b = __msa_ldi_b(3);                                             \
     filt2 = __msa_adds_s_b(filt, cnst3b);                                \
     filt2 >>= 3;                                                         \
                                                                          \ @@ -277,7 +211,7 @@ void ff_loop_filter_v_4_8_msa(uint8_t *src, ptrdiff_t pitch,