From patchwork Mon Sep 4 12:31:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: kaustubh.raste@imgtec.com X-Patchwork-Id: 4975 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.15.201 with SMTP id 70csp3072283jao; Mon, 4 Sep 2017 05:31:07 -0700 (PDT) X-Google-Smtp-Source: ADKCNb55BEQzQgQCrsntwF60z/bpYQHDGkp8tVRyPE8kBKpevOtx1v/ii7noFpu6Hw+sMHxxRXl1 X-Received: by 10.28.31.196 with SMTP id f187mr185039wmf.187.1504528267528; Mon, 04 Sep 2017 05:31:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1504528267; cv=none; d=google.com; s=arc-20160816; b=exzHWuq1Fu39sa9H2aj5prDRaVYaNuI+3oEWRwPwbZsh5nWAfXIuyTrG1O0y3htjSO umuYBprtDq889vtLztTuNyKJZxQ0HTjJnIaNvk8kZ1uHNN/y9krmtcGvkCrdwoSzG2hr /mVwEaXzRPGvsnn3saDu6xp3SPjKB5wSUbv0Wmd+VEp2MrfIiKHN8FCRH3O15zuyOLXX CQ1qGy10MU9qcL6ncyiabuoDIS7xb8ZRMbNWe5Mut8YzjffZ0vn7SytxtfjDzHr+UH3w u6OgpJp17c82+RCyQ2B00jxqw9zGERndAjwU84sZhvFfJO201suviA7w51qMDJqJ8hep 4lGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :delivered-to:arc-authentication-results; bh=aPxMsbRhGVxm51Y8d8+lasl2IY+HllWxWDB/kODv6SE=; b=dn7jJJ01HUvgkDyjzQssbcIXn3bFgZpUyzcE73LOpwuVwCaVo11dh2ywsPATUKY00F mdj0v/mSB+7NPYNPn6Y5BODTnjOoscNFoyss4/5t/45zEgwwOPXftnnQSSy7NT53AVKz b/7a1ObHaNf5u9ET0S4JIWGTRuiVNYSVcZr7z7i73k+caYCYr0szZoqa3wIV0wGD/UdS STyr5xIj36KyNWvkm+vWIXY4R+EYjypypQL0fcnsmzi1CtS7XBvGFBxh4VkwBjDuBd4S 5E3BKsV5c75Te7jRjgf/2qw4eZGbZ/fv/X2TjS70KVY3MWnA5RMv000UvK9fVFCBywhZ rQrg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e18si1729787wra.465.2017.09.04.05.31.05; Mon, 04 Sep 2017 05:31:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A6C45689FB9; Mon, 4 Sep 2017 15:30:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mailapp01.imgtec.com (mailapp01.imgtec.com [195.59.15.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CAA13689CF1 for ; Mon, 4 Sep 2017 15:30:53 +0300 (EEST) Received: from hhmail02.hh.imgtec.org (unknown [10.100.10.20]) by Forcepoint Email with ESMTPS id C9514EF3728C5 for ; Mon, 4 Sep 2017 13:30:50 +0100 (IST) Received: from pudesk204.pu.imgtec.org (192.168.91.13) by hhmail02.hh.imgtec.org (10.100.10.20) with Microsoft SMTP Server (TLS) id 14.3.294.0; Mon, 4 Sep 2017 13:30:53 +0100 From: To: Date: Mon, 4 Sep 2017 18:01:32 +0530 Message-ID: <1504528292-9634-1-git-send-email-kaustubh.raste@imgtec.com> X-Mailer: git-send-email 1.7.9.5 MIME-Version: 1.0 X-Originating-IP: [192.168.91.13] Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Improve vp9 lpf msa functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Kaustubh Raste Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Kaustubh Raste Updated VP9_LPF_FILTER4_4W macro to process on 8 bit data. Replaced VP9_LPF_FILTER4_8W with VP9_LPF_FILTER4_4W. Signed-off-by: Kaustubh Raste --- libavcodec/mips/vp9_lpf_msa.c | 94 ++++++----------------------------------- 1 file changed, 14 insertions(+), 80 deletions(-) diff --git a/libavcodec/mips/vp9_lpf_msa.c b/libavcodec/mips/vp9_lpf_msa.c index eef8afc..c82a9e9 100644 --- a/libavcodec/mips/vp9_lpf_msa.c +++ b/libavcodec/mips/vp9_lpf_msa.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2015 Shivraj Patil (Shivraj.Patil@imgtec.com) + * Copyright (c) 2015 - 2017 Shivraj Patil (Shivraj.Patil@imgtec.com) * * This file is part of FFmpeg. * @@ -22,63 +22,12 @@ #include "libavutil/mips/generic_macros_msa.h" #include "vp9dsp_mips.h" -#define VP9_LPF_FILTER4_8W(p1_in, p0_in, q0_in, q1_in, mask_in, hev_in, \ - p1_out, p0_out, q0_out, q1_out) \ -{ \ - v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt_sign; \ - v16i8 filt, filt1, filt2, cnst4b, cnst3b; \ - v8i16 q0_sub_p0_r, filt_r, cnst3h; \ - \ - p1_m = (v16i8) __msa_xori_b(p1_in, 0x80); \ - p0_m = (v16i8) __msa_xori_b(p0_in, 0x80); \ - q0_m = (v16i8) __msa_xori_b(q0_in, 0x80); \ - q1_m = (v16i8) __msa_xori_b(q1_in, 0x80); \ - \ - filt = __msa_subs_s_b(p1_m, q1_m); \ - filt = filt & (v16i8) hev_in; \ - q0_sub_p0 = q0_m - p0_m; \ - filt_sign = __msa_clti_s_b(filt, 0); \ - \ - cnst3h = __msa_ldi_h(3); \ - q0_sub_p0_r = (v8i16) __msa_ilvr_b(q0_sub_p0, q0_sub_p0); \ - q0_sub_p0_r = __msa_dotp_s_h((v16i8) q0_sub_p0_r, (v16i8) cnst3h); \ - filt_r = (v8i16) __msa_ilvr_b(filt_sign, filt); \ - filt_r += q0_sub_p0_r; \ - filt_r = __msa_sat_s_h(filt_r, 7); \ - \ - /* combine left and right part */ \ - filt = __msa_pckev_b((v16i8) filt_r, (v16i8) filt_r); \ - \ - filt = filt & (v16i8) mask_in; \ - cnst4b = __msa_ldi_b(4); \ - filt1 = __msa_adds_s_b(filt, cnst4b); \ - filt1 >>= 3; \ - \ - cnst3b = __msa_ldi_b(3); \ - filt2 = __msa_adds_s_b(filt, cnst3b); \ - filt2 >>= 3; \ - \ - q0_m = __msa_subs_s_b(q0_m, filt1); \ - q0_out = __msa_xori_b((v16u8) q0_m, 0x80); \ - p0_m = __msa_adds_s_b(p0_m, filt2); \ - p0_out = __msa_xori_b((v16u8) p0_m, 0x80); \ - \ - filt = __msa_srari_b(filt1, 1); \ - hev_in = __msa_xori_b((v16u8) hev_in, 0xff); \ - filt = filt & (v16i8) hev_in; \ - \ - q1_m = __msa_subs_s_b(q1_m, filt); \ - q1_out = __msa_xori_b((v16u8) q1_m, 0x80); \ - p1_m = __msa_adds_s_b(p1_m, filt); \ - p1_out = __msa_xori_b((v16u8) p1_m, 0x80); \ -} - #define VP9_LPF_FILTER4_4W(p1_in, p0_in, q0_in, q1_in, mask_in, hev_in, \ p1_out, p0_out, q0_out, q1_out) \ { \ - v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt_sign; \ - v16i8 filt, filt1, filt2, cnst4b, cnst3b; \ - v8i16 q0_sub_p0_r, q0_sub_p0_l, filt_l, filt_r, cnst3h; \ + v16i8 p1_m, p0_m, q0_m, q1_m, q0_sub_p0, filt, filt1, filt2; \ + const v16i8 cnst4b = __msa_ldi_b(4); \ + const v16i8 cnst3b = __msa_ldi_b(3); \ \ p1_m = (v16i8) __msa_xori_b(p1_in, 0x80); \ p0_m = (v16i8) __msa_xori_b(p0_in, 0x80); \ @@ -89,30 +38,15 @@ \ filt = filt & (v16i8) hev_in; \ \ - q0_sub_p0 = q0_m - p0_m; \ - filt_sign = __msa_clti_s_b(filt, 0); \ - \ - cnst3h = __msa_ldi_h(3); \ - q0_sub_p0_r = (v8i16) __msa_ilvr_b(q0_sub_p0, q0_sub_p0); \ - q0_sub_p0_r = __msa_dotp_s_h((v16i8) q0_sub_p0_r, (v16i8) cnst3h); \ - filt_r = (v8i16) __msa_ilvr_b(filt_sign, filt); \ - filt_r += q0_sub_p0_r; \ - filt_r = __msa_sat_s_h(filt_r, 7); \ - \ - q0_sub_p0_l = (v8i16) __msa_ilvl_b(q0_sub_p0, q0_sub_p0); \ - q0_sub_p0_l = __msa_dotp_s_h((v16i8) q0_sub_p0_l, (v16i8) cnst3h); \ - filt_l = (v8i16) __msa_ilvl_b(filt_sign, filt); \ - filt_l += q0_sub_p0_l; \ - filt_l = __msa_sat_s_h(filt_l, 7); \ - \ - filt = __msa_pckev_b((v16i8) filt_l, (v16i8) filt_r); \ + q0_sub_p0 = __msa_subs_s_b(q0_m, p0_m); \ + filt = __msa_adds_s_b(filt, q0_sub_p0); \ + filt = __msa_adds_s_b(filt, q0_sub_p0); \ + filt = __msa_adds_s_b(filt, q0_sub_p0); \ filt = filt & (v16i8) mask_in; \ \ - cnst4b = __msa_ldi_b(4); \ filt1 = __msa_adds_s_b(filt, cnst4b); \ filt1 >>= 3; \ \ - cnst3b = __msa_ldi_b(3); \ filt2 = __msa_adds_s_b(filt, cnst3b); \ filt2 >>= 3; \ \ @@ -277,7 +211,7 @@ void ff_loop_filter_v_4_8_msa(uint8_t *src, ptrdiff_t pitch, LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, q1_out); p1_d = __msa_copy_u_d((v2i64) p1_out, 0); @@ -342,7 +276,7 @@ void ff_loop_filter_v_8_8_msa(uint8_t *src, ptrdiff_t pitch, LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1065,7 +999,7 @@ void ff_loop_filter_v_16_8_msa(uint8_t *src, ptrdiff_t pitch, LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1280,7 +1214,7 @@ void ff_loop_filter_h_4_8_msa(uint8_t *src, ptrdiff_t pitch, p3, p2, p1, p0, q0, q1, q2, q3); LPF_MASK_HEV(p3, p2, p1, p0, q0, q1, q2, q3, limit, b_limit, thresh, hev, mask, flat); - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1, p0, q0, q1); + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1, p0, q0, q1); ILVR_B2_SH(p0, p1, q1, q0, vec0, vec1); ILVRL_H2_SH(vec1, vec0, vec2, vec3); @@ -1367,7 +1301,7 @@ void ff_loop_filter_h_8_8_msa(uint8_t *src, ptrdiff_t pitch, /* flat4 */ VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); /* filter4 */ - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat); @@ -1868,7 +1802,7 @@ static int32_t vp9_vt_lpf_t4_and_t8_8w(uint8_t *src, uint8_t *filter48, /* flat4 */ VP9_FLAT4(p3, p2, p0, q0, q2, q3, flat); /* filter4 */ - VP9_LPF_FILTER4_8W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, + VP9_LPF_FILTER4_4W(p1, p0, q0, q1, mask, hev, p1_out, p0_out, q0_out, q1_out); flat = (v16u8) __msa_ilvr_d((v2i64) zero, (v2i64) flat);