From patchwork Sat Mar 18 19:50:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mirage Abeysekara X-Patchwork-Id: 3008 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp794144vsy; Sat, 18 Mar 2017 12:58:43 -0700 (PDT) X-Received: by 10.28.6.203 with SMTP id 194mr3772576wmg.125.1489867123308; Sat, 18 Mar 2017 12:58:43 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w39si16111756wrc.140.2017.03.18.12.58.42; Sat, 18 Mar 2017 12:58:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@cse-mrt-ac-lk.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id F3C7D68838D; Sat, 18 Mar 2017 21:58:22 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf0-f193.google.com (mail-pf0-f193.google.com [209.85.192.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E82D2688355 for ; Sat, 18 Mar 2017 21:58:16 +0200 (EET) Received: by mail-pf0-f193.google.com with SMTP id n11so989951pfg.2 for ; Sat, 18 Mar 2017 12:58:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cse-mrt-ac-lk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=guA4HhjGkVS5O+ituAexmxarR9XZn+etj6f4CJTd4ec=; b=hFAVNPKa+bfALHUi8ZgMDJB5yvysfYsmvsUa3PVrD0zzH75z+Aul+i5j88Iw2cX3D4 af1CLF/+9Eiqdp2IX3g1rvynrvWdu8/RNwBSxagpHushKVGFlMKb81V+KLpW0KllIKQH Swf/4E6FR55EecAyK6MjmmzuNUdwbrZSbxSYYHnssTUuY6BWGubfdjD/nJ7O++1D/Ra2 nPna//Qj2pMAEocOcJ6WgyC7+SrXnN7ghniLPa7OhWLuGTIgNL5XnxkoegkU6M7PBrCG tEMKn5KgkO/Qcd8mc0/LQrwzjcF/A1mo/0Q/fq0OABTLNhKQtgPzME+UOI6Ckl17wJd+ VL0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=guA4HhjGkVS5O+ituAexmxarR9XZn+etj6f4CJTd4ec=; b=ZeqchZVgLJ/7NPhAOvoWomvgucG3r/gmKrNAHXSuwWlGuL4GswKuN/XhGQX3uxw0eq bYU353gs0kM8OEJT83P6tSpGAFXTPCOlRH55JKisiXoQfBBkBjNdHEgeuLH/0P/yJ8K1 XQV1cgnBZzvtZfhuIrSyVG1mVV7+BLvF2gsKEAtnbzLmc/+LywVfIUD/5Wzt0Nt+25fi E1cApCdqvSGavoKLPWb2fT/2TifUiGNi+URjtaIubGSt5oXjOnLCW2ivsayASw2A3ptN Xt9RwtgGCmPl5tJsRF67hVXrBH0X+FNjl/EwqhWQGro8ANxm6TUGMda2LQ6x2aGi6mkq fLdw== X-Gm-Message-State: AFeK/H0urG0af9aVbynkjxK4KHMiQ+O8UeJAyKw+wify5/pSV64QUp6lX6Cvzbjw9xweTCB2 X-Received: by 10.84.162.204 with SMTP id o12mr28788061plg.132.1489866667830; Sat, 18 Mar 2017 12:51:07 -0700 (PDT) Received: from localhost.localdomain ([2402:4000:bbfc:dccc:b0ea:4c19:e818:5c60]) by smtp.gmail.com with ESMTPSA id t6sm24371142pgo.42.2017.03.18.12.51.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 18 Mar 2017 12:51:07 -0700 (PDT) From: Mirage Abeysekara To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 Mar 2017 01:20:53 +0530 Message-Id: <1489866653-5992-2-git-send-email-mirage.12@cse.mrt.ac.lk> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1489866653-5992-1-git-send-email-mirage.12@cse.mrt.ac.lk> References: <1489866653-5992-1-git-send-email-mirage.12@cse.mrt.ac.lk> Subject: [FFmpeg-devel] [PATCH] Added AVX2 implementation for VP8 decoder (ff_pred16x16_tm_vp8_8_avx2) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Mirage Abeysekara MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavcodec/x86/h264_intrapred.asm | 37 ++++++++++++++++++++++++++++++++++++ libavcodec/x86/h264_intrapred_init.c | 7 +++++++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/x86/h264_intrapred.asm b/libavcodec/x86/h264_intrapred.asm index c88d91b..0f3b462 100644 --- a/libavcodec/x86/h264_intrapred.asm +++ b/libavcodec/x86/h264_intrapred.asm @@ -268,6 +268,43 @@ cglobal pred16x16_tm_vp8_8, 2,6,6 jg .loop REP_RET +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +cglobal pred16x16_tm_vp8_8, 2, 4, 5, dst, stride, stride3, iteration + sub dstq, strideq + pmovzxbw m0, [dstq] + vpbroadcastb xm1, [r0-1] + pmovzxbw m1, xm1 + psubw m0, m1 + mov iterationd, 4 + lea stride3q, [strideq*3] +.loop: + vpbroadcastb xm1, [dstq+strideq*1-1] + vpbroadcastb xm2, [dstq+strideq*2-1] + vpbroadcastb xm3, [dstq+stride3q-1] + vpbroadcastb xm4, [dstq+strideq*4-1] + pmovzxbw m1, xm1 + pmovzxbw m2, xm2 + pmovzxbw m3, xm3 + pmovzxbw m4, xm4 + paddw m1, m0 + paddw m2, m0 + paddw m3, m0 + paddw m4, m0 + vpackuswb m1, m1, m2 + vpackuswb m3, m3, m4 + vpermq m1, m1, q3120 + vpermq m3, m3, q3120 + movdqa [dstq+strideq*1], xm1 + vextracti128 [dstq+strideq*2], m1, 1 + movdqa [dstq+stride3q*1], xm3 + vextracti128 [dstq+strideq*4], m3, 1 + lea dstq, [dstq+strideq*4] + dec iterationd + jg .loop + REP_RET +%endif + ;----------------------------------------------------------------------------- ; void ff_pred16x16_plane_*_8(uint8_t *src, int stride) ;----------------------------------------------------------------------------- diff --git a/libavcodec/x86/h264_intrapred_init.c b/libavcodec/x86/h264_intrapred_init.c index 528b92e..bdd5125 100644 --- a/libavcodec/x86/h264_intrapred_init.c +++ b/libavcodec/x86/h264_intrapred_init.c @@ -127,6 +127,7 @@ PRED16x16(plane_svq3, 8, ssse3) PRED16x16(tm_vp8, 8, mmx) PRED16x16(tm_vp8, 8, mmxext) PRED16x16(tm_vp8, 8, sse2) +PRED16x16(tm_vp8, 8, avx2) PRED8x8(top_dc, 8, mmxext) PRED8x8(dc_rv40, 8, mmxext) @@ -323,6 +324,12 @@ av_cold void ff_h264_pred_init_x86(H264PredContext *h, int codec_id, } } } + + if(EXTERNAL_AVX2(cpu_flags)){ + if (codec_id == AV_CODEC_ID_VP8) { + h->pred16x16[PLANE_PRED8x8 ] = ff_pred16x16_tm_vp8_8_avx2; + } + } } else if (bit_depth == 10) { if (EXTERNAL_MMXEXT(cpu_flags)) { h->pred4x4[DC_PRED ] = ff_pred4x4_dc_10_mmxext;