From patchwork Mon Jul 3 11:09:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilia X-Patchwork-Id: 4198 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp9266403vsb; Mon, 3 Jul 2017 04:15:28 -0700 (PDT) X-Received: by 10.28.174.6 with SMTP id x6mr24651752wme.12.1499080528083; Mon, 03 Jul 2017 04:15:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1499080528; cv=none; d=google.com; s=arc-20160816; b=KdsOs2xW5w1fL1i1RT1DrITnTpn95y3rgvL5m0FliKhUBeDRdNFkG2OdrTYlqq9SJR F13FU5uLQ6959UojWVRjMRDQeAk28wwnMUaTui1d8VBeJEclhIVjY5yvIXzSr8Bq6DdO MtPl+m7McmAcFPDg+kqx0Iy8rfFRPfhDC2u6Vv48e8KF685C+d4oCNG8ygjYULOn8z1e ZnsKf/zpuneznz8Qrd/m/zs6qPaa6wAZ7+yDXNR6PPq80zrf1cEODQlvupjmSeGpOAyZ CzgSmpTMlkUknATqBvpW4jA0cDe5TlQvcjBGMM0jEU9F2Ve91mp9YcetODHS0AQjBwLZ XRPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=EsEtu1R4Lg+HhxHEYYH4Zo4JeOC42oC9+R6GvVafR+k=; b=kqJA88/H4MQAtHC21P7HXThiBWCdQYMCx2OKpyqUoqy5cMej7Smah3E8W2HrXSAV4k oGjeZwPR0fo0fgZGBOgzpCtobiQeGGlcl2GCy2OOJMRa6wDiKpnMBW96Qj76bFhBHcxk eRwXdLqJKlZTieUe9mOc4FbGNjep/t6x00GZT0Z/eALpnf1Z4qXQWMayckS1eJ/jlaEn Z3ud2++PdK7qG0agtmVA3KSJ7RV1MrqpsVanVJaLzjqpZ6vK9keNUgsRogOGrnLdbHG1 Cuh4c8hUtKK0x/wcAwUcpnAK+ivOmnxu39BfMMwO+7itm5grzvpN9aCpzLDcAi7Gq4Wn n1BQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=HWTSll1s; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i18si17063267wmc.96.2017.07.03.04.15.27; Mon, 03 Jul 2017 04:15:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=HWTSll1s; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 138296882B7; Mon, 3 Jul 2017 14:15:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f67.google.com (mail-lf0-f67.google.com [209.85.215.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5B210680668 for ; Mon, 3 Jul 2017 14:15:17 +0300 (EEST) Received: by mail-lf0-f67.google.com with SMTP id t72so14931126lff.0 for ; Mon, 03 Jul 2017 04:15:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6O4PvBFQeb8L+/Cf12asZrlFrD6/sMrlUOFwFZgyz0o=; b=HWTSll1sDJEKX9dDBR+ebaj8+05zcmGTwVP3d8e00/H9//Zkok+lGTiLu+eUEaatP7 EZdY6nH4K5r6qZQTQqzuMbJKGoY4zSafw8z2Q9RsSErV9ToMPXIYn815Yh5JuXcwsQOt L5DVlH98dhIU666E5VSZooryhHo0AQ3HjaUXBckKeyfbKcyzXQ9SFnD5jcCMSI+28AOm ufvuUGgAFOYbZtcNRooZfdWLsCDsPo/NFGF2Jvwty2nr9LF0AG9TJHzUuhtPOOTd/QDZ rs8AkBMWPyqmVyDfAVbpZlCuaXtvSzctDSESOLxYCBl7N+dluUzYOaMplHjiyZFNHLyw ly7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6O4PvBFQeb8L+/Cf12asZrlFrD6/sMrlUOFwFZgyz0o=; b=XBgZR3gTTbRkdSBvc5sx2ADlsXBFbRBHz0PaT0cQ/RiQwpPRod6cYeuNPqHjX9qrRo A3/kBZFe88aKA9CP+wtC1fRw471g3ofE4cSrROHMOH5cygsqPwdVtLShqaIpHzOmEkjx Gsd+HQk4Ol02H6xqDqZEFFyKfS/vBZPXioUsVAIAwbludw+7hUjeIiQG6AaQmnz2AsYA 2R/Y8jECQJ1VWwgxO3RALmPUelPMJ243MpKbzrxOt887w3wD3ofEYhVc6NJfdCoo1h0+ stusEkWC/kDF9Kqb2dw4wWF/ZRzlfG21HcsIkX3r8Bu2oBMJhQe+jHVSNp8d4scrkL/N pToA== X-Gm-Message-State: AKS2vOzCw1JLMJrMY7ERqFybM2d4Ma1M3RTNVYsqcjoexJDNZtMcZrJs YRsRnVzqsZ7bOmfE X-Received: by 10.25.115.210 with SMTP id h79mr12234259lfk.163.1499080169331; Mon, 03 Jul 2017 04:09:29 -0700 (PDT) Received: from localhost.localdomain ([95.191.209.241]) by smtp.gmail.com with ESMTPSA id f46sm4171250lfh.51.2017.07.03.04.09.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jul 2017 04:09:28 -0700 (PDT) From: Ilia Valiakhmetov To: ffmpeg-devel@ffmpeg.org Date: Mon, 3 Jul 2017 18:09:07 +0700 Message-Id: <20170703110907.5980-1-zakne0ne@gmail.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: References: Subject: [FFmpeg-devel] [PATCH] avcodec/vp9: AVX2 ipred_vl_16x16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ilia Valiakhmetov MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" vp9_vert_left_16x16_12bpp_c: 273.8 vp9_vert_left_16x16_12bpp_sse2: 69.4 vp9_vert_left_16x16_12bpp_ssse3: 35.3 vp9_vert_left_16x16_12bpp_avx: 34.6 vp9_vert_left_16x16_12bpp_avx2: 22.4 ~35% faster than avx Signed-off-by: Ilia Valiakhmetov --- libavcodec/x86/vp9dsp_init_16bpp.c | 2 ++ libavcodec/x86/vp9intrapred_16bpp.asm | 53 +++++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c b/libavcodec/x86/vp9dsp_init_16bpp.c index 60d10a1..da8b74c 100644 --- a/libavcodec/x86/vp9dsp_init_16bpp.c +++ b/libavcodec/x86/vp9dsp_init_16bpp.c @@ -55,6 +55,7 @@ decl_ipred_fn(dl, 16, 16, avx2); decl_ipred_fn(dl, 32, 16, avx2); decl_ipred_fn(dr, 16, 16, avx2); decl_ipred_fn(dr, 32, 16, avx2); +decl_ipred_fn(vl, 16, 16, avx2); #define decl_ipred_dir_funcs(type) \ decl_ipred_fns(type, 16, sse2, sse2); \ @@ -143,6 +144,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp) #if ARCH_X86_64 init_ipred_func(dr, DIAG_DOWN_RIGHT, 32, 16, avx2); #endif + init_ipred_func(vl, VERT_LEFT, 16, 16, avx2); } #endif /* HAVE_X86ASM */ diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm b/libavcodec/x86/vp9intrapred_16bpp.asm index 32b6982..8d8d65e 100644 --- a/libavcodec/x86/vp9intrapred_16bpp.asm +++ b/libavcodec/x86/vp9intrapred_16bpp.asm @@ -1538,6 +1538,59 @@ VL_FUNCS 1 INIT_XMM avx VL_FUNCS 1 +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +cglobal vp9_ipred_vl_16x16_16, 2, 4, 6, dst, stride, l, a + movifnidn aq, amp + mova m0, [aq] ; abcdefghijklmnop + vpbroadcastw xm5, [aq+30] ; pppppppp + vperm2i128 m1, m0, m5, q0201 ; ijklmnoppppppppp + vpalignr m2, m1, m0, 2 ; bcdefghijklmnopp + vpalignr m3, m1, m0, 4 ; cdefghijklmnoppp + mova m4, m2 + pavgw m4, m0 + LOWPASS 0, 2, 3 ; BCDEFGHIJKLMNOPp + vperm2i128 m2, m0, m5, q0201 + vperm2i128 m3, m4, m5, q0201 + DEFINE_ARGS dst, stride, stride3 + lea stride3q, [strideq*3] + + mova [dstq+strideq*0], m4 + mova [dstq+strideq*1], m0 + vpalignr m1, m2, m0, 2 + vpalignr m5, m3, m4, 2 + mova [dstq+strideq*2], m5 + mova [dstq+stride3q ], m1 + vpalignr m1, m2, m0, 4 + vpalignr m5, m3, m4, 4 + lea dstq, [dstq+strideq*4] + mova [dstq+strideq*0], m5 + mova [dstq+strideq*1], m1 + vpalignr m1, m2, m0, 6 + vpalignr m5, m3, m4, 6 + mova [dstq+strideq*2], m5 + mova [dstq+stride3q ], m1 + vpalignr m1, m2, m0, 8 + vpalignr m5, m3, m4, 8 + lea dstq, [dstq+strideq*4] + mova [dstq+strideq*0], m5 + mova [dstq+strideq*1], m1 + vpalignr m1, m2, m0, 10 + vpalignr m5, m3, m4, 10 + mova [dstq+strideq*2], m5 + mova [dstq+stride3q ], m1 + vpalignr m1, m2, m0, 12 + vpalignr m5, m3, m4, 12 + lea dstq, [dstq+strideq*4] + mova [dstq+strideq*0], m5 + mova [dstq+strideq*1], m1 + vpalignr m1, m2, m0, 14 + vpalignr m5, m3, m4, 14 + mova [dstq+strideq*2], m5 + mova [dstq+stride3q ], m1 + RET +%endif + %macro VR_FUNCS 0 cglobal vp9_ipred_vr_4x4_16, 4, 4, 3, dst, stride, l, a movu m0, [aq-2]