From patchwork Thu Jun 8 15:08:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilia X-Patchwork-Id: 3870 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp2905060vsk; Thu, 8 Jun 2017 08:16:35 -0700 (PDT) X-Received: by 10.28.174.131 with SMTP id x125mr3957706wme.32.1496934995671; Thu, 08 Jun 2017 08:16:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1496934995; cv=none; d=google.com; s=arc-20160816; b=xzExyMy0QkJ5mHhUkNgUyp+IKiZYjiVB2yaL7ZWVOUTZ2GtKvnSLmwc0j9yun5Onqj jYBVwA8HVBeqWQCQmxB7JDdkkELWmLU2FcTxFdrKLOnKw39IZCCXgaK5KQQBsgHmvdMk dK0fN6yNEZKaCvOzFyFoGOH0Y24eMFAMBPZcLL/Ojv1E2MIa/oqclTwRE+aHUNBRhKVy 1fm7jogv3a8ik7idD8qmmksk8c9kiBjd1uo5IrJ9rLJFnipwn86BFywiE4bI6gmmiKrz 9rplzr3Cu9HU4+PyNClve7gcevhDM9XASgbr1G9qiXJKOxPLcCHjmkKSk2c1S7V1E+Mz o5DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=irK4q0ZGFdo7qwkDyN8HFbz/3r0Q9FfBk9os+ZKijYU=; b=D9ZdR27NaO0sNv941DFbKamf4Vb1Bn9a7VJ2l1OsXeceBEsaNG231CItExVj6CBN0a rftfzdlGKbJ9MgICpPuAFEpPFP4YhpiSD5XVB4Z8sSeXTBNv6QFoOrQlzdFv4EW3tO8a 4iyiTlz4McpVpQAZ43fHgN6TdzpCm4A//5tMElUWk09XhG2kiAn9Mirwvl2XdNwpKUuS pxXKw/8CImLjWHUP6WpnMOAV9m5j93XNO/zzc1uOeyiWHp/9dc/hJGlDCqEZQ3J8k7UH q2313sBSpAsV2jDREYyHiSGihc/Bj9Ki9bivPaaLOuK/IiFSqeapKVRT61Ic+YpGYLT9 Fswg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y95si5318063wrb.262.2017.06.08.08.16.35; Thu, 08 Jun 2017 08:16:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 95A76689CEC; Thu, 8 Jun 2017 18:16:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f66.google.com (mail-lf0-f66.google.com [209.85.215.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3859A68799F for ; Thu, 8 Jun 2017 18:16:25 +0300 (EEST) Received: by mail-lf0-f66.google.com with SMTP id u62so3329683lfg.0 for ; Thu, 08 Jun 2017 08:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FfuuP5D3aRJUvTnI5AjJqe4T0fNOwlYUsL9W6x5ZGdc=; b=S9sMq9DSYLqSWql472ebMH9+olCLV/twzA8nuDRjW3kIc0k4eLSHNqn+pAtAF9MUmK MGXzwYjCStvjgRMJA0IY7dl0X+T9o1RNDvlBq6i+jH9Ugd5Cg5n8Mk/h5QB4RwmIx1z3 mYjhaIrCcizFLe0BLDaljsEWvTDAETarbh8xPYg6RngYyBQo/dX91/GSTb5VTAVT0W5y dIQTBVv3D/JTV+lDTBUq05+49YHwqivVktqyOhXkjqgCRQI6N226PidxguitXQw4+VWX on3s96H1xT2wZWn7+xq9teJVs29PRRKFE9Aw+VAHZ3vWlx5hGbMk7Q2YGLybljQcC1ne sRgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FfuuP5D3aRJUvTnI5AjJqe4T0fNOwlYUsL9W6x5ZGdc=; b=WrIcKCrn6TKpajdwbf4jqYoj5VXoxcYcYUY4azKdEvjh62IJOHUViGIFropKTzMIxn NFck6vuejhg6v/H2CCDStJuVjc4y03d0r46SJWEIEHPoC/Qk0EB/wKihoAcQzzqv0/Pj 97fVoxjp6dcASRwuYsP35pv4TZSzUa2wkFkPQPNBL/zvurbsE0bX1czSfysUIqO+z2zV VAZn+UXeU116yXFYcy48NKiqe66wERF+rmWa3JAzMggWuasq2nGErZwYha38gJ4nZOOX 7U6KnYyqCK6mEh0d2mz0uOL4SkUtPzAl45tYXG60A0E8zT6cEQWc6jTmjPz+mXlgCECX pjEw== X-Gm-Message-State: AODbwcDiSGV3fSFEmcTOl6b3uQnf8PUo0hlZAUjFet2C0x5UECvgqesb G3jPJWS9EU2kYClK X-Received: by 10.46.74.1 with SMTP id x1mr12038357lja.117.1496934529671; Thu, 08 Jun 2017 08:08:49 -0700 (PDT) Received: from localhost.localdomain ([212.164.89.30]) by smtp.gmail.com with ESMTPSA id s63sm1112938lja.20.2017.06.08.08.08.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Jun 2017 08:08:48 -0700 (PDT) From: Ilia Valiakhmetov To: ffmpeg-devel@ffmpeg.org Date: Thu, 8 Jun 2017 22:08:24 +0700 Message-Id: <20170608150824.3092-1-zakne0ne@gmail.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: References: Subject: [FFmpeg-devel] [PATCH] avcodec/vp9: ipred_dr_16x16_16 avx2 implementation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ilia Valiakhmetov MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" vp9_diag_downright_16x16_12bpp_c: 149.0 vp9_diag_downright_16x16_12bpp_sse2: 67.8 vp9_diag_downright_16x16_12bpp_ssse3: 45.6 vp9_diag_downright_16x16_12bpp_avx: 36.6 vp9_diag_downright_16x16_12bpp_avx2: 25.5 ~30% faster than avx Signed-off-by: Ilia Valiakhmetov --- libavcodec/x86/vp9dsp_init_16bpp.c | 2 ++ libavcodec/x86/vp9intrapred_16bpp.asm | 56 +++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c b/libavcodec/x86/vp9dsp_init_16bpp.c index d1b8fcd..8d1aa13 100644 --- a/libavcodec/x86/vp9dsp_init_16bpp.c +++ b/libavcodec/x86/vp9dsp_init_16bpp.c @@ -52,6 +52,7 @@ decl_ipred_fns(dc, 16, mmxext, sse2); decl_ipred_fns(dc_top, 16, mmxext, sse2); decl_ipred_fns(dc_left, 16, mmxext, sse2); decl_ipred_fn(dl, 16, 16, avx2); +decl_ipred_fn(dr, 16, 16, avx2); decl_ipred_fn(dl, 32, 16, avx2); #define decl_ipred_dir_funcs(type) \ @@ -136,6 +137,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp) init_fpel_func(1, 1, 64, avg, _16, avx2); init_fpel_func(0, 1, 128, avg, _16, avx2); init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2); + init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2); init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2); } diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm b/libavcodec/x86/vp9intrapred_16bpp.asm index 92333bc..67b98b1 100644 --- a/libavcodec/x86/vp9intrapred_16bpp.asm +++ b/libavcodec/x86/vp9intrapred_16bpp.asm @@ -1170,6 +1170,62 @@ DR_FUNCS 2 INIT_XMM avx DR_FUNCS 2 +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +cglobal vp9_ipred_dr_16x16_16, 4, 6, 7, dst, stride, l, a + mova m0, [lq] ; klmnopqrstuvwxyz + movu m1, [aq-2] ; *abcdefghijklmno + mova m2, [aq] ; abcdefghijklmnop + vperm2i128 m4, m2, m2, q2001 ; ijklmnop........ + vpalignr m5, m4, m2, 2 ; bcdefghijklmnop. + vperm2i128 m3, m0, m1, q0201 ; stuvwxyz*abcdefg + LOWPASS 1, 2, 5 ; ABCDEFGHIJKLMNO. + vpalignr m4, m3, m0, 2 ; lmnopqrstuvwxyz* + vpalignr m5, m3, m0, 4 ; mnopqrstuvwxyz*a + LOWPASS 0, 4, 5 ; LMNOPQRSTUVWXYZ# + vperm2i128 m5, m0, m1, q0201 ; TUVWXYZ#ABCDEFGH + DEFINE_ARGS dst, stride, stride3, stride5, dst3, cnt + lea dst3q, [dstq+strideq*4] + lea stride3q, [strideq*3] + lea stride5q, [stride3q+strideq*2] + + vpalignr m3, m5, m0, 2 + vpalignr m4, m1, m5, 2 + mova [dst3q+stride5q*2], m3 ; 14 + mova [ dstq+stride3q*2], m4 ; 6 + vpalignr m3, m5, m0, 4 + vpalignr m4, m1, m5, 4 + sub dst3q, strideq + mova [dst3q+stride5q*2], m3 ; 13 + mova [dst3q+strideq*2 ], m4 ; 5 + mova [dst3q+stride3q*4], m0 ; 15 + vpalignr m3, m5, m0, 6 + vpalignr m4, m1, m5, 6 + mova [dstq+stride3q*4], m3 ; 12 + mova [dst3q+strideq*1], m4 ; 4 + vpalignr m3, m5, m0, 8 + vpalignr m4, m1, m5, 8 + mova [dst3q+strideq*8], m3 ; 11 + mova [dst3q+strideq*0], m4 ; 3 + vpalignr m3, m5, m0, 12 + vpalignr m4, m1, m5, 12 + mova [dst3q+stride3q*2], m3 ; 9 + mova [dstq+strideq*1 ], m4 ; 1 + vpalignr m3, m5, m0, 10 + vpalignr m4, m1, m5, 10 + mova [dstq+stride5q*2], m3 ; 10 + mova [dstq+strideq*2 ], m4 ; 2 + vpalignr m3, m5, m0, 14 + vpalignr m4, m1, m5, 14 + mova [dstq+strideq*8], m3 ; 8 + mova [dstq+strideq*0], m4 ; 0 + sub dstq, strideq + mova [dst3q+strideq*4], m5 ; 7 + mova [ dstq+strideq*0], m1 ; -1 + RET +%endif + + %macro VL_FUNCS 1 ; stack_mem_for_32x32_32bit_function cglobal vp9_ipred_vl_4x4_16, 2, 4, 3, dst, stride, l, a movifnidn aq, amp