From patchwork Sun Jun 4 14:09:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilia X-Patchwork-Id: 3834 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp975052vsk; Sun, 4 Jun 2017 07:15:57 -0700 (PDT) X-Received: by 10.223.177.136 with SMTP id q8mr1983268wra.200.1496585757869; Sun, 04 Jun 2017 07:15:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1496585757; cv=none; d=google.com; s=arc-20160816; b=en8ekp8Wenv1z5E6PcVqFHeLSEiEQw3Li0QYizNezD6ig3h9f0L0OnadEbj6ZvGKE3 SZp/+tEAtfev7UFcW73ww3Mu5bR4TiCH++a9J2VF4aLib3/fhgJeS5CTEsFdSLOMABG5 AhqWq/jAVwN8HO6B5/rI6ImiyHnNFAZi96IgXPyTanS2IZUtH223WcMkxsAXvSlZv6lk hs6FBVjOBEAEq+LGltxaSIh0/NusYoMnoOLRXxikXL/YDSdxmBIlQQg7OObW/sa5LlY5 TIBcMB0GzculQV6HVB5YeoH9544O9m9RpTV3oMsvVURIAXmjNQzdvv891oVoJby7u7R6 O2fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=wOu20O/4ULAS/XDvCZzb6ew9C+Io4VMxcG3HQEqgOh0=; b=h4hHEW0Zj5xo3wdQCT7untiC32l0USYQ5dkIZHsp0loyf6mXZbc7e1G+cOLjv5Wb9S sQk1yI4RRUgdbZqSKl17u8L4EoKBpbyC1r5Z62FwM5SSAseK0UcQkqa1dn0Rd9rOkLSC Ay3sGtvIlfe2HRUu1b7xnRXQazhF34IYnIa/YfM8jMvB+vGhXuz5vO26wwIjmoBUCXJp H5DyJP08dfqqla/b5x2QhGfVKQyvmo69EmALgPksSUGxyj1P0ByhUub8Koy/9O+5lobb jUGZt2IfyfMi9laPuWpWDRBONRTi593QI4oZ7Xv3QPt+zyI0VIAN6cEUiFLQWN3yweiv 3zRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id k28si3769234wre.305.2017.06.04.07.15.56; Sun, 04 Jun 2017 07:15:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B669F689D95; Sun, 4 Jun 2017 17:15:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1BB9C6899C1 for ; Sun, 4 Jun 2017 17:15:42 +0300 (EEST) Received: by mail-lf0-f65.google.com with SMTP id f14so7796522lfe.1 for ; Sun, 04 Jun 2017 07:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=u99JYcqlSUjeBEQHfAIE2uYH++U+WvJm79AR4j4r5lw=; b=LgQ6hy5MrVxRYdlw2JTv33Nu1mc0ftBnwh06S1eyKWroM8mnwUkXtTjYJHco2dXrQQ Eaa4YdbU0oieUps0tHgCfuDDXwnDsifbvMNjzMdkTzoElPXvwq6gn6giI2VAQ2D+FW7y oukCGlclGIuppTeBY0qc+UQ/0mLPMhK7zt8G385ISGpjlkp168BW+wD5oFXRwqF6oh13 bL/bVr0RQ0muVUutVchJr0ojkqKua5eqh6jcFFtdQNwf+HLCGHdIR5vYbaG32IJ4Uenk /WyZaIe+g6kHSOS//EFmN1KcDgVV0WLJFfKef8D580ZyVBmZ/mZ8yMlhpaC7W9ZAzqjv ebkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=u99JYcqlSUjeBEQHfAIE2uYH++U+WvJm79AR4j4r5lw=; b=bzAjfRs2UJ5MspgVHjHMmlDs5zdZVwimwMj6FIraUgRLUamhF0XFQoH7QQnpwgipjq gTkK2Lq6brDRjPZuYnjZjkQYmYzxTGLcl1V2su/KGusBLZhk19RNUJnUebGtM+Untn/O kV9p9C4lUt59fxqkk9zayFdj/KNGzelFJuxWXRp20JY6nPd/MYY85g0RENT9eKiII62D NQH/se9vo9U6anSHY1ey5zSUjo+wkUagnFTwp0QIcu6hrtbG/V2VdJeNhOFd+iD87gdi U8BMvTHwXs7KEk4oVD9iumaOgwGymniTDBMYVMDTwn3ManIDj9H8TH5FTg1TAACL71pn rfHg== X-Gm-Message-State: AODbwcAJM7H+DkE1JRZnN49RbEgQ4TYpfE5trjlv2D6jxf/eBVEkRp2U qGJOjqclbwWmJnMi X-Received: by 10.25.215.74 with SMTP id o71mr5029839lfg.67.1496585365457; Sun, 04 Jun 2017 07:09:25 -0700 (PDT) Received: from localhost.localdomain ([212.164.89.30]) by smtp.gmail.com with ESMTPSA id g142sm6328172lfe.14.2017.06.04.07.09.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 04 Jun 2017 07:09:24 -0700 (PDT) From: Ilia Valiakhmetov To: ffmpeg-devel@ffmpeg.org Date: Sun, 4 Jun 2017 21:09:07 +0700 Message-Id: <20170604140907.4692-1-zakne0ne@gmail.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: References: Subject: [FFmpeg-devel] [PATCH] libavcodec/vp9 ipred_dl_32x32_16 avx2 version X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ilia Valiakhmetov MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx --- libavcodec/x86/vp9dsp_init_16bpp.c | 4 +- libavcodec/x86/vp9intrapred_16bpp.asm | 75 +++++++++++++++++++++++++++-------- 2 files changed, 60 insertions(+), 19 deletions(-) diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c b/libavcodec/x86/vp9dsp_init_16bpp.c index 4e1f24f..d1b8fcd 100644 --- a/libavcodec/x86/vp9dsp_init_16bpp.c +++ b/libavcodec/x86/vp9dsp_init_16bpp.c @@ -52,7 +52,7 @@ decl_ipred_fns(dc, 16, mmxext, sse2); decl_ipred_fns(dc_top, 16, mmxext, sse2); decl_ipred_fns(dc_left, 16, mmxext, sse2); decl_ipred_fn(dl, 16, 16, avx2); -decl_ipred_fn(dl, 32, 32, avx2); +decl_ipred_fn(dl, 32, 16, avx2); #define decl_ipred_dir_funcs(type) \ decl_ipred_fns(type, 16, sse2, sse2); \ @@ -136,7 +136,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp) init_fpel_func(1, 1, 64, avg, _16, avx2); init_fpel_func(0, 1, 128, avg, _16, avx2); init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2); - init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 32, avx2); + init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2); } #endif /* HAVE_YASM */ diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm b/libavcodec/x86/vp9intrapred_16bpp.asm index 2ec5381..10a0994 100644 --- a/libavcodec/x86/vp9intrapred_16bpp.asm +++ b/libavcodec/x86/vp9intrapred_16bpp.asm @@ -861,6 +861,7 @@ cglobal vp9_ipred_dl_16x16_16, 2, 4, 5, dst, stride, l, a DEFINE_ARGS dst, stride, stride3, cnt mov cntd, 2 lea stride3q, [strideq*3] + .loop: mova [dstq+strideq*0], m0 vpalignr m3, m2, m0, 2 @@ -887,24 +888,64 @@ cglobal vp9_ipred_dl_16x16_16, 2, 4, 5, dst, stride, l, a cglobal vp9_ipred_dl_32x32_16, 2, 6, 7, dst, stride, l, a movifnidn aq, amp - mova m0, [aq+mmsize*0] ; abcdefghijklmnop - mova m1, [aq+mmsize*1] ; qrstuvwxyz012345 - vpbroadcastw xm4, [aq+mmsize*1+30] ; 55555555 - vpalignr m2, m1, m0, 2 ; bcdefghijklmnopq - vpalignr m3, m1, m0, 4 ; cdefghijklmnopqr - vperm2i128 m5, m1, m4, q0201 ; yz01234555555555 - LOWPASS 0, 2, 3 ; BCDEFGHIJKLMNOPQ - vpalignr m2, m5, m1, 2 ; rstuvwxyz0123455 - vpalignr m3, m5, m1, 4 ; stuvwxyz01234555 - LOWPASS 1, 2, 3 ; RSTUVWXYZ......5 - vperm2i128 m2, m1, m4, q0201 ; Z......555555555 + mova m0, [aq+mmsize*0+ 0] ; abcdefghijklmnop + mova m1, [aq+mmsize*1+ 0] ; qrstuvwxyz012345 + vpbroadcastw xm4, [aq+mmsize*1+30] ; 55555555 + vperm2i128 m5, m0, m1, q0201 ; ijklmnopqrstuvwx + vpalignr m2, m5, m0, 2 ; bcdefghijklmnopq + vpalignr m3, m5, m0, 4 ; cdefghijklmnopqr + LOWPASS 0, 2, 3 ; BCDEFGHIJKLMNOPQ + vperm2i128 m5, m1, m4, q0201 ; yz01234555555555 + vpalignr m2, m5, m1, 2 ; rstuvwxyz0123455 + vpalignr m3, m5, m1, 4 ; stuvwxyz01234555 + LOWPASS 1, 2, 3 ; RSTUVWXYZ......5 + vperm2i128 m2, m1, m4, q0201 ; Z......555555555 + vperm2i128 m5, m0, m1, q0201 ; JKLMNOPQRSTUVWXY + DEFINE_ARGS dst, stride, stride3, stride5, cnt + lea stride3q, [strideq*3] + lea stride5q, [strideq*5] + mov cntd, 4 - mova [dstq+strideq*0+0 ], m0 - mova [dstq+strideq*0+32], m1 - vpalignr m3, m1, m0, 2 - vpalignr m4, m2, m1, 2 - mova [dstq+strideq*1+0 ], m3 - mova [dstq+strideq*1+32], m4 +.loop: + mova [dstq+strideq*0 + 0], m0 + mova [dstq+strideq*0 +32], m1 + vpalignr m3, m5, m0, 2 + vpalignr m4, m2, m1, 2 + mova [dstq+strideq*1 + 0], m3 + mova [dstq+strideq*1 +32], m4 + vpalignr m3, m5, m0, 4 + vpalignr m4, m2, m1, 4 + mova [dstq+strideq*2 + 0], m3 + mova [dstq+strideq*2 +32], m4 + vpalignr m3, m5, m0, 6 + vpalignr m4, m2, m1, 6 + mova [dstq+stride3q*1+ 0], m3 + mova [dstq+stride3q*1+32], m4 + vpalignr m3, m5, m0, 8 + vpalignr m4, m2, m1, 8 + mova [dstq+strideq*4 + 0], m3 + mova [dstq+strideq*4 +32], m4 + vpalignr m3, m5, m0, 10 + vpalignr m4, m2, m1, 10 + mova [dstq+stride5q*1+ 0], m3 + mova [dstq+stride5q*1+32], m4 + vpalignr m3, m5, m0, 12 + vpalignr m4, m2, m1, 12 + mova [dstq+stride3q*2+ 0], m3 + mova [dstq+stride3q*2+32], m4 + vpalignr m3, m5, m0, 14 + vpalignr m4, m2, m1, 14 + mova [dstq+stride3q*2+64], m3 + mova [dstq+stride3q*2+96], m4 + vpalignr m3, m5, m0, 16 + vpalignr m4, m2, m1, 16 + vperm2i128 m5, m3, m4, q0201 + vperm2i128 m2, m4, m4, q0101 + mova m0, m3 + mova m1, m4 + lea dstq, [dstq+strideq*8] + dec cntd + jg .loop RET %endif