From patchwork Sun Jun 4 17:52:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilia X-Patchwork-Id: 3837 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp1036667vsk; Sun, 4 Jun 2017 10:52:55 -0700 (PDT) X-Received: by 10.28.52.139 with SMTP id b133mr5613475wma.119.1496598775651; Sun, 04 Jun 2017 10:52:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1496598775; cv=none; d=google.com; s=arc-20160816; b=wX4/EkMe5FiG7n5dQR04bbesGmRI6Rj0Sdn5pjIk0Q7BvUqsad8jPox5l348YU/ZX3 /mTKAFC2cCtSRZwgzs/QlGATN86qIimaCjbJk3aBNdZ67XYpDIiNUBD3CzVEd7u0CJv/ 1T5xrLGBrgHN+bRLroa23USWJTEXV/mWsMTQ5giN1OL+qFB9DQrS01YfCgVa+9khAYvM sLD8tU9SA6i+PeZ8P/Npwi+2ek/bkNSARvEqSRg+HiZNUpHn6NVO2UfhcLKjxTxc+12F ZLXfHlO2fTD/1GpGFKQN22+S9LRDjsKbav5mX6rAGxw3smdqVbP4g1B30QzoKoc94tkE AMgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=3b7RUQ8u9JyGqlCWuDc0j6baGCrMMJmzKpj6pQ6FiS0=; b=mwLLzT52rrZZwZcbeud4B+2cBaawlmJ8rkGl1nhfa6IDl1cJPurcaIpuI49Phf7lKj q4W3DDSorQqxNz3Owtce5OEFYHS6DRjSOcqlv9zv7PJoXJqtrvpMEkDz1zd9arWrC5YG eK1yVaKsFADDAXtiF7ooWtZUYgYiGv49vznwovEp2kzNwB1btUP3loIV9mdMr/UEpnyQ 45hhsYspGuJ03iT3G6jxEHb1fY+c5fc/9ga5MQfRPhY7XdRb0QS0nYiRuwjvgcKNSJkb CATbuFjDTgSjV7OJ8awpt+bh5Gp6Z4y2BP2upA5uu5+IkbbSRJD4uoJNfJpWe1FVv+ey D3lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b187si7972576wmd.25.2017.06.04.10.52.55; Sun, 04 Jun 2017 10:52:55 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BAB5F689DB1; Sun, 4 Jun 2017 20:52:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f68.google.com (mail-lf0-f68.google.com [209.85.215.68]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0965B688311 for ; Sun, 4 Jun 2017 20:52:39 +0300 (EEST) Received: by mail-lf0-f68.google.com with SMTP id u62so7836597lfg.0 for ; Sun, 04 Jun 2017 10:52:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=s8G9IsgBLoEavn4DetRNjBoQU3sa+Rkhwq4DaE3ykZU=; b=WwcuJWw0gfnl3Lytt6+pGof9AA4C9DQcuTO76lo9aIVFso8gm06hUuXSC7YQ9eXoUI T18MzZXGp065s5/KQdvJK7eGo/YYYL4GJT8ilQOXIoy7SCnXERqPt2av2GBvgnhBj/fI jfPoAKeI7D2BfWdD4N/5n15HYxvXKgLOaQUpdpLs7/qHTyv9PNlxhMbn2tfrumUH0+4Q jW3iWH8bNO+YXYOdmUUoywKMVpxpJWnqo2dBKJ+CMe/ayo1xPzoeEy9emSCxXZGqs0xG OhjvEg2vSvYzWOHsXssQkgeuymHToY6cJagd+8KVuqSq9MYc/oKlY6BombOZqq3HBRy1 hHKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=s8G9IsgBLoEavn4DetRNjBoQU3sa+Rkhwq4DaE3ykZU=; b=tvg9mlUUmYo2sgCe8ncZ7zevTyKUlGIEsHjSOQQr4K38Qxn355Xt4NF5mjeLu7VDXf Hw7HP5xZ8TuwZlMwCNDj9ZmDjopI46aVHZNtKDKKsdPNevacxliilpA0VivggtRi+RHw AQw2QN5/BzuNfck5TABa8e6al/XotyIF6TDRjk3TNAYboJhRDUx9rOG98//kQEw3PSVc zDBjwTHertDPht49IXTWrBgkSLvnenHtqbnz40YC3tGRjHlIMlX7X2c9aF5f4xLGgZNt LcmTQ7qMKFdgHGVv1L3Q6YAcnqWKCwqxvPOf8S4YE3tkeOBJd5M2KIPyLhhRxMSCbiYr 8TVg== X-Gm-Message-State: AODbwcCR3Pv+CWIW9TPFiXaL9AwytBbQAnbcDqgPoiBnWxd8wXSVLM7p 7/cjGUlkPsEuVTfX X-Received: by 10.25.202.93 with SMTP id h29mr4877779lfj.139.1496598764775; Sun, 04 Jun 2017 10:52:44 -0700 (PDT) Received: from localhost.localdomain ([212.164.89.30]) by smtp.gmail.com with ESMTPSA id o138sm6413066lfo.55.2017.06.04.10.52.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 04 Jun 2017 10:52:44 -0700 (PDT) From: Ilia Valiakhmetov To: ffmpeg-devel@ffmpeg.org Date: Mon, 5 Jun 2017 00:52:27 +0700 Message-Id: <20170604175227.3296-1-zakne0ne@gmail.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: References: Subject: [FFmpeg-devel] [PATCH] libavcodec/vp9: ipred_dl_32x32_16 avx2 implementation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Ilia Valiakhmetov MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx implementation --- libavcodec/x86/vp9dsp_init_16bpp.c | 2 ++ libavcodec/x86/vp9intrapred_16bpp.asm | 63 +++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c b/libavcodec/x86/vp9dsp_init_16bpp.c index 4576ff1..d1b8fcd 100644 --- a/libavcodec/x86/vp9dsp_init_16bpp.c +++ b/libavcodec/x86/vp9dsp_init_16bpp.c @@ -52,6 +52,7 @@ decl_ipred_fns(dc, 16, mmxext, sse2); decl_ipred_fns(dc_top, 16, mmxext, sse2); decl_ipred_fns(dc_left, 16, mmxext, sse2); decl_ipred_fn(dl, 16, 16, avx2); +decl_ipred_fn(dl, 32, 16, avx2); #define decl_ipred_dir_funcs(type) \ decl_ipred_fns(type, 16, sse2, sse2); \ @@ -135,6 +136,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp) init_fpel_func(1, 1, 64, avg, _16, avx2); init_fpel_func(0, 1, 128, avg, _16, avx2); init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2); + init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2); } #endif /* HAVE_YASM */ diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm b/libavcodec/x86/vp9intrapred_16bpp.asm index 212e413..5cd6a3e 100644 --- a/libavcodec/x86/vp9intrapred_16bpp.asm +++ b/libavcodec/x86/vp9intrapred_16bpp.asm @@ -861,6 +861,7 @@ cglobal vp9_ipred_dl_16x16_16, 2, 4, 5, dst, stride, l, a DEFINE_ARGS dst, stride, stride3, cnt mov cntd, 2 lea stride3q, [strideq*3] + .loop: mova [dstq+strideq*0], m0 vpalignr m3, m2, m0, 2 @@ -884,6 +885,68 @@ cglobal vp9_ipred_dl_16x16_16, 2, 4, 5, dst, stride, l, a dec cntd jg .loop RET + +cglobal vp9_ipred_dl_32x32_16, 2, 6, 7, dst, stride, l, a + movifnidn aq, amp + mova m0, [aq+mmsize*0+ 0] ; abcdefghijklmnop + mova m1, [aq+mmsize*1+ 0] ; qrstuvwxyz012345 + vpbroadcastw xm4, [aq+mmsize*1+30] ; 55555555 + vperm2i128 m5, m0, m1, q0201 ; ijklmnopqrstuvwx + vpalignr m2, m5, m0, 2 ; bcdefghijklmnopq + vpalignr m3, m5, m0, 4 ; cdefghijklmnopqr + LOWPASS 0, 2, 3 ; BCDEFGHIJKLMNOPQ + vperm2i128 m5, m1, m4, q0201 ; yz01234555555555 + vpalignr m2, m5, m1, 2 ; rstuvwxyz0123455 + vpalignr m3, m5, m1, 4 ; stuvwxyz01234555 + LOWPASS 1, 2, 3 ; RSTUVWXYZ......5 + vperm2i128 m2, m1, m4, q0201 ; Z......555555555 + vperm2i128 m5, m0, m1, q0201 ; JKLMNOPQRSTUVWXY + DEFINE_ARGS dst, stride, stride3, cnt + lea stride3q, [strideq*3] + mov cntd, 4 + +.loop: + mova [dstq+strideq*0 + 0], m0 + mova [dstq+strideq*0 +32], m1 + vpalignr m3, m5, m0, 2 + vpalignr m4, m2, m1, 2 + mova [dstq+strideq*1 + 0], m3 + mova [dstq+strideq*1 +32], m4 + vpalignr m3, m5, m0, 4 + vpalignr m4, m2, m1, 4 + mova [dstq+strideq*2 + 0], m3 + mova [dstq+strideq*2 +32], m4 + vpalignr m3, m5, m0, 6 + vpalignr m4, m2, m1, 6 + mova [dstq+stride3q*1+ 0], m3 + mova [dstq+stride3q*1+32], m4 + lea dstq, [dstq+strideq*4] + vpalignr m3, m5, m0, 8 + vpalignr m4, m2, m1, 8 + mova [dstq+strideq*0 + 0], m3 + mova [dstq+strideq*0 +32], m4 + vpalignr m3, m5, m0, 10 + vpalignr m4, m2, m1, 10 + mova [dstq+strideq*1 + 0], m3 + mova [dstq+strideq*1 +32], m4 + vpalignr m3, m5, m0, 12 + vpalignr m4, m2, m1, 12 + mova [dstq+strideq*2+ 0], m3 + mova [dstq+strideq*2+32], m4 + vpalignr m3, m5, m0, 14 + vpalignr m4, m2, m1, 14 + mova [dstq+stride3q+ 0], m3 + mova [dstq+stride3q+ 32], m4 + vpalignr m3, m5, m0, 16 + vpalignr m4, m2, m1, 16 + vperm2i128 m5, m3, m4, q0201 + vperm2i128 m2, m4, m4, q0101 + mova m0, m3 + mova m1, m4 + lea dstq, [dstq+strideq*4] + dec cntd + jg .loop + RET %endif %macro DR_FUNCS 1 ; stack_mem_for_32x32_32bit_function