From patchwork Mon Sep 18 01:52:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 5176 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.36.26 with SMTP id f26csp2985051jaa; Sun, 17 Sep 2017 18:52:40 -0700 (PDT) X-Received: by 10.28.23.76 with SMTP id 73mr8582962wmx.70.1505699560809; Sun, 17 Sep 2017 18:52:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505699560; cv=none; d=google.com; s=arc-20160816; b=YqVbQIRjvz9/iTByNd/Ma59H20A5uuMoBNqR0fu78H81sfm3dX2yMeu24f+GCPjYLQ zB56BttJJoOflZZZkQapEs/UWXo87C5pN5ptivmEvV8IucIIH21pfVzrFmu/NJSV/c2F iC90sc7Igo1dHvsPZ5og7/ibjx5I8180BO3jvweVtCR8qyrQPWyKHNilOouBVRkJTznL lX+tfwpvTHhTaCeje8kUx/NXFMOeHHTvenrwmizWSgQhqqrdkWoOniGhw11FV0r6dW+E 5gvlKAuYjAsEhRNKltoOf5QPelZxjP5s/rdKI7R+huon92Tura5koE/STEULOniLwmlO IgiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to:arc-authentication-results; bh=iupkcFr0QyyXNl+isl3dWIXmL8JMG00TUQwnmUYkKVs=; b=hHrFRAUl2geH/Qx+uCJqNag3W/7D8U7HJ8A5Ii5WpJ22o5g+OvVmfXbjYTpt5qhiXi 0V5ghsmsHqskQWd54clgfvL9Krg2gFQjJUu7bJUjQiFD1/jokfmGwJqgcsv+6/7qt76m zs8i4XlP8mN4geUORI3xDFb0LtcPIMYjjk2utd/g8HNYUuY44beYBSMmGNgPFTcwobt3 N6eP3kVDRMejZH+LNx+5pgac1xMYk5uPoErHlynm/S0k4+uwJuw9HdHfGNl8iIvG9lXY Mbc//L1cO/UiBfyhU1aIYnxHN6cnEEl8W8lrwF/xFOpzFfQlt3WfzUebk1NIGIKRel0P FygQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=rbWUdemp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m129si4948720wma.246.2017.09.17.18.52.40; Sun, 17 Sep 2017 18:52:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=rbWUdemp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 52E35687EE5; Mon, 18 Sep 2017 04:52:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk0-f194.google.com (mail-qk0-f194.google.com [209.85.220.194]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B7B48680ADF for ; Mon, 18 Sep 2017 04:52:25 +0300 (EEST) Received: by mail-qk0-f194.google.com with SMTP id r66so4677308qke.4 for ; Sun, 17 Sep 2017 18:52:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=0mBTuUPG2qjO1pRCAEQ+VHotvVtDwBgyWVCEkZqti70=; b=rbWUdempLM3VNUwpk1pAjU1XXpmpTIhFin0w8j6yX9GmWMGjCUPUHBiMOJnoXKYB72 TPyOhVcw0vCYdEJLNHXcagCuN7cJVbuAI+3h/wKSvsts59T6+BUL7SwNcj4Il2ahfl4Q A4BxgdGVPtc/iYtcVlWisFEverEiWSZ5plDxcYq/G3Q76QIDwu9nVLJiqz9XMBKVPFXQ uPWiowCkNw3OSKbb1mZfgCmWN30lO6YgQ7nDPy4pExBr5tudpLBl58Par1A+u0SnEmpZ ag1v+w0fZMN5Hl1w0AkCo2Jl56qxMdrk/nOMOEhJ0UmCtMwTmlY2GmmT9DTCte9ToJMq CVKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=0mBTuUPG2qjO1pRCAEQ+VHotvVtDwBgyWVCEkZqti70=; b=OYQG5FDClAXRbLM9BzSSqxABRvRWgqXGpAZ3b/5gt112S9HsZRhHTN0vGniEaGf19X t9v7Fzgi7Q+pZoJmtCAKXQjyWVW7SYwKWq5z3IYJKMRIPh8Zf25LvphQtckEsfcAb0Kl L8ceIbe5PFN5v/6ebqK/3ohcxZZ7AyBRK6QJuSPuuYe0dJzHe7a1qdThFwqOHWYd0844 VBxaL4yuFdc0iqtbDwtO0v8Aar69Cq3CoGFt+jhLmAWMG+o+7jIz3P99V0WkJvw5yB75 UTs2BZSz+47RhhER1itiW1r2Lcnq/Mqx8TGQwfDs5F1yfYc9n7kJDUEANzke00VdWt6H 5pVA== X-Gm-Message-State: AHPjjUhaVQd2XGIySIGMWm/rFqccR/m2rP9dF+DqivdR34GY+R/q9zTb gASFeNqD1cOaR3D2 X-Google-Smtp-Source: AOwi7QCYcY5djwqoFDXAIC5usfMvWqNuN+fpDQrUVUxW441Imwh1O9Am+kTs9Z6Ev6A/DVWwii4pLQ== X-Received: by 10.55.217.198 with SMTP id q67mr18790065qkl.153.1505699553037; Sun, 17 Sep 2017 18:52:33 -0700 (PDT) Received: from localhost.localdomain ([181.231.68.242]) by smtp.gmail.com with ESMTPSA id t90sm4441031qkl.77.2017.09.17.18.52.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 17 Sep 2017 18:52:32 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Sun, 17 Sep 2017 22:52:13 -0300 Message-Id: <20170918015213.7236-1-jamrial@gmail.com> X-Mailer: git-send-email 2.14.1 Subject: [FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Henrik Gramner Tested with "checkasm --test=exrdsp -bench" Before: reorder_pixels_c: 5187.8 reorder_pixels_sse2: 377.0 reorder_pixels_avx2: 331.3 After: reorder_pixels_c: 5181.5 reorder_pixels_sse2: 377.0 reorder_pixels_avx2: 313.8 Signed-off-by: James Almer --- libavcodec/x86/exrdsp.asm | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/libavcodec/x86/exrdsp.asm b/libavcodec/x86/exrdsp.asm index b91a7be20d..06c629e59e 100644 --- a/libavcodec/x86/exrdsp.asm +++ b/libavcodec/x86/exrdsp.asm @@ -39,16 +39,15 @@ cglobal reorder_pixels, 3,4,3, dst, src1, size, src2 neg sizeq ; size = offset for dst, src1, src2 .loop: -%if cpuflag(avx2) - vpermq m0, [src1q + sizeq], 0xd8; load first part - vpermq m1, [src2q + sizeq], 0xd8; load second part -%else mova m0, [src1q+sizeq] ; load first part movu m1, [src2q+sizeq] ; load second part -%endif SBUTTERFLY bw, 0, 1, 2 ; interleaved - mova [dstq+2*sizeq ], m0 ; copy to dst - mova [dstq+2*sizeq+mmsize], m1 + mova [dstq+2*sizeq ], xm0 ; copy to dst + mova [dstq+2*sizeq+16], xm1 +%if cpuflag(avx2) + vperm2i128 m0, m0, m1, q0301 + mova [dstq+2*sizeq+32], m0 +%endif add sizeq, mmsize jl .loop RET