From patchwork Fri Sep 6 15:30:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 14959 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 561D4449BBE for ; Fri, 6 Sep 2019 18:45:45 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A84AC687F8E; Fri, 6 Sep 2019 18:30:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7E380680373 for ; Fri, 6 Sep 2019 18:30:42 +0300 (EEST) Received: by mail-qk1-f196.google.com with SMTP id m2so5956704qkd.10 for ; Fri, 06 Sep 2019 08:30:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=4yV3zjLLlFPY4IMtmvb5CxdA8c8fJRC76P5FAUTyH0g=; b=GtV/uzBey3FMYar+I+/AcKtQewAwNkPlEa3zLHxveMe5CmvQEA+qG1HaUe6qCxwaTZ GJqUDgb2AVbRr3NhlQwhEZAhHUOiemcC9nSp3PfZF9phARplo0PFhap/p3CBZMDsQPy+ /Pw9xLA2z5Q+TmSoHC+7lDzLJfAzaObWqKvqQSs0LCcxlcghFWgXbfwJsj0K//0A0O2r Srj7hBwn5znSUWwhQDqUcY0poRZAkE26SOHHpbaaHoWCiQOn8D3CfvrX3IYUvhe+Swmj tDpPE0w9Lnc7SfavDAvhQMFigl3yA1kmtKRSLmvih4IUy/D8njuuPCUt0/mpUm5A7ETx y0Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4yV3zjLLlFPY4IMtmvb5CxdA8c8fJRC76P5FAUTyH0g=; b=fq1u66MLlSa6fj9MBdsqLgvx1ASHmc6j4ukvCulV431bZX4vkqZgrlDKLSWAgymNsQ zMVWu4aaRFecJBK7UIxxiJH2yclP/R3nSxWJcuKnOlcDs6SWE30v7bdiykFiRWDa0QJC z3WXUzDf1qNQ9TjApG3Jfl11QtZrf8tg5y0bUCPxroc4mWjazPLjuc4zNfoEnoQHE0pI rcHqL3nMHRNpFat5nxBFiR3CoAl39CGj8NhLQKpNiwCJRPF60HbffpYwhzDcn7j/bNBX JzA4GHQC9A4iMJdUzOdFJjOqh7zNx+F5zf5LwTDNJfacX/X4jjy0TzRkbcISfoPEroT2 1v7A== X-Gm-Message-State: APjAAAW2DHMcCLEcPm6ZVguZtjvryS1sCOuaWODKK0AtMPCLdhsDwReM z3VDtZksS14I7q5K7AgfcmM0YcKx X-Google-Smtp-Source: APXvYqyPcasog1m3vHEffdFPPuD8i03vOnOvIhQuo8eZ+8vak1pciLkaIw7VkozAkFdNImIecdyTaQ== X-Received: by 2002:a37:e40f:: with SMTP id y15mr9015829qkf.233.1567783841289; Fri, 06 Sep 2019 08:30:41 -0700 (PDT) Received: from localhost.localdomain ([181.23.90.48]) by smtp.gmail.com with ESMTPSA id a11sm2966012qkc.123.2019.09.06.08.30.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Sep 2019 08:30:40 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Fri, 6 Sep 2019 12:30:03 -0300 Message-Id: <20190906153003.1093-2-jamrial@gmail.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190906153003.1093-1-jamrial@gmail.com> References: <20190906153003.1093-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] x86/vf_v360: use a faster horizontal add in remap4_8bit_line_avx2 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: James Almer --- libavfilter/x86/vf_v360.asm | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/libavfilter/x86/vf_v360.asm b/libavfilter/x86/vf_v360.asm index f49702b603..a0936eb6dc 100644 --- a/libavfilter/x86/vf_v360.asm +++ b/libavfilter/x86/vf_v360.asm @@ -130,14 +130,11 @@ cglobal remap4_8bit_line, 7, 9, 11, dst, width, src, in_linesize, u, v, ker, x, pmulld m4, m5 paddd m2, m4 - vextracti128 xm1, m2, 1 - paddd m1, m2 - phaddd m1, m1 - phaddd m1, m1 - psrld m1, m1, 0xe - packuswb m1, m1 + HADDD m2, m1 + psrld m2, m2, 0xe + packuswb m2, m2 - pextrb [dstq+xq], xm1, 0 + pextrb [dstq+xq], xm2, 0 add xq, 1 add yq, 32