From patchwork Tue Jun 27 15:45:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 4136 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.1.76 with SMTP id 73csp2182959vsb; Tue, 27 Jun 2017 08:47:35 -0700 (PDT) X-Received: by 10.223.152.240 with SMTP id w103mr7584586wrb.64.1498578455153; Tue, 27 Jun 2017 08:47:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1498578455; cv=none; d=google.com; s=arc-20160816; b=A1W9JwsZpub20hKX1xHXVbzgDqbqaaGUhFYbyblu6H/YoXlr5O0y5IvpnKT5Nkie5S jYyPf0f10pPljI2BSC02gcN5hShZHsXKprM8GvcGXNuyQl95NnqX1f51i8CobXgH9DbP exvjiQGuLr6xwf9UtkKm+e9J2pUesc+EbTeLqJHzXHDkmME0IncRag9NczjDZX2xFg+i Pl3z9b2P2yR+Os4i1HJteooCSARij09cCrOZCBebtmuaBLITz/wTBvn8s/0mWSGAR2c1 c7fwfvS0HIDQkb7Vn3opEgMxKoae+StdVwVWW8s/tAl9t8T2FYZdudAAKD61r6eBRhdL sbHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=KLRCntdGRm23DmbV4URXGQs6s5K0JMYetO1cVodw3UU=; b=Nx/Tj5ZAJfUU6HDO+VjIUOqpkNsFYhLbmxZhN1BckWRwWNXBekOYljJksXRNjYlVbs GEsW+FzfE/qFgHywldna5O/Kdr/I2t9pbGQvS6TYEuIAMD1nY16tMwBugyTlQkeF7HLP b8TK3/kpcBgU7/r0UHSVKPs0B+jbuV3r5LYq7Iwtq2KZf4VwpjcksjJ1qo8RZNu1FYdG 8DvOwCHv+cu3IIKn80qu50So9NV7+4bN0nX/3Gz2S2DTSCpAO0KkMVNtKm0DraBzgi7+ qAzmqxrnvSTLeE9twKvwkyr6F1I6kDzdceKlSc7ZK41Q7xE80eN4Sx3kaa4LPlT8D8d/ LqKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=cDZkQ2YJ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e2si3099835wmi.174.2017.06.27.08.47.34; Tue, 27 Jun 2017 08:47:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.b=cDZkQ2YJ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C9E7A68A491; Tue, 27 Jun 2017 18:47:23 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt0-f193.google.com (mail-qt0-f193.google.com [209.85.216.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D4B5D68A43D for ; Tue, 27 Jun 2017 18:47:16 +0300 (EEST) Received: by mail-qt0-f193.google.com with SMTP id m54so1730670qtb.1 for ; Tue, 27 Jun 2017 08:47:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=TanciCUrW8umZWwEaggMzcGYu9DK8m7l0n7PvLPfoAg=; b=cDZkQ2YJTRNxUHaMmTvwLiRggQYKprQfUnYgnpbGKCFbgLgm6C2zzS8e8KpVoxZM+5 DgCF4pCJpaDuIk6r5g8d7FYRQvKhiP1EYoU466vzmVpn7ujpK7LSDWJHaApz5JTk4k1R XkAbPiKjRnWp0MvzHk8zt5RVqWN51OAdd7O5lEGry4BuDs7+3euxj5UA7Jo7wBJDb1U9 XZ7Er6+VxPE97feRGxqn9uRwMk32rGHLY/5P36kMLQi2iiG6zxZ4o/BioQ9MYGUHWNlB tgC1ZiUEGPsKFdPv1bHZOMfOv9k4rVpSYVomIixppa+gJBksI91uVbWwqg82Xt/keWgF +BcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=TanciCUrW8umZWwEaggMzcGYu9DK8m7l0n7PvLPfoAg=; b=nhkpKZj2OehOusKJBDF/y/DyY0KgdrnDHGNNhB1+IsF8q4kGBSyd8jI5LkaDmoGQY/ 6vf5Xg09u0Rcf3M+4UEOEy5JZj0urOjzbU9a99t0OnbHutqEPLO9tNngnG4ufX0P0Rae vvyhFd/b++Y50ag5IFZvNnonjLD/SQIEW2mjzx+uiI6sH0CjiiBt29jGGvAAS8m7hbmu luxSlltIvoxuQsyNop3cYrH1eRb4RmViN3E1gq4Zf0MFfgRu0Naga/PJgcrmJIPLTFCy kZ9vrf7LHvg7fIejQb18taq3UgCoJGK0Mo4iAyRWlqG9mAl5GrQBjElb2rejSB27jotk JYfA== X-Gm-Message-State: AKS2vOwCGHeDL3nXi+SM/Gyb+B8560VoaG5ALOzUpiq93vJN3kFV4JK6 t+8dgf27aBZ29rba X-Received: by 10.237.58.35 with SMTP id n32mr7534268qte.109.1498578437568; Tue, 27 Jun 2017 08:47:17 -0700 (PDT) Received: from localhost.localdomain ([181.231.116.134]) by smtp.gmail.com with ESMTPSA id x57sm2607356qtb.50.2017.06.27.08.47.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 27 Jun 2017 08:47:17 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Tue, 27 Jun 2017 12:45:53 -0300 Message-Id: <20170627154553.4800-2-jamrial@gmail.com> X-Mailer: git-send-email 2.13.0 In-Reply-To: <20170627154553.4800-1-jamrial@gmail.com> References: <20170627154553.4800-1-jamrial@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/2] x86/vf_blend: optimize difference and negation functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Process more pixels per loop. Signed-off-by: James Almer --- libavfilter/x86/vf_blend.asm | 40 ++++++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm index 25f6f5affc..d5e512e6e0 100644 --- a/libavfilter/x86/vf_blend.asm +++ b/libavfilter/x86/vf_blend.asm @@ -268,21 +268,25 @@ BLEND_INIT phoenix, 4 BLEND_END %macro BLEND_ABS 0 -BLEND_INIT difference, 3 +BLEND_INIT difference, 5 pxor m2, m2 .nextrow: mov xq, widthq .loop: - movh m0, [topq + xq] - movh m1, [bottomq + xq] + movu m0, [topq + xq] + movu m1, [bottomq + xq] + punpckhbw m3, m0, m2 punpcklbw m0, m2 + punpckhbw m4, m1, m2 punpcklbw m1, m2 psubw m0, m1 + psubw m3, m4 ABS1 m0, m1 - packuswb m0, m0 - movh [dstq + xq], m0 - add xq, mmsize / 2 + ABS1 m3, m4 + packuswb m0, m3 + mova [dstq + xq], m0 + add xq, mmsize jl .loop BLEND_END @@ -311,26 +315,30 @@ BLEND_INIT extremity, 8 jl .loop BLEND_END -BLEND_INIT negation, 5 +BLEND_INIT negation, 8 pxor m2, m2 mova m4, [pw_255] .nextrow: mov xq, widthq .loop: - movh m0, [topq + xq] - movh m1, [bottomq + xq] + movu m0, [topq + xq] + movu m1, [bottomq + xq] + punpckhbw m5, m0, m2 punpcklbw m0, m2 + punpckhbw m6, m1, m2 punpcklbw m1, m2 - mova m3, m4 - psubw m3, m0 + psubw m3, m4, m0 + psubw m7, m4, m5 psubw m3, m1 + psubw m7, m6 ABS1 m3, m1 - mova m0, m4 - psubw m0, m3 - packuswb m0, m0 - movh [dstq + xq], m0 - add xq, mmsize / 2 + ABS1 m7, m1 + psubw m0, m4, m3 + psubw m1, m4, m7 + packuswb m0, m1 + mova [dstq + xq], m0 + add xq, mmsize jl .loop BLEND_END %endmacro