From patchwork Mon Jan 9 22:15:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 2160 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp7257858vsb; Mon, 9 Jan 2017 14:22:38 -0800 (PST) X-Received: by 10.194.153.199 with SMTP id vi7mr65831390wjb.198.1484000558631; Mon, 09 Jan 2017 14:22:38 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g10si9353024wrc.135.2017.01.09.14.22.38; Mon, 09 Jan 2017 14:22:38 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6BFA268A277; Tue, 10 Jan 2017 00:22:23 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B8CF268A218 for ; Tue, 10 Jan 2017 00:22:15 +0200 (EET) Received: by mail-lf0-f65.google.com with SMTP id k62so10281195lfg.0 for ; Mon, 09 Jan 2017 14:22:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=AfCWigCSbzMzlrcNC37P5RmXS1Kr8hMKuwjOAn64cw8=; b=m/OtbivogalPohAQrGoisj+k/ZbA/w4BqKNHvCnVTM2537mvOFfKB3HQen5DcSPL1E IkSwTUVsPPW773/ZPIi2bEjiHYyRMYagb5aT9DOZtfxTnbikg1xj7vGfPpC9sI4Yo1e1 E8XOudQAyPd4o5iJzLBw8hJcvsspoZKpipLXL5HR3jFmXbPJNrVzOHR75wew2jGUcdA+ th162sr1Ig2ArBPZsu3ZUAWSZu57CtcWB0UmFp7brMeN0af/ABS8vqpzw6gIa+1STR+9 3fUq43+ZbT4KEQGz+dJO0M45zOXcz3mP6GGS4CsqZJ9bc6Vvsz8cHs8x2Y6MrqMKwSfl TaXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=AfCWigCSbzMzlrcNC37P5RmXS1Kr8hMKuwjOAn64cw8=; b=UKvcOHdW1V7bvYaAkGbMNE1DSjAMAdq2jnLwuYlzlaxftJYXIjTTzPbfZihgDN1Nh0 Q4f30HEbf3i+q1A2i9GtKZ+uJ1wEX5kfyZpufyXwq9L2KclJ62G6XTvA1jkFjfFoVW3d q9cU32FHAklRRWz0SXyJcbpAPDON8/6tgrKTYQSUhoZQlwRRS5rFFSEdyw3fk040+gxR p6lCy+kTodaDorDV8lMH1LsvuTYyjGLyIPxLQgEkKWjLRh6ro7x2J+W92vk5FcStnRSX FAWKKGxpngO9iJF8O1tgSaPnEoU2a+pHdcLeb/c4OKqEnO9Bi9Mm278fDBH+q4r0ABPY 8eYg== X-Gm-Message-State: AIkVDXJXCbJThZX44qzXaCrusN3hiyHRZZ6RjJI1eXmYlXQP0JZQnMN/i2Uhjn/GV6uu4A== X-Received: by 10.25.168.196 with SMTP id r187mr26929038lfe.70.1484000122623; Mon, 09 Jan 2017 14:15:22 -0800 (PST) Received: from localhost.localdomain ([2001:470:28:852:a9ed:5432:636c:1053]) by smtp.gmail.com with ESMTPSA id f25sm1358538lji.26.2017.01.09.14.15.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 09 Jan 2017 14:15:22 -0800 (PST) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 10 Jan 2017 00:15:08 +0200 Message-Id: <1484000119-4959-2-git-send-email-martin@martin.st> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1484000119-4959-1-git-send-email-martin@martin.st> References: <1484000119-4959-1-git-send-email-martin@martin.st> Subject: [FFmpeg-devel] [PATCH 02/13] aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne}; X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Janne Grunau The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent. This is cherrypicked from libav commit e7ae8f7a715843a5089d18e033afb3ee19ab3057. --- libavcodec/aarch64/vp9lpf_neon.S | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/libavcodec/aarch64/vp9lpf_neon.S b/libavcodec/aarch64/vp9lpf_neon.S index 78aae61..55e1964 100644 --- a/libavcodec/aarch64/vp9lpf_neon.S +++ b/libavcodec/aarch64/vp9lpf_neon.S @@ -218,13 +218,15 @@ xtn_sz v5, v6.8h, v7.8h, \sz and v4\sz, v4\sz, v5\sz // fm + // If no pixels need filtering, just exit as soon as possible mov x5, v4.d[0] .ifc \sz, .16b mov x6, v4.d[1] - orr x5, x5, x6 -.endif - // If no pixels need filtering, just exit as soon as possible + adds x5, x5, x6 + b.eq 9f +.else cbz x5, 9f +.endif .if \wd >= 8 movi v0\sz, #1 @@ -344,15 +346,17 @@ bit v22\sz, v0\sz, v5\sz // if (!hev && fm && !flat8in) bit v25\sz, v2\sz, v5\sz + // If no pixels need flat8in, jump to flat8out + // (or to a writeout of the inner 4 pixels, for wd=8) .if \wd >= 8 mov x5, v6.d[0] .ifc \sz, .16b mov x6, v6.d[1] - orr x5, x5, x6 -.endif - // If no pixels need flat8in, jump to flat8out - // (or to a writeout of the inner 4 pixels, for wd=8) + adds x5, x5, x6 + b.eq 6f +.else cbz x5, 6f +.endif // flat8in uaddl_sz \tmp1\().8h, \tmp2\().8h, v20, v21, \sz @@ -406,20 +410,25 @@ mov x5, v2.d[0] .ifc \sz, .16b mov x6, v2.d[1] - orr x5, x5, x6 + adds x5, x5, x6 + b.ne 1f +.else + cbnz x5, 1f .endif // If no pixels needed flat8in nor flat8out, jump to a // writeout of the inner 4 pixels - cbnz x5, 1f br x14 1: + mov x5, v7.d[0] .ifc \sz, .16b mov x6, v7.d[1] - orr x5, x5, x6 + adds x5, x5, x6 + b.ne 1f +.else + cbnz x5, 1f .endif // If no pixels need flat8out, jump to a writeout of the inner 6 pixels - cbnz x5, 1f br x15 1: