From patchwork Sun Jun 9 11:00:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 13474 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 95A43447159 for ; Sun, 9 Jun 2019 14:15:55 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7F33668AAAD; Sun, 9 Jun 2019 14:15:55 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 51C0F68AA53 for ; Sun, 9 Jun 2019 14:15:49 +0300 (EEST) Received: by mail-wr1-f66.google.com with SMTP id v14so6318792wrr.4 for ; Sun, 09 Jun 2019 04:15:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZkMY8bBNimQWOlgGYHnpKElq49Wxu9b5Z97bmmQh5u0=; b=jSw4nFDvT1Ix050g7cP7I1DC8w5WEt2ddSuNGtjA2HUxrfsaAkb/GdWetN1YAR/MEP vEphuzM864LoW7fqTHSE/1b3pur3gqskCE9XJKveZEJu6rLo8QwHuzqc6NM9lz6nB36l UxaAowERR7rywTAhBWsU3aOrNiapqFlRKgTRNiGXaxvmZisD40v6Yy+fMnFWcW1CAviV mj1DJymMbHSiURsTj1kmNUwwafEyb5FuRGSm5ovQe2k/yIu85dgbPFPzTsxYofv0v8e1 wn6hCJl9ydQQxK/g8KUCwBjbRixUsEJt9V/WLhJePkHsvym/usVBd+n/PQMhnawZnpQ4 9zWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZkMY8bBNimQWOlgGYHnpKElq49Wxu9b5Z97bmmQh5u0=; b=CmDUu70H4ZWjRwTuZ8Hm9Y7cw/EniZLbZ+Io63cdfIVfauDfcF+qCHlT4+eO+nmena qq1maHy5ar22CWgxSYugTQMrW6erx7Y+jiNvcV89N7kaNXKXvbg7kYQnBsUrvX/n3jvg JM7PqX+AiXDQkHpq+dkBJeItuV6sADiyuj6pSG4rP6+BO+h6g+lupCi5Oka0hlZZImdU nV+igKMtQ5euesCePV2t1shWDagp3KdWsODfa3OI4ZKXDFQSzhDAJX7Pzjtm3rhpe/nG xsqH5PAl6xUVYCfUamNjfWr/qhxDVyR3SNMpMhF3mCpwnhxOwF1l5YJ/Pzu5s8e6y9FO hfiA== X-Gm-Message-State: APjAAAXkBJnTchoR5qUFiy7ViOdcT3aqyc3Dlt9XNsD+QtOv9EuVZVks p+t2J7JVkFbgfOspGFN0GQ3zacDn X-Google-Smtp-Source: APXvYqxer9Jf828xhWHPajlfQliaxtFpYcE2u0Nyqq3VRxUDvcI/lJrvRq5EKCzAsozL3imVtm4lmA== X-Received: by 2002:adf:fbc2:: with SMTP id d2mr14905479wrs.334.1560078577021; Sun, 09 Jun 2019 04:09:37 -0700 (PDT) Received: from localhost.localdomain (ipbcc063db.dynamic.kabel-deutschland.de. [188.192.99.219]) by smtp.gmail.com with ESMTPSA id e7sm6055079wmd.0.2019.06.09.04.09.36 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 09 Jun 2019 04:09:36 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 9 Jun 2019 13:00:50 +0200 Message-Id: <20190609110053.4012-3-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190609110053.4012-1-andreas.rheinhardt@gmail.com> References: <20190604111632.GZ3118@michaelspb> <20190609110053.4012-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] startcode: Switch to aligned reads X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" ff_startcode_find_candidate_c already checked multiple bytes for zeros at once if HAVE_FAST_UNALIGNED is true; up until now the other case checked all bytes one by one. This has been modified: A few bytes are checked until alignment is reached from which point on several bytes can be checked at once via aligned reads. This might cause a slight performance degradation if HAVE_FAST_UNALIGNED is true, but this is only temporarily as this patch is preparatory for further patches where benchmarks have shown aligned accesses to be faster. On an x64 Haswell this led to a performance degradation of ca. 3% (from 411578 decicycles to 424503 decicycles based upon 10 iteration with 8192 runs each) when reading a 30.2 Mb/s H.264 stream from a transport stream; for another file it were 4.9% (from 55476 to 58326 decicycles based on 10 iterations with 131072 runs each). Signed-off-by: Andreas Rheinhardt --- libavcodec/startcode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c index a55a8fafa6..373572365b 100644 --- a/libavcodec/startcode.c +++ b/libavcodec/startcode.c @@ -33,8 +33,13 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) { const uint8_t *start = buf, *end = buf + size; -#if HAVE_FAST_UNALIGNED -#define READ(bitness) AV_RN ## bitness +#define INITIALIZATION(mod) do { \ + for (; buf < end && (uintptr_t)buf % mod; buf++) \ + if (!*buf) \ + return buf - start; \ + } while (0) + +#define READ(bitness) AV_RN ## bitness ## A #define MAIN_LOOP(bitness, mask1, mask2) do { \ /* we check p < end instead of p + 3 / 7 because it is * simpler and there must be AV_INPUT_BUFFER_PADDING_SIZE @@ -46,10 +51,11 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) } while (0) #if HAVE_FAST_64BIT + INITIALIZATION(8); MAIN_LOOP(64, 0x0101010101010101ULL, 0x8080808080808080ULL); #else + INITIALIZATION(4); MAIN_LOOP(32, 0x01010101U, 0x80808080U); -#endif #endif for (; buf < end; buf++) if (!*buf)