From patchwork Sat Jun 1 22:47:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 13373 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 275F1446F95 for ; Sun, 2 Jun 2019 01:48:41 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 12BC668AAD3; Sun, 2 Jun 2019 01:48:41 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CD9ED68A97C for ; Sun, 2 Jun 2019 01:48:34 +0300 (EEST) Received: by mail-wr1-f67.google.com with SMTP id d18so8808804wrs.5 for ; Sat, 01 Jun 2019 15:48:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WqCRjXEncj+EyUXZ+EBPeFnGN9EVOKQ6YMGOdmnhjJA=; b=HyF/qIqFrcGEeiWDtzpmgi6h3kmOeddInhRehSxVBZXzmJSCg5m5fynf2x0JDEpF98 tCjKRnNFqZ5dJfF/5TH0uZpLkdxGVcv1eCAhPLcsKLBhyQacBa9JlAWcu8IOJaefq0Ms Fg33EAOw/ALr86h5v3iVnWPN0FDxVaIyhjxIE6yZfEFBhBsMDG1FQmHnHyRXC+MPeID4 VfEBeLa0XIpfaIHxZgkn+pD1Up87KoL5tTkAOUv5xsuGWdaEtoShfQp3ko71F7nREL/r +5Hz98NXU5zRbUZ4q+WYMU0Wd5Lf5hbjVc++CmWoeeZB/f7TGc1YT9JLc01MV49+mYml ymvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WqCRjXEncj+EyUXZ+EBPeFnGN9EVOKQ6YMGOdmnhjJA=; b=EglLvGYYf01rePl3yycV+TfB+lykw2Jb8LIvdVz/oZwGc4xlvnywXWnK6f0netCpwf XYbYz3EOhsaRfFWrfw9q+2BK/Qw+Z5vYjJddOH0u0U5WC84s27nDMZkW5DMoTfscxd5P UeDOgcBLEPg5rKlkl3Fh1RuYWTquCOiPFsp+OR4khqh6moE10z3u1jDE6J85iL0L21L8 p+47S221t1vy8ZPNabtpO8FdK5UTeun80aL8+yjFdQk2NPs6zE0BQ1HCDZEzsrpJhA2/ A7xu6auNjdfGRONMD5AQkaOrKjxluPoJmuC2j1JwJe0NuCOvjm4kc2YHsI6TqNjaYCHf 7ZKw== X-Gm-Message-State: APjAAAVH/fjXteLZjF9v1cCnqKaE59fHMjn9kj4VtGJ2w9VTdsAl1JxV GL91DE0SamaBzQTDRqCgVYz1OsYa X-Google-Smtp-Source: APXvYqyfM6qQ8claLMkD8eTsgS8s3E0A1VMbn47r2gZs9E/NUHwpLoQ2U2McqKnkcjkUrOXkKHrN5A== X-Received: by 2002:a5d:4e50:: with SMTP id r16mr11129072wrt.197.1559429314239; Sat, 01 Jun 2019 15:48:34 -0700 (PDT) Received: from localhost.localdomain (ipbcc063db.dynamic.kabel-deutschland.de. [188.192.99.219]) by smtp.gmail.com with ESMTPSA id c24sm7591892wmb.21.2019.06.01.15.48.33 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sat, 01 Jun 2019 15:48:33 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Jun 2019 00:47:16 +0200 Message-Id: <20190601224719.32872-3-andreas.rheinhardt@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190601224719.32872-1-andreas.rheinhardt@gmail.com> References: <20190601224719.32872-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] startcode: Switch to aligned reads X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" ff_startcode_find_candidate_c already checked multiple bytes for zeros at once if HAVE_FAST_UNALIGNED is true; up until now the other case checked all bytes one by one. This has been modified: A few bytes are checked until alignment is reached from which point on several bytes can be checked at once via aligned reads. This might cause a slight performance degradation if HAVE_FAST_UNALIGNED is true, but this is only temporarily as this patch is preparatory for further patches where benchmarks have shown aligned accesses to be faster. On an x64 Haswell this led to a performance degradation of ca. 3% (from 411578 decicycles to 424503 decicycles based upon 10 iteration with 8192 runs each) when reading a 30.2 Mb/s H.264 stream from a transport stream; for another file it were 4.9% (from 55476 to 58326 decicycles based on 10 iterations with 131072 runs each). Signed-off-by: Andreas Rheinhardt --- The "further patches where benchmarks have shown aligned accesses to be faster" of course refers to https://github.com/mkver/FFmpeg/commits/start_3 libavcodec/startcode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c index a55a8fafa6..373572365b 100644 --- a/libavcodec/startcode.c +++ b/libavcodec/startcode.c @@ -33,8 +33,13 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) { const uint8_t *start = buf, *end = buf + size; -#if HAVE_FAST_UNALIGNED -#define READ(bitness) AV_RN ## bitness +#define INITIALIZATION(mod) do { \ + for (; buf < end && (uintptr_t)buf % mod; buf++) \ + if (!*buf) \ + return buf - start; \ + } while (0) + +#define READ(bitness) AV_RN ## bitness ## A #define MAIN_LOOP(bitness, mask1, mask2) do { \ /* we check p < end instead of p + 3 / 7 because it is * simpler and there must be AV_INPUT_BUFFER_PADDING_SIZE @@ -46,10 +51,11 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) } while (0) #if HAVE_FAST_64BIT + INITIALIZATION(8); MAIN_LOOP(64, 0x0101010101010101ULL, 0x8080808080808080ULL); #else + INITIALIZATION(4); MAIN_LOOP(32, 0x01010101U, 0x80808080U); -#endif #endif for (; buf < end; buf++) if (!*buf)