From patchwork Sun Nov 4 04:48:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 10913 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id F275B44CC64 for ; Sun, 4 Nov 2018 06:55:08 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6421068A6B1; Sun, 4 Nov 2018 06:54:40 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 13EE568A696 for ; Sun, 4 Nov 2018 06:54:34 +0200 (EET) Received: by mail-wr1-f66.google.com with SMTP id u1-v6so5874578wrn.0 for ; Sat, 03 Nov 2018 21:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+NVAmwARV1DydjODMdwJb7aIsqDgmKZtGJ5sJrGgEmc=; b=gyid2er48RTGOLMvO4IuB/aoyJRYEBHlVB4ZpxIy6wx0pG4WfSsZFrdWYbZCnel0pP EOJ7s2G5WHHJHPwPbu+8HodAtVZU52aShNf4nSO9epjZOyMaXO+KtzI89IhNj6JRokuX dMPtDp20/gz9tCLDq8gZg4IyDEZphn15YsYhJXN3MHXMS6Nzxo+znLk8TXwCyzDL3ILA k5WIVeO94ZmlmWaiIv/o5nM0Ptp85qp2Jbngv3D1/GRvwnqLcMdCq1WeE7gYtKLjFerl cIdNcBzR80QGogUv/nRZSBG6S/kckg4xhjPI44DbVGMv/xnIYgFak8LKBtYspldZIdqB FyYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+NVAmwARV1DydjODMdwJb7aIsqDgmKZtGJ5sJrGgEmc=; b=NmtOpfKSUt6RuVt4/uOtntoLa7OCNEdVH305W1DGiU9gzWVTDRjxhg00gf3IyCTxeR zkqKtUbcpk+CIWFlkO3muXQP0pEwc5+P/BuEDKiYKG8UOb8sFldq6DZBDvkhUKUA7KKd J/EwX3izxm0oeqH6T9nkTZMl2A9vtEx/axM3aJRaAbv1MN/0arq/YFNXVBnQeVd/H45v JA5mEi7ccetAvEmY7ZSS1XB/qByhHQq6FAbj4BsKe+81pCe6HJiBLSFcjAhJX7T2sjT5 klqXTAtxIrfCi5FU2rb0Y4KESA1c8tXvKiLepMxn0pEfvEmjSBDZZGL/Xrws8IQbM7mO osBw== X-Gm-Message-State: AGRZ1gJuCQRfLHDdDH/BqCRShxDuMowOFyXyVVuGhmNQIwaaDGqBntPE VkEb66b45mGIHmZDjcLWlK5ky5X9QKusN9pe X-Google-Smtp-Source: AJdET5cwKGUFQIX4mG0CyRtt72b7A9HTeKVHpxI/ER3ydZiIPuOy+WPoQqvzzsl9oB0aaq79LPrrNQ== X-Received: by 2002:a5d:4609:: with SMTP id t9-v6mr16201297wrq.198.1541306989219; Sat, 03 Nov 2018 21:49:49 -0700 (PDT) Received: from localhost.localdomain (ipbcc08c44.dynamic.kabel-deutschland.de. [188.192.140.68]) by smtp.googlemail.com with ESMTPSA id 137-v6sm778573wmo.43.2018.11.03.21.49.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Nov 2018 21:49:48 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 4 Nov 2018 05:48:40 +0100 Message-Id: <20181104044842.3092-2-andreas.rheinhardt@googlemail.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20181104044842.3092-1-andreas.rheinhardt@googlemail.com> References: <20181104044842.3092-1-andreas.rheinhardt@googlemail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] cbs_mpeg2: Improve performance of writing slices X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Instead of using a combination of bitreader and -writer for copying data, one can byte-align the (obsolete and removed) bitreader to improve performance. One can even use memcpy in the normal case. This improved the time needed for writing the slicedata from 33618 to 2370 decicycles when tested on a video originating from a DVD (4194394 runs). Signed-off-by: Andreas Rheinhardt --- libavcodec/cbs_mpeg2.c | 39 +++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/libavcodec/cbs_mpeg2.c b/libavcodec/cbs_mpeg2.c index 0df4234b12..7161f1ee80 100644 --- a/libavcodec/cbs_mpeg2.c +++ b/libavcodec/cbs_mpeg2.c @@ -264,8 +264,6 @@ static int cbs_mpeg2_write_slice(CodedBitstreamContext *ctx, PutBitContext *pbc) { MPEG2RawSlice *slice = unit->content; - GetBitContext gbc; - size_t bits_left; int err; err = cbs_mpeg2_write_slice_header(ctx, pbc, &slice->header); @@ -273,21 +271,38 @@ static int cbs_mpeg2_write_slice(CodedBitstreamContext *ctx, return err; if (slice->data) { + size_t rest = slice->data_size - (slice->data_bit_start + 7) / 8; + uint8_t *pos = slice->data + slice->data_bit_start / 8; + + av_assert0(slice->data_bit_start >= 0 && + 8* slice->data_size > slice->data_bit_start); + if (slice->data_size * 8 + 8 > put_bits_left(pbc)) return AVERROR(ENOSPC); - init_get_bits(&gbc, slice->data, slice->data_size * 8); - skip_bits_long(&gbc, slice->data_bit_start); - - while (get_bits_left(&gbc) > 15) - put_bits(pbc, 16, get_bits(&gbc, 16)); + // First copy the remaining bits of the first byte + if (slice->data_bit_start % 8) + put_bits(pbc, 8 - slice->data_bit_start % 8, + *pos++ & MAX_UINT_BITS(8 - slice->data_bit_start % 8)); + + if (put_bits_count(pbc) % 8 == 0) { + // If the writer is aligned at this point, + // memcpy can be used to improve performance. + // This is the normal case. + flush_put_bits(pbc); + memcpy(put_bits_ptr(pbc), pos, rest); + skip_put_bytes(pbc, rest); + } else { + // If not, we have to copy manually: + for (; rest > 3; rest -= 4, pos += 4) + put_bits32(pbc, AV_RB32(pos)); - bits_left = get_bits_left(&gbc); - put_bits(pbc, bits_left, get_bits(&gbc, bits_left)); + for (; rest; rest--, pos++) + put_bits(pbc, 8, *pos); - // Align with zeroes. - while (put_bits_count(pbc) % 8 != 0) - put_bits(pbc, 1, 0); + // Align with zeros + put_bits(pbc, 8 - put_bits_count(pbc) % 8, 0U); + } } return 0;