From patchwork Sun Nov 4 04:48:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 10914 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 54E4244CCDD for ; Sun, 4 Nov 2018 06:57:08 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C0A7B68A80A; Sun, 4 Nov 2018 06:56:39 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 19F3D68A6B1 for ; Sun, 4 Nov 2018 06:56:34 +0200 (EET) Received: by mail-wm1-f41.google.com with SMTP id u13-v6so4997802wmc.4 for ; Sat, 03 Nov 2018 21:57:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZHaKhMNMQT/C17BthQAUAKyFKstoquMwpVUXaULH61s=; b=Eh6xck3EFuk1U/Dbet6efyIylpG6O84Pp3OJN2UZnyKJOs1HxRXCgY4XN0hu0zJTvF YSREd8pZ8CTmvGwUkOkQ8uqxyxnHMR2ml/pKABh7rK/F6nnrn1HM1COlfUhEZrUFog4j TXyou2Km10qXG23A/HDLkp8XgJjIB/MfNGOTDPUMctkpSaylHNj70GrZKjtaN96Utzkz RCY6vyBhnMAxh1kNUaN2Y3RQ9k+b74OUGH+C5nZdrLZOyBz0/g2xf+PXF0yy8h4VMH3G 8rLFl2NQ2naihKp0PHPe+WAiMQvxvXnBACcDQJcRV2KbH5QLuybh/cSwZPYXLA9j6Lu7 pBcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZHaKhMNMQT/C17BthQAUAKyFKstoquMwpVUXaULH61s=; b=OcrcWI90ayboXVP5zXTcHr72KsApnSba7nyVVEUX2sN9pQekH/YUfjvGi81W6ckW3M rimaCax8jy2iLygNnT5673MCzHohNVNEpsbmvk+YTm3ncedgOCRxMqgiUsQpEPcCLYLA +VRf/T1D2/zuQjZDMytaVh169Bv//6ImRIquBPul9IcBleunz+20pSLpwDcF5f3h4xTz XxOO3meZwIfBINa6EzZami6vvQXnU5hAo7ywExe8q2Doo1MZbGd8mV+UR4aGyTfZ0Zuk a3h+//QIO6k06506alVAUYEIryfWxJGcMvxQot0qOD0gLSQcpV4cjku1RCPQH3eeoSRI fyFQ== X-Gm-Message-State: AGRZ1gLEj1fI3aFdxq3XL9BT3/hQdQB3Qr3XG09TyiFVlyJ4c4ZxruFl 0fdgLdKSaO5nQzLcJsHx4ai7tnRlvAa8WjV2 X-Google-Smtp-Source: AJdET5deA/OxwHp3AIfQSVHl7lJbLhVzr8HEPkBGam+ck26FvbWygl7HeWD3zNEFb5NPtxOeGvZsvA== X-Received: by 2002:a1c:e3d4:: with SMTP id a203-v6mr2313086wmh.16.1541306991219; Sat, 03 Nov 2018 21:49:51 -0700 (PDT) Received: from localhost.localdomain (ipbcc08c44.dynamic.kabel-deutschland.de. [188.192.140.68]) by smtp.googlemail.com with ESMTPSA id 137-v6sm778573wmo.43.2018.11.03.21.49.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Nov 2018 21:49:50 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 4 Nov 2018 05:48:42 +0100 Message-Id: <20181104044842.3092-4-andreas.rheinhardt@googlemail.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20181104044842.3092-1-andreas.rheinhardt@googlemail.com> References: <20181104044842.3092-1-andreas.rheinhardt@googlemail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] cbs_h265: Improve performance of writing slices X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Instead of using a combination of bitreader and -writer for copying data, one can byte-align the (obsolete and removed) bitreader to improve performance. Given that the H265 slice segment header always has a byte length, one can normally use memcpy. With this patch the number of decicycles used to copy the slicedata went down from 181395 to 8672 for a 830kb/s sample with 16384 runs. Signed-off-by: Andreas Rheinhardt --- libavcodec/cbs_h2645.c | 70 +++++++++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 22 deletions(-) diff --git a/libavcodec/cbs_h2645.c b/libavcodec/cbs_h2645.c index d3a41fbdf0..d9ea498faa 100644 --- a/libavcodec/cbs_h2645.c +++ b/libavcodec/cbs_h2645.c @@ -1279,39 +1279,65 @@ static int cbs_h265_write_nal_unit(CodedBitstreamContext *ctx, case HEVC_NAL_CRA_NUT: { H265RawSlice *slice = unit->content; - GetBitContext gbc; - int bits_left, end, zeroes; err = cbs_h265_write_slice_segment_header(ctx, pbc, &slice->header); if (err < 0) return err; if (slice->data) { + size_t rest = slice->data_size - (slice->data_bit_start + 7) / 8; + uint8_t *pos = slice->data + slice->data_bit_start / 8; + + av_assert0(slice->data_bit_start >= 0 && + 8 * slice->data_size > slice->data_bit_start); + if (slice->data_size * 8 + 8 > put_bits_left(pbc)) return AVERROR(ENOSPC); - init_get_bits(&gbc, slice->data, slice->data_size * 8); - skip_bits_long(&gbc, slice->data_bit_start); - - // Copy in two-byte blocks, but stop before copying the - // rbsp_stop_one_bit in the final byte. - while (get_bits_left(&gbc) > 23) - put_bits(pbc, 16, get_bits(&gbc, 16)); - - bits_left = get_bits_left(&gbc); - end = get_bits(&gbc, bits_left); - - // rbsp_stop_one_bit must be present here. - av_assert0(end); - zeroes = ff_ctz(end); - if (bits_left > zeroes + 1) - put_bits(pbc, bits_left - zeroes - 1, - end >> (zeroes + 1)); - put_bits(pbc, 1, 1); - while (put_bits_count(pbc) % 8 != 0) - put_bits(pbc, 1, 0); + if (!rest) + goto rbsp_stop_one_bit; + + // First copy the remaining bits of the first byte + // The above check ensures that we do not accidentally + // copy beyond the rbsp_stop_one_bit. + if (slice->data_bit_start % 8) + put_bits(pbc, 8 - slice->data_bit_start % 8, + *pos++ & MAX_UINT_BITS(8 - slice->data_bit_start % 8)); + + if (put_bits_count(pbc) % 8 == 0) { + // If the writer is aligned at this point, + // memcpy can be used to improve performance. + // This is the normal case. + flush_put_bits(pbc); + memcpy(put_bits_ptr(pbc), pos, rest); + skip_put_bytes(pbc, rest); + break; + } else { + // If not, we have to copy manually. + // rbsp_stop_one_bit forces us to special-case + // the last byte. + for (; rest > 4; rest -= 4, pos += 4) + put_bits32(pbc, AV_RB32(pos)); + + for (; rest > 1; rest--, pos++) + put_bits(pbc, 8, *pos); + } + + rbsp_stop_one_bit: { + int i; + uint8_t temp = rest ? *pos : *pos & MAX_UINT_BITS(8 - + slice->data_bit_start % 8); + av_assert0(temp); + i = ff_ctz(*pos); + temp = temp >> i; + i = rest ? (8 - i) : (8 - i - slice->data_bit_start % 8); + put_bits(pbc, i, temp); + if (put_bits_count(pbc) % 8) + put_bits(pbc, 8 - put_bits_count(pbc) % 8, 0U); + } } else { // No slice data - that was just the header. + // (Bitstream may be unaligned!) } } break;