From patchwork Sun Nov 4 04:48:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 10915 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 95FEE44CCDD for ; Sun, 4 Nov 2018 06:57:37 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0C22968A830; Sun, 4 Nov 2018 06:57:09 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E708668A7DE for ; Sun, 4 Nov 2018 06:57:02 +0200 (EET) Received: by mail-wr1-f65.google.com with SMTP id y3-v6so5547844wrh.10 for ; Sat, 03 Nov 2018 21:57:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9metla98TSZFA/Y4uWcTCqfSyeApH/dObrATJDnPZF0=; b=o9L9dhzJgBy2x+/4Px4JJQtnlAy6+ipd7Vkp4svodC2dB/JBdYlr9aVDaIBtHTtGcn aCCCDu34X5A1h2TDi4M00J86OB1LMAXfTbks/Mix9WHUfDp64xEl7b75A+l+q1syieSH ku3aql2m5Zdr1C1m5sYGRwzlJKN3P6hYFzEReoULb5rVV9Saz7L3ziSj9wR5nNdqcD3b cDu+pop9zLjW2G+4Cew+qBli17spVGaNK6sriWUxbubl/aWkETXid+44oNrdpVQbZart aB7T16IvlRB3+gpKsP3PopSH/n6PCrWm93fyVbspZmO90fB9OfdWW0id6QxBIoiGucL5 2fIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9metla98TSZFA/Y4uWcTCqfSyeApH/dObrATJDnPZF0=; b=YSKz7se5NjgA3QCcDoO3uKEFw9gFWiPod/2rgGAL+CXMU1ismKWuGXw8TM7nXGBHw0 M1MS1rgltMaxLQpiLs3oxzRlcv0bBzXqZBeZvCRTE8fmSVNcvZWefNSdXF0/M1S6uUSn FCWKKbOVSsFyJeLnoN21bls5eY0kUjzmHco7Rz2gmNWYjtsgUk1wdWN6SFaavroidvM+ DMXElofcZeEW7C/uO54ydO+2mR5d+uU/S+a0Ol8fn+jqza/+0Msd8QHNTkyKeVECHw7s aKfk8TrNBC8YTwEjVCbASL6jy0VAU/piE8N1UtmMhlC0HQC73QLr1oYIZVzNusZTmakE mlMA== X-Gm-Message-State: AGRZ1gJyV6d5NZg9zqvpaKmIC4wnloJxX31xkaGwL5CgGnaf6loWcSJS Tl+rI+SsGnwYOFisYJp6WEdlEbYTEFqsMCH7 X-Google-Smtp-Source: AJdET5dlvqmDkC7XfXj4DkTyALgSRia70DognXCX1pIwvuJuf0M06yFRYhx/tJoQri37lEdGkKUcVA== X-Received: by 2002:adf:e581:: with SMTP id l1-v6mr2346492wrm.253.1541306990191; Sat, 03 Nov 2018 21:49:50 -0700 (PDT) Received: from localhost.localdomain (ipbcc08c44.dynamic.kabel-deutschland.de. [188.192.140.68]) by smtp.googlemail.com with ESMTPSA id 137-v6sm778573wmo.43.2018.11.03.21.49.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Nov 2018 21:49:49 -0700 (PDT) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 4 Nov 2018 05:48:41 +0100 Message-Id: <20181104044842.3092-3-andreas.rheinhardt@googlemail.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20181104044842.3092-1-andreas.rheinhardt@googlemail.com> References: <20181104044842.3092-1-andreas.rheinhardt@googlemail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] cbs_h264: Improve performance of writing slices X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Instead of using a combination of bitreader and -writer for copying data, one can byte-align the (obsolete and removed) bitreader to improve performance. With the right alignment one can even use memcpy. The right alignment normally exists for CABAC. For aligned data this reduced the time to copy the slicedata from 776520 decicycles to 33889 with 262144 runs and a 6.5mb/s H.264 video. For unaligned data the number went down from 279196 to 97739 decicycles. Signed-off-by: Andreas Rheinhardt --- libavcodec/cbs_h2645.c | 67 +++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/libavcodec/cbs_h2645.c b/libavcodec/cbs_h2645.c index e55bd00183..d3a41fbdf0 100644 --- a/libavcodec/cbs_h2645.c +++ b/libavcodec/cbs_h2645.c @@ -1100,37 +1100,62 @@ static int cbs_h264_write_nal_unit(CodedBitstreamContext *ctx, case H264_NAL_AUXILIARY_SLICE: { H264RawSlice *slice = unit->content; - GetBitContext gbc; - int bits_left, end, zeroes; err = cbs_h264_write_slice_header(ctx, pbc, &slice->header); if (err < 0) return err; if (slice->data) { - if (slice->data_size * 8 + 8 > put_bits_left(pbc)) - return AVERROR(ENOSPC); + size_t rest = slice->data_size - (slice->data_bit_start + 7) / 8; + uint8_t *pos = slice->data + slice->data_bit_start / 8; - init_get_bits(&gbc, slice->data, slice->data_size * 8); - skip_bits_long(&gbc, slice->data_bit_start); + av_assert0(slice->data_bit_start >= 0 && + 8 * slice->data_size > slice->data_bit_start); - // Copy in two-byte blocks, but stop before copying the - // rbsp_stop_one_bit in the final byte. - while (get_bits_left(&gbc) > 23) - put_bits(pbc, 16, get_bits(&gbc, 16)); + if (slice->data_size * 8 + 8 > put_bits_left(pbc)) + return AVERROR(ENOSPC); - bits_left = get_bits_left(&gbc); - end = get_bits(&gbc, bits_left); + if (!rest) + goto rbsp_stop_one_bit; + + // First copy the remaining bits of the first byte + // The above check ensures that we do not accidentally + // copy beyond the rbsp_stop_one_bit. + if (slice->data_bit_start % 8) + put_bits(pbc, 8 - slice->data_bit_start % 8, + *pos++ & MAX_UINT_BITS(8 - slice->data_bit_start % 8)); + + if (put_bits_count(pbc) % 8 == 0) { + // If the writer is aligned at this point, + // memcpy can be used to improve performance. + // This happens normally for CABAC. + flush_put_bits(pbc); + memcpy(put_bits_ptr(pbc), pos, rest); + skip_put_bytes(pbc, rest); + break; + } else { + // If not, we have to copy manually. + // rbsp_stop_one_bit forces us to special-case + // the last byte. + for (; rest > 4; rest -= 4, pos += 4) + put_bits32(pbc, AV_RB32(pos)); + + for (; rest > 1; rest--, pos++) + put_bits(pbc, 8, *pos); + } - // rbsp_stop_one_bit must be present here. - av_assert0(end); - zeroes = ff_ctz(end); - if (bits_left > zeroes + 1) - put_bits(pbc, bits_left - zeroes - 1, - end >> (zeroes + 1)); - put_bits(pbc, 1, 1); - while (put_bits_count(pbc) % 8 != 0) - put_bits(pbc, 1, 0); + rbsp_stop_one_bit: { + int i; + uint8_t temp = rest ? *pos : *pos & MAX_UINT_BITS(8 - + slice->data_bit_start % 8); + av_assert0(temp); + i = ff_ctz(*pos); + temp = temp >> i; + i = rest ? (8 - i) : (8 - i - slice->data_bit_start % 8); + put_bits(pbc, i, temp); + if (put_bits_count(pbc) % 8) + put_bits(pbc, 8 - put_bits_count(pbc) % 8, 0U); + } } else { // No slice data - that was just the header. // (Bitstream may be unaligned!)