From patchwork Sun Nov 11 22:43:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 10991 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id BE73644D8A9 for ; Mon, 12 Nov 2018 00:43:52 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 306F068A141; Mon, 12 Nov 2018 00:43:24 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DB1B768A10A for ; Mon, 12 Nov 2018 00:43:17 +0200 (EET) Received: by mail-wm1-f67.google.com with SMTP id z8-v6so287366wma.5 for ; Sun, 11 Nov 2018 14:43:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9o+R0S6FHX3+GLG/vkaXd6tKf4zWF3cU13diTNxqf50=; b=cGCsU+OvdiaCS3nBfI8Ra17c43iL6PBb89gG3hc2cKE/8+8X3syIQ1+NnvUMDjh7K0 rb2rIKnqtMGOz+ooPU0EfVDayChFp2Gmfi47NQwSPKAI3+d1Yot842jd00c28OGCfSyW ztlod3aJQQ+zSLYQfHSK1kY6hP8T+I3vXjTIUP07PM7jqjSwelyzOyoCqpZ7g5vXwKqq aqcdXESyoKW7lLmEW/m16Z336uixDbX2H/kbTT3dEtynXjQg9lwJ93op/KwfEYFN3L0G hTtAdamw+h/JcNLwYpaYkcFs6nBZQhInUXYR18+9qNOdJSaob7q2Sbxr4xLoEQ0vEnJa dFSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9o+R0S6FHX3+GLG/vkaXd6tKf4zWF3cU13diTNxqf50=; b=XLh8AmJDUjE0io5UZP0SnQzABAUOuDt2hkiLheuqFVm61S3jMOn37NBOI9xF44yesD 5//7EE3ZTu24LfEZO1aVqfWGo0EgCk0q0oY3qi66t9wvPWFfQ7G0xE4lPsmtxlExhsTt hYnIiaWW9djNOPx8Go25OR3uZ1Z9cSGZhVgG/qE3HpRF46Ub0EsD+7DJpnEz9IDMcWos qjBTaCDCZ78SN5sNaWPszb8e0DR3jVfGfOT4abAGIjLUoKUMKGTAOoFggHMBkbAV6TgG k8HF3ItueOtbMdUgt6zIhfuAYvb77AmzKAHoGc6cmY6S59ktR+fYrgfvktIr8Tx6XH// 3VlQ== X-Gm-Message-State: AGRZ1gIKbrOmMi4Cj6q/YzSfTZKGGKCldpAlBNmGuxdWbXgFPSLEUelf i5v5VbcP1/5TrNJjdn++7J4M3Znp X-Google-Smtp-Source: AJdET5eDHVhyWU0a4VClb3DwWr4kwMOPOLUEFkOeHOCaI/7jJGrsrQ2wFzpvRsdCoX2UJxiQLNiLug== X-Received: by 2002:a1c:e03:: with SMTP id 3-v6mr5866339wmo.13.1541976234142; Sun, 11 Nov 2018 14:43:54 -0800 (PST) Received: from localhost.localdomain (ipbcc08c44.dynamic.kabel-deutschland.de. [188.192.140.68]) by smtp.googlemail.com with ESMTPSA id x8-v6sm36972376wrd.54.2018.11.11.14.43.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 11 Nov 2018 14:43:53 -0800 (PST) From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Sun, 11 Nov 2018 23:43:05 +0100 Message-Id: <20181111224305.4480-1-andreas.rheinhardt@googlemail.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] cbs_h2645: Improve performance of writing slices X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Instead of using a combination of bitreader and -writer for copying data, one can byte-align the (obsolete and removed) bitreader to improve performance. With the right alignment one can even use memcpy. The right alignment normally exists for CABAC and hence for H.265 in general. For aligned data this reduced the time to copy the slicedata from 776520 decicycles to 33889 with 262144 runs and a 6.5mb/s H.264 video. For unaligned data the number went down from 279196 to 97739 decicycles. --- libavcodec/cbs_h2645.c | 119 ++++++++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 50 deletions(-) diff --git a/libavcodec/cbs_h2645.c b/libavcodec/cbs_h2645.c index e55bd00183..416d3fd32a 100644 --- a/libavcodec/cbs_h2645.c +++ b/libavcodec/cbs_h2645.c @@ -1050,6 +1050,64 @@ static int cbs_h265_read_nal_unit(CodedBitstreamContext *ctx, return 0; } +static int cbs_h2645_write_slice_data(CodedBitstreamContext *ctx, + PutBitContext *pbc, const uint8_t *data, + size_t data_size, int data_bit_start) +{ + size_t rest = data_size - (data_bit_start + 7) / 8; + const uint8_t *pos = data + data_bit_start / 8; + + av_assert0(data_bit_start >= 0 && + 8 * data_size > data_bit_start); + + if (data_size * 8 + 8 > put_bits_left(pbc)) + return AVERROR(ENOSPC); + + if (!rest) + goto rbsp_stop_one_bit; + + // First copy the remaining bits of the first byte + // The above check ensures that we do not accidentally + // copy beyond the rbsp_stop_one_bit. + if (data_bit_start % 8) + put_bits(pbc, 8 - data_bit_start % 8, + *pos++ & MAX_UINT_BITS(8 - data_bit_start % 8)); + + if (put_bits_count(pbc) % 8 == 0) { + // If the writer is aligned at this point, + // memcpy can be used to improve performance. + // This happens normally for CABAC. + flush_put_bits(pbc); + memcpy(put_bits_ptr(pbc), pos, rest); + skip_put_bytes(pbc, rest); + } else { + // If not, we have to copy manually. + // rbsp_stop_one_bit forces us to special-case + // the last byte. + uint8_t temp; + int i; + + for (; rest > 4; rest -= 4, pos += 4) + put_bits32(pbc, AV_RB32(pos)); + + for (; rest > 1; rest--, pos++) + put_bits(pbc, 8, *pos); + + rbsp_stop_one_bit: + temp = rest ? *pos : *pos & MAX_UINT_BITS(8 - data_bit_start % 8); + + av_assert0(temp); + i = ff_ctz(*pos); + temp = temp >> i; + i = rest ? (8 - i) : (8 - i - data_bit_start % 8); + put_bits(pbc, i, temp); + if (put_bits_count(pbc) % 8) + put_bits(pbc, 8 - put_bits_count(pbc) % 8, 0U); + } + + return 0; +} + static int cbs_h264_write_nal_unit(CodedBitstreamContext *ctx, CodedBitstreamUnit *unit, PutBitContext *pbc) @@ -1100,37 +1158,17 @@ static int cbs_h264_write_nal_unit(CodedBitstreamContext *ctx, case H264_NAL_AUXILIARY_SLICE: { H264RawSlice *slice = unit->content; - GetBitContext gbc; - int bits_left, end, zeroes; err = cbs_h264_write_slice_header(ctx, pbc, &slice->header); if (err < 0) return err; if (slice->data) { - if (slice->data_size * 8 + 8 > put_bits_left(pbc)) - return AVERROR(ENOSPC); - - init_get_bits(&gbc, slice->data, slice->data_size * 8); - skip_bits_long(&gbc, slice->data_bit_start); - - // Copy in two-byte blocks, but stop before copying the - // rbsp_stop_one_bit in the final byte. - while (get_bits_left(&gbc) > 23) - put_bits(pbc, 16, get_bits(&gbc, 16)); - - bits_left = get_bits_left(&gbc); - end = get_bits(&gbc, bits_left); - - // rbsp_stop_one_bit must be present here. - av_assert0(end); - zeroes = ff_ctz(end); - if (bits_left > zeroes + 1) - put_bits(pbc, bits_left - zeroes - 1, - end >> (zeroes + 1)); - put_bits(pbc, 1, 1); - while (put_bits_count(pbc) % 8 != 0) - put_bits(pbc, 1, 0); + err = cbs_h2645_write_slice_data(ctx, pbc, slice->data, + slice->data_size, + slice->data_bit_start); + if (err < 0) + return err; } else { // No slice data - that was just the header. // (Bitstream may be unaligned!) @@ -1254,39 +1292,20 @@ static int cbs_h265_write_nal_unit(CodedBitstreamContext *ctx, case HEVC_NAL_CRA_NUT: { H265RawSlice *slice = unit->content; - GetBitContext gbc; - int bits_left, end, zeroes; err = cbs_h265_write_slice_segment_header(ctx, pbc, &slice->header); if (err < 0) return err; if (slice->data) { - if (slice->data_size * 8 + 8 > put_bits_left(pbc)) - return AVERROR(ENOSPC); - - init_get_bits(&gbc, slice->data, slice->data_size * 8); - skip_bits_long(&gbc, slice->data_bit_start); - - // Copy in two-byte blocks, but stop before copying the - // rbsp_stop_one_bit in the final byte. - while (get_bits_left(&gbc) > 23) - put_bits(pbc, 16, get_bits(&gbc, 16)); - - bits_left = get_bits_left(&gbc); - end = get_bits(&gbc, bits_left); - - // rbsp_stop_one_bit must be present here. - av_assert0(end); - zeroes = ff_ctz(end); - if (bits_left > zeroes + 1) - put_bits(pbc, bits_left - zeroes - 1, - end >> (zeroes + 1)); - put_bits(pbc, 1, 1); - while (put_bits_count(pbc) % 8 != 0) - put_bits(pbc, 1, 0); + err = cbs_h2645_write_slice_data(ctx, pbc, slice->data, + slice->data_size, + slice->data_bit_start); + if (err < 0) + return err; } else { // No slice data - that was just the header. + // (Bitstream may be unaligned!) } } break;