From patchwork Tue Nov 19 01:13:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josh Allmann X-Patchwork-Id: 16329 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 0C2F7445202 for ; Tue, 19 Nov 2019 03:13:36 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DFA0668A56C; Tue, 19 Nov 2019 03:13:35 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4819568A291 for ; Tue, 19 Nov 2019 03:13:29 +0200 (EET) Received: by mail-pf1-f180.google.com with SMTP id b19so11266126pfd.3 for ; Mon, 18 Nov 2019 17:13:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=l0P/fukIqIANkKpSQkJFfRL5SW7QWb8a8DmUNOS03oU=; b=gM7yHd6dfhH/UIehFoOTHOJjF6tpiE2DnUQC0wKB/KbX4nVcFgGTzeeq2BqUWl4Lu4 T0Fuj1NHXmaqLqsKY/afl6rL3ive7NuzFeWnYOPMzHMnGDS6xF81/0h4/WdXYXrCd0Or Qf1m/PgPhG0Tld8FfJkjvCOqJs0h6tvV9KK86uWqydKio7oDy/emlN3y0gLmNgHUxSYu yXjOOZcWQljpOVf2mLLULsZ4C9OZZqamqbujh8Z73U4DzYw2Hc6uOH/qvhzE4oa4Cckn BeWoQWoFuTYqXpYRwrSxmkrrZcS2AvkMevkeMG23WoD3G/7BkEJ+G/P7m6bna4+Yq+6g h5qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=l0P/fukIqIANkKpSQkJFfRL5SW7QWb8a8DmUNOS03oU=; b=TT9rjziadMak31Z0ESr5C6LLHyLw+dlPjCZ/Wbw2Q40C0b9tj0wMpkQFfgc4EHPNXk hp0PB94T1uhCNTvoII4DmXcnN+CnulJGYujpE6wPqE9QSiPtdMMkUyQ9mQMfBdPrN7iI Q7Fglr4tm5xTi5m8C1+o2QIzsTn7UQUA35vBkarbOdUSC93q8yYgGhhlpHZA6CzKQH4J N8shl1ARpegGl2Dym5XUJX9AMhp3dJeenNjG03Iuwzp3KgmqbqPNLDwEX6TBJ9Ch8vJd VT46uMOSk7SauFTWWHn1ClvovbhFOpvNQhL9Ilci582Sn/VRmNxo85NvqZK9bMugamBA rlyQ== X-Gm-Message-State: APjAAAUdUOI2SGtOZaLkjNwxeXMGf8ubKdEVZmKEANBTpKwS5mkvN2bS UOmGcVUYzWK9DUj1vdJ9JwPxFz9v X-Google-Smtp-Source: APXvYqzRH8AG9rcRYG1pHsmTvhSwwxmffvXz+NIzBnnexOjMWZtnFqyJ9gEVPHbPpSycCHM+2ah9+A== X-Received: by 2002:a63:745:: with SMTP id 66mr2397647pgh.389.1574126007151; Mon, 18 Nov 2019 17:13:27 -0800 (PST) Received: from localhost.localdomain (cpe-76-172-80-40.socal.res.rr.com. [76.172.80.40]) by smtp.gmail.com with ESMTPSA id a68sm22173627pfa.160.2019.11.18.17.13.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 18 Nov 2019 17:13:26 -0800 (PST) From: Josh Allmann To: ffmpeg-devel@ffmpeg.org Date: Mon, 18 Nov 2019 17:13:14 -0800 Message-Id: <1574125994-7782-1-git-send-email-joshua.allmann@gmail.com> X-Mailer: git-send-email 2.7.4 Subject: [FFmpeg-devel] [PATCH][DISCUSS] nvenc: Add encoder flush API. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This patch is meant to be an entry point for discussion around an issue we are having with flushing the nvenc encoder while doing segmented transcoding. Hopefully there will be a less kludgey workaround than this. First, some background some info on where this is coming from. We do segmented transcoding on Nvidia GPUs using the libav* libraries [0]. The flow is roughly this: 1. Segment incoming stream 2. Send each segment to a transcoder We've noticed a significant overhead around setting up new transcode sessions / contexts for each segment, and this overhead is magnified the more streams a given machine is processing, regardless of the number of attached GPUs [1]. Now, the logical solution here would be to reuse the GPU sessions for segments during a given stream. However, there is a problem around flushing internal decode / encode buffers. Because we do segmented transcoding [2], we need to ensure that all stages in the transcode pipeline are completely flushed in between each segment. Here is what we do for each stage of decode, filter and encode: * Decoding : Cache the first packet of each segment. When the IO layer EOFs, feed the cached packet with a sentinel pts of -1. (This doesn't seem to cause issues with h264_cuvid.) Once a frame is returned from the decoder with the sentinel pts set, we know the decoder is flushed of legitimate input. For a typical 2-second segment, this has typically added about 6 frames (~10%) of overhead which is tolerable because decoding is typically less expensive than encoding, No changes are required to FFmpeg itself, which is nice. * Filtering : Close the filtergraph (via av_buffersrc_close) and re- initialize the filter with each segment. Again, the overhead here seems tolerable. Have not seen a straightforward way to drain the filtergraph without also closing or re-opening it. * Encoding : This patch. We add a very special "av_nvenc_flush" API to signal end-of-stream in the same way as `avcodec_send_packet(ctx, NULL)` but bypassing all the higher-level libavcodec machinery before hitting nvenc. This seems to successfully drain pending frames. Afterwards, we can continue to send packets for the next segments via `avcodec_send_packet` and the internal state will more-or-less reinitialize as if nothing had happened. Now, it is quite likely that this behavior is entirely accidental, and should not be expected to be stable in the future. While the nvenc encoder itself does seem to be "resumable" according to the documentation around the `NV_ENC_FLAGS_EOS` flag (cf. NVIDIA Video Encoder API Programming Guide), FFmpeg has no such mode. So we've had to sort of inject one in here. The questions here are: * Are these workarounds reasonable for the problem of Nvidia GPU sessions taking a long time to initialize when transcoding under load? * Is there an alternative to carrying around this patch to flush the encoder in between segments? * If there is no alternative, would you be open to a more formalized addition to the avcodec API around "flushable" or "resumable" encoders? Thanks for your thoughts! Josh [0] https://github.com/livepeer/lpms [1] https://gist.github.com/j0sh/ae9e5a97e794e364a6dfe513fa2591c2 [2] For historical reasons we cannot easily change right now --- libavcodec/avcodec.h | 2 ++ libavcodec/nvenc.c | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index bcb931f0dd..763a557d82 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -6232,6 +6232,8 @@ const AVCodecDescriptor *avcodec_descriptor_get_by_name(const char *name); */ AVCPBProperties *av_cpb_properties_alloc(size_t *size); +int av_nvenc_flush(AVCodecContext *avctx); + /** * @} */ diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c index 111048d043..36134fa6a9 100644 --- a/libavcodec/nvenc.c +++ b/libavcodec/nvenc.c @@ -2071,6 +2071,11 @@ static void reconfig_encoder(AVCodecContext *avctx, const AVFrame *frame) } } +int attribute_align_arg av_nvenc_flush(AVCodecContext *avctx) +{ + return ff_nvenc_send_frame(avctx, NULL); +} + int ff_nvenc_send_frame(AVCodecContext *avctx, const AVFrame *frame) { NVENCSTATUS nv_status;