From patchwork Tue Nov 19 01:13:14 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josh Allmann <joshua.allmann@gmail.com>
X-Patchwork-Id: 16329
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 0C2F7445202
	for <patchwork@ffaux-bg.ffmpeg.org>;
	Tue, 19 Nov 2019 03:13:36 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DFA0668A56C;
	Tue, 19 Nov 2019 03:13:35 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com
	[209.85.210.180])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4819568A291
	for <ffmpeg-devel@ffmpeg.org>; Tue, 19 Nov 2019 03:13:29 +0200 (EET)
Received: by mail-pf1-f180.google.com with SMTP id b19so11266126pfd.3
	for <ffmpeg-devel@ffmpeg.org>; Mon, 18 Nov 2019 17:13:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=from:to:subject:date:message-id;
	bh=l0P/fukIqIANkKpSQkJFfRL5SW7QWb8a8DmUNOS03oU=;
	b=gM7yHd6dfhH/UIehFoOTHOJjF6tpiE2DnUQC0wKB/KbX4nVcFgGTzeeq2BqUWl4Lu4
	T0Fuj1NHXmaqLqsKY/afl6rL3ive7NuzFeWnYOPMzHMnGDS6xF81/0h4/WdXYXrCd0Or
	Qf1m/PgPhG0Tld8FfJkjvCOqJs0h6tvV9KK86uWqydKio7oDy/emlN3y0gLmNgHUxSYu
	yXjOOZcWQljpOVf2mLLULsZ4C9OZZqamqbujh8Z73U4DzYw2Hc6uOH/qvhzE4oa4Cckn
	BeWoQWoFuTYqXpYRwrSxmkrrZcS2AvkMevkeMG23WoD3G/7BkEJ+G/P7m6bna4+Yq+6g
	h5qQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:subject:date:message-id;
	bh=l0P/fukIqIANkKpSQkJFfRL5SW7QWb8a8DmUNOS03oU=;
	b=TT9rjziadMak31Z0ESr5C6LLHyLw+dlPjCZ/Wbw2Q40C0b9tj0wMpkQFfgc4EHPNXk
	hp0PB94T1uhCNTvoII4DmXcnN+CnulJGYujpE6wPqE9QSiPtdMMkUyQ9mQMfBdPrN7iI
	Q7Fglr4tm5xTi5m8C1+o2QIzsTn7UQUA35vBkarbOdUSC93q8yYgGhhlpHZA6CzKQH4J
	N8shl1ARpegGl2Dym5XUJX9AMhp3dJeenNjG03Iuwzp3KgmqbqPNLDwEX6TBJ9Ch8vJd
	VT46uMOSk7SauFTWWHn1ClvovbhFOpvNQhL9Ilci582Sn/VRmNxo85NvqZK9bMugamBA
	rlyQ==
X-Gm-Message-State: APjAAAUdUOI2SGtOZaLkjNwxeXMGf8ubKdEVZmKEANBTpKwS5mkvN2bS
	UOmGcVUYzWK9DUj1vdJ9JwPxFz9v
X-Google-Smtp-Source: 
 APXvYqzRH8AG9rcRYG1pHsmTvhSwwxmffvXz+NIzBnnexOjMWZtnFqyJ9gEVPHbPpSycCHM+2ah9+A==
X-Received: by 2002:a63:745:: with SMTP id 66mr2397647pgh.389.1574126007151;
	Mon, 18 Nov 2019 17:13:27 -0800 (PST)
Received: from localhost.localdomain (cpe-76-172-80-40.socal.res.rr.com.
	[76.172.80.40]) by smtp.gmail.com with ESMTPSA id
	a68sm22173627pfa.160.2019.11.18.17.13.26
	for <ffmpeg-devel@ffmpeg.org>
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Mon, 18 Nov 2019 17:13:26 -0800 (PST)
From: Josh Allmann <joshua.allmann@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Mon, 18 Nov 2019 17:13:14 -0800
Message-Id: <1574125994-7782-1-git-send-email-joshua.allmann@gmail.com>
X-Mailer: git-send-email 2.7.4
Subject: [FFmpeg-devel] [PATCH][DISCUSS] nvenc: Add encoder flush API.
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

This patch is meant to be an entry point for discussion around an
issue we are having with flushing the nvenc encoder while doing
segmented transcoding. Hopefully there will be a less kludgey
workaround than this.

First, some background some info on where this is coming from. We do
segmented transcoding on Nvidia GPUs using the libav* libraries [0].
The flow is roughly this:

1. Segment incoming stream
2. Send each segment to a transcoder

We've noticed a significant overhead around setting up new transcode
sessions / contexts for each segment, and this overhead is magnified
the more streams a given machine is processing, regardless of the
number of attached GPUs [1].

Now, the logical solution here would be to reuse the GPU sessions
for segments during a given stream. However, there is a problem
around flushing internal decode / encode buffers. Because we do
segmented transcoding [2], we need to ensure that all stages in the
transcode pipeline are completely flushed in between each segment.

Here is what we do for each stage of decode, filter and encode:

* Decoding : Cache the first packet of each segment. When the
  IO layer EOFs, feed the cached packet with a sentinel pts of -1.
  (This doesn't seem to cause issues with h264_cuvid.) Once a frame
  is returned from the decoder with the sentinel pts set, we know
  the decoder is flushed of legitimate input. For a typical 2-second
  segment, this has typically added about 6 frames (~10%) of overhead
  which is tolerable because decoding is typically less expensive than
  encoding, No changes are required to FFmpeg itself, which is nice.

* Filtering : Close the filtergraph (via av_buffersrc_close) and re-
  initialize the filter with each segment. Again, the overhead here
  seems tolerable. Have not seen a straightforward way to drain the
  filtergraph without also closing or re-opening it.

* Encoding : This patch.

  We add a very special "av_nvenc_flush" API to signal end-of-stream
  in the same way as `avcodec_send_packet(ctx, NULL)` but bypassing
  all the higher-level libavcodec machinery before hitting nvenc.
  This seems to successfully drain pending frames. Afterwards,
  we can continue to send packets for the next segments via
  `avcodec_send_packet` and the internal state will more-or-less
  reinitialize as if nothing had happened.

  Now, it is quite likely that this behavior is entirely accidental,
  and should not be expected to be stable in the future.

  While the nvenc encoder itself does seem to be "resumable" according
  to the documentation around the `NV_ENC_FLAGS_EOS` flag (cf.
  NVIDIA Video Encoder API Programming Guide), FFmpeg has no such
  mode. So we've had to sort of inject one in here.

The questions here are:

* Are these workarounds reasonable for the problem of Nvidia GPU
  sessions taking a long time to initialize when transcoding under
  load?

* Is there an alternative to carrying around this patch to flush
  the encoder in between segments?

* If there is no alternative, would you be open to a more formalized
  addition to the avcodec API around "flushable" or "resumable"
  encoders?

Thanks for your thoughts!

Josh

[0] https://github.com/livepeer/lpms

[1] https://gist.github.com/j0sh/ae9e5a97e794e364a6dfe513fa2591c2

[2] For historical reasons we cannot easily change right now
---
 libavcodec/avcodec.h | 2 ++
 libavcodec/nvenc.c   | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index bcb931f0dd..763a557d82 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -6232,6 +6232,8 @@ const AVCodecDescriptor *avcodec_descriptor_get_by_name(const char *name);
  */
 AVCPBProperties *av_cpb_properties_alloc(size_t *size);
 
+int av_nvenc_flush(AVCodecContext *avctx);
+
 /**
  * @}
  */
diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
index 111048d043..36134fa6a9 100644
--- a/libavcodec/nvenc.c
+++ b/libavcodec/nvenc.c
@@ -2071,6 +2071,11 @@ static void reconfig_encoder(AVCodecContext *avctx, const AVFrame *frame)
     }
 }
 
+int attribute_align_arg av_nvenc_flush(AVCodecContext *avctx)
+{
+  return ff_nvenc_send_frame(avctx, NULL);
+}
+
 int ff_nvenc_send_frame(AVCodecContext *avctx, const AVFrame *frame)
 {
     NVENCSTATUS nv_status;