[FFmpeg-devel,0/2] Implement SMPTE 2038 output support over Decklink SDI

Message ID	1682111554-31597-1-git-send-email-dheitmueller@ltnglobal.com
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: Devin Heitmueller <devin.heitmueller@ltnglobal.com> To: ffmpeg-devel@ffmpeg.org Date: Fri, 21 Apr 2023 17:12:32 -0400 Message-Id: <1682111554-31597-1-git-send-email-dheitmueller@ltnglobal.com> Subject: [FFmpeg-devel] [PATCH 0/2] Implement SMPTE 2038 output support over Decklink SDI Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: Devin Heitmueller <dheitmueller@ltnglobal.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	Implement SMPTE 2038 output support over Decklink SDI \| expand [FFmpeg-devel,0/2] Implement SMPTE 2038 output support over Decklink SDI [FFmpeg-devel,1/2] decklink: Move AVPacketQueue into decklink_common [FFmpeg-devel,2/2] decklink_enc: add support for SMPTE 2038 VANC packet output

Devin Heitmueller April 21, 2023, 9:12 p.m. UTC

This patch series implements output of SMPTE 2038 VANC over SDI, building
on the prior patch series which added it in the TS domain.  Note that
we moved the AVPacketQueue to be common code within libavdevice so it
can be shared by both the decklink input and output.

Comments/feedback are welcome.

Devin

Devin Heitmueller (2):
  decklink: Move AVPacketQueue into decklink_common
  decklink_enc: add support for SMPTE 2038 VANC packet output

 libavdevice/decklink_common.cpp | 131 ++++++++++++++++++++++++++++++++++++++++
 libavdevice/decklink_common.h   |  11 ++++
 libavdevice/decklink_dec.cpp    | 114 ----------------------------------
 libavdevice/decklink_enc.cpp    | 104 +++++++++++++++++++++++++++++++
 libavdevice/decklink_enc_c.c    |   1 +
 5 files changed, 247 insertions(+), 114 deletions(-)

Marton Balint April 23, 2023, 6:42 p.m. UTC | #1

On Fri, 21 Apr 2023, Devin Heitmueller wrote:

> This patch series implements output of SMPTE 2038 VANC over SDI, building
> on the prior patch series which added it in the TS domain.  Note that
> we moved the AVPacketQueue to be common code within libavdevice so it
> can be shared by both the decklink input and output.
>
> Comments/feedback are welcome.

In general, queueing packets in specific components should be avoided if 
possible. Muxed packets are normally ordered by DTS and stream id, generic 
code ensures that. If you want something other than that, then I think 
the perferred way of doing it is by providing a custom interleave 
function. (e.g. to ensure you get data packets before video even if data 
stream has a higher stream ID.)

If you are only using the queue to store multiple data packets for a 
single frame then one way to avoid it is to parse them as soon as they 
arrive via the KLV library. If you insist on queueing them (maybe because 
not every packet will be parased by the KLV lib), then I'd rather see you 
use avpriv_packet_list_*() functions, and not a custom decklink
implementation.

Regards,
Marton

Devin Heitmueller April 24, 2023, 2:11 p.m. UTC | #2

Hello Marton,

Thanks for reviewing.  Comments inline:

On Sun, Apr 23, 2023 at 2:43 PM Marton Balint <cus@passwd.hu> wrote:
> In general, queueing packets in specific components should be avoided if
> possible. Muxed packets are normally ordered by DTS and stream id, generic
> code ensures that. If you want something other than that, then I think
> the perferred way of doing it is by providing a custom interleave
> function. (e.g. to ensure you get data packets before video even if data
> stream has a higher stream ID.)

To be clear, using a queue was not first choice.  It's the result of
trying different approaches, and I'm open to constructive suggestions
on alternatives.

While what you're are saying is correct "in general", there are some
really important reasons why it doesn't work in this case.  Permit me
to explain...

By default, the behavior of the mux interleaver is to wait until there
is at least one packet available for each stream before writing to the
output module (in this case decklink).  However data formats such as
SMPTE ST2038 are considered to be "sparse" as there isn't necessarily
a continuous stream of packets like with video and audio (there may be
many seconds between packets, or no packets at all).  As a result you
can't wait for a packet to be available on all streams since on some
streams it will simply wait continuously until hitting the
max_interleave_delta, at which point it will burst out everything in
the queue.  This would cause stalls and/or stuttering playback on the
decklink output.

To accommodate these sparse streams we added code to mux.c to not wait
for 2038 packets.  A side-effect of that though is that packets will
be sent through as soon as they hit the mux, which in most cases will
be significantly ahead of the video (potentially hundreds of
milliseconds).  This can easily be seen experimentally by adding an
av_log() line to ff_decklink_write_packet(), which will show in many
cases the PTS values of the data frames being sent 20+ frames before
the corresponding video.

The queue is there because the data packets and video frames arrive in
separate calls to write_packet(), and they need to be combined to
ensure they are inserted into the same video frame.  Stashing the data
packets seemed like a reasonable approach, and a queue seemed like a
good choice as a data structure since there can be multiple data
packets for a video frame and we might receive data packets for
multiple video frames before the corresponding video frames arrived.

The notion you mentioned that the data packets might arrive after the
video frames is a valid concern hypothetically.  In practice it hasn't
been an issue, as the data packets tend to arrive long before the
video.  It was not a motivation for using a queue.  If a data packet
did arrive after the video (due to the DTS and stream ID ordering you
mentioned), the implementation would insert it on the next video frame
and it would effectively be one frame late.  I was willing to accept
this edge case given it doesn't actually happen in practice.

> If you are only using the queue to store multiple data packets for a
> single frame then one way to avoid it is to parse them as soon as they
> arrive via the KLV library. If you insist on queueing them (maybe because
> not every packet will be parased by the KLV lib), then I'd rather see you
> use avpriv_packet_list_*() functions, and not a custom decklink
> implementation.

Passing them off to libklvanc doesn't actually change the queueing
problem.  The libklvanc library doesn't actually output the VANC
packets but rather just converts them into the byte sequences that
then need to be embedded into the video frames.  I guess I could queue
the output of libklvanc rather than the original AVPackets, but that
doesn't actually solve any of the problems described above, and
actually makes things more complicated since the AVPackets contain all
the timing data and the VANC byte blobs would need to queue not just
the data but also the output timing, VANC line number and horizontal
position within the VANC region.

Regarding the use of avpriv_packet_list() as opposed to
avpacket_queue_*, I used the avpacket_queue functions for consistency
with the decklink capture module where it is used today.  Also,
avpacket_queue is threadsafe while avpriv_packet_list.*() is not.
While the threadsafeness is not critical for the VANC case, I have
subsequent patches for audio where it is important, and I figured it
would more consistent to use the same queue mechanism within decklink
for all three (capture, audio output, and vanc output).

That said, I wouldn't specifically object to converting to the
avpriv_packet_list functions since thread-safeness isn't really a
requirement for this particular case.  It's probably worth noting
though that I extended the avpacket_queue method to allow me to peek
at the first packet in the queue (which avpriv_packet_list doesn't
support today).  Hence converting to avpriv_packet_list would require
an equivalent addition to be accepted upstream.

Devin

--
Devin Heitmueller, Senior Software Engineer
LTN Global Communications
o: +1 (301) 363-1001
w: https://ltnglobal.com  e: devin.heitmueller@ltnglobal.com

Marton Balint April 25, 2023, 9:58 p.m. UTC | #3

On Mon, 24 Apr 2023, Devin Heitmueller wrote:

> Hello Marton,
>
> Thanks for reviewing.  Comments inline:
>
> On Sun, Apr 23, 2023 at 2:43 PM Marton Balint <cus@passwd.hu> wrote:

[...]

Thanks for the detailed explanations. I guess then keeping the queue is 
well justified here.

> Regarding the use of avpriv_packet_list() as opposed to
> avpacket_queue_*, I used the avpacket_queue functions for consistency
> with the decklink capture module where it is used today.  Also,
> avpacket_queue is threadsafe while avpriv_packet_list.*() is not.
> While the threadsafeness is not critical for the VANC case, I have
> subsequent patches for audio where it is important, and I figured it
> would more consistent to use the same queue mechanism within decklink
> for all three (capture, audio output, and vanc output).

Can you explain how thread safety will be relevant for audio? The 
muxer should get packets in a thread safe way, so I don't quite see how 
suddenly it will be needed.

>
> That said, I wouldn't specifically object to converting to the
> avpriv_packet_list functions since thread-safeness isn't really a
> requirement for this particular case.  It's probably worth noting
> though that I extended the avpacket_queue method to allow me to peek
> at the first packet in the queue (which avpriv_packet_list doesn't
> support today).  Hence converting to avpriv_packet_list would require
> an equivalent addition to be accepted upstream.

You can access the internals of the PacketList struct, so you can just add 
needed function to your own code, you don't necessarily have to make it 
public. On the other hand, the avpriv_packet_list does not have the 
concept of queue size or queue count, so it is not only thread safety 
that will be missing.

Two things bother me with the decklink queue:

1) It duplicates the functionality of avpriv_packet_list_put and 
avpriv_packet_list_get, but it seems to me it should not be difficult 
to actually use these get/put functions in the decklink queue as well, 
because it is already using the same packet list struct internally.
Maybe can you give it a try?

2) Namespacing of the struct / functions are wrong. Struct is called 
AVPacketQueue, it should be something like DecklinkPacketQueue in order 
to make it clear that it is not a public struct. The function names are 
prefixed with avpacket, which is also wrong. It should be simply 
packet_queue_xxx, av* would imply a public function. And if you 
factorize it to a non-static function, then it should be 
ff_decklink_packet_queue_xxx.

With these two things fixed, things would look a lot better :)

Regards,
Marton

Marton Balint April 26, 2023, 7:35 a.m. UTC | #4

On Mon, 24 Apr 2023, Devin Heitmueller wrote:

> Hello Marton,
>
> Thanks for reviewing.  Comments inline:
>
> On Sun, Apr 23, 2023 at 2:43 PM Marton Balint <cus@passwd.hu> wrote:
>> In general, queueing packets in specific components should be avoided if
>> possible. Muxed packets are normally ordered by DTS and stream id, generic
>> code ensures that. If you want something other than that, then I think
>> the perferred way of doing it is by providing a custom interleave
>> function. (e.g. to ensure you get data packets before video even if data
>> stream has a higher stream ID.)
>
> To be clear, using a queue was not first choice.  It's the result of
> trying different approaches, and I'm open to constructive suggestions
> on alternatives.
>
> While what you're are saying is correct "in general", there are some
> really important reasons why it doesn't work in this case.  Permit me
> to explain...
>
> By default, the behavior of the mux interleaver is to wait until there
> is at least one packet available for each stream before writing to the
> output module (in this case decklink).  However data formats such as
> SMPTE ST2038 are considered to be "sparse" as there isn't necessarily
> a continuous stream of packets like with video and audio (there may be
> many seconds between packets, or no packets at all).  As a result you
> can't wait for a packet to be available on all streams since on some
> streams it will simply wait continuously until hitting the
> max_interleave_delta, at which point it will burst out everything in
> the queue.  This would cause stalls and/or stuttering playback on the
> decklink output.
>
> To accommodate these sparse streams we added code to mux.c to not wait
> for 2038 packets.  A side-effect of that though is that packets will
> be sent through as soon as they hit the mux, which in most cases will
> be significantly ahead of the video (potentially hundreds of
> milliseconds).  This can easily be seen experimentally by adding an
> av_log() line to ff_decklink_write_packet(), which will show in many
> cases the PTS values of the data frames being sent 20+ frames before
> the corresponding video.

Okay, I realized there is one thing here I don't understand. What if we 
interleave data packets the same way as others, but we don't wait for them 
in order to start flushing packet queues?

So I wonder, if you removed the AV_CODEC_ID_SMPTE_2038 exception
from init_muxer when calculating si->nb_interleaved_streams but keep the 
exception in ff_interleave_packet_per_dts, and set 
max_interleave_delta to 1, would that work?

Regards,
Marton

Devin Heitmueller April 26, 2023, 2:30 p.m. UTC | #5

Hello Marton,

On Wed, Apr 26, 2023 at 3:36 AM Marton Balint <cus@passwd.hu> wrote:
> Okay, I realized there is one thing here I don't understand. What if we
> interleave data packets the same way as others, but we don't wait for them
> in order to start flushing packet queues?
>
> So I wonder, if you removed the AV_CODEC_ID_SMPTE_2038 exception
> from init_muxer when calculating si->nb_interleaved_streams but keep the
> exception in ff_interleave_packet_per_dts, and set
> max_interleave_delta to 1, would that work?

I was actually wondering the same thing after our email exchange
yesterday.  I haven't tried it yet, but I suspect it might very well
result in the 2038 packets not being very far ahead of the video.  We
still need an intermediate data structure to hold onto the 2038
packets (and there could be multiple) before the corresponding video
frame arrives, and a queue is still a reasonable data structure to
store those packets within the decklink module.

Your suggestion might be a good one, and it might change the behavior
such that the packets in general would be more often held in the mux
queue rather than the decklink queue.  But I don't think it changes
anything about the fundamental design, and it doesn't eliminate the
need for stashing the data packets until the corresponding video is to
be sent out.

Devin

Devin Heitmueller April 26, 2023, 2:45 p.m. UTC | #6

Hi Marton,

Sorry, I'm now recognizing I should have answered this email prior to
the later one.  Comments inline:

On Tue, Apr 25, 2023 at 5:59 PM Marton Balint <cus@passwd.hu> wrote:
> > Regarding the use of avpriv_packet_list() as opposed to
> > avpacket_queue_*, I used the avpacket_queue functions for consistency
> > with the decklink capture module where it is used today.  Also,
> > avpacket_queue is threadsafe while avpriv_packet_list.*() is not.
> > While the threadsafeness is not critical for the VANC case, I have
> > subsequent patches for audio where it is important, and I figured it
> > would more consistent to use the same queue mechanism within decklink
> > for all three (capture, audio output, and vanc output).
>
> Can you explain how thread safety will be relevant for audio? The
> muxer should get packets in a thread safe way, so I don't quite see how
> suddenly it will be needed.

I have a subsequent patch which supports multiple audio output streams
(which may be a mix of PCM and compressed audio).  Those streams need
to be interleaved together before submitting them to the hardware.  I
made a fundamental change to the design such that I employ an
intermediate FIFO which contains the interleaved audio, and the
submission to the hardware is done in the audio callback as we get
close to the scheduling deadline (which runs on a separate thread and
thus the queue needs to be thread-safe).

I am quite confident that considerable discussion will be needed to
explain why I arrived at this design decision, as even I will
acknowledge that it seems ugly at first inspection.  The design has
actually evolved three or four times over the last five years as I had
to address a variety of edge cases found in real-world usage and
working in low-latency environments.

> > That said, I wouldn't specifically object to converting to the
> > avpriv_packet_list functions since thread-safeness isn't really a
> > requirement for this particular case.  It's probably worth noting
> > though that I extended the avpacket_queue method to allow me to peek
> > at the first packet in the queue (which avpriv_packet_list doesn't
> > support today).  Hence converting to avpriv_packet_list would require
> > an equivalent addition to be accepted upstream.
>
> You can access the internals of the PacketList struct, so you can just add
> needed function to your own code, you don't necessarily have to make it
> public. On the other hand, the avpriv_packet_list does not have the
> concept of queue size or queue count, so it is not only thread safety
> that will be missing.

Ok.

> Two things bother me with the decklink queue:
>
> 1) It duplicates the functionality of avpriv_packet_list_put and
> avpriv_packet_list_get, but it seems to me it should not be difficult
> to actually use these get/put functions in the decklink queue as well,
> because it is already using the same packet list struct internally.
> Maybe can you give it a try?

Sure, I can take a look.  I am definitely in favor of using common
functions, and it wasn't until I looked more closely at the code did I
recognize why the author wrote yet another FIFO implementation rather
than using one of the standard public ones.

If we can end up with the decklink queue being a simple wrapper around
avpriv_packet_list() but with an added mutex, then I think that would
be ideal.

> 2) Namespacing of the struct / functions are wrong. Struct is called
> AVPacketQueue, it should be something like DecklinkPacketQueue in order
> to make it clear that it is not a public struct. The function names are
> prefixed with avpacket, which is also wrong. It should be simply
> packet_queue_xxx, av* would imply a public function. And if you
> factorize it to a non-static function, then it should be
> ff_decklink_packet_queue_xxx.

I never really liked the naming either, and agree that it implies the
functionality is public rather than private to decklink.  I can submit
a patch renaming the functions.

Devin

[FFmpeg-devel,0/2] Implement SMPTE 2038 output support over Decklink SDI

Message

Comments