Message ID | 1682111554-31597-1-git-send-email-dheitmueller@ltnglobal.com |
---|---|
Headers | show |
Series | Implement SMPTE 2038 output support over Decklink SDI | expand |
On Fri, 21 Apr 2023, Devin Heitmueller wrote: > This patch series implements output of SMPTE 2038 VANC over SDI, building > on the prior patch series which added it in the TS domain. Note that > we moved the AVPacketQueue to be common code within libavdevice so it > can be shared by both the decklink input and output. > > Comments/feedback are welcome. In general, queueing packets in specific components should be avoided if possible. Muxed packets are normally ordered by DTS and stream id, generic code ensures that. If you want something other than that, then I think the perferred way of doing it is by providing a custom interleave function. (e.g. to ensure you get data packets before video even if data stream has a higher stream ID.) If you are only using the queue to store multiple data packets for a single frame then one way to avoid it is to parse them as soon as they arrive via the KLV library. If you insist on queueing them (maybe because not every packet will be parased by the KLV lib), then I'd rather see you use avpriv_packet_list_*() functions, and not a custom decklink implementation. Regards, Marton
Hello Marton, Thanks for reviewing. Comments inline: On Sun, Apr 23, 2023 at 2:43 PM Marton Balint <cus@passwd.hu> wrote: > In general, queueing packets in specific components should be avoided if > possible. Muxed packets are normally ordered by DTS and stream id, generic > code ensures that. If you want something other than that, then I think > the perferred way of doing it is by providing a custom interleave > function. (e.g. to ensure you get data packets before video even if data > stream has a higher stream ID.) To be clear, using a queue was not first choice. It's the result of trying different approaches, and I'm open to constructive suggestions on alternatives. While what you're are saying is correct "in general", there are some really important reasons why it doesn't work in this case. Permit me to explain... By default, the behavior of the mux interleaver is to wait until there is at least one packet available for each stream before writing to the output module (in this case decklink). However data formats such as SMPTE ST2038 are considered to be "sparse" as there isn't necessarily a continuous stream of packets like with video and audio (there may be many seconds between packets, or no packets at all). As a result you can't wait for a packet to be available on all streams since on some streams it will simply wait continuously until hitting the max_interleave_delta, at which point it will burst out everything in the queue. This would cause stalls and/or stuttering playback on the decklink output. To accommodate these sparse streams we added code to mux.c to not wait for 2038 packets. A side-effect of that though is that packets will be sent through as soon as they hit the mux, which in most cases will be significantly ahead of the video (potentially hundreds of milliseconds). This can easily be seen experimentally by adding an av_log() line to ff_decklink_write_packet(), which will show in many cases the PTS values of the data frames being sent 20+ frames before the corresponding video. The queue is there because the data packets and video frames arrive in separate calls to write_packet(), and they need to be combined to ensure they are inserted into the same video frame. Stashing the data packets seemed like a reasonable approach, and a queue seemed like a good choice as a data structure since there can be multiple data packets for a video frame and we might receive data packets for multiple video frames before the corresponding video frames arrived. The notion you mentioned that the data packets might arrive after the video frames is a valid concern hypothetically. In practice it hasn't been an issue, as the data packets tend to arrive long before the video. It was not a motivation for using a queue. If a data packet did arrive after the video (due to the DTS and stream ID ordering you mentioned), the implementation would insert it on the next video frame and it would effectively be one frame late. I was willing to accept this edge case given it doesn't actually happen in practice. > If you are only using the queue to store multiple data packets for a > single frame then one way to avoid it is to parse them as soon as they > arrive via the KLV library. If you insist on queueing them (maybe because > not every packet will be parased by the KLV lib), then I'd rather see you > use avpriv_packet_list_*() functions, and not a custom decklink > implementation. Passing them off to libklvanc doesn't actually change the queueing problem. The libklvanc library doesn't actually output the VANC packets but rather just converts them into the byte sequences that then need to be embedded into the video frames. I guess I could queue the output of libklvanc rather than the original AVPackets, but that doesn't actually solve any of the problems described above, and actually makes things more complicated since the AVPackets contain all the timing data and the VANC byte blobs would need to queue not just the data but also the output timing, VANC line number and horizontal position within the VANC region. Regarding the use of avpriv_packet_list() as opposed to avpacket_queue_*, I used the avpacket_queue functions for consistency with the decklink capture module where it is used today. Also, avpacket_queue is threadsafe while avpriv_packet_list.*() is not. While the threadsafeness is not critical for the VANC case, I have subsequent patches for audio where it is important, and I figured it would more consistent to use the same queue mechanism within decklink for all three (capture, audio output, and vanc output). That said, I wouldn't specifically object to converting to the avpriv_packet_list functions since thread-safeness isn't really a requirement for this particular case. It's probably worth noting though that I extended the avpacket_queue method to allow me to peek at the first packet in the queue (which avpriv_packet_list doesn't support today). Hence converting to avpriv_packet_list would require an equivalent addition to be accepted upstream. Devin -- Devin Heitmueller, Senior Software Engineer LTN Global Communications o: +1 (301) 363-1001 w: https://ltnglobal.com e: devin.heitmueller@ltnglobal.com
On Mon, 24 Apr 2023, Devin Heitmueller wrote: > Hello Marton, > > Thanks for reviewing. Comments inline: > > On Sun, Apr 23, 2023 at 2:43 PM Marton Balint <cus@passwd.hu> wrote: [...] Thanks for the detailed explanations. I guess then keeping the queue is well justified here. > Regarding the use of avpriv_packet_list() as opposed to > avpacket_queue_*, I used the avpacket_queue functions for consistency > with the decklink capture module where it is used today. Also, > avpacket_queue is threadsafe while avpriv_packet_list.*() is not. > While the threadsafeness is not critical for the VANC case, I have > subsequent patches for audio where it is important, and I figured it > would more consistent to use the same queue mechanism within decklink > for all three (capture, audio output, and vanc output). Can you explain how thread safety will be relevant for audio? The muxer should get packets in a thread safe way, so I don't quite see how suddenly it will be needed. > > That said, I wouldn't specifically object to converting to the > avpriv_packet_list functions since thread-safeness isn't really a > requirement for this particular case. It's probably worth noting > though that I extended the avpacket_queue method to allow me to peek > at the first packet in the queue (which avpriv_packet_list doesn't > support today). Hence converting to avpriv_packet_list would require > an equivalent addition to be accepted upstream. You can access the internals of the PacketList struct, so you can just add needed function to your own code, you don't necessarily have to make it public. On the other hand, the avpriv_packet_list does not have the concept of queue size or queue count, so it is not only thread safety that will be missing. Two things bother me with the decklink queue: 1) It duplicates the functionality of avpriv_packet_list_put and avpriv_packet_list_get, but it seems to me it should not be difficult to actually use these get/put functions in the decklink queue as well, because it is already using the same packet list struct internally. Maybe can you give it a try? 2) Namespacing of the struct / functions are wrong. Struct is called AVPacketQueue, it should be something like DecklinkPacketQueue in order to make it clear that it is not a public struct. The function names are prefixed with avpacket, which is also wrong. It should be simply packet_queue_xxx, av* would imply a public function. And if you factorize it to a non-static function, then it should be ff_decklink_packet_queue_xxx. With these two things fixed, things would look a lot better :) Regards, Marton
On Mon, 24 Apr 2023, Devin Heitmueller wrote: > Hello Marton, > > Thanks for reviewing. Comments inline: > > On Sun, Apr 23, 2023 at 2:43 PM Marton Balint <cus@passwd.hu> wrote: >> In general, queueing packets in specific components should be avoided if >> possible. Muxed packets are normally ordered by DTS and stream id, generic >> code ensures that. If you want something other than that, then I think >> the perferred way of doing it is by providing a custom interleave >> function. (e.g. to ensure you get data packets before video even if data >> stream has a higher stream ID.) > > To be clear, using a queue was not first choice. It's the result of > trying different approaches, and I'm open to constructive suggestions > on alternatives. > > While what you're are saying is correct "in general", there are some > really important reasons why it doesn't work in this case. Permit me > to explain... > > By default, the behavior of the mux interleaver is to wait until there > is at least one packet available for each stream before writing to the > output module (in this case decklink). However data formats such as > SMPTE ST2038 are considered to be "sparse" as there isn't necessarily > a continuous stream of packets like with video and audio (there may be > many seconds between packets, or no packets at all). As a result you > can't wait for a packet to be available on all streams since on some > streams it will simply wait continuously until hitting the > max_interleave_delta, at which point it will burst out everything in > the queue. This would cause stalls and/or stuttering playback on the > decklink output. > > To accommodate these sparse streams we added code to mux.c to not wait > for 2038 packets. A side-effect of that though is that packets will > be sent through as soon as they hit the mux, which in most cases will > be significantly ahead of the video (potentially hundreds of > milliseconds). This can easily be seen experimentally by adding an > av_log() line to ff_decklink_write_packet(), which will show in many > cases the PTS values of the data frames being sent 20+ frames before > the corresponding video. Okay, I realized there is one thing here I don't understand. What if we interleave data packets the same way as others, but we don't wait for them in order to start flushing packet queues? So I wonder, if you removed the AV_CODEC_ID_SMPTE_2038 exception from init_muxer when calculating si->nb_interleaved_streams but keep the exception in ff_interleave_packet_per_dts, and set max_interleave_delta to 1, would that work? Regards, Marton
Hello Marton, On Wed, Apr 26, 2023 at 3:36 AM Marton Balint <cus@passwd.hu> wrote: > Okay, I realized there is one thing here I don't understand. What if we > interleave data packets the same way as others, but we don't wait for them > in order to start flushing packet queues? > > So I wonder, if you removed the AV_CODEC_ID_SMPTE_2038 exception > from init_muxer when calculating si->nb_interleaved_streams but keep the > exception in ff_interleave_packet_per_dts, and set > max_interleave_delta to 1, would that work? I was actually wondering the same thing after our email exchange yesterday. I haven't tried it yet, but I suspect it might very well result in the 2038 packets not being very far ahead of the video. We still need an intermediate data structure to hold onto the 2038 packets (and there could be multiple) before the corresponding video frame arrives, and a queue is still a reasonable data structure to store those packets within the decklink module. Your suggestion might be a good one, and it might change the behavior such that the packets in general would be more often held in the mux queue rather than the decklink queue. But I don't think it changes anything about the fundamental design, and it doesn't eliminate the need for stashing the data packets until the corresponding video is to be sent out. Devin
Hi Marton, Sorry, I'm now recognizing I should have answered this email prior to the later one. Comments inline: On Tue, Apr 25, 2023 at 5:59 PM Marton Balint <cus@passwd.hu> wrote: > > Regarding the use of avpriv_packet_list() as opposed to > > avpacket_queue_*, I used the avpacket_queue functions for consistency > > with the decklink capture module where it is used today. Also, > > avpacket_queue is threadsafe while avpriv_packet_list.*() is not. > > While the threadsafeness is not critical for the VANC case, I have > > subsequent patches for audio where it is important, and I figured it > > would more consistent to use the same queue mechanism within decklink > > for all three (capture, audio output, and vanc output). > > Can you explain how thread safety will be relevant for audio? The > muxer should get packets in a thread safe way, so I don't quite see how > suddenly it will be needed. I have a subsequent patch which supports multiple audio output streams (which may be a mix of PCM and compressed audio). Those streams need to be interleaved together before submitting them to the hardware. I made a fundamental change to the design such that I employ an intermediate FIFO which contains the interleaved audio, and the submission to the hardware is done in the audio callback as we get close to the scheduling deadline (which runs on a separate thread and thus the queue needs to be thread-safe). I am quite confident that considerable discussion will be needed to explain why I arrived at this design decision, as even I will acknowledge that it seems ugly at first inspection. The design has actually evolved three or four times over the last five years as I had to address a variety of edge cases found in real-world usage and working in low-latency environments. > > That said, I wouldn't specifically object to converting to the > > avpriv_packet_list functions since thread-safeness isn't really a > > requirement for this particular case. It's probably worth noting > > though that I extended the avpacket_queue method to allow me to peek > > at the first packet in the queue (which avpriv_packet_list doesn't > > support today). Hence converting to avpriv_packet_list would require > > an equivalent addition to be accepted upstream. > > You can access the internals of the PacketList struct, so you can just add > needed function to your own code, you don't necessarily have to make it > public. On the other hand, the avpriv_packet_list does not have the > concept of queue size or queue count, so it is not only thread safety > that will be missing. Ok. > Two things bother me with the decklink queue: > > 1) It duplicates the functionality of avpriv_packet_list_put and > avpriv_packet_list_get, but it seems to me it should not be difficult > to actually use these get/put functions in the decklink queue as well, > because it is already using the same packet list struct internally. > Maybe can you give it a try? Sure, I can take a look. I am definitely in favor of using common functions, and it wasn't until I looked more closely at the code did I recognize why the author wrote yet another FIFO implementation rather than using one of the standard public ones. If we can end up with the decklink queue being a simple wrapper around avpriv_packet_list() but with an added mutex, then I think that would be ideal. > 2) Namespacing of the struct / functions are wrong. Struct is called > AVPacketQueue, it should be something like DecklinkPacketQueue in order > to make it clear that it is not a public struct. The function names are > prefixed with avpacket, which is also wrong. It should be simply > packet_queue_xxx, av* would imply a public function. And if you > factorize it to a non-static function, then it should be > ff_decklink_packet_queue_xxx. I never really liked the naming either, and agree that it implies the functionality is public rather than private to decklink. I can submit a patch renaming the functions. Devin