diff mbox

[FFmpeg-devel] libavformat/mpegtsenc: new interleaved mux mode [v3]

Message ID Hld3RVCojMUWQ2IXngnngo5wsLPOg7i1rVgrJRcCVnH8B-9j1Drk5rMxDWaHommrh7g8YLdcNZf7lLb8syKHXLCUITZ6zwg_6Dt-4iRP0aE=@protonmail.com
State New
Headers show

Commit Message

Andreas Håkon Aug. 19, 2019, 7:16 p.m. UTC
Hi,

This is the third version of my patch for an "interleaved MPEG-TS muxer".
This new version includes all recommendations and rebases the fix of the
incorrect PCR with multiple programs (fixed in collaboration with Marton Balint).

Supersedes: https://patchwork.ffmpeg.org/patch/13745/

How to check it:

(Note: I use for all the tests the file
https://samples.ffmpeg.org/HDTV/bshi01.tp
)

- By default the current behavior is selected. You can verify that this
patch doesn’t alter the original behavior with this simple test:

$ ffmpeg-original -i bshi01.tp \
  -c:v copy -c:a copy -c:d copy \
  -f mpegts -muxrate 22M bshi01-stock.ts

$ ffmpeg-patched -i bshi01.tp \
  -c:v copy -c:a copy -c:d copy \
  -f mpegts -muxrate 22M -mpegts_extra_mux 0 bshi01-new.ts

$ cmp -b bshi01-stock.ts bshi01-new.ts

So both files are identical. The patch therefore doesn’t introduce any
changes in the implementation of the sequential mode.

- To check the new interlaced mode you can perform this other test:

$ ffmpeg-patched -y -loglevel verbose -i bshi01.tp \
  -map "i:0x100" -c:0 copy \
  -map "i:0x110" -c:a:0 mp2 -ac:0 2 -ar:0 48000 -ab:0 384k \
  -map "i:0x130" -c:2 copy \
  -map "i:0x110" -c:3 copy \
  -map "i:0x100" -c:4 copy \
  -program title=Prog1:st=0:st=1:st=2 \
  -program title=Prog2:st=3:st=4 \
  -f mpegts -muxrate 44M -mpegts_extra_mux 1 bshi01-mode1.ts

$ ffmpeg-patched -y -loglevel verbose -i bshi01.tp \
  -map "i:0x100" -c:0 copy \
  -map "i:0x110" -c:a:0 mp2 -ac:0 2 -ar:0 48000 -ab:0 384k \
  -map "i:0x130" -c:2 copy \
  -map "i:0x110" -c:3 copy \
  -map "i:0x100" -c:4 copy \
  -program title=Prog1:st=0:st=1:st=2 \
  -program title=Prog2:st=3:st=4 \
  -f mpegts -muxrate 44M -mpegts_extra_mux 0 bshi01-mode0.ts

And you can observe:

a) The size of the files “bshi01-mode0.ts” and “bshi01-mode1.ts” is
almost the same. If you inspect the content, you can verify that the
difference is based solely on: a) an small increase in the number of
NULL packets in mode 1; b) a few new packets with only PCR and
not payload in the first video stream.

b) If you demux the three files to elemental streams, then you can
check that the content is identical. Using the linux package “tstools”
you can do this check:

$ ts2es -pid 256 bshi01-mode0.ts bshi01-mode0-256.m2v
$ ts2es -pid 260 bshi01-mode0.ts bshi01-mode0-260.m2v
$ ts2es -pid 257 bshi01-mode0.ts bshi01-mode0-257.mp2
$ ts2es -pid 259 bshi01-mode0.ts bshi01-mode0-259.aac

$ ts2es -pid 256 bshi01-mode1.ts bshi01-mode1-256.m2v
$ ts2es -pid 260 bshi01-mode1.ts bshi01-mode1-260.m2v
$ ts2es -pid 257 bshi01-mode1.ts bshi01-mode1-257.mp2
$ ts2es -pid 259 bshi01-mode1.ts bshi01-mode1-259.aac

c) If you look at the internal content of the files you can verify that
the original “bshi01.tp” file has all pids interlaced, but this isn’t true
for the file “bshi01-mode0.ts”. However, the file “bshi01-mode1.ts”
has an internal structure similar to that of the original file.
You can view the content using the well-known tool
“DVB Inspector” with the “Grid View” option.

These tests confirm the correctness of the implementation of this
new multiplexing mode.

Regards.
A.H.

---
From 4636f83ca24e71fb807d48d3713bda6d3254938a Mon Sep 17 00:00:00 2001
From: Andreas Hakon <andreas.hakon@protonmail.com>
Date: Mon, 19 Aug 2019 20:57:32 +0200
Subject: [PATCH] libavformat/mpegtsenc: interleaved mux mode [v3]

This patch implements a new optional "interleaved mux mode" in the MPEGTS muxer.
The strategy that implements the current mux (selected by default) is based on
writing full PES packages sequentially. This mode can be problematic when using
with DTV broadcasts, as some large video PES packets can delay the writing of
other elementary streams.
The new optional parameter "-mpegts_interleave_mux 1" enables another strategy.
Instead of writing all PES packets sequentially, the first TS packet of each PES
packet is written when the PES packet is started. But the rest of the PES data
will be written later, and interleaved between all the mux streams.
This new (optional) behavior has clear advantages when multiplexing multiple
programs with several video streams. And although this does not turn the
current implementation into a professional muxer, it brings the result closer
to what professional equipment does.

Example of use:

ffmpeg -i INPUT.ts \
  -map "i:0x100" -c:v:0 copy \
  -map "i:0x110" -c:a:0 copy \
  -map "i:0x100" -c:v:1 copy \
  -map "i:0x110" -c:a:1 copy \
  -program title=Prog1:st=0:st=1 \
  -program title=Prog2:st=2:st=3 \
  -f mpegts -muxrate 44M -mpegts_interleave_mux 1 OUTPUT.ts

Signed-off-by: Andreas Hakon <andreas.hakon@protonmail.com>
---
 libavformat/mpegtsenc.c |  188 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 152 insertions(+), 36 deletions(-)

Comments

Andreas Håkon Aug. 26, 2019, 1:05 p.m. UTC | #1
Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, 19 de August de 2019 21:16, Andreas Håkon <andreas.hakon@protonmail.com> wrote:

> Hi,
>
> This is the third version of my patch for an "interleaved MPEG-TS muxer".
> This new version includes all recommendations and rebases the fix of the
> incorrect PCR with multiple programs (fixed in collaboration with Marton Balint).
>
> Supersedes: https://patchwork.ffmpeg.org/patch/13745/
>
> How to check it:
>
> (Note: I use for all the tests the file
> https://samples.ffmpeg.org/HDTV/bshi01.tp
> )
>
> - To check the new interlaced mode you can perform this other test:
>
> $ ffmpeg-patched -y -loglevel verbose -i bshi01.tp \
>   -map "i:0x100" -c:0 copy \
>   -map "i:0x110" -c:a:0 mp2 -ac:0 2 -ar:0 48000 -ab:0 384k \
>   -map "i:0x130" -c:2 copy \
>   -map "i:0x110" -c:3 copy \
>   -map "i:0x100" -c:4 copy \
>   -program title=Prog1:st=0:st=1:st=2 \
>   -program title=Prog2:st=3:st=4 \
>   -f mpegts -muxrate 44M -mpegts_extra_mux 1 bshi01-mode1.ts
>
> $ ffmpeg-patched -y -loglevel verbose -i bshi01.tp \
>   -map "i:0x100" -c:0 copy \
>   -map "i:0x110" -c:a:0 mp2 -ac:0 2 -ar:0 48000 -ab:0 384k \
>   -map "i:0x130" -c:2 copy \
>   -map "i:0x110" -c:3 copy \
>   -map "i:0x100" -c:4 copy \
>   -program title=Prog1:st=0:st=1:st=2 \
>   -program title=Prog2:st=3:st=4 \
>   -f mpegts -muxrate 44M -mpegts_extra_mux 0 bshi01-mode0.ts
>
> ---

To understand what this patch is doing, see these screenshots about the test files
"bshi01-mode0.ts" and "bshi01-mode1.ts":

- Current muxing: https://trac.ffmpeg.org/attachment/ticket/8096/MODE-0.PNG
- New interleaved muxing mode: https://trac.ffmpeg.org/attachment/ticket/8096/MODE-1.PNG

See also this example of a professional muxer: https://trac.ffmpeg.org/attachment/ticket/8096/MPTS.PNG

I hope this patch will be reviewed soon.
Regards.
A.H.

---
Marton Balint Aug. 27, 2019, 10:46 p.m. UTC | #2
On Mon, 26 Aug 2019, Andreas Håkon wrote:

> Hi,
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, 19 de August de 2019 21:16, Andreas Håkon <andreas.hakon@protonmail.com> wrote:
>
>> Hi,
>>
>> This is the third version of my patch for an "interleaved MPEG-TS muxer".
>> This new version includes all recommendations and rebases the fix of the
>> incorrect PCR with multiple programs (fixed in collaboration with Marton Balint).
>>
>> Supersedes: https://patchwork.ffmpeg.org/patch/13745/
>>
>> How to check it:
>>
>> (Note: I use for all the tests the file
>> https://samples.ffmpeg.org/HDTV/bshi01.tp
>> )
>>
>> - To check the new interlaced mode you can perform this other test:
>>
>> $ ffmpeg-patched -y -loglevel verbose -i bshi01.tp \
>>   -map "i:0x100" -c:0 copy \
>>   -map "i:0x110" -c:a:0 mp2 -ac:0 2 -ar:0 48000 -ab:0 384k \
>>   -map "i:0x130" -c:2 copy \
>>   -map "i:0x110" -c:3 copy \
>>   -map "i:0x100" -c:4 copy \
>>   -program title=Prog1:st=0:st=1:st=2 \
>>   -program title=Prog2:st=3:st=4 \
>>   -f mpegts -muxrate 44M -mpegts_extra_mux 1 bshi01-mode1.ts
>>
>> $ ffmpeg-patched -y -loglevel verbose -i bshi01.tp \
>>   -map "i:0x100" -c:0 copy \
>>   -map "i:0x110" -c:a:0 mp2 -ac:0 2 -ar:0 48000 -ab:0 384k \
>>   -map "i:0x130" -c:2 copy \
>>   -map "i:0x110" -c:3 copy \
>>   -map "i:0x100" -c:4 copy \
>>   -program title=Prog1:st=0:st=1:st=2 \
>>   -program title=Prog2:st=3:st=4 \
>>   -f mpegts -muxrate 44M -mpegts_extra_mux 0 bshi01-mode0.ts
>>
>> ---
>
> To understand what this patch is doing, see these screenshots about the test files
> "bshi01-mode0.ts" and "bshi01-mode1.ts":
>
> - Current muxing: https://trac.ffmpeg.org/attachment/ticket/8096/MODE-0.PNG
> - New interleaved muxing mode: https://trac.ffmpeg.org/attachment/ticket/8096/MODE-1.PNG
>
> See also this example of a professional muxer: https://trac.ffmpeg.org/attachment/ticket/8096/MPTS.PNG

Thanks for your changes, it was much easier for me to see what is going 
on. There is still room for simplification, you can probably factorize the 
patch which assigns ->stream_id to the context to a sperate patch. And you 
can probably loose most of the PES flags you introduced because the 
state of the PES packet can be decided based on payload_top and 
payload_size. This also helps you to reduce the number of functions added.

The biggest issue however is that your interleaving algorithm works by 
draining pending PES packets in a round robin fashion TS packet by TS 
packet. So if you have streams A, B, and stream B has twice the bitrate of 
stream A, you will get something like this: ABABABBBB when in fact you 
should be getting something like ABBABBABB. So I am not sure if we should 
add an "interleaving" mode if it only works properly for streams with 
roughly the same bitrate. It certainly does not fix ticket #912.

Even if we don't want to implement a "proper" muxer, there should be a way 
to improve the interleaving algorightm to have a better chance of 
outputting something that is spec compliant. I might give it a try.

Regards,
Marton
Andreas Håkon Aug. 28, 2019, 7:39 a.m. UTC | #3
Hi Marton,


Thank you very much for finally answering regarding this patch!


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, 28 de August de 2019 0:46, Marton Balint <cus@passwd.hu> wrote:

> On Mon, 26 Aug 2019, Andreas Håkon wrote:
>
> > To understand what this patch is doing, see these screenshots about the test files
> > "bshi01-mode0.ts" and "bshi01-mode1.ts":
> >
> > -   Current muxing: https://trac.ffmpeg.org/attachment/ticket/8096/MODE-0.PNG
> > -   New interleaved muxing mode: https://trac.ffmpeg.org/attachment/ticket/8096/MODE-1.PNG
> >
> > See also this example of a professional muxer: https://trac.ffmpeg.org/attachment/ticket/8096/MPTS.PNG
>
> Thanks for your changes, it was much easier for me to see what is going
> on. There is still room for simplification, you can probably factorize the
> patch which assigns ->stream_id to the context to a sperate patch. And you
> can probably loose most of the PES flags you introduced because the
> state of the PES packet can be decided based on payload_top and
> payload_size. This also helps you to reduce the number of functions added.

OK. I'll prepare another simple patch to move the ->stream_id to the context.

Regarding the PES flags... I can't discard any!

- #define PES_FLAG_READY     : Ready to write the PES packet.
- #define PES_FLAG_START     : Write the header PES.
- #define PES_FLAG_PARTIAL   : Write only one TS packet.
- #define PES_FLAG_NEEDS_END : The PES packet contains TELETEXT

Note that these flags are because the function mpegts_write_pes() it's used
in a session less way (the PES packet state is stored inside the flags).
So I can't use payload_top and payload_size to deduce the state of the PES
packet.


> The biggest issue however is that your interleaving algorithm works by
> draining pending PES packets in a round robin fashion TS packet by TS
> packet. So if you have streams A, B, and stream B has twice the bitrate of
> stream A, you will get something like this: ABABABBBB when in fact you
> should be getting something like ABBABBABB. So I am not sure if we should
> add an "interleaving" mode if it only works properly for streams with
> roughly the same bitrate. It certainly does not fix ticket #912.

Yes and no.

It's true that the interleaving mode introduced by this patch can be improved.
However, it doesn't have any troubles when using streams with different bitrates.

And your example is wrong:

- ABBABBABB: That's ONE option.
- ABABABBBB: That's NOT the current output with my interleaving algorithm.
- AABABBBBB: That IS the current output with my interleaving algorithm.
- AAABBBBBB: And that's the current sequential mode.

What's the difference? The STARTING point of a PES packet. That's because
the relevant trigger to decide to insert or not a packet from a stream is
the PTS/DTS timestamps. And that decision is calculated at the first packet
(when the PES header is writed). Then, after that point, you don't need to worry
about the exact position of the rest of the packets. You only need to ensure
that the last PES packet is writed before the start of the next PES packet
of the same stream. For this reason you have the decoder buffer. And the
MPEG-TS muxer doesn't need to care about it, as the responsability is from
the encoder. In fact, the worst case is the current sequential mode, as
you write the entire PES packet as quickly as possible. So the buffer fills
up quickly. And if you interleave, then you're reducing the filling speed.
Then no problem at all, as the end of the PES will arrive before the start
of the next PES. So, no buffer underflow is possible.

> It certainly does not fix ticket #912.

My goal is not to solve that error, it's to implement an interleaved mode.

The main problem when using the sequential mode (the only option at time)
is when you use multiple programs. With multiple video streams, plus other
audio and data streams, this interleaved implementation is a big breakthrough.


> Even if we don't want to implement a "proper" muxer, there should be a way
> to improve the interleaving algorightm to have a better chance of
> outputting something that is spec compliant. I might give it a try.

I prefer to first introduce the "mpegts_interleave_mux" option in the
mpegtsenc muxer, and after improve the algorithm. The current sequential
strategy is very inefficient.

Are you willing to help me to achieve that goal?

Regards.
A.H.

---
diff mbox

Patch

diff --git a/libavformat/mpegtsenc.c b/libavformat/mpegtsenc.c
index d4dd4ab..dc1b68f 100644
--- a/libavformat/mpegtsenc.c
+++ b/libavformat/mpegtsenc.c
@@ -87,6 +87,7 @@  typedef struct MpegTSWrite {
     int64_t next_pcr;
     int mux_rate; ///< set to 1 when VBR
     int pes_payload_size;
+    int pes_payload_max;
 
     int transport_stream_id;
     int original_network_id;
@@ -96,6 +97,7 @@  typedef struct MpegTSWrite {
     int pmt_start_pid;
     int start_pid;
     int m2ts_mode;
+    int parallel_mux;
 
     int pcr_period;
 #define MPEGTS_FLAG_REEMIT_PAT_PMT  0x01
@@ -117,6 +119,7 @@  typedef struct MpegTSWrite {
 /* a PES packet header is generated every DEFAULT_PES_HEADER_FREQ packets */
 #define DEFAULT_PES_HEADER_FREQ  16
 #define DEFAULT_PES_PAYLOAD_SIZE ((DEFAULT_PES_HEADER_FREQ - 1) * 184 + 170)
+#define MAX_PES_PAYLOAD 2 * 200 * 1024  // From mpegts.c
 
 /* The section length is 12 bits. The first 2 are set to 0, the remaining
  * 10 bits should not exceed 1021. */
@@ -229,11 +232,18 @@  typedef struct MpegTSWriteStream {
     int cc;
     int discontinuity;
     int payload_size;
+    int payload_top;
+    int stream_id;
     int first_pts_check; ///< first pts check needed
     int prev_payload_key;
     int64_t payload_pts;
     int64_t payload_dts;
     int payload_flags;
+    int pes_flags;
+#define PES_FLAG_READY      0x01
+#define PES_FLAG_START      0x02
+#define PES_FLAG_PARTIAL    0x04
+#define PES_FLAG_NEEDS_END  0x08
     uint8_t *payload;
     AVFormatContext *amux;
 
@@ -905,7 +915,8 @@  static int mpegts_init(AVFormatContext *s)
 
         avpriv_set_pts_info(st, 33, 1, 90000);
 
-        ts_st->payload = av_mallocz(ts->pes_payload_size);
+        ts->pes_payload_max = FFMAX(MAX_PES_PAYLOAD, ts->pes_payload_size);
+        ts_st->payload = av_mallocz(ts->parallel_mux ? ts->pes_payload_max : ts->pes_payload_size);
         if (!ts_st->payload) {
             ret = AVERROR(ENOMEM);
             goto fail;
@@ -940,6 +951,9 @@  static int mpegts_init(AVFormatContext *s)
         pids[i]                = ts_st->pid;
         ts_st->payload_pts     = AV_NOPTS_VALUE;
         ts_st->payload_dts     = AV_NOPTS_VALUE;
+        ts_st->payload_top     = 0;
+        ts_st->stream_id       = -1;
+        ts_st->pes_flags       = 0;
         ts_st->first_pts_check = 1;
         ts_st->cc              = 15;
         ts_st->discontinuity   = ts->flags & MPEGTS_FLAG_DISCONT;
@@ -1167,15 +1181,17 @@  static uint8_t *get_ts_payload_start(uint8_t *pkt)
 /* Add a PES header to the front of the payload, and segment into an integer
  * number of TS packets. The final TS packet is padded using an oversized
  * adaptation header to exactly fill the last TS packet.
- * NOTE: 'payload' contains a complete PES payload. */
-static void mpegts_write_pes(AVFormatContext *s, AVStream *st,
+ * NOTE: 'payload' contains a complete PES payload, or a partial chunk when
+ *       the writing of the PES packet has already begun. */
+static int mpegts_write_pes(AVFormatContext *s, AVStream *st,
                              const uint8_t *payload, int payload_size,
-                             int64_t pts, int64_t dts, int key, int stream_id)
+                             int64_t pts, int64_t dts, int key)
 {
     MpegTSWriteStream *ts_st = st->priv_data;
     MpegTSWrite *ts = s->priv_data;
     uint8_t buf[TS_PACKET_SIZE];
     uint8_t *q;
+    int ret_size = 0;
     int val, is_start, len, header_len, write_pcr, is_dvb_subtitle, is_dvb_teletext, flags;
     int afc_len, stuffing_len;
     int64_t delay = av_rescale(s->max_delay, 90000, AV_TIME_BASE);
@@ -1186,7 +1202,7 @@  static void mpegts_write_pes(AVFormatContext *s, AVStream *st,
         force_pat = 1;
     }
 
-    is_start = 1;
+    is_start = !!(ts_st->pes_flags & PES_FLAG_START);
     while (payload_size > 0) {
         int64_t pcr = -1; /* avoid warning */
 
@@ -1297,9 +1313,9 @@  static void mpegts_write_pes(AVFormatContext *s, AVStream *st,
                        st->codecpar->codec_id == AV_CODEC_ID_TIMED_ID3) {
                 *q++ = 0xbd;
             } else if (st->codecpar->codec_type == AVMEDIA_TYPE_DATA) {
-                *q++ = stream_id != -1 ? stream_id : 0xfc;
+                *q++ = ts_st->stream_id != -1 ? ts_st->stream_id : 0xfc;
 
-                if (stream_id == 0xbd) /* asynchronous KLV */
+                if (ts_st->stream_id == 0xbd) /* asynchronous KLV */
                     pts = dts = AV_NOPTS_VALUE;
             } else {
                 *q++ = 0xbd;
@@ -1400,12 +1416,15 @@  static void mpegts_write_pes(AVFormatContext *s, AVStream *st,
                  * subtitle_stream_id: for DVB subtitle stream shall be identified by the value 0x00 */
                 *q++ = 0x20;
                 *q++ = 0x00;
+                /* It's required to add a final mark */
+                ts_st->pes_flags |= PES_FLAG_NEEDS_END;
             }
             if (is_dvb_teletext) {
                 memset(q, 0xff, pes_header_stuffing_bytes);
                 q += pes_header_stuffing_bytes;
             }
             is_start = 0;
+            ts_st->pes_flags &= ~PES_FLAG_START;
         }
         /* header size */
         header_len = q - buf;
@@ -1436,7 +1455,7 @@  static void mpegts_write_pes(AVFormatContext *s, AVStream *st,
             }
         }
 
-        if (is_dvb_subtitle && payload_size == len) {
+        if ((ts_st->pes_flags & PES_FLAG_NEEDS_END) && payload_size == len) {
             memcpy(buf + TS_PACKET_SIZE - len, payload, len - 1);
             buf[TS_PACKET_SIZE - 1] = 0xff; /* end_of_PES_data_field_marker: an 8-bit field with fixed contents 0xff for DVB subtitle */
         } else {
@@ -1447,8 +1466,12 @@  static void mpegts_write_pes(AVFormatContext *s, AVStream *st,
         payload_size -= len;
         mpegts_prefix_m2ts_header(s);
         avio_write(s->pb, buf, TS_PACKET_SIZE);
+        ret_size += len;
+        if (ts_st->pes_flags & PES_FLAG_PARTIAL)
+            break;
     }
     ts_st->prev_payload_key = key;
+    return ret_size;
 }
 
 int ff_check_h264_startcode(AVFormatContext *s, const AVStream *st, const AVPacket *pkt)
@@ -1535,6 +1558,83 @@  static int opus_get_packet_samples(AVFormatContext *s, AVPacket *pkt)
     return duration;
 }
 
+static inline void mpegts_write_full_pes_stream(AVFormatContext *s, AVStream *st, const uint8_t *payload)
+{
+    MpegTSWriteStream *ts_st = st->priv_data;
+    ts_st->pes_flags |= PES_FLAG_START;
+    ts_st->pes_flags &= ~PES_FLAG_PARTIAL;
+    mpegts_write_pes(s, st, payload,
+                    ts_st->payload_size,
+                    ts_st->payload_pts, ts_st->payload_dts,
+                    ts_st->payload_flags & AV_PKT_FLAG_KEY);
+    ts_st->payload_size = 0;
+    ts_st->payload_top  = 0;
+    ts_st->pes_flags    = 0;
+}
+
+static inline void mpegts_flush_pes_stream(AVFormatContext *s, AVStream *st, const uint8_t *payload)
+{
+    MpegTSWriteStream *ts_st = st->priv_data;
+    ts_st->pes_flags &= ~PES_FLAG_PARTIAL;
+    mpegts_write_pes(s, st, payload + ts_st->payload_top,
+                    ts_st->payload_size - ts_st->payload_top,
+                    ts_st->payload_pts, ts_st->payload_dts,
+                    ts_st->payload_flags & AV_PKT_FLAG_KEY);
+    ts_st->payload_size = 0;
+    ts_st->payload_top  = 0;
+    ts_st->pes_flags    = 0;
+}
+
+static inline void mpegts_write_partial_pes_stream(AVFormatContext *s, AVStream *st, const uint8_t *payload)
+{
+    MpegTSWriteStream *ts_st = st->priv_data;
+    MpegTSWrite *ts = s->priv_data;
+    if (ts_st->payload_size > ts->pes_payload_max) {
+        av_log(s, AV_LOG_WARNING, "PES packet oversized (%d), full sequential writing required\n", ts_st->payload_size);    
+        ts_st->pes_flags &= ~PES_FLAG_PARTIAL;
+    } else
+        ts_st->pes_flags |= PES_FLAG_PARTIAL;
+    ts_st->payload_top += mpegts_write_pes(s, st, payload + ts_st->payload_top,
+                                        ts_st->payload_size - ts_st->payload_top,
+                                        ts_st->payload_pts, ts_st->payload_dts,
+                                        ts_st->payload_flags & AV_PKT_FLAG_KEY);
+    if (ts_st->payload_size && ts_st->payload_size == ts_st->payload_top) {
+        ts_st->payload_size = 0;
+        ts_st->payload_top  = 0;
+        ts_st->pes_flags    = 0;
+    }
+}
+
+static inline void mpegts_start_partial_pes_stream(AVFormatContext *s, AVStream *st, const uint8_t *payload)
+{
+    MpegTSWriteStream *ts_st = st->priv_data;
+    ts_st->payload_top = 0;
+    ts_st->pes_flags |= PES_FLAG_START | PES_FLAG_READY;
+    mpegts_write_partial_pes_stream(s, st, payload);
+}
+
+static void write_side_pes_streams(AVFormatContext *s, int64_t dts, int64_t delay, int parallel)
+{
+    int i;
+    for(i=0; i<s->nb_streams; i++) {
+        AVStream *st2 = s->streams[i];
+        MpegTSWriteStream *ts_st2 = st2->priv_data;
+        if (   ts_st2->payload_size
+            && (ts_st2->payload_dts == AV_NOPTS_VALUE
+                || dts - ts_st2->payload_dts > delay/2
+                || (ts_st2->pes_flags & PES_FLAG_READY))) {
+            if (parallel) {
+                if (!(ts_st2->pes_flags & PES_FLAG_READY))
+                    mpegts_start_partial_pes_stream(s, st2, ts_st2->payload);
+                else
+                    mpegts_write_partial_pes_stream(s, st2, ts_st2->payload);
+            } else {
+                mpegts_write_full_pes_stream(s, st2, ts_st2->payload);
+            }
+        }
+    }
+}
+
 static int mpegts_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
 {
     AVStream *st = s->streams[pkt->stream_index];
@@ -1548,13 +1648,12 @@  static int mpegts_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
     int opus_samples = 0;
     int side_data_size;
     uint8_t *side_data = NULL;
-    int stream_id = -1;
 
     side_data = av_packet_get_side_data(pkt,
                                         AV_PKT_DATA_MPEGTS_STREAM_ID,
                                         &side_data_size);
     if (side_data)
-        stream_id = side_data[0];
+        ts_st->stream_id = side_data[0];
 
     if (ts->flags & MPEGTS_FLAG_REEMIT_PAT_PMT) {
         ts->pat_packet_count = ts->pat_packet_period - 1;
@@ -1749,52 +1848,66 @@  static int mpegts_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
     }
 
     if (pkt->dts != AV_NOPTS_VALUE) {
-        int i;
-        for(i=0; i<s->nb_streams; i++) {
-            AVStream *st2 = s->streams[i];
-            MpegTSWriteStream *ts_st2 = st2->priv_data;
-            if (   ts_st2->payload_size
-               && (ts_st2->payload_dts == AV_NOPTS_VALUE || dts - ts_st2->payload_dts > delay/2)) {
-                mpegts_write_pes(s, st2, ts_st2->payload, ts_st2->payload_size,
-                                 ts_st2->payload_pts, ts_st2->payload_dts,
-                                 ts_st2->payload_flags & AV_PKT_FLAG_KEY, stream_id);
-                ts_st2->payload_size = 0;
-            }
-        }
+        if (!ts->parallel_mux) {
+            /* In sequential mode first flush PES packets of other streams */
+            write_side_pes_streams(s, dts, delay, ts->parallel_mux);
+        } else while (ts_st->pes_flags & PES_FLAG_READY)
+            /* In parallel mode first write one PES chunk of every pending streams */
+            write_side_pes_streams(s, dts, delay, ts->parallel_mux);
     }
 
+    /* Complete the PES packet if it's time to do it
+     * (incoming data will go into another PES packet) */
     if (ts_st->payload_size && (ts_st->payload_size + size > ts->pes_payload_size ||
         (dts != AV_NOPTS_VALUE && ts_st->payload_dts != AV_NOPTS_VALUE &&
          av_compare_ts(dts - ts_st->payload_dts, st->time_base,
                        s->max_delay, AV_TIME_BASE_Q) >= 0) ||
         ts_st->opus_queued_samples + opus_samples >= 5760 /* 120ms */)) {
-        mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_size,
-                         ts_st->payload_pts, ts_st->payload_dts,
-                         ts_st->payload_flags & AV_PKT_FLAG_KEY, stream_id);
-        ts_st->payload_size = 0;
-        ts_st->opus_queued_samples = 0;
+        if (!ts->parallel_mux || ts_st->opus_queued_samples) {
+            mpegts_write_full_pes_stream(s, st, ts_st->payload);
+            ts_st->opus_queued_samples = 0;
+        } else {
+            mpegts_start_partial_pes_stream(s, st, ts_st->payload);
+            /* and write the entire pes packet interleaved until end */
+            while (ts_st->pes_flags & PES_FLAG_READY)
+                write_side_pes_streams(s, dts, delay, ts->parallel_mux);
+        }
     }
 
+    /* Complete the PES packet too with the incoming data when necessary */
     if (st->codecpar->codec_type != AVMEDIA_TYPE_AUDIO || size > ts->pes_payload_size) {
         av_assert0(!ts_st->payload_size);
-        // for video and subtitle, write a single pes packet
-        mpegts_write_pes(s, st, buf, size, pts, dts,
-                         pkt->flags & AV_PKT_FLAG_KEY, stream_id);
-        ts_st->opus_queued_samples = 0;
-        av_free(data);
-        return 0;
+        /* for video and subtitle, a single pes packet is required */
+        ts_st->payload_size  = size;
+        ts_st->payload_pts   = pts;
+        ts_st->payload_dts   = dts;
+        ts_st->payload_flags = pkt->flags;
+        if (!ts->parallel_mux || ts_st->opus_queued_samples) {
+            mpegts_write_full_pes_stream(s, st, buf);
+            ts_st->opus_queued_samples = 0;
+            goto free;
+        } else {
+            /* start writing the first part of this pes packet */
+            mpegts_start_partial_pes_stream(s, st, buf);
+            if (ts_st->payload_size == 0)
+                goto free;
+            ts_st->payload_size = 0;  // note: this value will be set later
+        }
     }
 
+    /* Start a new PES packet */
     if (!ts_st->payload_size) {
         ts_st->payload_pts   = pts;
         ts_st->payload_dts   = dts;
         ts_st->payload_flags = pkt->flags;
     }
 
+    /* Enqueue new data in the current PES packet */
     memcpy(ts_st->payload + ts_st->payload_size, buf, size);
     ts_st->payload_size += size;
     ts_st->opus_queued_samples += opus_samples;
 
+free:
     av_free(data);
 
     return 0;
@@ -1809,9 +1922,9 @@  static void mpegts_write_flush(AVFormatContext *s)
         AVStream *st = s->streams[i];
         MpegTSWriteStream *ts_st = st->priv_data;
         if (ts_st->payload_size > 0) {
-            mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_size,
-                             ts_st->payload_pts, ts_st->payload_dts,
-                             ts_st->payload_flags & AV_PKT_FLAG_KEY, -1);
+            if (!ts_st->pes_flags)
+                ts_st->pes_flags |= PES_FLAG_START;
+            mpegts_flush_pes_stream(s, st, ts_st->payload);
             ts_st->payload_size = 0;
             ts_st->opus_queued_samples = 0;
         }
@@ -1929,6 +2042,9 @@  static const AVOption options[] = {
     { "mpegts_m2ts_mode", "Enable m2ts mode.",
       offsetof(MpegTSWrite, m2ts_mode), AV_OPT_TYPE_BOOL,
       { .i64 = -1 }, -1, 1, AV_OPT_FLAG_ENCODING_PARAM },
+    { "mpegts_interleave_mux", "Enable interleaved (non-sequential) muxing mode.",
+      offsetof(MpegTSWrite, parallel_mux), AV_OPT_TYPE_BOOL,
+      { .i64 = 0 }, 0, 1, AV_OPT_FLAG_ENCODING_PARAM },
     { "muxrate", NULL,
       offsetof(MpegTSWrite, mux_rate), AV_OPT_TYPE_INT,
       { .i64 = 1 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM },