Message ID | 20210216140050.164009-2-philip-dylan.gleonec@savoirfairelinux.com |
---|---|
State | New |
Headers | show |
Series | libopusdec: Enable FEC/PLC | expand |
Context | Check | Description |
---|---|---|
andriy/x86_make | success | Make finished |
andriy/x86_make_fate | success | Make fate finished |
andriy/PPC64_make | success | Make finished |
andriy/PPC64_make_fate | success | Make fate finished |
Am Di., 16. Feb. 2021 um 15:02 Uhr schrieb Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com>: > > Adds FEC/PLC support to libopus. The lost packets are detected as a > discontinuity in the audio stream. When a discontinuity is used, this > patch tries to decode the FEC data. If FEC data is present in the > packet, it is decoded, otherwise audio is re-created through PLC. > > This patch is based on Steinar H. Gunderson contribution, and corrects > the pts computation: all pts are expressed in samples instead of time. > This patch also adds an option "decode_fec" which enables or disables > FEC decoding. This option is disabled by default to keep consistent > behaviour with former versions. Could you elaborate? I would have expected that the normal use case is not have a lossy input and that the new feature is always useful if data was lost. Or is there no way to distinguish between a stream with unusual timestamps and a stream with packet loss? Carl Eugen
Feb 16, 2021, 20:07 by ceffmpeg@gmail.com: > Am Di., 16. Feb. 2021 um 15:02 Uhr schrieb Philip-Dylan Gleonec > <philip-dylan.gleonec@savoirfairelinux.com>: > >> >> Adds FEC/PLC support to libopus. The lost packets are detected as a >> discontinuity in the audio stream. When a discontinuity is used, this >> patch tries to decode the FEC data. If FEC data is present in the >> packet, it is decoded, otherwise audio is re-created through PLC. >> >> This patch is based on Steinar H. Gunderson contribution, and corrects >> the pts computation: all pts are expressed in samples instead of time. >> This patch also adds an option "decode_fec" which enables or disables >> FEC decoding. This option is disabled by default to keep consistent >> behaviour with former versions. >> > > Could you elaborate? > I would have expected that the normal use case is not have a > lossy input and that the new feature is always useful if data > was lost. > > Or is there no way to distinguish between a stream with > unusual timestamps and a stream with packet loss? > Streams can switch between being FEC encoded and not on a per-frame basis. Opus streams can have odd timestamps since the frame size is somewhat variable. So no way to check purely on timestamps. So it being an init-time option is fine. I'd like to have the original patch author review this, could someone CC him?
> Could you elaborate? > I would have expected that the normal use case is not have a > lossy input and that the new feature is always useful if data > was lost. The use-case for FEC is typically RTP stream where audio is compressed with opus. In that case, depending on the network conditions, packets can be lost. From the decoder side, this can be observed as a packet with a pts that is incremented more than we would expect. With this patch, if no discontinuity is detected, the packet is decoded as usual. If a discontinuity is detected, its duration is checked to deduct how many samples should be reconstructed. One limitation of opus FEC is that packets can be reconstructed with a granularity of 2.5ms (120 samples), so libopus can not jsut reconstruct an arbitrary number of samples. This patch manages this by "rounding" the number of lost samples to the closest number of samples that libopsu can reconstruct. > Or is there no way to distinguish between a stream with > unusual timestamps and a stream with packet loss? From my understanding, the only information at the decoder side are the metadata associated with the packet to decode, notably the pts, dts and packet duration. It does not have information about how the packet has been received (RTP, SDP, other...). Packet loss can typically be detected at higher layers (RTCP packets for RTP) but not at the decoder layer. From observation it seemed that a pts different from the last pts + packet duration would indicate that some samples were lost. The decoder than tries to restore as much sample as it can. Note that packet duration seem to change units depending on how the file is encapsulated: the same file had a packet->duration of 20 (ms) in a .mkv container but 960 (samples) in a .opus container. Since the decoder is not aware of which container is used, it has to "guess" which unit is used. As FEC is ony used with SILK frames, which have a 10ms (480 samples) minimal duration, and an opus packet has a max duration of 120ms, a threshold can find how the pts is expressed. An alternative could be to simply provide a new function to decode FEC data and let the higher layers manage if FEC should be decoded, but this implies more modification to the stack to make FEC work, while here only an AVOption has to be set.
>> Am Di., 16. Feb. 2021 um 15:02 Uhr schrieb Philip-Dylan Gleonec >> <philip-dylan.gleonec@savoirfairelinux.com>: >> >>> >>> Adds FEC/PLC support to libopus. The lost packets are detected as a >>> discontinuity in the audio stream. When a discontinuity is used, this >>> patch tries to decode the FEC data. If FEC data is present in the >>> packet, it is decoded, otherwise audio is re-created through PLC. >>> >>> This patch is based on Steinar H. Gunderson contribution, and corrects >>> the pts computation: all pts are expressed in samples instead of time. >>> This patch also adds an option "decode_fec" which enables or disables >>> FEC decoding. This option is disabled by default to keep consistent >>> behaviour with former versions. >>> >> >> Could you elaborate? >> I would have expected that the normal use case is not have a >> lossy input and that the new feature is always useful if data >> was lost. >> >> Or is there no way to distinguish between a stream with >> unusual timestamps and a stream with packet loss? >> > >Streams can switch between being FEC encoded and not on >a per-frame basis. Opus streams can have odd timestamps >since the frame size is somewhat variable. So no way to >check purely on timestamps. Is there a possibility of the packet duration varying ? I'm not sure this patch can manage this case, though it could by comparing the pts to the last pts + the new packet duration. >I'd like to have the original patch author review this, could >someone CC him? I've added him to the CC list.
> I've added him to the CC list
Now done.
Philip-Dylan Gleonec: >> Could you elaborate? >> I would have expected that the normal use case is not have a >> lossy input and that the new feature is always useful if data >> was lost. > > The use-case for FEC is typically RTP stream where audio is compressed > with opus. In that case, depending on the network conditions, packets > can be lost. From the decoder side, this can be observed as a packet > with a pts that is incremented more than we would expect. > > With this patch, if no discontinuity is detected, the packet is decoded > as usual. If a discontinuity is detected, its duration is checked to > deduct how many samples should be reconstructed. One limitation of opus > FEC is that packets can be reconstructed with a granularity of 2.5ms > (120 samples), so libopus can not jsut reconstruct an arbitrary number > of samples. This patch manages this by "rounding" the number of lost > samples to the closest number of samples that libopsu can reconstruct. > >> Or is there no way to distinguish between a stream with >> unusual timestamps and a stream with packet loss? > > From my understanding, the only information at the decoder side are the > metadata associated with the packet to decode, notably the pts, dts and > packet duration. It does not have information about how the packet has > been received (RTP, SDP, other...). Packet loss can typically be > detected at higher layers (RTCP packets for RTP) but not at the decoder > layer. > > From observation it seemed that a pts different from the last pts + > packet duration would indicate that some samples were lost. The decoder > than tries to restore as much sample as it can. > > Note that packet duration seem to change units depending on how the file > is encapsulated: the same file had a packet->duration of 20 (ms) in a > .mkv container but 960 (samples) in a .opus container. Since the decoder > is not aware of which container is used, it has to "guess" which unit is > used. As FEC is ony used with SILK frames, which have a 10ms (480 > samples) minimal duration, and an opus packet has a max duration of > 120ms, a threshold can find how the pts is expressed. There is AVCodecContext.pkt_timebase. > > An alternative could be to simply provide a new function to decode FEC > data and let the higher layers manage if FEC should be decoded, but this > implies more modification to the stack to make FEC work, while here only > an AVOption has to be set.
Andreas Rheinhardt: >>> Could you elaborate? >>> I would have expected that the normal use case is not have a >>> lossy input and that the new feature is always useful if data >>> was lost. >> >> The use-case for FEC is typically RTP stream where audio is compressed >> with opus. In that case, depending on the network conditions, packets >> can be lost. From the decoder side, this can be observed as a packet >> with a pts that is incremented more than we would expect. >> >> With this patch, if no discontinuity is detected, the packet is decoded >> as usual. If a discontinuity is detected, its duration is checked to >> deduct how many samples should be reconstructed. One limitation of opus >> FEC is that packets can be reconstructed with a granularity of 2.5ms >> (120 samples), so libopus can not jsut reconstruct an arbitrary number >> of samples. This patch manages this by "rounding" the number of lost >> samples to the closest number of samples that libopsu can reconstruct. >> >>> Or is there no way to distinguish between a stream with >>> unusual timestamps and a stream with packet loss? >> >> From my understanding, the only information at the decoder side are the >> metadata associated with the packet to decode, notably the pts, dts and >> packet duration. It does not have information about how the packet has >> been received (RTP, SDP, other...). Packet loss can typically be >> detected at higher layers (RTCP packets for RTP) but not at the decoder >> layer. >> >> From observation it seemed that a pts different from the last pts + >> packet duration would indicate that some samples were lost. The decoder >> than tries to restore as much sample as it can. >> >> Note that packet duration seem to change units depending on how the file >> is encapsulated: the same file had a packet->duration of 20 (ms) in a >> .mkv container but 960 (samples) in a .opus container. Since the decoder >> is not aware of which container is used, it has to "guess" which unit is >> used. As FEC is ony used with SILK frames, which have a 10ms (480 >> samples) minimal duration, and an opus packet has a max duration of >> 120ms, a threshold can find how the pts is expressed. > >There is AVCodecContext.pkt_timebase. Thanks for the feedback, I had missed that property. I've attached a reworked version of the patch which makes use of it. It is indeed more clean and precise. I've left a fallback when computing the expected_next_pts in case it has not been set after creating the AVCodecContext and left at the {0, 1} value.
Here is the reworked patch properly attached.
Sorry about the duplicate mail, I just noticed I had a mishap with my
mail client and the previous patch was scrubbed.
Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com>
Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no>
---
libavcodec/libopusdec.c | 107 +++++++++++++++++++++++++++++++++++-----
1 file changed, 96 insertions(+), 11 deletions(-)
diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c
index 082a431c6c..3de784dfbd 100644
--- a/libavcodec/libopusdec.c
+++ b/libavcodec/libopusdec.c
@@ -43,10 +43,15 @@ struct libopus_context {
#ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
int apply_phase_inv;
#endif
+ int decode_fec;
+ int64_t expected_next_pts;
};
#define OPUS_HEAD_SIZE 19
+// Sample rate is constant as libopus always output at 48kHz
+const AVRational opus_timebase = { 1, 48000 };
+
static av_cold int libopus_decode_init(AVCodecContext *avc)
{
struct libopus_context *opus = avc->priv_data;
@@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc)
/* Decoder delay (in samples) at 48kHz */
avc->delay = avc->internal->skip_samples = opus->pre_skip;
+ opus->expected_next_pts = AV_NOPTS_VALUE;
+
return 0;
}
@@ -155,25 +162,102 @@ static int libopus_decode(AVCodecContext *avc, void *data,
{
struct libopus_context *opus = avc->priv_data;
AVFrame *frame = data;
- int ret, nb_samples;
+ uint8_t *outptr;
+ int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left;
+
+ // If FEC is enabled, calculate number of lost samples
+ if (opus->decode_fec &&
+ opus->expected_next_pts != AV_NOPTS_VALUE &&
+ pkt->pts != AV_NOPTS_VALUE &&
+ pkt->pts != opus->expected_next_pts) {
+ // Cap at recovering 120 ms of lost audio.
+ nb_lost_samples = pkt->pts - opus->expected_next_pts;
+ nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE);
+ // pts is expressed in ms for some containers (e.g. mkv)
+ // FEC only works for SILK frames (> 10ms)
+ // Detect if nb_lost_samples is in ms, and convert in samples if it is
+ if (nb_lost_samples > 0) {
+ if (avc->pkt_timebase.den != 48000) {
+ nb_lost_samples = av_rescale_q(nb_lost_samples, avc->pkt_timebase, opus_timebase);
+ }
+ // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms
+ if (nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den)) {
+ nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den);
+ }
+ }
+ }
- frame->nb_samples = MAX_FRAME_SIZE;
+ frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples;
if ((ret = ff_get_buffer(avc, frame, 0)) < 0)
return ret;
+ outptr = frame->data[0];
+ nb_samples_left = frame->nb_samples;
+
+ if (opus->decode_fec && nb_lost_samples > 0) {
+ // Try to recover the lost samples with FEC data from this one.
+ // If there's no FEC data, the decoder will do loss concealment instead.
+ if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
+ ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
+ (opus_int16 *)outptr,
+ nb_lost_samples, 1);
+ else
+ ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
+ (float *)outptr,
+ nb_lost_samples, 1);
+
+ if (ret < 0) {
+ if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
+ av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
+ opus_strerror(ret));
+ return ff_opus_error_to_averror(ret);
+ }
+
+ av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n",
+ ret);
+
+ outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt);
+ nb_samples_left -= ret;
+ nb_samples += ret;
+ if (pkt->pts != AV_NOPTS_VALUE) {
+ frame->pts = pkt->pts - ret;
+ }
+ }
+
+ // Decode the actual, non-lost data.
if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
- nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
- (opus_int16 *)frame->data[0],
- frame->nb_samples, 0);
+ ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
+ (opus_int16 *)outptr,
+ nb_samples_left, 0);
else
- nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
- (float *)frame->data[0],
- frame->nb_samples, 0);
+ ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
+ (float *)outptr,
+ nb_samples_left, 0);
- if (nb_samples < 0) {
+ if (ret < 0) {
+ if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
- opus_strerror(nb_samples));
- return ff_opus_error_to_averror(nb_samples);
+ opus_strerror(ret));
+ return ff_opus_error_to_averror(ret);
+ }
+ nb_samples += ret;
+
+ av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret);
+
+ if (opus->decode_fec)
+ {
+ // Calculate the next expected pts
+ if (pkt->pts == AV_NOPTS_VALUE) {
+ opus->expected_next_pts = AV_NOPTS_VALUE;
+ } else {
+ if (pkt->duration) {
+ opus->expected_next_pts = pkt->pts + pkt->duration;
+ } else if (avc->pkt_timebase.num) {
+ opus->expected_next_pts = pkt->pts + av_rescale_q(ret, opus_timebase, avc->pkt_timebase);
+ } else {
+ opus->expected_next_pts = pkt->pts + ret;
+ }
+ }
}
#ifndef OPUS_SET_GAIN
@@ -214,6 +298,7 @@ static const AVOption libopusdec_options[] = {
#ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
{ "apply_phase_inv", "Apply intensity stereo phase inversion", OFFSET(apply_phase_inv), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, FLAGS },
#endif
+ { "decode_fec", "Decode FEC data or use PLC", OFFSET(decode_fec), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
{ NULL },
};
Le 18/02/2021 à 17:38, Philip-Dylan Gleonec a écrit : > Here is the reworked patch properly attached. > Sorry about the duplicate mail, I just noticed I had a mishap with my > mail client and the previous patch was scrubbed. Hello, Is someone interested in picking this up, or is there some further corrections/improvements I should make to this patch ? Regards, Philip-Dylan Gleonec
Mar 3, 2021, 16:51 by philip-dylan.gleonec@savoirfairelinux.com: > Le 18/02/2021 à 17:38, Philip-Dylan Gleonec a écrit : > >> Here is the reworked patch properly attached. >> Sorry about the duplicate mail, I just noticed I had a mishap with my >> mail client and the previous patch was scrubbed. >> > > > Hello, > > Is someone interested in picking this up, or is there some further corrections/improvements I should make to this patch ? > Patch looks good. I'll push it alongside the libopusdec patch once you've resent it.
On Thu, Feb 18, 2021 at 11:39 AM Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com> wrote: > > Here is the reworked patch properly attached. > Sorry about the duplicate mail, I just noticed I had a mishap with my > mail client and the previous patch was scrubbed. > > Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com> > Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no> > --- > libavcodec/libopusdec.c | 107 +++++++++++++++++++++++++++++++++++----- > 1 file changed, 96 insertions(+), 11 deletions(-) > > diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c > index 082a431c6c..3de784dfbd 100644 > --- a/libavcodec/libopusdec.c > +++ b/libavcodec/libopusdec.c > @@ -43,10 +43,15 @@ struct libopus_context { > #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST > int apply_phase_inv; > #endif > + int decode_fec; > + int64_t expected_next_pts; > }; > > #define OPUS_HEAD_SIZE 19 > > +// Sample rate is constant as libopus always output at 48kHz > +const AVRational opus_timebase = { 1, 48000 }; > + > static av_cold int libopus_decode_init(AVCodecContext *avc) > { > struct libopus_context *opus = avc->priv_data; > @@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc) > /* Decoder delay (in samples) at 48kHz */ > avc->delay = avc->internal->skip_samples = opus->pre_skip; > > + opus->expected_next_pts = AV_NOPTS_VALUE; > + > return 0; > } > > @@ -155,25 +162,102 @@ static int libopus_decode(AVCodecContext *avc, void *data, > { > struct libopus_context *opus = avc->priv_data; > AVFrame *frame = data; > - int ret, nb_samples; > + uint8_t *outptr; > + int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left; > + > + // If FEC is enabled, calculate number of lost samples > + if (opus->decode_fec && > + opus->expected_next_pts != AV_NOPTS_VALUE && > + pkt->pts != AV_NOPTS_VALUE && > + pkt->pts != opus->expected_next_pts) { > + // Cap at recovering 120 ms of lost audio. > + nb_lost_samples = pkt->pts - opus->expected_next_pts; > + nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE); > + // pts is expressed in ms for some containers (e.g. mkv) > + // FEC only works for SILK frames (> 10ms) > + // Detect if nb_lost_samples is in ms, and convert in samples if it is > + if (nb_lost_samples > 0) { > + if (avc->pkt_timebase.den != 48000) { > + nb_lost_samples = av_rescale_q(nb_lost_samples, avc->pkt_timebase, opus_timebase); > + } > + // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms > + if (nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den)) { > + nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den); > + } > + } > + } > > - frame->nb_samples = MAX_FRAME_SIZE; > + frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples; > if ((ret = ff_get_buffer(avc, frame, 0)) < 0) > return ret; > > + outptr = frame->data[0]; > + nb_samples_left = frame->nb_samples; > + > + if (opus->decode_fec && nb_lost_samples > 0) { > + // Try to recover the lost samples with FEC data from this one. > + // If there's no FEC data, the decoder will do loss concealment instead. > + if (avc->sample_fmt == AV_SAMPLE_FMT_S16) > + ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size, > + (opus_int16 *)outptr, > + nb_lost_samples, 1); > + else > + ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, > + (float *)outptr, > + nb_lost_samples, 1); > + > + if (ret < 0) { > + if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration; > + av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n", > + opus_strerror(ret)); > + return ff_opus_error_to_averror(ret); > + } > + > + av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n", > + ret); > + > + outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt); > + nb_samples_left -= ret; > + nb_samples += ret; > + if (pkt->pts != AV_NOPTS_VALUE) { > + frame->pts = pkt->pts - ret; > + } > + } > + > + // Decode the actual, non-lost data. > if (avc->sample_fmt == AV_SAMPLE_FMT_S16) > - nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size, > - (opus_int16 *)frame->data[0], > - frame->nb_samples, 0); > + ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size, > + (opus_int16 *)outptr, > + nb_samples_left, 0); > else > - nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, > - (float *)frame->data[0], > - frame->nb_samples, 0); > + ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, > + (float *)outptr, > + nb_samples_left, 0); > > - if (nb_samples < 0) { > + if (ret < 0) { > + if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration; > av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n", > - opus_strerror(nb_samples)); > - return ff_opus_error_to_averror(nb_samples); > + opus_strerror(ret)); > + return ff_opus_error_to_averror(ret); > + } > + nb_samples += ret; > + > + av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret); I would either drop this or only log it at AV_LOG_DEBUG level as it's pretty verbose during normal operation (especially given that it's the "normal" case). Otherwise this seems great, nice work. Best, Tristan
On Thu, Feb 18, 2021 at 11:39 AM Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com> wrote: > > Here is the reworked patch properly attached. > Sorry about the duplicate mail, I just noticed I had a mishap with my > mail client and the previous patch was scrubbed. > > Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com> > Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no> > --- > libavcodec/libopusdec.c | 107 +++++++++++++++++++++++++++++++++++----- > 1 file changed, 96 insertions(+), 11 deletions(-) > > diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c > index 082a431c6c..3de784dfbd 100644 > --- a/libavcodec/libopusdec.c > +++ b/libavcodec/libopusdec.c > @@ -43,10 +43,15 @@ struct libopus_context { > #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST > int apply_phase_inv; > #endif > + int decode_fec; > + int64_t expected_next_pts; > }; > > #define OPUS_HEAD_SIZE 19 > > +// Sample rate is constant as libopus always output at 48kHz > +const AVRational opus_timebase = { 1, 48000 }; > + > static av_cold int libopus_decode_init(AVCodecContext *avc) > { > struct libopus_context *opus = avc->priv_data; > @@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc) > /* Decoder delay (in samples) at 48kHz */ > avc->delay = avc->internal->skip_samples = opus->pre_skip; > > + opus->expected_next_pts = AV_NOPTS_VALUE; > + > return 0; > } > > @@ -155,25 +162,102 @@ static int libopus_decode(AVCodecContext *avc, void *data, > { > struct libopus_context *opus = avc->priv_data; > AVFrame *frame = data; > - int ret, nb_samples; > + uint8_t *outptr; > + int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left; > + > + // If FEC is enabled, calculate number of lost samples > + if (opus->decode_fec && > + opus->expected_next_pts != AV_NOPTS_VALUE && > + pkt->pts != AV_NOPTS_VALUE && > + pkt->pts != opus->expected_next_pts) { > + // Cap at recovering 120 ms of lost audio. > + nb_lost_samples = pkt->pts - opus->expected_next_pts; > + nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE); > + // pts is expressed in ms for some containers (e.g. mkv) > + // FEC only works for SILK frames (> 10ms) > + // Detect if nb_lost_samples is in ms, and convert in samples if it is > + if (nb_lost_samples > 0) { > + if (avc->pkt_timebase.den != 48000) { > + nb_lost_samples = av_rescale_q(nb_lost_samples, avc->pkt_timebase, opus_timebase); > + } > + // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms > + if (nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den)) { > + nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den); > + } > + } > + } > > - frame->nb_samples = MAX_FRAME_SIZE; > + frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples; > if ((ret = ff_get_buffer(avc, frame, 0)) < 0) > return ret; > > + outptr = frame->data[0]; > + nb_samples_left = frame->nb_samples; > + > + if (opus->decode_fec && nb_lost_samples > 0) { > + // Try to recover the lost samples with FEC data from this one. > + // If there's no FEC data, the decoder will do loss concealment instead. > + if (avc->sample_fmt == AV_SAMPLE_FMT_S16) > + ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size, > + (opus_int16 *)outptr, > + nb_lost_samples, 1); > + else > + ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, > + (float *)outptr, > + nb_lost_samples, 1); > + > + if (ret < 0) { > + if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration; > + av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n", > + opus_strerror(ret)); > + return ff_opus_error_to_averror(ret); > + } > + > + av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n", > + ret); > + > + outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt); > + nb_samples_left -= ret; > + nb_samples += ret; > + if (pkt->pts != AV_NOPTS_VALUE) { > + frame->pts = pkt->pts - ret; > + } > + } > + > + // Decode the actual, non-lost data. > if (avc->sample_fmt == AV_SAMPLE_FMT_S16) > - nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size, > - (opus_int16 *)frame->data[0], > - frame->nb_samples, 0); > + ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size, > + (opus_int16 *)outptr, > + nb_samples_left, 0); > else > - nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, > - (float *)frame->data[0], > - frame->nb_samples, 0); > + ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, > + (float *)outptr, > + nb_samples_left, 0); > > - if (nb_samples < 0) { > + if (ret < 0) { > + if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration; > av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n", > - opus_strerror(nb_samples)); > - return ff_opus_error_to_averror(nb_samples); > + opus_strerror(ret)); > + return ff_opus_error_to_averror(ret); > + } > + nb_samples += ret; > + > + av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret); > + I would either drop this or only log it at AV_LOG_DEBUG level as it's pretty verbose (especially given that it's the "normal" case). Otherwise this seems great, nice work. Best, Tristan
Hello, Please find attached a rebased patchset for the FEC implementation of libopus. Following the received feedbacks, some improvements have been done compared to the first version: - remove a log when a packet is decoded without FEC - add a check to only set libopus encoder packet loss estimate if it has not changed since previous encoding Both patches have passed FATE testing successfully. Moreover, the patch on libopus decoder is used in production in GNU Jami. Thanks for your time.
diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c index 082a431c6c..504043353f 100644 --- a/libavcodec/libopusdec.c +++ b/libavcodec/libopusdec.c @@ -43,10 +43,15 @@ struct libopus_context { #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST int apply_phase_inv; #endif + int decode_fec; + int64_t expected_next_pts; }; #define OPUS_HEAD_SIZE 19 +// Sample rate is constant as libopus always output at 48kHz +#define OPUS_SAMPLERATE 48000 + static av_cold int libopus_decode_init(AVCodecContext *avc) { struct libopus_context *opus = avc->priv_data; @@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc) /* Decoder delay (in samples) at 48kHz */ avc->delay = avc->internal->skip_samples = opus->pre_skip; + opus->expected_next_pts = AV_NOPTS_VALUE; + return 0; } @@ -155,25 +162,100 @@ static int libopus_decode(AVCodecContext *avc, void *data, { struct libopus_context *opus = avc->priv_data; AVFrame *frame = data; - int ret, nb_samples; + uint8_t *outptr; + int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left; + + // If FEC is enabled, calculate number of lost samples + if (opus->decode_fec && + opus->expected_next_pts != AV_NOPTS_VALUE && + pkt->pts != AV_NOPTS_VALUE && + pkt->pts != opus->expected_next_pts) { + // Cap at recovering 120 ms of lost audio. + nb_lost_samples = pkt->pts - opus->expected_next_pts; + nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE); + // pts is expressed in ms for some containers (e.g. mkv) + // FEC only works for SILK frames (> 10ms) + // Detect if nb_lost_samples is in ms, and convert in samples if it is + if (nb_lost_samples > 0) { + if (pkt->duration > 0 && pkt->duration < OPUS_SAMPLERATE * 10 / 1000) { + nb_lost_samples = nb_lost_samples * OPUS_SAMPLERATE / 1000; + } + // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms + if (nb_lost_samples % (int)(2.5 / 1000 * OPUS_SAMPLERATE)) { + nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * OPUS_SAMPLERATE); + } + } + } - frame->nb_samples = MAX_FRAME_SIZE; + frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples; if ((ret = ff_get_buffer(avc, frame, 0)) < 0) return ret; + outptr = frame->data[0]; + nb_samples_left = frame->nb_samples; + + if (opus->decode_fec && nb_lost_samples > 0) { + // Try to recover the lost samples with FEC data from this one. + // If there's no FEC data, the decoder will do loss concealment instead. + if (avc->sample_fmt == AV_SAMPLE_FMT_S16) + ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size, + (opus_int16 *)outptr, + nb_lost_samples, 1); + else + ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, + (float *)outptr, + nb_lost_samples, 1); + + if (ret < 0) { + if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration; + av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n", + opus_strerror(ret)); + return ff_opus_error_to_averror(ret); + } + + av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n", + ret); + + outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt); + nb_samples_left -= ret; + nb_samples += ret; + if (pkt->pts != AV_NOPTS_VALUE) { + frame->pts = pkt->pts - ret; + } + } + + // Decode the actual, non-lost data. if (avc->sample_fmt == AV_SAMPLE_FMT_S16) - nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size, - (opus_int16 *)frame->data[0], - frame->nb_samples, 0); + ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size, + (opus_int16 *)outptr, + nb_samples_left, 0); else - nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, - (float *)frame->data[0], - frame->nb_samples, 0); + ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size, + (float *)outptr, + nb_samples_left, 0); - if (nb_samples < 0) { + if (ret < 0) { + if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration; av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n", - opus_strerror(nb_samples)); - return ff_opus_error_to_averror(nb_samples); + opus_strerror(ret)); + return ff_opus_error_to_averror(ret); + } + nb_samples += ret; + + av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret); + + if (opus->decode_fec) + { + // Calculate the next expected pts + if (pkt->pts == AV_NOPTS_VALUE) { + opus->expected_next_pts = AV_NOPTS_VALUE; + } else { + if (pkt->duration) { + opus->expected_next_pts = pkt->pts + pkt->duration; + } else { + opus->expected_next_pts = pkt->pts + ret; + } + } } #ifndef OPUS_SET_GAIN @@ -214,6 +296,7 @@ static const AVOption libopusdec_options[] = { #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST { "apply_phase_inv", "Apply intensity stereo phase inversion", OFFSET(apply_phase_inv), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, FLAGS }, #endif + { "decode_fec", "Decode FEC data or use PLC", OFFSET(decode_fec), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS }, { NULL }, };
Adds FEC/PLC support to libopus. The lost packets are detected as a discontinuity in the audio stream. When a discontinuity is used, this patch tries to decode the FEC data. If FEC data is present in the packet, it is decoded, otherwise audio is re-created through PLC. This patch is based on Steinar H. Gunderson contribution, and corrects the pts computation: all pts are expressed in samples instead of time. This patch also adds an option "decode_fec" which enables or disables FEC decoding. This option is disabled by default to keep consistent behaviour with former versions. A number of checks are made to ensure compatibility with different containers. Indeed, video containers seem to have a pts expressed in ms while it is expressed in samples for audio containers. It also manages the cases where pkt->duration is 0, in some RTP streams. This patch ignores data it can not reconstruct, i.e. packets received twice and packets with a length that is not a multiple of 2.5ms. Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com> Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no> --- libavcodec/libopusdec.c | 105 +++++++++++++++++++++++++++++++++++----- 1 file changed, 94 insertions(+), 11 deletions(-)