diff mbox series

[FFmpeg-devel,1/1] avcodec/libopusdec: Enable FEC/PLC

Message ID 20210216140050.164009-2-philip-dylan.gleonec@savoirfairelinux.com
State New
Headers show
Series libopusdec: Enable FEC/PLC | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Philip-Dylan Gleonec Feb. 16, 2021, 2 p.m. UTC
Adds FEC/PLC support to libopus. The lost packets are detected as a
discontinuity in the audio stream. When a discontinuity is used, this
patch tries to decode the FEC data. If FEC data is present in the
packet, it is decoded, otherwise audio is re-created through PLC.

This patch is based on Steinar H. Gunderson contribution, and corrects
the pts computation: all pts are expressed in samples instead of time.
This patch also adds an option "decode_fec" which enables or disables
FEC decoding. This option is disabled by default to keep consistent
behaviour with former versions.

A number of checks are made to ensure compatibility with different
containers. Indeed, video containers seem to have a pts expressed in ms
while it is expressed in samples for audio containers. It also manages
the cases where pkt->duration is 0, in some RTP streams. This patch
ignores data it can not reconstruct, i.e. packets received twice and
packets with a length that is not a multiple of 2.5ms.

Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com>
Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no>
---
 libavcodec/libopusdec.c | 105 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 94 insertions(+), 11 deletions(-)

Comments

Carl Eugen Hoyos Feb. 16, 2021, 7:07 p.m. UTC | #1
Am Di., 16. Feb. 2021 um 15:02 Uhr schrieb Philip-Dylan Gleonec
<philip-dylan.gleonec@savoirfairelinux.com>:
>
> Adds FEC/PLC support to libopus. The lost packets are detected as a
> discontinuity in the audio stream. When a discontinuity is used, this
> patch tries to decode the FEC data. If FEC data is present in the
> packet, it is decoded, otherwise audio is re-created through PLC.
>
> This patch is based on Steinar H. Gunderson contribution, and corrects
> the pts computation: all pts are expressed in samples instead of time.
> This patch also adds an option "decode_fec" which enables or disables
> FEC decoding. This option is disabled by default to keep consistent
> behaviour with former versions.

Could you elaborate?
I would have expected that the normal use case is not have a
lossy input and that the new feature is always useful if data
was lost.

Or is there no way to distinguish between a stream with
unusual timestamps and a stream with packet loss?

Carl Eugen
Lynne Feb. 16, 2021, 7:15 p.m. UTC | #2
Feb 16, 2021, 20:07 by ceffmpeg@gmail.com:

> Am Di., 16. Feb. 2021 um 15:02 Uhr schrieb Philip-Dylan Gleonec
> <philip-dylan.gleonec@savoirfairelinux.com>:
>
>>
>> Adds FEC/PLC support to libopus. The lost packets are detected as a
>> discontinuity in the audio stream. When a discontinuity is used, this
>> patch tries to decode the FEC data. If FEC data is present in the
>> packet, it is decoded, otherwise audio is re-created through PLC.
>>
>> This patch is based on Steinar H. Gunderson contribution, and corrects
>> the pts computation: all pts are expressed in samples instead of time.
>> This patch also adds an option "decode_fec" which enables or disables
>> FEC decoding. This option is disabled by default to keep consistent
>> behaviour with former versions.
>>
>
> Could you elaborate?
> I would have expected that the normal use case is not have a
> lossy input and that the new feature is always useful if data
> was lost.
>
> Or is there no way to distinguish between a stream with
> unusual timestamps and a stream with packet loss?
>

Streams can switch between being FEC encoded and not on
a per-frame basis. Opus streams can have odd timestamps
since the frame size is somewhat variable. So no way to
check purely on timestamps.

So it being an init-time option is fine.

I'd like to have the original patch author review this, could
someone CC him?
Philip-Dylan Gleonec Feb. 17, 2021, 11:11 a.m. UTC | #3
> Could you elaborate?
> I would have expected that the normal use case is not have a
> lossy input and that the new feature is always useful if data
> was lost.

The use-case for FEC is typically RTP stream where audio is compressed
with opus. In that case, depending on the network conditions, packets
can be lost. From the decoder side, this can be observed as a packet
with a pts that is incremented more than we would expect.

With this patch, if no discontinuity is detected, the packet is decoded
as usual. If a discontinuity is detected, its duration is checked to
deduct how many samples should be reconstructed. One limitation of opus
FEC is that packets can be reconstructed with a granularity of 2.5ms
(120 samples), so libopus can not jsut reconstruct an arbitrary number
of samples. This patch manages this by "rounding" the number of lost
samples to the closest number of samples that libopsu can reconstruct.

> Or is there no way to distinguish between a stream with
> unusual timestamps and a stream with packet loss?

From my understanding, the only information at the decoder side are the
metadata associated with the packet to decode, notably the pts, dts and
packet duration. It does not have information about how the packet has
been received (RTP, SDP, other...). Packet loss can typically be
detected at higher layers (RTCP packets for RTP) but not at the decoder
layer.

From observation it seemed that a pts different from the last pts +
packet duration would indicate that some samples were lost. The decoder
than tries to restore as much sample as it can.

Note that packet duration seem to change units depending on how the file
is encapsulated: the same file had a packet->duration of 20 (ms) in a
.mkv container but 960 (samples) in a .opus container. Since the decoder
is not aware of which container is used, it has to "guess" which unit is
used. As FEC is ony used with SILK frames, which have a 10ms (480
samples) minimal duration, and an opus packet has a max duration of
120ms, a threshold can find how the pts is expressed.

An alternative could be to simply provide a new function to decode FEC
data and let the higher layers manage if FEC should be decoded, but this
implies more modification to the stack to make FEC work, while here only
an AVOption has to be set.
Philip-Dylan Gleonec Feb. 17, 2021, 11:12 a.m. UTC | #4
>> Am Di., 16. Feb. 2021 um 15:02 Uhr schrieb Philip-Dylan Gleonec
>> <philip-dylan.gleonec@savoirfairelinux.com>:
>>
>>>
>>> Adds FEC/PLC support to libopus. The lost packets are detected as a
>>> discontinuity in the audio stream. When a discontinuity is used, this
>>> patch tries to decode the FEC data. If FEC data is present in the
>>> packet, it is decoded, otherwise audio is re-created through PLC.
>>>
>>> This patch is based on Steinar H. Gunderson contribution, and corrects
>>> the pts computation: all pts are expressed in samples instead of time.
>>> This patch also adds an option "decode_fec" which enables or disables
>>> FEC decoding. This option is disabled by default to keep consistent
>>> behaviour with former versions.
>>>
>>
>> Could you elaborate?
>> I would have expected that the normal use case is not have a
>> lossy input and that the new feature is always useful if data
>> was lost.
>>
>> Or is there no way to distinguish between a stream with
>> unusual timestamps and a stream with packet loss?
>>
>
>Streams can switch between being FEC encoded and not on
>a per-frame basis. Opus streams can have odd timestamps
>since the frame size is somewhat variable. So no way to
>check purely on timestamps.

Is there a possibility of the packet duration varying ? I'm not sure
this patch can manage this case, though it could by comparing the pts
to the last pts + the new packet duration.

>I'd like to have the original patch author review this, could
>someone CC him?

I've added him to the CC list.
Philip-Dylan Gleonec Feb. 17, 2021, 11:13 a.m. UTC | #5
> I've added him to the CC list

Now done.
Andreas Rheinhardt Feb. 17, 2021, 11:14 a.m. UTC | #6
Philip-Dylan Gleonec:
>> Could you elaborate?
>> I would have expected that the normal use case is not have a
>> lossy input and that the new feature is always useful if data
>> was lost.
> 
> The use-case for FEC is typically RTP stream where audio is compressed
> with opus. In that case, depending on the network conditions, packets
> can be lost. From the decoder side, this can be observed as a packet
> with a pts that is incremented more than we would expect.
> 
> With this patch, if no discontinuity is detected, the packet is decoded
> as usual. If a discontinuity is detected, its duration is checked to
> deduct how many samples should be reconstructed. One limitation of opus
> FEC is that packets can be reconstructed with a granularity of 2.5ms
> (120 samples), so libopus can not jsut reconstruct an arbitrary number
> of samples. This patch manages this by "rounding" the number of lost
> samples to the closest number of samples that libopsu can reconstruct.
> 
>> Or is there no way to distinguish between a stream with
>> unusual timestamps and a stream with packet loss?
> 
> From my understanding, the only information at the decoder side are the
> metadata associated with the packet to decode, notably the pts, dts and
> packet duration. It does not have information about how the packet has
> been received (RTP, SDP, other...). Packet loss can typically be
> detected at higher layers (RTCP packets for RTP) but not at the decoder
> layer.
> 
> From observation it seemed that a pts different from the last pts +
> packet duration would indicate that some samples were lost. The decoder
> than tries to restore as much sample as it can.
> 
> Note that packet duration seem to change units depending on how the file
> is encapsulated: the same file had a packet->duration of 20 (ms) in a
> .mkv container but 960 (samples) in a .opus container. Since the decoder
> is not aware of which container is used, it has to "guess" which unit is
> used. As FEC is ony used with SILK frames, which have a 10ms (480
> samples) minimal duration, and an opus packet has a max duration of
> 120ms, a threshold can find how the pts is expressed.

There is AVCodecContext.pkt_timebase.

> 
> An alternative could be to simply provide a new function to decode FEC
> data and let the higher layers manage if FEC should be decoded, but this
> implies more modification to the stack to make FEC work, while here only
> an AVOption has to be set.
Philip-Dylan Gleonec Feb. 18, 2021, 1:51 p.m. UTC | #7
Andreas Rheinhardt:

 >>> Could you elaborate?
 >>> I would have expected that the normal use case is not have a
 >>> lossy input and that the new feature is always useful if data
 >>> was lost.
 >>
 >> The use-case for FEC is typically RTP stream where audio is compressed
 >> with opus. In that case, depending on the network conditions, packets
 >> can be lost. From the decoder side, this can be observed as a packet
 >> with a pts that is incremented more than we would expect.
 >>
 >> With this patch, if no discontinuity is detected, the packet is decoded
 >> as usual. If a discontinuity is detected, its duration is checked to
 >> deduct how many samples should be reconstructed. One limitation of opus
 >> FEC is that packets can be reconstructed with a granularity of 2.5ms
 >> (120 samples), so libopus can not jsut reconstruct an arbitrary number
 >> of samples. This patch manages this by "rounding" the number of lost
 >> samples to the closest number of samples that libopsu can reconstruct.
 >>
 >>> Or is there no way to distinguish between a stream with
 >>> unusual timestamps and a stream with packet loss?
 >>
 >> From my understanding, the only information at the decoder side are the
 >> metadata associated with the packet to decode, notably the pts, dts and
 >> packet duration. It does not have information about how the packet has
 >> been received (RTP, SDP, other...). Packet loss can typically be
 >> detected at higher layers (RTCP packets for RTP) but not at the decoder
 >> layer.
 >>
 >> From observation it seemed that a pts different from the last pts +
 >> packet duration would indicate that some samples were lost. The decoder
 >> than tries to restore as much sample as it can.
 >>
 >> Note that packet duration seem to change units depending on how the file
 >> is encapsulated: the same file had a packet->duration of 20 (ms) in a
 >> .mkv container but 960 (samples) in a .opus container. Since the decoder
 >> is not aware of which container is used, it has to "guess" which unit is
 >> used. As FEC is ony used with SILK frames, which have a 10ms (480
 >> samples) minimal duration, and an opus packet has a max duration of
 >> 120ms, a threshold can find how the pts is expressed.
 >
 >There is AVCodecContext.pkt_timebase.

Thanks for the feedback, I had missed that property.
I've attached a reworked version of the patch which makes use of it. It
is indeed more clean and precise. I've left a fallback when computing
the expected_next_pts in case it has not been set after creating the
AVCodecContext and left at the {0, 1} value.
Philip-Dylan Gleonec Feb. 18, 2021, 4:38 p.m. UTC | #8
Here is the reworked patch properly attached.
Sorry about the duplicate mail, I just noticed I had a mishap with my
mail client and the previous patch was scrubbed.

Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com>
Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no>
---
 libavcodec/libopusdec.c | 107 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 96 insertions(+), 11 deletions(-)

diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c
index 082a431c6c..3de784dfbd 100644
--- a/libavcodec/libopusdec.c
+++ b/libavcodec/libopusdec.c
@@ -43,10 +43,15 @@ struct libopus_context {
 #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
     int apply_phase_inv;
 #endif
+    int decode_fec;
+    int64_t expected_next_pts;
 };
 
 #define OPUS_HEAD_SIZE 19
 
+// Sample rate is constant as libopus always output at 48kHz
+const AVRational opus_timebase = { 1, 48000 };
+
 static av_cold int libopus_decode_init(AVCodecContext *avc)
 {
     struct libopus_context *opus = avc->priv_data;
@@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc)
     /* Decoder delay (in samples) at 48kHz */
     avc->delay = avc->internal->skip_samples = opus->pre_skip;
 
+    opus->expected_next_pts = AV_NOPTS_VALUE;
+
     return 0;
 }
 
@@ -155,25 +162,102 @@ static int libopus_decode(AVCodecContext *avc, void *data,
 {
     struct libopus_context *opus = avc->priv_data;
     AVFrame *frame               = data;
-    int ret, nb_samples;
+    uint8_t *outptr;
+    int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left;
+
+    // If FEC is enabled, calculate number of lost samples
+    if (opus->decode_fec &&
+        opus->expected_next_pts != AV_NOPTS_VALUE &&
+        pkt->pts != AV_NOPTS_VALUE &&
+        pkt->pts != opus->expected_next_pts) {
+        // Cap at recovering 120 ms of lost audio.
+        nb_lost_samples = pkt->pts - opus->expected_next_pts;
+        nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE);
+        // pts is expressed in ms for some containers (e.g. mkv)
+        // FEC only works for SILK frames (> 10ms)
+        // Detect if nb_lost_samples is in ms, and convert in samples if it is
+        if (nb_lost_samples > 0) {
+            if (avc->pkt_timebase.den != 48000) {
+                nb_lost_samples = av_rescale_q(nb_lost_samples, avc->pkt_timebase, opus_timebase);
+            }
+            // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms
+            if (nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den)) {
+                nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den);
+            }
+        }
+    }
 
-    frame->nb_samples = MAX_FRAME_SIZE;
+    frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples;
     if ((ret = ff_get_buffer(avc, frame, 0)) < 0)
         return ret;
 
+    outptr = frame->data[0];
+    nb_samples_left = frame->nb_samples;
+
+    if (opus->decode_fec && nb_lost_samples > 0) {
+        // Try to recover the lost samples with FEC data from this one.
+        // If there's no FEC data, the decoder will do loss concealment instead.
+        if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
+            ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
+                                                  (opus_int16 *)outptr,
+                                                  nb_lost_samples, 1);
+        else
+            ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
+                                                       (float *)outptr,
+                                                       nb_lost_samples, 1);
+
+        if (ret < 0) {
+            if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
+            av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
+                   opus_strerror(ret));
+            return ff_opus_error_to_averror(ret);
+        }
+
+        av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n",
+                   ret);
+
+        outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt);
+        nb_samples_left -= ret;
+        nb_samples += ret;
+        if (pkt->pts != AV_NOPTS_VALUE) {
+            frame->pts = pkt->pts - ret;
+        }
+    }
+
+    // Decode the actual, non-lost data.
     if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
-        nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
-                                             (opus_int16 *)frame->data[0],
-                                             frame->nb_samples, 0);
+        ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
+                                      (opus_int16 *)outptr,
+                                      nb_samples_left, 0);
     else
-        nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
-                                                   (float *)frame->data[0],
-                                                   frame->nb_samples, 0);
+        ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
+                                            (float *)outptr,
+                                            nb_samples_left, 0);
 
-    if (nb_samples < 0) {
+    if (ret < 0) {
+        if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
         av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
-               opus_strerror(nb_samples));
-        return ff_opus_error_to_averror(nb_samples);
+               opus_strerror(ret));
+        return ff_opus_error_to_averror(ret);
+    }
+    nb_samples += ret;
+
+    av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret);
+
+    if (opus->decode_fec)
+    {
+        // Calculate the next expected pts
+        if (pkt->pts == AV_NOPTS_VALUE) {
+            opus->expected_next_pts = AV_NOPTS_VALUE;
+        } else {
+            if (pkt->duration) {
+                opus->expected_next_pts = pkt->pts + pkt->duration;
+            } else if (avc->pkt_timebase.num) {
+                opus->expected_next_pts = pkt->pts + av_rescale_q(ret, opus_timebase, avc->pkt_timebase);
+            } else {
+                opus->expected_next_pts = pkt->pts + ret;
+            }
+        }
     }
 
 #ifndef OPUS_SET_GAIN
@@ -214,6 +298,7 @@ static const AVOption libopusdec_options[] = {
 #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
     { "apply_phase_inv", "Apply intensity stereo phase inversion", OFFSET(apply_phase_inv), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, FLAGS },
 #endif
+    { "decode_fec", "Decode FEC data or use PLC", OFFSET(decode_fec), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
     { NULL },
 };
Philip-Dylan Gleonec March 3, 2021, 3:51 p.m. UTC | #9
Le 18/02/2021 à 17:38, Philip-Dylan Gleonec a écrit :
> Here is the reworked patch properly attached.
> Sorry about the duplicate mail, I just noticed I had a mishap with my
> mail client and the previous patch was scrubbed.


Hello,

Is someone interested in picking this up, or is there some further 
corrections/improvements I should make to this patch ?

Regards,
Philip-Dylan Gleonec
Lynne April 20, 2021, 6:29 p.m. UTC | #10
Mar 3, 2021, 16:51 by philip-dylan.gleonec@savoirfairelinux.com:

> Le 18/02/2021 à 17:38, Philip-Dylan Gleonec a écrit :
>
>> Here is the reworked patch properly attached.
>> Sorry about the duplicate mail, I just noticed I had a mishap with my
>> mail client and the previous patch was scrubbed.
>>
>
>
> Hello,
>
> Is someone interested in picking this up, or is there some further corrections/improvements I should make to this patch ?
>

Patch looks good.
I'll push it alongside the libopusdec patch once you've resent it.
Tristan Matthews May 18, 2021, 11:51 p.m. UTC | #11
On Thu, Feb 18, 2021 at 11:39 AM Philip-Dylan Gleonec
<philip-dylan.gleonec@savoirfairelinux.com> wrote:
>
> Here is the reworked patch properly attached.
> Sorry about the duplicate mail, I just noticed I had a mishap with my
> mail client and the previous patch was scrubbed.
>
> Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com>
> Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no>
> ---
>  libavcodec/libopusdec.c | 107 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 96 insertions(+), 11 deletions(-)
>
> diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c
> index 082a431c6c..3de784dfbd 100644
> --- a/libavcodec/libopusdec.c
> +++ b/libavcodec/libopusdec.c
> @@ -43,10 +43,15 @@ struct libopus_context {
>  #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
>      int apply_phase_inv;
>  #endif
> +    int decode_fec;
> +    int64_t expected_next_pts;
>  };
>
>  #define OPUS_HEAD_SIZE 19
>
> +// Sample rate is constant as libopus always output at 48kHz
> +const AVRational opus_timebase = { 1, 48000 };
> +
>  static av_cold int libopus_decode_init(AVCodecContext *avc)
>  {
>      struct libopus_context *opus = avc->priv_data;
> @@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc)
>      /* Decoder delay (in samples) at 48kHz */
>      avc->delay = avc->internal->skip_samples = opus->pre_skip;
>
> +    opus->expected_next_pts = AV_NOPTS_VALUE;
> +
>      return 0;
>  }
>
> @@ -155,25 +162,102 @@ static int libopus_decode(AVCodecContext *avc, void *data,
>  {
>      struct libopus_context *opus = avc->priv_data;
>      AVFrame *frame               = data;
> -    int ret, nb_samples;
> +    uint8_t *outptr;
> +    int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left;
> +
> +    // If FEC is enabled, calculate number of lost samples
> +    if (opus->decode_fec &&
> +        opus->expected_next_pts != AV_NOPTS_VALUE &&
> +        pkt->pts != AV_NOPTS_VALUE &&
> +        pkt->pts != opus->expected_next_pts) {
> +        // Cap at recovering 120 ms of lost audio.
> +        nb_lost_samples = pkt->pts - opus->expected_next_pts;
> +        nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE);
> +        // pts is expressed in ms for some containers (e.g. mkv)
> +        // FEC only works for SILK frames (> 10ms)
> +        // Detect if nb_lost_samples is in ms, and convert in samples if it is
> +        if (nb_lost_samples > 0) {
> +            if (avc->pkt_timebase.den != 48000) {
> +                nb_lost_samples = av_rescale_q(nb_lost_samples, avc->pkt_timebase, opus_timebase);
> +            }
> +            // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms
> +            if (nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den)) {
> +                nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den);
> +            }
> +        }
> +    }
>
> -    frame->nb_samples = MAX_FRAME_SIZE;
> +    frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples;
>      if ((ret = ff_get_buffer(avc, frame, 0)) < 0)
>          return ret;
>
> +    outptr = frame->data[0];
> +    nb_samples_left = frame->nb_samples;
> +
> +    if (opus->decode_fec && nb_lost_samples > 0) {
> +        // Try to recover the lost samples with FEC data from this one.
> +        // If there's no FEC data, the decoder will do loss concealment instead.
> +        if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
> +            ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
> +                                                  (opus_int16 *)outptr,
> +                                                  nb_lost_samples, 1);
> +        else
> +            ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
> +                                                       (float *)outptr,
> +                                                       nb_lost_samples, 1);
> +
> +        if (ret < 0) {
> +            if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
> +            av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
> +                   opus_strerror(ret));
> +            return ff_opus_error_to_averror(ret);
> +        }
> +
> +        av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n",
> +                   ret);
> +
> +        outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt);
> +        nb_samples_left -= ret;
> +        nb_samples += ret;
> +        if (pkt->pts != AV_NOPTS_VALUE) {
> +            frame->pts = pkt->pts - ret;
> +        }
> +    }
> +
> +    // Decode the actual, non-lost data.
>      if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
> -        nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
> -                                             (opus_int16 *)frame->data[0],
> -                                             frame->nb_samples, 0);
> +        ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
> +                                      (opus_int16 *)outptr,
> +                                      nb_samples_left, 0);
>      else
> -        nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
> -                                                   (float *)frame->data[0],
> -                                                   frame->nb_samples, 0);
> +        ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
> +                                            (float *)outptr,
> +                                            nb_samples_left, 0);
>
> -    if (nb_samples < 0) {
> +    if (ret < 0) {
> +        if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
>          av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
> -               opus_strerror(nb_samples));
> -        return ff_opus_error_to_averror(nb_samples);
> +               opus_strerror(ret));
> +        return ff_opus_error_to_averror(ret);
> +    }
> +    nb_samples += ret;
> +
> +    av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret);

I would either drop this or only log it at AV_LOG_DEBUG level as it's
pretty verbose during normal operation (especially given that it's the
"normal" case).

Otherwise this seems great, nice work.

Best,
Tristan
Tristan Matthews May 19, 2021, 6:36 p.m. UTC | #12
On Thu, Feb 18, 2021 at 11:39 AM Philip-Dylan Gleonec
<philip-dylan.gleonec@savoirfairelinux.com> wrote:
>
> Here is the reworked patch properly attached.
> Sorry about the duplicate mail, I just noticed I had a mishap with my
> mail client and the previous patch was scrubbed.
>
> Signed-off-by: Philip-Dylan Gleonec <philip-dylan.gleonec@savoirfairelinux.com>
> Co-developed-by: Steinar H. Gunderson <steinar+ffmpeg@gunderson.no>
> ---
>  libavcodec/libopusdec.c | 107 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 96 insertions(+), 11 deletions(-)
>
> diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c
> index 082a431c6c..3de784dfbd 100644
> --- a/libavcodec/libopusdec.c
> +++ b/libavcodec/libopusdec.c
> @@ -43,10 +43,15 @@ struct libopus_context {
>  #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
>      int apply_phase_inv;
>  #endif
> +    int decode_fec;
> +    int64_t expected_next_pts;
>  };
>
>  #define OPUS_HEAD_SIZE 19
>
> +// Sample rate is constant as libopus always output at 48kHz
> +const AVRational opus_timebase = { 1, 48000 };
> +
>  static av_cold int libopus_decode_init(AVCodecContext *avc)
>  {
>      struct libopus_context *opus = avc->priv_data;
> @@ -134,6 +139,8 @@ static av_cold int libopus_decode_init(AVCodecContext *avc)
>      /* Decoder delay (in samples) at 48kHz */
>      avc->delay = avc->internal->skip_samples = opus->pre_skip;
>
> +    opus->expected_next_pts = AV_NOPTS_VALUE;
> +
>      return 0;
>  }
>
> @@ -155,25 +162,102 @@ static int libopus_decode(AVCodecContext *avc, void *data,
>  {
>      struct libopus_context *opus = avc->priv_data;
>      AVFrame *frame               = data;
> -    int ret, nb_samples;
> +    uint8_t *outptr;
> +    int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left;
> +
> +    // If FEC is enabled, calculate number of lost samples
> +    if (opus->decode_fec &&
> +        opus->expected_next_pts != AV_NOPTS_VALUE &&
> +        pkt->pts != AV_NOPTS_VALUE &&
> +        pkt->pts != opus->expected_next_pts) {
> +        // Cap at recovering 120 ms of lost audio.
> +        nb_lost_samples = pkt->pts - opus->expected_next_pts;
> +        nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE);
> +        // pts is expressed in ms for some containers (e.g. mkv)
> +        // FEC only works for SILK frames (> 10ms)
> +        // Detect if nb_lost_samples is in ms, and convert in samples if it is
> +        if (nb_lost_samples > 0) {
> +            if (avc->pkt_timebase.den != 48000) {
> +                nb_lost_samples = av_rescale_q(nb_lost_samples, avc->pkt_timebase, opus_timebase);
> +            }
> +            // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms
> +            if (nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den)) {
> +                nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * opus_timebase.den);
> +            }
> +        }
> +    }
>
> -    frame->nb_samples = MAX_FRAME_SIZE;
> +    frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples;
>      if ((ret = ff_get_buffer(avc, frame, 0)) < 0)
>          return ret;
>
> +    outptr = frame->data[0];
> +    nb_samples_left = frame->nb_samples;
> +
> +    if (opus->decode_fec && nb_lost_samples > 0) {
> +        // Try to recover the lost samples with FEC data from this one.
> +        // If there's no FEC data, the decoder will do loss concealment instead.
> +        if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
> +            ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
> +                                                  (opus_int16 *)outptr,
> +                                                  nb_lost_samples, 1);
> +        else
> +            ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
> +                                                       (float *)outptr,
> +                                                       nb_lost_samples, 1);
> +
> +        if (ret < 0) {
> +            if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
> +            av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
> +                   opus_strerror(ret));
> +            return ff_opus_error_to_averror(ret);
> +        }
> +
> +        av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n",
> +                   ret);
> +
> +        outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt);
> +        nb_samples_left -= ret;
> +        nb_samples += ret;
> +        if (pkt->pts != AV_NOPTS_VALUE) {
> +            frame->pts = pkt->pts - ret;
> +        }
> +    }
> +
> +    // Decode the actual, non-lost data.
>      if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
> -        nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
> -                                             (opus_int16 *)frame->data[0],
> -                                             frame->nb_samples, 0);
> +        ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
> +                                      (opus_int16 *)outptr,
> +                                      nb_samples_left, 0);
>      else
> -        nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
> -                                                   (float *)frame->data[0],
> -                                                   frame->nb_samples, 0);
> +        ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
> +                                            (float *)outptr,
> +                                            nb_samples_left, 0);
>
> -    if (nb_samples < 0) {
> +    if (ret < 0) {
> +        if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
>          av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
> -               opus_strerror(nb_samples));
> -        return ff_opus_error_to_averror(nb_samples);
> +               opus_strerror(ret));
> +        return ff_opus_error_to_averror(ret);
> +    }
> +    nb_samples += ret;
> +
> +    av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret);
> +

I would either drop this or only log it at AV_LOG_DEBUG level as it's
pretty verbose (especially given that it's the "normal" case).

Otherwise this seems great, nice work.

Best,
Tristan
Philip-Dylan Gleonec March 16, 2022, 2 p.m. UTC | #13
Hello,

Please find attached a rebased patchset for the FEC implementation of
libopus. Following the received feedbacks, some improvements have been
done compared to the first version:
- remove a log when a packet is decoded without FEC 
- add a check to only set libopus encoder packet loss estimate if it
  has not changed since previous encoding

Both patches have passed FATE testing successfully. Moreover, the patch
on libopus decoder is used in production in GNU Jami.

Thanks for your time.
diff mbox series

Patch

diff --git a/libavcodec/libopusdec.c b/libavcodec/libopusdec.c
index 082a431c6c..504043353f 100644
--- a/libavcodec/libopusdec.c
+++ b/libavcodec/libopusdec.c
@@ -43,10 +43,15 @@  struct libopus_context {
 #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
     int apply_phase_inv;
 #endif
+    int decode_fec;
+    int64_t expected_next_pts;
 };
 
 #define OPUS_HEAD_SIZE 19
 
+// Sample rate is constant as libopus always output at 48kHz
+#define OPUS_SAMPLERATE 48000
+
 static av_cold int libopus_decode_init(AVCodecContext *avc)
 {
     struct libopus_context *opus = avc->priv_data;
@@ -134,6 +139,8 @@  static av_cold int libopus_decode_init(AVCodecContext *avc)
     /* Decoder delay (in samples) at 48kHz */
     avc->delay = avc->internal->skip_samples = opus->pre_skip;
 
+    opus->expected_next_pts = AV_NOPTS_VALUE;
+
     return 0;
 }
 
@@ -155,25 +162,100 @@  static int libopus_decode(AVCodecContext *avc, void *data,
 {
     struct libopus_context *opus = avc->priv_data;
     AVFrame *frame               = data;
-    int ret, nb_samples;
+    uint8_t *outptr;
+    int ret, nb_samples = 0, nb_lost_samples = 0, nb_samples_left;
+
+    // If FEC is enabled, calculate number of lost samples
+    if (opus->decode_fec &&
+        opus->expected_next_pts != AV_NOPTS_VALUE &&
+        pkt->pts != AV_NOPTS_VALUE &&
+        pkt->pts != opus->expected_next_pts) {
+        // Cap at recovering 120 ms of lost audio.
+        nb_lost_samples = pkt->pts - opus->expected_next_pts;
+        nb_lost_samples = FFMIN(nb_lost_samples, MAX_FRAME_SIZE);
+        // pts is expressed in ms for some containers (e.g. mkv)
+        // FEC only works for SILK frames (> 10ms)
+        // Detect if nb_lost_samples is in ms, and convert in samples if it is
+        if (nb_lost_samples > 0) {
+            if (pkt->duration > 0 && pkt->duration < OPUS_SAMPLERATE * 10 / 1000) {
+                nb_lost_samples = nb_lost_samples * OPUS_SAMPLERATE / 1000;
+            }
+            // For FEC/PLC, frame_size has to be to have a multiple of 2.5 ms
+            if (nb_lost_samples % (int)(2.5 / 1000 * OPUS_SAMPLERATE)) {
+                nb_lost_samples -= nb_lost_samples % (int)(2.5 / 1000 * OPUS_SAMPLERATE);
+            }
+        }
+    }
 
-    frame->nb_samples = MAX_FRAME_SIZE;
+    frame->nb_samples = MAX_FRAME_SIZE + nb_lost_samples;
     if ((ret = ff_get_buffer(avc, frame, 0)) < 0)
         return ret;
 
+    outptr = frame->data[0];
+    nb_samples_left = frame->nb_samples;
+
+    if (opus->decode_fec && nb_lost_samples > 0) {
+        // Try to recover the lost samples with FEC data from this one.
+        // If there's no FEC data, the decoder will do loss concealment instead.
+        if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
+            ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
+                                                  (opus_int16 *)outptr,
+                                                  nb_lost_samples, 1);
+        else
+            ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
+                                                       (float *)outptr,
+                                                       nb_lost_samples, 1);
+
+        if (ret < 0) {
+            if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
+            av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
+                   opus_strerror(ret));
+            return ff_opus_error_to_averror(ret);
+        }
+
+        av_log(avc, AV_LOG_WARNING, "Recovered %d samples with FEC/PLC\n",
+                   ret);
+
+        outptr += ret * avc->channels * av_get_bytes_per_sample(avc->sample_fmt);
+        nb_samples_left -= ret;
+        nb_samples += ret;
+        if (pkt->pts != AV_NOPTS_VALUE) {
+            frame->pts = pkt->pts - ret;
+        }
+    }
+
+    // Decode the actual, non-lost data.
     if (avc->sample_fmt == AV_SAMPLE_FMT_S16)
-        nb_samples = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
-                                             (opus_int16 *)frame->data[0],
-                                             frame->nb_samples, 0);
+        ret = opus_multistream_decode(opus->dec, pkt->data, pkt->size,
+                                      (opus_int16 *)outptr,
+                                      nb_samples_left, 0);
     else
-        nb_samples = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
-                                                   (float *)frame->data[0],
-                                                   frame->nb_samples, 0);
+        ret = opus_multistream_decode_float(opus->dec, pkt->data, pkt->size,
+                                            (float *)outptr,
+                                            nb_samples_left, 0);
 
-    if (nb_samples < 0) {
+    if (ret < 0) {
+        if (opus->decode_fec) opus->expected_next_pts = pkt->pts + pkt->duration;
         av_log(avc, AV_LOG_ERROR, "Decoding error: %s\n",
-               opus_strerror(nb_samples));
-        return ff_opus_error_to_averror(nb_samples);
+               opus_strerror(ret));
+        return ff_opus_error_to_averror(ret);
+    }
+    nb_samples += ret;
+
+    av_log(avc, AV_LOG_WARNING, "Decoded %d samples normally\n", ret);
+
+    if (opus->decode_fec)
+    {
+        // Calculate the next expected pts
+        if (pkt->pts == AV_NOPTS_VALUE) {
+            opus->expected_next_pts = AV_NOPTS_VALUE;
+        } else {
+            if (pkt->duration) {
+                opus->expected_next_pts = pkt->pts + pkt->duration;
+            } else {
+                opus->expected_next_pts = pkt->pts + ret;
+            }
+        }
     }
 
 #ifndef OPUS_SET_GAIN
@@ -214,6 +296,7 @@  static const AVOption libopusdec_options[] = {
 #ifdef OPUS_SET_PHASE_INVERSION_DISABLED_REQUEST
     { "apply_phase_inv", "Apply intensity stereo phase inversion", OFFSET(apply_phase_inv), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, FLAGS },
 #endif
+    { "decode_fec", "Decode FEC data or use PLC", OFFSET(decode_fec), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
     { NULL },
 };