diff mbox series

[FFmpeg-devel] aacdec: padding skip improvements

Message ID Nfni27T--3-9@lynne.ee
State New
Headers show
Series [FFmpeg-devel] aacdec: padding skip improvements | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 fail Make fate failed

Commit Message

Lynne Oct. 3, 2023, 4:07 a.m. UTC
For some reason, this was never set, which meant all **raw** AAC in ADTS
streams, except faac, had extra samples at the start.

Despite this being a standard MDCT-based codec with a frame size of 1024,
hence a delay of 1024 samples at the start, all major encoders, excluding
faac and FFmpeg, use 2048 samples of padding.

Because this is also such a mess, make it an option to allow for
users to configure the amount of padding.

The FFmpeg encoder will be modified to also output 2048 samples of padding
at the start, to make it in line with other encoders.

Yes, this leaves FATE pretty sad. Will be fixed by another patch once
the discussions have completed as to what should be the default.

Comments

Andreas Rheinhardt Oct. 3, 2023, 4:40 p.m. UTC | #1
Lynne:
> The FFmpeg encoder will be modified to also output 2048 samples of
> padding at the start, to make it in line with other encoders.

Once again: What is the advantage? Doing what lots of other codecs do is
not a real advantage.

> 
> +    { "padding", "Override the padding at the start of a stream.\n",
> +        offsetof(AACContext, override_padding), AV_OPT_TYPE_INT, { .i64 = 2048 }, 1024, 8192, AACDEC_FLAGS },
> +

A decoder is supposed to decode and output what is given to it by
default and not trim it according to what you expect to be normal for
encoders for a given format. It is not even clear that there are any
padding samples at the start: The first packet to be fed to the decoder
could be from the middle of a file. In other words, the default value of
samples to discard should be zero.

Furthermore, the description is wrong, because this option itself might
be overridden by packet side data.

- Andreas
Lynne Oct. 5, 2023, 2:37 a.m. UTC | #2
Oct 3, 2023, 18:39 by andreas.rheinhardt@outlook.com:

> Lynne:
>
>> The FFmpeg encoder will be modified to also output 2048 samples of
>> padding at the start, to make it in line with other encoders.
>>
>
> Once again: What is the advantage? Doing what lots of other codecs do is
> not a real advantage.
>
>>
>> +    { "padding", "Override the padding at the start of a stream.\n",
>> +        offsetof(AACContext, override_padding), AV_OPT_TYPE_INT, { .i64 = 2048 }, 1024, 8192, AACDEC_FLAGS },
>> +
>>
>
> A decoder is supposed to decode and output what is given to it by
> default and not trim it according to what you expect to be normal for
> encoders for a given format. It is not even clear that there are any
> padding samples at the start: The first packet to be fed to the decoder
> could be from the middle of a file. In other words, the default value of
> samples to discard should be zero.
>

There are always at least 1024 samples padded in all AAC files,
this is how an MDCT operates.
Currently, we let that bit of padding through, so absolutely no
streams are gapless when played through our AAC decoder.

2048 is just more widely encountered. I'm posting these patches
to get an opinion:
 - do we cut nothing at all (currently)
 - do we cut 1024 (required by the standard/algorithm, and currently what our AAC ENcoder outputs)
 - do we cut 2048 (what is most widely expected)

My preference would be 1024. Note: this is only for raw AAC
streams outside of a container. Streams inside of a container are currently
correctly cut, with this value being overridden.

There is also the issue of HE-AAC streams having a lot more padding,
with a lot more variation, but that's for another patch to attempt
to solve (which it probably couldn't).
Derek Buitenhuis Oct. 5, 2023, 12:24 p.m. UTC | #3
On 10/3/2023 5:07 AM, Lynne wrote:
> Despite this being a standard MDCT-based codec with a frame size of 1024,
> hence a delay of 1024 samples at the start, all major encoders, excluding
> faac and FFmpeg, use 2048 samples of padding.

Apple's uses 2112.

- Derek
Cosmin Stejerean Oct. 5, 2023, 5:22 p.m. UTC | #4
> On Oct 4, 2023, at 7:37 PM, Lynne <dev@lynne.ee> wrote:
> 
> 2048 is just more widely encountered. I'm posting these patches
> to get an opinion:
>  - do we cut nothing at all (currently)
>  - do we cut 1024 (required by the standard/algorithm, and currently what our AAC ENcoder outputs)
>  - do we cut 2048 (what is most widely expected)
> 
> My preference would be 1024. Note: this is only for raw AAC
> streams outside of a container. Streams inside of a container are currently
> correctly cut, with this value being overridden.

I think it would be good by default to have the ffmpeg encoder and decoder roundtrip correctly, and then leverage an option to specify a different padding if a different encoder was used. I think the important property is that if we take a PCM input, encode it to raw AAC with ffmpeg and then decode it to PCM output that we have the same number of PCM samples and they're all properly aligned. 

Since we can't correctly handle unknown encoders (that might be using 2048 or 2112 priming samples) for the option to override the priming samples should be used in that case (which could also be set to 0 if someone really wants potentially distorted output before the decoder was properly primed).

- Cosmin
Lynne Oct. 5, 2023, 5:32 p.m. UTC | #5
Oct 5, 2023, 14:24 by derek.buitenhuis@gmail.com:

> On 10/3/2023 5:07 AM, Lynne wrote:
>
>> Despite this being a standard MDCT-based codec with a frame size of 1024,
>> hence a delay of 1024 samples at the start, all major encoders, excluding
>> faac and FFmpeg, use 2048 samples of padding.
>>
>
> Apple's uses 2112.
>

I counted 2048 with af-convert. Are you referring to qaac?
Derek Buitenhuis Oct. 6, 2023, 5:01 p.m. UTC | #6
On 10/5/2023 6:32 PM, Lynne wrote:
> I counted 2048 with af-convert. Are you referring to qaac?

Referring to Apple's own documents, which have 2112 plastered everywhere.

- Derek
Thierry Foucu Oct. 6, 2023, 6:58 p.m. UTC | #7
On Fri, Oct 6, 2023 at 10:02 AM Derek Buitenhuis <derek.buitenhuis@gmail.com>
wrote:

> On 10/5/2023 6:32 PM, Lynne wrote:
> > I counted 2048 with af-convert. Are you referring to qaac?
>
> Referring to Apple's own documents, which have 2112 plastered everywhere.
>

And another Fraunhofer implementation is padding 1600 samples.


>
> - Derek
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
diff mbox series

Patch

From dca29a894a60cdca1682fa6d0d3ee56c552b3d49 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Tue, 3 Oct 2023 05:57:50 +0200
Subject: [PATCH] aacdec: padding skip improvements

For some reason, this was never set, which meant all **raw** AAC in ADTS
streams, except faac, had extra samples at the start.

Despite this being a standard MDCT-based codec with a frame size of 1024,
hence a delay of 1024 samples at the start, all major encoders, excluding
faac and FFmpeg, use 2048 samples of padding.

The FFmpeg encoder will be modified to also output 2048 samples of padding
at the start, to make it in line with other encoders.

Yes, this leaves FATE pretty sad. Will fix it with the real version of the patch.
---
 libavcodec/aac.h             |  1 +
 libavcodec/aacdec_template.c | 20 +++++++++++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/libavcodec/aac.h b/libavcodec/aac.h
index 285d3b7482..79bbb3cce5 100644
--- a/libavcodec/aac.h
+++ b/libavcodec/aac.h
@@ -298,6 +298,7 @@  struct AACContext {
     AVCodecContext *avctx;
     AVFrame *frame;
 
+    int override_padding;
     int is_saved;                 ///< Set if elements have stored overlap from previous frame.
     DynamicRangeControl che_drc;
 
diff --git a/libavcodec/aacdec_template.c b/libavcodec/aacdec_template.c
index 51a4cb2b66..70c5c6ab4c 100644
--- a/libavcodec/aacdec_template.c
+++ b/libavcodec/aacdec_template.c
@@ -1273,6 +1273,9 @@  static av_cold int aac_decode_init(AVCodecContext *avctx)
     if (ret < 0)
         return ret;
 
+    /* Usually overridden by side data */
+    avctx->internal->skip_samples = ac->override_padding;
+
     return 0;
 }
 
@@ -2417,14 +2420,16 @@  static int decode_dynamic_range(DynamicRangeControl *che_drc,
     return n;
 }
 
-static int decode_fill(AACContext *ac, GetBitContext *gb, int len) {
-    uint8_t buf[256];
-    int i, major, minor;
+static int decode_fill(AACContext *ac, GetBitContext *gb, int len)
+{
+    uint8_t buf[256] = { 0 };
+    int i, major, minor, micro;
 
     if (len < 13+7*8)
         goto unknown;
 
-    get_bits(gb, 13); len -= 13;
+    get_bits(gb, 13);
+    len -= 13;
 
     for(i=0; i+1<sizeof(buf) && len>=8; i++, len-=8)
         buf[i] = get_bits(gb, 8);
@@ -2434,7 +2439,9 @@  static int decode_fill(AACContext *ac, GetBitContext *gb, int len) {
         av_log(ac->avctx, AV_LOG_DEBUG, "FILL:%s\n", buf);
 
     if (sscanf(buf, "libfaac %d.%d", &major, &minor) == 2){
-        ac->avctx->internal->skip_samples = 1024;
+        ac->avctx->internal->skip_samples -= 1024;
+    } else if ((sscanf(buf, "avc %d.%d.%d", &major, &minor, &micro) == 3)) {
+        ac->avctx->internal->skip_samples -= 1024;
     }
 
 unknown:
@@ -3471,6 +3478,9 @@  static const AVOption options[] = {
       { "coded",    "order in which the channels are coded in the bitstream",
         0, AV_OPT_TYPE_CONST, { .i64 = CHANNEL_ORDER_CODED }, .flags = AACDEC_FLAGS, "channel_order" },
 
+    { "padding", "Override the padding at the start of a stream.\n",
+        offsetof(AACContext, override_padding), AV_OPT_TYPE_INT, { .i64 = 2048 }, 1024, 8192, AACDEC_FLAGS },
+
     {NULL},
 };
 
-- 
2.42.0