diff mbox series

[FFmpeg-devel] ffprobe/eac3/mlp/dca: add detection of spatial audio extensions

Message ID 20230209044058.2872534-1-marth64@proxyid.net
State New
Headers show
Series [FFmpeg-devel] ffprobe/eac3/mlp/dca: add detection of spatial audio extensions | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 fail Make fate failed

Commit Message

Marth64 Feb. 9, 2023, 4:41 a.m. UTC
Signed-off-by: Marth64 <marth64@proxyid.net>
---
Adds detection of spatial/object-based audio extensions in E-AC-3,
TrueHD, and DCA XLL (DTS). This includes Atmos, DTS:X, and IMAX formats.
Please let me know what I could improve, I'm learning still.
Thank you.

 doc/ffprobe.xsd            |  1 +
 fftools/ffprobe.c          |  3 +++
 libavcodec/ac3dec.c        |  1 +
 libavcodec/ac3dec.h        |  1 +
 libavcodec/avcodec.h       |  6 ++++++
 libavcodec/codec_par.c     |  2 ++
 libavcodec/codec_par.h     |  6 ++++++
 libavcodec/dca_syncwords.h |  3 +++
 libavcodec/dca_xll.c       | 14 ++++++++++++++
 libavcodec/eac3dec.c       | 11 ++++++++++-
 libavcodec/mlpdec.c        |  9 +++++++++
 11 files changed, 56 insertions(+), 1 deletion(-)

Comments

Hendrik Leppkes Feb. 9, 2023, 8:12 p.m. UTC | #1
On Thu, Feb 9, 2023 at 5:42 AM Marth64 <marth64@proxyid.net> wrote:
>
> Signed-off-by: Marth64 <marth64@proxyid.net>
> ---
> Adds detection of spatial/object-based audio extensions in E-AC-3,
> TrueHD, and DCA XLL (DTS). This includes Atmos, DTS:X, and IMAX formats.
> Please let me know what I could improve, I'm learning still.
> Thank you.
>

The detection itself seems fine to me, however we should talk about
how the presence is communicated back to the user.

A new flag in AVCodecContext goes against a variety of designs we try
to avoid - namely having codec-specific things in a global struct, as
well as having only one value, rather then per-frame values.

So options that present themself to me:
(a) Use "profile". At least for DTS that would fit quite nicely, as it
already has profiles, and it seems like a logical extension. TrueHD
and eac3 do not have profiles, but it might still be sensible to put
it there. The advantage here is that it also automatically is conveyed
in AVCodecParameters after avformat opens a stream, so the information
is available early and lets players decide how to handle the stream.
(b) Use per-frame side data. The early-availability advantage is not
present here, so its not my favorite. side-data could be used in the
future to transport the actual object metadata, if needed.

So from where I'm standing we should maybe define profiles to use for
these. In the past profiles were at least suggested for TrueHD Atmos
before, but there were some objections, so maybe a good time to
revisit and see where we go from here.

- Hendrik
Michael Niedermayer Feb. 9, 2023, 10:34 p.m. UTC | #2
On Wed, Feb 08, 2023 at 10:41:00PM -0600, Marth64 wrote:
> Signed-off-by: Marth64 <marth64@proxyid.net>
> ---
> Adds detection of spatial/object-based audio extensions in E-AC-3,
> TrueHD, and DCA XLL (DTS). This includes Atmos, DTS:X, and IMAX formats.
> Please let me know what I could improve, I'm learning still.
> Thank you.
[...]
> @@ -1054,10 +1055,23 @@ static int parse_frame(DCAXllDecoder *s, const uint8_t *data, int size, DCAExssA
>          return ret;
>      if ((ret = parse_band_data(s)) < 0)
>          return ret;
> +
> +    extradata_peek_pos = (get_bits_count(&s->gb) + 31) & ~31;
> +    if (s->frame_size * 8 > extradata_peek_pos) {
> +        unsigned int extradata_syncword = show_bits(&s->gb, 32);

show_bits_long()


[..]
Marth64 Feb. 10, 2023, 12:03 a.m. UTC | #3
Hi,

Thank you for your time and thoughts. Some of this I had wondered about the
same.

Re: Hendrik, Using profile >
This was an original intention of mine but I changed course. I'm happy to
do it, but felt too unsure for a first pass.
My reasoning being that I'm not sure if the presence of extension metadata
itself qualifies as a discrete profile. For DCA in particular, I was
worried since DCA already expands to profiles (ES, XLL, etc.). I did not
want to clutter those distinctions with a "somewhat profile of a profile,
based on an educated guess without the reference docs" and break any
existing integrations. Likewise, EAC3 and TrueHD didn't have profiles, so
it felt tacked on for this case. So I settled with "extension" as the
marker. That said, I wasn't too thrilled about adding to AVCodecContext
either. I discovered and considered priv_data but then realized that this
is a pattern across 3 codecs, maybe more in the future. So definitely open
to guidance here. Profile is probably the next best bet.

I had gone down the frame-level inspection road at some point, but came to
a similar conclusion as you, it makes this less useful as a feature.

I am open to other's interpretation. Will ponder this a little more.

Re: Michael, show_bits_long >
Will fix. I am trying to procure another IMAX DTS material to test the
syncword better, so will push any of those changes together in the next 2
days.

Thank you!



On Thu, Feb 9, 2023 at 2:12 PM Hendrik Leppkes <h.leppkes@gmail.com> wrote:

> On Thu, Feb 9, 2023 at 5:42 AM Marth64 <marth64@proxyid.net> wrote:
> >
> > Signed-off-by: Marth64 <marth64@proxyid.net>
> > ---
> > Adds detection of spatial/object-based audio extensions in E-AC-3,
> > TrueHD, and DCA XLL (DTS). This includes Atmos, DTS:X, and IMAX formats.
> > Please let me know what I could improve, I'm learning still.
> > Thank you.
> >
>
> The detection itself seems fine to me, however we should talk about
> how the presence is communicated back to the user.
>
> A new flag in AVCodecContext goes against a variety of designs we try
> to avoid - namely having codec-specific things in a global struct, as
> well as having only one value, rather then per-frame values.
>
> So options that present themself to me:
> (a) Use "profile". At least for DTS that would fit quite nicely, as it
> already has profiles, and it seems like a logical extension. TrueHD
> and eac3 do not have profiles, but it might still be sensible to put
> it there. The advantage here is that it also automatically is conveyed
> in AVCodecParameters after avformat opens a stream, so the information
> is available early and lets players decide how to handle the stream.
> (b) Use per-frame side data. The early-availability advantage is not
> present here, so its not my favorite. side-data could be used in the
> future to transport the actual object metadata, if needed.
>
> So from where I'm standing we should maybe define profiles to use for
> these. In the past profiles were at least suggested for TrueHD Atmos
> before, but there were some objections, so maybe a good time to
> revisit and see where we go from here.
>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Marth64 Feb. 10, 2023, 12:07 a.m. UTC | #4
Ack'd. Nice catch. Thank you!

On Thu, Feb 9, 2023 at 4:35 PM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Wed, Feb 08, 2023 at 10:41:00PM -0600, Marth64 wrote:
> > Signed-off-by: Marth64 <marth64@proxyid.net>
> > ---
> > Adds detection of spatial/object-based audio extensions in E-AC-3,
> > TrueHD, and DCA XLL (DTS). This includes Atmos, DTS:X, and IMAX formats.
> > Please let me know what I could improve, I'm learning still.
> > Thank you.
> [...]
> > @@ -1054,10 +1055,23 @@ static int parse_frame(DCAXllDecoder *s, const
> uint8_t *data, int size, DCAExssA
> >          return ret;
> >      if ((ret = parse_band_data(s)) < 0)
> >          return ret;
> > +
> > +    extradata_peek_pos = (get_bits_count(&s->gb) + 31) & ~31;
> > +    if (s->frame_size * 8 > extradata_peek_pos) {
> > +        unsigned int extradata_syncword = show_bits(&s->gb, 32);
>
> show_bits_long()
>
>
> [..]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> The misfortune of the wise is better than the prosperity of the fool.
> -- Epicurus
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
diff mbox series

Patch

diff --git a/doc/ffprobe.xsd b/doc/ffprobe.xsd
index 0920380108..a01a4359dc 100644
--- a/doc/ffprobe.xsd
+++ b/doc/ffprobe.xsd
@@ -247,6 +247,7 @@ 
       <xsd:attribute name="channel_layout"   type="xsd:string"/>
       <xsd:attribute name="bits_per_sample"  type="xsd:int"/>
       <xsd:attribute name="initial_padding"  type="xsd:int"/>
+      <xsd:attribute name="spatial_ext"      type="xsd:boolean"/>
 
       <xsd:attribute name="id"               type="xsd:string"/>
       <xsd:attribute name="r_frame_rate"     type="xsd:string" use="required"/>
diff --git a/fftools/ffprobe.c b/fftools/ffprobe.c
index dfa7ff1b24..7e81088c56 100644
--- a/fftools/ffprobe.c
+++ b/fftools/ffprobe.c
@@ -3071,6 +3071,9 @@  static int show_stream(WriterContext *w, AVFormatContext *fmt_ctx, int stream_id
         print_int("bits_per_sample", av_get_bits_per_sample(par->codec_id));
 
         print_int("initial_padding", par->initial_padding);
+        if (par->spatial_ext > 0) {
+            print_int("spatial_ext",  par->spatial_ext);
+        }
         break;
 
     case AVMEDIA_TYPE_SUBTITLE:
diff --git a/libavcodec/ac3dec.c b/libavcodec/ac3dec.c
index 0b120e6140..009acd9ff6 100644
--- a/libavcodec/ac3dec.c
+++ b/libavcodec/ac3dec.c
@@ -1714,6 +1714,7 @@  skip:
     if (!err) {
         avctx->sample_rate = s->sample_rate;
         avctx->bit_rate    = s->bit_rate + s->prev_bit_rate;
+        avctx->spatial_ext = s->eac3_extension_type_a == 1;
     }
 
     if (!avctx->sample_rate) {
diff --git a/libavcodec/ac3dec.h b/libavcodec/ac3dec.h
index 138b462abb..0829f4b40d 100644
--- a/libavcodec/ac3dec.h
+++ b/libavcodec/ac3dec.h
@@ -102,6 +102,7 @@  typedef struct AC3DecodeContext {
     int eac3;                               ///< indicates if current frame is E-AC-3
     int eac3_frame_dependent_found;         ///< bitstream has E-AC-3 dependent frame(s)
     int eac3_subsbtreamid_found;            ///< bitstream has E-AC-3 additional substream(s)
+    int eac3_extension_type_a;              ///< bitstream has E-AC-3 extension type A enabled frame(s)
     int dolby_surround_mode;                ///< dolby surround mode                    (dsurmod)
     int dolby_surround_ex_mode;             ///< dolby surround ex mode                 (dsurexmod)
     int dolby_headphone_mode;               ///< dolby headphone mode                   (dheadphonmod)
diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 755e543fac..8b54a99313 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -2104,6 +2104,12 @@  typedef struct AVCodecContext {
      *             The decoder can then override during decoding as needed.
      */
     AVChannelLayout ch_layout;
+
+    /**
+     * Audio only. Whether or not a spatial audio extension is
+     * detected in the stream (object-based surround).
+     */
+    int spatial_ext;
 } AVCodecContext;
 
 /**
diff --git a/libavcodec/codec_par.c b/libavcodec/codec_par.c
index abda649aa8..02f6da5059 100644
--- a/libavcodec/codec_par.c
+++ b/libavcodec/codec_par.c
@@ -161,6 +161,7 @@  FF_ENABLE_DEPRECATION_WARNINGS
         par->initial_padding  = codec->initial_padding;
         par->trailing_padding = codec->trailing_padding;
         par->seek_preroll     = codec->seek_preroll;
+        par->spatial_ext      = codec->spatial_ext;
         break;
     case AVMEDIA_TYPE_SUBTITLE:
         par->width  = codec->width;
@@ -243,6 +244,7 @@  FF_ENABLE_DEPRECATION_WARNINGS
         codec->initial_padding  = par->initial_padding;
         codec->trailing_padding = par->trailing_padding;
         codec->seek_preroll     = par->seek_preroll;
+        codec->spatial_ext      = par->spatial_ext;
         break;
     case AVMEDIA_TYPE_SUBTITLE:
         codec->width  = par->width;
diff --git a/libavcodec/codec_par.h b/libavcodec/codec_par.h
index f51d27c590..287b138b6b 100644
--- a/libavcodec/codec_par.h
+++ b/libavcodec/codec_par.h
@@ -211,6 +211,12 @@  typedef struct AVCodecParameters {
      * Audio only. The channel layout and number of channels.
      */
     AVChannelLayout ch_layout;
+
+    /**
+     * Audio only. Whether or not a spatial audio extension is
+     * detected in the stream (object-based surround).
+     */
+    int spatial_ext;
 } AVCodecParameters;
 
 /**
diff --git a/libavcodec/dca_syncwords.h b/libavcodec/dca_syncwords.h
index 4d2cd5f56d..ccbc38bb38 100644
--- a/libavcodec/dca_syncwords.h
+++ b/libavcodec/dca_syncwords.h
@@ -33,4 +33,7 @@ 
 #define    DCA_SYNCWORD_SUBSTREAM_CORE       0x02B09261U
 #define    DCA_SYNCWORD_REV1AUX              0x9A1105A0U
 
+#define    DCA_SYNCWORD_XLL_X                0x20008
+#define    DCA_SYNCWORD_XLL_IMAX             (0xF14000D0 >> 1)
+
 #endif /* AVCODEC_DCA_SYNCWORDS_H */
diff --git a/libavcodec/dca_xll.c b/libavcodec/dca_xll.c
index fe2c766d98..6b64f907cc 100644
--- a/libavcodec/dca_xll.c
+++ b/libavcodec/dca_xll.c
@@ -1043,6 +1043,7 @@  static int parse_band_data(DCAXllDecoder *s)
 static int parse_frame(DCAXllDecoder *s, const uint8_t *data, int size, DCAExssAsset *asset)
 {
     int ret;
+    int extradata_peek_pos;
 
     if ((ret = init_get_bits8(&s->gb, data, size)) < 0)
         return ret;
@@ -1054,10 +1055,23 @@  static int parse_frame(DCAXllDecoder *s, const uint8_t *data, int size, DCAExssA
         return ret;
     if ((ret = parse_band_data(s)) < 0)
         return ret;
+
+    extradata_peek_pos = (get_bits_count(&s->gb) + 31) & ~31;
+    if (s->frame_size * 8 > extradata_peek_pos) {
+        unsigned int extradata_syncword = show_bits(&s->gb, 32);
+
+        if (extradata_syncword == DCA_SYNCWORD_XLL_X) {
+            s->avctx->spatial_ext = 1;
+        } else if ((extradata_syncword >> 1) == DCA_SYNCWORD_XLL_IMAX) {
+            s->avctx->spatial_ext = 1;
+        }
+    }
+
     if (ff_dca_seek_bits(&s->gb, s->frame_size * 8)) {
         av_log(s->avctx, AV_LOG_ERROR, "Read past end of XLL frame\n");
         return AVERROR_INVALIDDATA;
     }
+
     return ret;
 }
 
diff --git a/libavcodec/eac3dec.c b/libavcodec/eac3dec.c
index deca51dd3d..5c71751a0c 100644
--- a/libavcodec/eac3dec.c
+++ b/libavcodec/eac3dec.c
@@ -464,7 +464,16 @@  static int ff_eac3_parse_header(AC3DecodeContext *s)
     if (get_bits1(gbc)) {
         int addbsil = get_bits(gbc, 6);
         for (i = 0; i < addbsil + 1; i++) {
-            skip_bits(gbc, 8); // skip additional bit stream info
+            if (i == 0) {
+                /* In this 8 bit chunk, the LSB is equal to flag_ec3_extension_type_a
+                   which can be used to detect Atmos presence */
+                skip_bits(gbc, 7);
+                if (get_bits1(gbc)) {
+                    s->eac3_extension_type_a = 1;
+                }
+            } else {
+                skip_bits(gbc, 8); // skip additional bit stream info
+            }
         }
     }
 
diff --git a/libavcodec/mlpdec.c b/libavcodec/mlpdec.c
index 0ee1f0982c..547e68342e 100644
--- a/libavcodec/mlpdec.c
+++ b/libavcodec/mlpdec.c
@@ -392,6 +392,15 @@  static int read_major_sync(MLPDecodeContext *m, GetBitContext *gb)
     m->num_substreams        = mh.num_substreams;
     m->substream_info        = mh.substream_info;
 
+    /*  If there is a 4th substream and the MSB of substream_info is set,
+     *  there is a 16-channel spatial presentation (Atmos in TrueHD).
+     */
+    if (m->avctx->codec_id == AV_CODEC_ID_TRUEHD
+            && m->num_substreams == 4
+            && m->substream_info >> 7 == 1) {
+        m->avctx->spatial_ext = 1;
+    }
+
     /* limit to decoding 3 substreams, as the 4th is used by Dolby Atmos for non-audio data */
     m->max_decoded_substream = FFMIN(m->num_substreams - 1, 2);