[FFmpeg-devel,v3,1/2] lavf/isom: support for demuxing MPEG-H 3D Audio in MP4

Submitted by Yuki.Tsuchiya on Nov. 1, 2019, 5:16 a.m.

Details

Message ID 1572585384-12000-1-git-send-email-Yuki.Tsuchiya@sony.com
State New
Headers show

Commit Message

Yuki.Tsuchiya Nov. 1, 2019, 5:16 a.m.
Implemented according to the specification at https://www.iso.org/standard/69561.html
The "mhm1" sample entry is registered with MP4RA, which is defined as MHAS encapsulated single stream MPEG-H 3D Audio.
"MHAS" stands for MPEG-H audio stream, which contains encoded audio data and corresponds metadata for decoding.
This patch enables extracting the MHAS bitstream from MP4.

Signed-off-by: Yuki Tsuchiya <Yuki.Tsuchiya@sony.com>
---
 libavcodec/avcodec.h    | 1 +
 libavcodec/codec_desc.c | 7 +++++++
 libavcodec/version.h    | 2 +-
 libavformat/isom.c      | 1 +
 libavformat/movenc.c    | 6 ++++--
 libavformat/utils.c     | 3 ++-
 6 files changed, 16 insertions(+), 4 deletions(-)

Comments

Carl Eugen Hoyos Nov. 6, 2019, 12:19 a.m.
Am Fr., 1. Nov. 2019 um 06:24 Uhr schrieb Yuki Tsuchiya
<Yuki.Tsuchiya@sony.com>:
>
> Implemented according to the specification at https://www.iso.org/standard/69561.html
> The "mhm1" sample entry is registered with MP4RA, which is defined as MHAS encapsulated single stream MPEG-H 3D Audio.
> "MHAS" stands for MPEG-H audio stream, which contains encoded audio data and corresponds metadata for decoding.
> This patch enables extracting the MHAS bitstream from MP4.

I will push this if there are no objections.

Carl Eugen
Jan Ekström Nov. 9, 2019, 2:04 a.m.
On Fri, Nov 1, 2019 at 7:24 AM Yuki Tsuchiya <Yuki.Tsuchiya@sony.com> wrote:
>
> Implemented according to the specification at https://www.iso.org/standard/69561.html
> The "mhm1" sample entry is registered with MP4RA, which is defined as MHAS encapsulated single stream MPEG-H 3D Audio.
> "MHAS" stands for MPEG-H audio stream, which contains encoded audio data and corresponds metadata for decoding.
> This patch enables extracting the MHAS bitstream from MP4.
>
> Signed-off-by: Yuki Tsuchiya <Yuki.Tsuchiya@sony.com>
> ---

Sorry for the late response, there have been various things recently :) .

All of the samples I've seen in the wild (well, on the DASH-IF test
vector list, which is the only place I've seen both AC-4 and MPEG-H
Audio at until now) seem to utilize mha1, such as
https://dash.akamaized.net/dash264/TestCasesMCA/fraunhofer/MPEGH_714_lc_mha1/1/Sintel/Sintel.2010_1080p_incl_Credits_new_cicp19_16bit-eng-893s-12-mpegh-256000bps_seg.mp4
.

Thus my initial question is if there is any reason why 'mha1' is not
added as well? Was that removed from the MP4 container specification
afterwards? Additionally, are there any MPEG-H Audio specific
configuration/etc boxes required to be read/written for valid decoding
or to create a valid mux according to the spec which should be
handled?

Best regards,
Jan
Yuki.Tsuchiya Nov. 9, 2019, 8:12 a.m.
Hi Jan,

Thank you for the comment.

> All of the samples I've seen in the wild (well, on the DASH-IF test

> vector list, which is the only place I've seen both AC-4 and MPEG-H

> Audio at until now) seem to utilize mha1, such as

> https://dash.akamaized.net/dash264/TestCasesMCA/fraunhofer/MPEGH_714_lc_mha1/1/Sintel/Sintel.2010_1080p_incl_Credits_new_cicp19_16bit-eng-893s-12-mpegh-256000bps_seg.mp4

> Thus my initial question is if there is any reason why 'mha1' is not

> added as well? Was that removed from the MP4 container specification

> afterwards?

'mha1' is still documented on ISO, but the latest DASH-IOP specifies to use only mhm1 (https://dashif.org/docs/DASH-IF-IOP-v4.3.pdf) from v4.3.
So it seems likely that mhm1 will become majority in MPEG-H 3D Audio in MP4. This is why this patch supports mhm1 as priority.

> Additionally, are there any MPEG-H Audio specific

> configuration/etc boxes required to be read/written for valid decoding

> or to create a valid mux according to the spec which should be

> handled?

In mha1 case, it is required to handle 'mhaC' box which contains configuration for decoding. 
In mhm1 case (this patch), MHAS bitstream in mdat has the configuration, so the 'mhaC' is not required to handle.

Regards.

Yuki Tsuchiya
Yuki.Tsuchiya Nov. 15, 2019, 8:38 a.m.
I will rebase against current master and send the new patch.

Hi Jan,
Do you have any comment to my answer?

Regards,

Yuki Tsuchiya

Patch hide | download patch | download mbox

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index bcb931f..8c1a85d 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -654,6 +654,7 @@  enum AVCodecID {
     AV_CODEC_ID_ATRAC9,
     AV_CODEC_ID_HCOM,
     AV_CODEC_ID_ACELP_KELVIN,
+    AV_CODEC_ID_MPEGH_3D_AUDIO,
 
     /* subtitle codecs */
     AV_CODEC_ID_FIRST_SUBTITLE = 0x17000,          ///< A dummy ID pointing at the start of subtitle codecs.
diff --git a/libavcodec/codec_desc.c b/libavcodec/codec_desc.c
index 0602ecb..a970fae 100644
--- a/libavcodec/codec_desc.c
+++ b/libavcodec/codec_desc.c
@@ -2998,6 +2998,13 @@  static const AVCodecDescriptor codec_descriptors[] = {
         .long_name = NULL_IF_CONFIG_SMALL("Sipro ACELP.KELVIN"),
         .props     = AV_CODEC_PROP_LOSSY,
     },
+    {
+        .id        = AV_CODEC_ID_MPEGH_3D_AUDIO,
+        .type      = AVMEDIA_TYPE_AUDIO,
+        .name      = "mpegh_3d_audio",
+        .long_name = NULL_IF_CONFIG_SMALL("MPEG-H 3D Audio"),
+        .props     = AV_CODEC_PROP_LOSSY,
+    },
 
     /* subtitle codecs */
     {
diff --git a/libavcodec/version.h b/libavcodec/version.h
index 27c126e..b36f331 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -28,7 +28,7 @@ 
 #include "libavutil/version.h"
 
 #define LIBAVCODEC_VERSION_MAJOR  58
-#define LIBAVCODEC_VERSION_MINOR  60
+#define LIBAVCODEC_VERSION_MINOR  61
 #define LIBAVCODEC_VERSION_MICRO 100
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
diff --git a/libavformat/isom.c b/libavformat/isom.c
index edd0d81..824e811 100644
--- a/libavformat/isom.c
+++ b/libavformat/isom.c
@@ -371,6 +371,7 @@  const AVCodecTag ff_codec_movaudio_tags[] = {
     { AV_CODEC_ID_FLAC,            MKTAG('f', 'L', 'a', 'C') }, /* nonstandard */
     { AV_CODEC_ID_TRUEHD,          MKTAG('m', 'l', 'p', 'a') }, /* mp4ra.org */
     { AV_CODEC_ID_OPUS,            MKTAG('O', 'p', 'u', 's') }, /* mp4ra.org */
+    { AV_CODEC_ID_MPEGH_3D_AUDIO,  MKTAG('m', 'h', 'm', '1') }, /* MPEG-H 3D Audio bitstream */
     { AV_CODEC_ID_NONE, 0 },
 };
 
diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 715bec1..ff234d9 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -2411,7 +2411,7 @@  static int mov_preroll_write_stbl_atoms(AVIOContext *pb, MOVTrack *track)
     if (!sgpd_entries)
         return AVERROR(ENOMEM);
 
-    av_assert0(track->par->codec_id == AV_CODEC_ID_OPUS || track->par->codec_id == AV_CODEC_ID_AAC);
+    av_assert0(track->par->codec_id == AV_CODEC_ID_OPUS || track->par->codec_id == AV_CODEC_ID_AAC || track->par->codec_id == AV_CODEC_ID_MPEGH_3D_AUDIO);
 
     if (track->par->codec_id == AV_CODEC_ID_OPUS) {
         for (i = 0; i < track->entry; i++) {
@@ -2493,6 +2493,7 @@  static int mov_write_stbl_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext
     mov_write_stts_tag(pb, track);
     if ((track->par->codec_type == AVMEDIA_TYPE_VIDEO ||
          track->par->codec_id == AV_CODEC_ID_TRUEHD ||
+         track->par->codec_id == AV_CODEC_ID_MPEGH_3D_AUDIO ||
          track->par->codec_tag == MKTAG('r','t','p',' ')) &&
         track->has_keyframes && track->has_keyframes < track->entry)
         mov_write_stss_tag(pb, track, MOV_SYNC_SAMPLE);
@@ -2512,7 +2513,7 @@  static int mov_write_stbl_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContext
     if (track->cenc.aes_ctr) {
         ff_mov_cenc_write_stbl_atoms(&track->cenc, pb);
     }
-    if (track->par->codec_id == AV_CODEC_ID_OPUS || track->par->codec_id == AV_CODEC_ID_AAC) {
+    if (track->par->codec_id == AV_CODEC_ID_OPUS || track->par->codec_id == AV_CODEC_ID_AAC || track->par->codec_id == AV_CODEC_ID_MPEGH_3D_AUDIO) {
         mov_preroll_write_stbl_atoms(pb, track);
     }
     return update_size(pb, pos);
@@ -6877,6 +6878,7 @@  const AVCodecTag codec_mp4_tags[] = {
     { AV_CODEC_ID_DVD_SUBTITLE, MKTAG('m', 'p', '4', 's') },
     { AV_CODEC_ID_MOV_TEXT    , MKTAG('t', 'x', '3', 'g') },
     { AV_CODEC_ID_BIN_DATA    , MKTAG('g', 'p', 'm', 'd') },
+    { AV_CODEC_ID_MPEGH_3D_AUDIO, MKTAG('m', 'h', 'm', '1') },
     { AV_CODEC_ID_NONE        ,    0 },
 };
 
diff --git a/libavformat/utils.c b/libavformat/utils.c
index cfb6d03..d271251 100644
--- a/libavformat/utils.c
+++ b/libavformat/utils.c
@@ -1021,7 +1021,8 @@  static int is_intra_only(enum AVCodecID id)
     const AVCodecDescriptor *d = avcodec_descriptor_get(id);
     if (!d)
         return 0;
-    if (d->type == AVMEDIA_TYPE_VIDEO && !(d->props & AV_CODEC_PROP_INTRA_ONLY))
+    if ((d->type == AVMEDIA_TYPE_VIDEO && !(d->props & AV_CODEC_PROP_INTRA_ONLY)) ||
+        id == AV_CODEC_ID_MPEGH_3D_AUDIO)
         return 0;
     return 1;
 }