diff mbox series

[FFmpeg-devel,4/6] avformat/mov: parse ISO-14496-12 ChannelLayout

Message ID tencent_96272E38BF9664CD28A03D4D874FE3556007@qq.com
State New
Headers show
Series add PCM in mp4 support | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Zhao Zhili Feb. 24, 2023, 12:25 p.m. UTC
From: Zhao Zhili <zhilizhao@tencent.com>

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
---
 libavformat/mov.c      |  79 +++++++++++-
 libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
 libavformat/mov_chan.h |  26 ++++
 3 files changed, 369 insertions(+), 1 deletion(-)

Comments

Tomas Härdin Feb. 24, 2023, 9:42 a.m. UTC | #1
fre 2023-02-24 klockan 20:25 +0800 skrev Zhao Zhili:
> +        if (!layout) {
> +            uint8_t positions[64] = {};

Is there a maximum number of channels defined somewhere? stsd supports
up to 65535.

> +    // stream carries objects
> +    if (stream_structure & 2) {
> +        int obj_count = avio_r8(pb);
> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n",
> obj_count);
> +    }
> +
> +    avio_seek(pb, end, SEEK_SET);

I feel we should complain loudly if there's bytes not accounted for, at
least when (stream_structure & 2) == 0

The rest I can't say much about

/Tomas
Jan Ekström Feb. 24, 2023, 1:49 p.m. UTC | #2
On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack@foxmail.com> wrote:
>
> From: Zhao Zhili <zhilizhao@tencent.com>
>
> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>

Hah, I actually happened to recently start coding uncompressed audio
support in mp4 myself, but what this commit is handling is what
basically killed my version off since the channel layout box is
required.

If you're interested you can check my take over at
https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .

Will comment on some things.

> ---
>  libavformat/mov.c      |  79 +++++++++++-
>  libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
>  libavformat/mov_chan.h |  26 ++++
>  3 files changed, 369 insertions(+), 1 deletion(-)
>
> diff --git a/libavformat/mov.c b/libavformat/mov.c
> index b125343f84..1db869aa2e 100644
> --- a/libavformat/mov.c
> +++ b/libavformat/mov.c
> @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>      return 0;
>  }
>
> +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> +{
> +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
> +    int stream_structure;
> +    int ret = 0;
> +    AVStream *st;
> +
> +    if (c->fc->nb_streams < 1)
> +        return 0;
> +    st = c->fc->streams[c->fc->nb_streams-1];
> +
> +    /* skip version and flags */
> +    avio_skip(pb, 4);

We should really not do this any more. Various FullBoxes have multiple
versions or depend on the flags. See how I have added FullBox things
recently, although I would prefer us to have a generic macro/function
setup for this where you then get the version and flags as arguments
or whatever in the future.

For this specific box, there are now versions 0 and 1 defined since
circa 2018-2019 or so (visible at least in 14496-12 2022)

Since ISO/IEC has changed the rules for free specifications (against
the wishes of various spec authors) and all that jazz, this is how
it's defined in what I have on hand:

12.2.4  Channel layout

12.2.4.1  Definition

Box Types:  'chnl'
Container: Audio sample entry
Mandatory: No
Quantity: Zero or one

This box may appear in an audio sample entry to document the
assignment of channels in the audio
stream. It is recommended to use this box to convey the base channel
count for the DownMixInstructions
box and other DRC-related boxes specified in ISO/IEC 23003-4.
The channel layout can be all or part of a standard layout (from an
enumerated list), or a custom layout
(which also allows a track to contribute part of an overall layout).
A stream may contain channels, objects, neither, or both. A stream
that is neither channel nor object
structured can implicitly be rendered in a variety of ways.

12.2.4.2  Syntax

aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
   if (version==0) {
      unsigned int(8) stream_structure;
      if (stream_structure & channelStructured) {
         unsigned int(8) definedLayout;
          if (definedLayout==0) {
            for (i = 1 ; i <= layout_channel_count ; i++) {
               //  layout_channel_count comes from the sample entry
               unsigned int(8) speaker_position;
               if (speaker_position == 126) {   // explicit position
                  signed int (16) azimuth;
                  signed int (8)  elevation;
               }
            }
         } else {
            unsigned int(64)   omittedChannelsMap;
                  // a ‘1’ bit indicates ‘not in this track’
         }
      }
      if (stream_structure & objectStructured) {
         unsigned int(8) object_count;
      }
   } else {
      unsigned int(4) stream_structure;
      unsigned int(4) format_ordering;
      unsigned int(8) baseChannelCount;
      if (stream_structure & channelStructured) {
         unsigned int(8) definedLayout;
         if (definedLayout==0) {
            unsigned int(8) layout_channel_count;
            for (i = 1 ; i <= layout_channel_count ; i++) {
               unsigned int(8) speaker_position;
               if (speaker_position == 126) {   // explicit position
                  signed int (16) azimuth;
                  signed int (8)  elevation;
               }
            }
         } else {
            int(4) reserved = 0;
            unsigned int(3) channel_order_definition;
            unsigned int(1) omitted_channels_present;
            if (omitted_channels_present == 1) {
               unsigned int(64)   omittedChannelsMap;
                     // a ‘1’ bit indicates ‘not in this track’
            }
         }
      }
      if (stream_structure & objectStructured) {
                     // object_count is derived from baseChannelCount
      }
   }
}

12.2.4.3  Semantics

version is an integer that specifies the version of this box (0 or 1).
When authoring, version 1 should be
        preferred over version 0. Version 1 conveys the channel
ordering, which is not always the case for
        version 0. Version 1 should be used to convey the base channel
count for DRC.

stream_structure is a field of flags that define whether the stream
has channel or object structure (or
                 both, or neither); the following flags are defined,
all other values are reserved:
   1  the stream carries channels
   2  the stream carries objects

format_ordering indicates the order of formats in the stream starting
from the lowest channel index
                (see Table). Each format shall only use contiguous
channel indices.
   format_ordering Order
   0               unknown
   1               Channels, possibly followed by Objects
   2               Objects, possibly followed by Channels
   Remaining values are reserved

definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.

speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
an explicit position is used,
                 then the azimuth and elevation are as defined as for
speakers in ISO/IEC 23091-3. The channel
                 order corresponds to the order of speaker positions.

azimuth is a signed value in degrees, as defined for
LoudspeakerAzimuth in ISO/IEC 23091-3.

elevation is a signed value, in degrees, as defined for
LoudspeakerElevation in ISO/IEC 23091-3.

channel_order_definition indicates where the ordering of the audio
channels for the definedLayout
                         are specified (see Table).

   channel_order_definition Channel order specification
   0                        as listed for the ChannelConfigurations in
ISO/IEC 23091-3
   1                        Default order of audio codec specification
   2                        Channel ordering #2 of audio codec specification
   3                        Channel ordering #3 of audio codec specification
   4                        Channel ordering #4 of audio codec specification
   Remaining values are reserved

omitted_channels_present is a flag that indicates if it is set to 1
that the omittedChannelsMap is present.

omittedChannelsMap is a bit-map of omitted channels; the bits in the
channel map are numbered from
                   least-significant to most-significant, and
correspond in that ordering with the order of the channels
                   for  the  configuration  as  documented  in
ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
                   channel map mean that a channel is absent. A zero
value of the map therefore always means that
                   the given standard layout is fully present. The
default value is 0.

layout_channel_count is the count of channels for the channel layout.
The default value is 0 if stream_
                     structure indicates that no channel structure is
present. Otherwise, the value is the number of
                     channels of the defined layout, if present,
otherwise it is the value from the sample entry.
object_count is the count of channels that contain audio objects. The
default value is 0. For version
             1 and if the objectStructured flag is set, the value is
computed as baseChannelCount  minus the
             channel count of the channel structure.

baseChannelCount represents the combined channel count of the channel
layout and the object count.
                 The value must match the base channel count for DRC
(see ISO/IEC 23003-4).


> +
> +    stream_structure = avio_r8(pb);
> +
> +    // stream carries channels
> +    if (stream_structure & 1) {
> +        int layout = avio_r8(pb);
> +
> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
> +        if (!layout) {
> +            uint8_t positions[64] = {};
> +            int enable = 1;
> +
> +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
> +                int speaker_pos = avio_r8(pb);
> +
> +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
> +                if (speaker_pos == 126) { // explicit position
> +                    int16_t azimuth = avio_rb16(pb);
> +                    int8_t elevation = avio_r8(pb);
> +
> +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
> +                           azimuth, elevation);
> +                    // Don't support explicit position
> +                    enable = 0;
> +                } else if (i < FF_ARRAY_ELEMS(positions)) {
> +                    positions[i] = speaker_pos;
> +                } else {
> +                    // number of channel out of our supported range
> +                    enable = 0;
> +                }
> +            }
> +
> +            if (enable) {
> +                ret = ff_mov_get_layout_from_channel_positions(positions,
> +                        st->codecpar->ch_layout.nb_channels,
> +                        &st->codecpar->ch_layout);
> +                if (ret) {
> +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
> +                    ret = 0;
> +                }
> +            }
> +        } else {
> +            uint64_t omitted_channel_map = avio_rb64(pb);
> +
> +            if (omitted_channel_map) {
> +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
> +                                      omitted_channel_map);
> +                return AVERROR_PATCHWELCOME;
> +            }
> +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
> +        }
> +    }
> +
> +    // stream carries objects
> +    if (stream_structure & 2) {
> +        int obj_count = avio_r8(pb);
> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
> +    }
> +
> +    avio_seek(pb, end, SEEK_SET);
> +    return ret;
> +}
> +
>  static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>  {
>      AVStream *st;
> @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
>  { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
>  { MKTAG('w','f','e','x'), mov_read_wfex },
>  { MKTAG('c','m','o','v'), mov_read_cmov },
> -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
> +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
> +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
>  { MKTAG('d','v','c','1'), mov_read_dvc1 },
>  { MKTAG('s','g','p','d'), mov_read_sgpd },
>  { MKTAG('s','b','g','p'), mov_read_sbgp },
> diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
> index f66bf0df7f..10ebcdc08f 100644
> --- a/libavformat/mov_chan.c
> +++ b/libavformat/mov_chan.c
> @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
>
>      return 0;
>  }
> +
> +/* ISO/IEC 23001-8, 8.2 */
> +static const AVChannelLayout iso_channel_configuration[] = {
> +    // 0: any setup
> +    {},
> +

I think the better naming for this would be CICP channel configuration
since the specification is called "common independent coding points"
(for video this is shared with ITU-T H.273 which is free).

Also do note that a whole bunch of these are not in the channel order
that FFmpeg wants after stereo :<

Thankfully with manual mapping FFmpeg native channel layouts' channel
order should be writable and readable.

The channel orders for various CICP layouts can be found both in the
referenced specifications, as well as in the comments from Apple's
headers for example

// ISO/IEC 23091-3, channels w/orderings
kAudioChannelLayoutTag_CICP_1                   =
kAudioChannelLayoutTag_MPEG_1_0,      ///< C
kAudioChannelLayoutTag_CICP_2                   =
kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
kAudioChannelLayoutTag_CICP_3                   =
kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
kAudioChannelLayoutTag_CICP_4                   =
kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
kAudioChannelLayoutTag_CICP_5                   =
kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
kAudioChannelLayoutTag_CICP_6                   =
kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
kAudioChannelLayoutTag_CICP_7                   =
kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc

kAudioChannelLayoutTag_CICP_9                   =
kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
kAudioChannelLayoutTag_CICP_10                  =
kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
kAudioChannelLayoutTag_CICP_11                  =
kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
kAudioChannelLayoutTag_CICP_12                  =
kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
                   ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb

kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
               ///< L R C LFE Ls Rs Vhl Vhr
kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
                   ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr

kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
                   ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
                   ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
                   ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts

kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
Reos

Best regards,
Jan
Zhao Zhili Feb. 24, 2023, 6:37 p.m. UTC | #3
On Fri, 2023-02-24 at 10:42 +0100, Tomas Härdin wrote:
> fre 2023-02-24 klockan 20:25 +0800 skrev Zhao Zhili:
> > +        if (!layout) {
> > +            uint8_t positions[64] = {};
> 
> Is there a maximum number of channels defined somewhere? stsd supports
> up to 65535.

AV_CHANNEL_ORDER_NATIVE supports up to 63 different channels.
Patchset v2 adds AVChannelCustom support.

> 
> > +    // stream carries objects
> > +    if (stream_structure & 2) {
> > +        int obj_count = avio_r8(pb);
> > +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n",
> > obj_count);
> > +    }
> > +
> > +    avio_seek(pb, end, SEEK_SET);
> 
> I feel we should complain loudly if there's bytes not accounted for, at
> least when (stream_structure & 2) == 0

Patchset v2 adds log message when skipping unknown bytes.
After a second thought, I made a mistake that unknown bytes
comes after ChannelLayout belonging to AudioSampleEntryV1,
not inside ChannelLayout. Will drop the check and seek in v3.

class AudioSampleEntryV1(codingname) extends SampleEntry (codingname){
...

ChannelLayout();
// we permit any number of DownMix or DRC boxes:
DownMixInstructions() [];
DRCCoefficientsBasic() [];
...

> 
> The rest I can't say much about
> 
> /Tomas
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Zhao Zhili Feb. 25, 2023, 4:31 a.m. UTC | #4
On Fri, 2023-02-24 at 15:49 +0200, Jan Ekström wrote:
> On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack@foxmail.com> wrote:
> > 
> > From: Zhao Zhili <zhilizhao@tencent.com>
> > 
> > Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
> 
> Hah, I actually happened to recently start coding uncompressed audio
> support in mp4 myself, but what this commit is handling is what
> basically killed my version off since the channel layout box is
> required.
> 
> If you're interested you can check my take over at
> https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .

Sorry I didn't notice your work on this issue. I have cherry-picked
the first two patches from your branch in v2. Is it OK for you?

It's tediousFor the channel layout supports. Some of the layouts aren't
supported yet, and some of the details are unclear. Please help review
and improve this part.
 
> 
> Will comment on some things.
> 
> > ---
> >  libavformat/mov.c      |  79 +++++++++++-
> >  libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
> >  libavformat/mov_chan.h |  26 ++++
> >  3 files changed, 369 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > index b125343f84..1db869aa2e 100644
> > --- a/libavformat/mov.c
> > +++ b/libavformat/mov.c
> > @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> >      return 0;
> >  }
> > 
> > +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> > +{
> > +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
> > +    int stream_structure;
> > +    int ret = 0;
> > +    AVStream *st;
> > +
> > +    if (c->fc->nb_streams < 1)
> > +        return 0;
> > +    st = c->fc->streams[c->fc->nb_streams-1];
> > +
> > +    /* skip version and flags */
> > +    avio_skip(pb, 4);
> 
> We should really not do this any more. Various FullBoxes have multiple
> versions or depend on the flags. See how I have added FullBox things
> recently, although I would prefer us to have a generic macro/function
> setup for this where you then get the version and flags as arguments
> or whatever in the future.

I have added version and flags check, and only supports version 0 with
patch v2. Welcome to add version 1 supports :)

I agree with the idea to cleanup the handling of version and flags for
future proof.

> 
> For this specific box, there are now versions 0 and 1 defined since
> circa 2018-2019 or so (visible at least in 14496-12 2022)
> 
> Since ISO/IEC has changed the rules for free specifications (against
> the wishes of various spec authors) and all that jazz, this is how
> it's defined in what I have on hand:
> 
> 12.2.4  Channel layout
> 
> 12.2.4.1  Definition
> 
> Box Types:  'chnl'
> Container: Audio sample entry
> Mandatory: No
> Quantity: Zero or one
> 
> This box may appear in an audio sample entry to document the
> assignment of channels in the audio
> stream. It is recommended to use this box to convey the base channel
> count for the DownMixInstructions
> box and other DRC-related boxes specified in ISO/IEC 23003-4.
> The channel layout can be all or part of a standard layout (from an
> enumerated list), or a custom layout
> (which also allows a track to contribute part of an overall layout).
> A stream may contain channels, objects, neither, or both. A stream
> that is neither channel nor object
> structured can implicitly be rendered in a variety of ways.
> 
> 12.2.4.2  Syntax
> 
> aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
>    if (version==0) {
>       unsigned int(8) stream_structure;
>       if (stream_structure & channelStructured) {
>          unsigned int(8) definedLayout;
>           if (definedLayout==0) {
>             for (i = 1 ; i <= layout_channel_count ; i++) {
>                //  layout_channel_count comes from the sample entry
>                unsigned int(8) speaker_position;
>                if (speaker_position == 126) {   // explicit position
>                   signed int (16) azimuth;
>                   signed int (8)  elevation;
>                }
>             }
>          } else {
>             unsigned int(64)   omittedChannelsMap;
>                   // a ‘1’ bit indicates ‘not in this track’
>          }
>       }
>       if (stream_structure & objectStructured) {
>          unsigned int(8) object_count;
>       }
>    } else {
>       unsigned int(4) stream_structure;
>       unsigned int(4) format_ordering;
>       unsigned int(8) baseChannelCount;
>       if (stream_structure & channelStructured) {
>          unsigned int(8) definedLayout;
>          if (definedLayout==0) {
>             unsigned int(8) layout_channel_count;
>             for (i = 1 ; i <= layout_channel_count ; i++) {
>                unsigned int(8) speaker_position;
>                if (speaker_position == 126) {   // explicit position
>                   signed int (16) azimuth;
>                   signed int (8)  elevation;
>                }
>             }
>          } else {
>             int(4) reserved = 0;
>             unsigned int(3) channel_order_definition;
>             unsigned int(1) omitted_channels_present;
>             if (omitted_channels_present == 1) {
>                unsigned int(64)   omittedChannelsMap;
>                      // a ‘1’ bit indicates ‘not in this track’
>             }
>          }
>       }
>       if (stream_structure & objectStructured) {
>                      // object_count is derived from baseChannelCount
>       }
>    }
> }
> 
> 12.2.4.3  Semantics
> 
> version is an integer that specifies the version of this box (0 or 1).
> When authoring, version 1 should be
>         preferred over version 0. Version 1 conveys the channel
> ordering, which is not always the case for
>         version 0. Version 1 should be used to convey the base channel
> count for DRC.
> 
> stream_structure is a field of flags that define whether the stream
> has channel or object structure (or
>                  both, or neither); the following flags are defined,
> all other values are reserved:
>    1  the stream carries channels
>    2  the stream carries objects
> 
> format_ordering indicates the order of formats in the stream starting
> from the lowest channel index
>                 (see Table). Each format shall only use contiguous
> channel indices.
>    format_ordering Order
>    0               unknown
>    1               Channels, possibly followed by Objects
>    2               Objects, possibly followed by Channels
>    Remaining values are reserved
> 
> definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.
> 
> speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
> an explicit position is used,
>                  then the azimuth and elevation are as defined as for
> speakers in ISO/IEC 23091-3. The channel
>                  order corresponds to the order of speaker positions.
> 
> azimuth is a signed value in degrees, as defined for
> LoudspeakerAzimuth in ISO/IEC 23091-3.
> 
> elevation is a signed value, in degrees, as defined for
> LoudspeakerElevation in ISO/IEC 23091-3.
> 
> channel_order_definition indicates where the ordering of the audio
> channels for the definedLayout
>                          are specified (see Table).
> 
>    channel_order_definition Channel order specification
>    0                        as listed for the ChannelConfigurations in
> ISO/IEC 23091-3
>    1                        Default order of audio codec specification
>    2                        Channel ordering #2 of audio codec specification
>    3                        Channel ordering #3 of audio codec specification
>    4                        Channel ordering #4 of audio codec specification
>    Remaining values are reserved
> 
> omitted_channels_present is a flag that indicates if it is set to 1
> that the omittedChannelsMap is present.
> 
> omittedChannelsMap is a bit-map of omitted channels; the bits in the
> channel map are numbered from
>                    least-significant to most-significant, and
> correspond in that ordering with the order of the channels
>                    for  the  configuration  as  documented  in
> ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
>                    channel map mean that a channel is absent. A zero
> value of the map therefore always means that
>                    the given standard layout is fully present. The
> default value is 0.
> 
> layout_channel_count is the count of channels for the channel layout.
> The default value is 0 if stream_
>                      structure indicates that no channel structure is
> present. Otherwise, the value is the number of
>                      channels of the defined layout, if present,
> otherwise it is the value from the sample entry.
> object_count is the count of channels that contain audio objects. The
> default value is 0. For version
>              1 and if the objectStructured flag is set, the value is
> computed as baseChannelCount  minus the
>              channel count of the channel structure.
> 
> baseChannelCount represents the combined channel count of the channel
> layout and the object count.
>                  The value must match the base channel count for DRC
> (see ISO/IEC 23003-4).
> 
> 
> > +
> > +    stream_structure = avio_r8(pb);
> > +
> > +    // stream carries channels
> > +    if (stream_structure & 1) {
> > +        int layout = avio_r8(pb);
> > +
> > +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
> > +        if (!layout) {
> > +            uint8_t positions[64] = {};
> > +            int enable = 1;
> > +
> > +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
> > +                int speaker_pos = avio_r8(pb);
> > +
> > +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
> > +                if (speaker_pos == 126) { // explicit position
> > +                    int16_t azimuth = avio_rb16(pb);
> > +                    int8_t elevation = avio_r8(pb);
> > +
> > +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
> > +                           azimuth, elevation);
> > +                    // Don't support explicit position
> > +                    enable = 0;
> > +                } else if (i < FF_ARRAY_ELEMS(positions)) {
> > +                    positions[i] = speaker_pos;
> > +                } else {
> > +                    // number of channel out of our supported range
> > +                    enable = 0;
> > +                }
> > +            }
> > +
> > +            if (enable) {
> > +                ret = ff_mov_get_layout_from_channel_positions(positions,
> > +                        st->codecpar->ch_layout.nb_channels,
> > +                        &st->codecpar->ch_layout);
> > +                if (ret) {
> > +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
> > +                    ret = 0;
> > +                }
> > +            }
> > +        } else {
> > +            uint64_t omitted_channel_map = avio_rb64(pb);
> > +
> > +            if (omitted_channel_map) {
> > +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
> > +                                      omitted_channel_map);
> > +                return AVERROR_PATCHWELCOME;
> > +            }
> > +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
> > +        }
> > +    }
> > +
> > +    // stream carries objects
> > +    if (stream_structure & 2) {
> > +        int obj_count = avio_r8(pb);
> > +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
> > +    }
> > +
> > +    avio_seek(pb, end, SEEK_SET);
> > +    return ret;
> > +}
> > +
> >  static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> >  {
> >      AVStream *st;
> > @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
> >  { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
> >  { MKTAG('w','f','e','x'), mov_read_wfex },
> >  { MKTAG('c','m','o','v'), mov_read_cmov },
> > -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
> > +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
> > +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
> >  { MKTAG('d','v','c','1'), mov_read_dvc1 },
> >  { MKTAG('s','g','p','d'), mov_read_sgpd },
> >  { MKTAG('s','b','g','p'), mov_read_sbgp },
> > diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
> > index f66bf0df7f..10ebcdc08f 100644
> > --- a/libavformat/mov_chan.c
> > +++ b/libavformat/mov_chan.c
> > @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
> > 
> >      return 0;
> >  }
> > +
> > +/* ISO/IEC 23001-8, 8.2 */
> > +static const AVChannelLayout iso_channel_configuration[] = {
> > +    // 0: any setup
> > +    {},
> > +
> 
> I think the better naming for this would be CICP channel configuration
> since the specification is called "common independent coding points"
> (for video this is shared with ITU-T H.273 which is free).
> 
> Also do note that a whole bunch of these are not in the channel order
> that FFmpeg wants after stereo :<
> 
> Thankfully with manual mapping FFmpeg native channel layouts' channel
> order should be writable and readable.
> 
> The channel orders for various CICP layouts can be found both in the
> referenced specifications, as well as in the comments from Apple's
> headers for example
> 
> // ISO/IEC 23091-3, channels w/orderings
> kAudioChannelLayoutTag_CICP_1                   =
> kAudioChannelLayoutTag_MPEG_1_0,      ///< C
> kAudioChannelLayoutTag_CICP_2                   =
> kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
> kAudioChannelLayoutTag_CICP_3                   =
> kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
> kAudioChannelLayoutTag_CICP_4                   =
> kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
> kAudioChannelLayoutTag_CICP_5                   =
> kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
> kAudioChannelLayoutTag_CICP_6                   =
> kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
> kAudioChannelLayoutTag_CICP_7                   =
> kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc
> 
> kAudioChannelLayoutTag_CICP_9                   =
> kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
> kAudioChannelLayoutTag_CICP_10                  =
> kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
> kAudioChannelLayoutTag_CICP_11                  =
> kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
> kAudioChannelLayoutTag_CICP_12                  =
> kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
> kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
>                    ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
> Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb
> 
> kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
>                ///< L R C LFE Ls Rs Vhl Vhr
> kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
>                    ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr
> 
> kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
>                    ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
> kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
>                    ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
> kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
>                    ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts
> 
> kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
>                    ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
> kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
>                    ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
> Reos
> 
> Best regards,
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Zhao Zhili Oct. 31, 2023, 3:15 a.m. UTC | #5
> On Feb 24, 2023, at 21:49, Jan Ekström <jeebjp@gmail.com> wrote:
> 
> On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack@foxmail.com <mailto:quinkblack@foxmail.com>> wrote:
>> 
>> From: Zhao Zhili <zhilizhao@tencent.com <mailto:zhilizhao@tencent.com>>
>> 
>> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com <mailto:zhilizhao@tencent.com>>
> 
> Hah, I actually happened to recently start coding uncompressed audio
> support in mp4 myself, but what this commit is handling is what
> basically killed my version off since the channel layout box is
> required.
> 
> If you're interested you can check my take over at
> https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .
> 
> Will comment on some things.

I only have an old copy of the spec, and I may have missed some comments
and made some mistakes. Please notify me in mailing list or personal email
(this one) if I didn’t something wrong.

I have network issue with IRC, can only read the archives if I get the time.
I don’t work on open source for daily jobs.

> 
>> ---
>> libavformat/mov.c      |  79 +++++++++++-
>> libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
>> libavformat/mov_chan.h |  26 ++++
>> 3 files changed, 369 insertions(+), 1 deletion(-)
>> 
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index b125343f84..1db869aa2e 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>>     return 0;
>> }
>> 
>> +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> +{
>> +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
>> +    int stream_structure;
>> +    int ret = 0;
>> +    AVStream *st;
>> +
>> +    if (c->fc->nb_streams < 1)
>> +        return 0;
>> +    st = c->fc->streams[c->fc->nb_streams-1];
>> +
>> +    /* skip version and flags */
>> +    avio_skip(pb, 4);
> 
> We should really not do this any more. Various FullBoxes have multiple
> versions or depend on the flags. See how I have added FullBox things
> recently, although I would prefer us to have a generic macro/function
> setup for this where you then get the version and flags as arguments
> or whatever in the future.
> 
> For this specific box, there are now versions 0 and 1 defined since
> circa 2018-2019 or so (visible at least in 14496-12 2022)
> 
> Since ISO/IEC has changed the rules for free specifications (against
> the wishes of various spec authors) and all that jazz, this is how
> it's defined in what I have on hand:
> 
> 12.2.4  Channel layout
> 
> 12.2.4.1  Definition
> 
> Box Types:  'chnl'
> Container: Audio sample entry
> Mandatory: No
> Quantity: Zero or one
> 
> This box may appear in an audio sample entry to document the
> assignment of channels in the audio
> stream. It is recommended to use this box to convey the base channel
> count for the DownMixInstructions
> box and other DRC-related boxes specified in ISO/IEC 23003-4.
> The channel layout can be all or part of a standard layout (from an
> enumerated list), or a custom layout
> (which also allows a track to contribute part of an overall layout).
> A stream may contain channels, objects, neither, or both. A stream
> that is neither channel nor object
> structured can implicitly be rendered in a variety of ways.
> 
> 12.2.4.2  Syntax
> 
> aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
>   if (version==0) {
>      unsigned int(8) stream_structure;
>      if (stream_structure & channelStructured) {
>         unsigned int(8) definedLayout;
>          if (definedLayout==0) {
>            for (i = 1 ; i <= layout_channel_count ; i++) {
>               //  layout_channel_count comes from the sample entry
>               unsigned int(8) speaker_position;
>               if (speaker_position == 126) {   // explicit position
>                  signed int (16) azimuth;
>                  signed int (8)  elevation;
>               }
>            }
>         } else {
>            unsigned int(64)   omittedChannelsMap;
>                  // a ‘1’ bit indicates ‘not in this track’
>         }
>      }
>      if (stream_structure & objectStructured) {
>         unsigned int(8) object_count;
>      }
>   } else {
>      unsigned int(4) stream_structure;
>      unsigned int(4) format_ordering;
>      unsigned int(8) baseChannelCount;
>      if (stream_structure & channelStructured) {
>         unsigned int(8) definedLayout;
>         if (definedLayout==0) {
>            unsigned int(8) layout_channel_count;
>            for (i = 1 ; i <= layout_channel_count ; i++) {
>               unsigned int(8) speaker_position;
>               if (speaker_position == 126) {   // explicit position
>                  signed int (16) azimuth;
>                  signed int (8)  elevation;
>               }
>            }
>         } else {
>            int(4) reserved = 0;
>            unsigned int(3) channel_order_definition;
>            unsigned int(1) omitted_channels_present;
>            if (omitted_channels_present == 1) {
>               unsigned int(64)   omittedChannelsMap;
>                     // a ‘1’ bit indicates ‘not in this track’
>            }
>         }
>      }
>      if (stream_structure & objectStructured) {
>                     // object_count is derived from baseChannelCount
>      }
>   }
> }
> 
> 12.2.4.3  Semantics
> 
> version is an integer that specifies the version of this box (0 or 1).
> When authoring, version 1 should be
>        preferred over version 0. Version 1 conveys the channel
> ordering, which is not always the case for
>        version 0. Version 1 should be used to convey the base channel
> count for DRC.
> 
> stream_structure is a field of flags that define whether the stream
> has channel or object structure (or
>                 both, or neither); the following flags are defined,
> all other values are reserved:
>   1  the stream carries channels
>   2  the stream carries objects
> 
> format_ordering indicates the order of formats in the stream starting
> from the lowest channel index
>                (see Table). Each format shall only use contiguous
> channel indices.
>   format_ordering Order
>   0               unknown
>   1               Channels, possibly followed by Objects
>   2               Objects, possibly followed by Channels
>   Remaining values are reserved
> 
> definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.
> 
> speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
> an explicit position is used,
>                 then the azimuth and elevation are as defined as for
> speakers in ISO/IEC 23091-3. The channel
>                 order corresponds to the order of speaker positions.
> 
> azimuth is a signed value in degrees, as defined for
> LoudspeakerAzimuth in ISO/IEC 23091-3.
> 
> elevation is a signed value, in degrees, as defined for
> LoudspeakerElevation in ISO/IEC 23091-3.
> 
> channel_order_definition indicates where the ordering of the audio
> channels for the definedLayout
>                         are specified (see Table).
> 
>   channel_order_definition Channel order specification
>   0                        as listed for the ChannelConfigurations in
> ISO/IEC 23091-3
>   1                        Default order of audio codec specification
>   2                        Channel ordering #2 of audio codec specification
>   3                        Channel ordering #3 of audio codec specification
>   4                        Channel ordering #4 of audio codec specification
>   Remaining values are reserved
> 
> omitted_channels_present is a flag that indicates if it is set to 1
> that the omittedChannelsMap is present.
> 
> omittedChannelsMap is a bit-map of omitted channels; the bits in the
> channel map are numbered from
>                   least-significant to most-significant, and
> correspond in that ordering with the order of the channels
>                   for  the  configuration  as  documented  in
> ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
>                   channel map mean that a channel is absent. A zero
> value of the map therefore always means that
>                   the given standard layout is fully present. The
> default value is 0.
> 
> layout_channel_count is the count of channels for the channel layout.
> The default value is 0 if stream_
>                     structure indicates that no channel structure is
> present. Otherwise, the value is the number of
>                     channels of the defined layout, if present,
> otherwise it is the value from the sample entry.
> object_count is the count of channels that contain audio objects. The
> default value is 0. For version
>             1 and if the objectStructured flag is set, the value is
> computed as baseChannelCount  minus the
>             channel count of the channel structure.
> 
> baseChannelCount represents the combined channel count of the channel
> layout and the object count.
>                 The value must match the base channel count for DRC
> (see ISO/IEC 23003-4).
> 
> 
>> +
>> +    stream_structure = avio_r8(pb);
>> +
>> +    // stream carries channels
>> +    if (stream_structure & 1) {
>> +        int layout = avio_r8(pb);
>> +
>> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
>> +        if (!layout) {
>> +            uint8_t positions[64] = {};
>> +            int enable = 1;
>> +
>> +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
>> +                int speaker_pos = avio_r8(pb);
>> +
>> +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
>> +                if (speaker_pos == 126) { // explicit position
>> +                    int16_t azimuth = avio_rb16(pb);
>> +                    int8_t elevation = avio_r8(pb);
>> +
>> +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
>> +                           azimuth, elevation);
>> +                    // Don't support explicit position
>> +                    enable = 0;
>> +                } else if (i < FF_ARRAY_ELEMS(positions)) {
>> +                    positions[i] = speaker_pos;
>> +                } else {
>> +                    // number of channel out of our supported range
>> +                    enable = 0;
>> +                }
>> +            }
>> +
>> +            if (enable) {
>> +                ret = ff_mov_get_layout_from_channel_positions(positions,
>> +                        st->codecpar->ch_layout.nb_channels,
>> +                        &st->codecpar->ch_layout);
>> +                if (ret) {
>> +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
>> +                    ret = 0;
>> +                }
>> +            }
>> +        } else {
>> +            uint64_t omitted_channel_map = avio_rb64(pb);
>> +
>> +            if (omitted_channel_map) {
>> +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
>> +                                      omitted_channel_map);
>> +                return AVERROR_PATCHWELCOME;
>> +            }
>> +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
>> +        }
>> +    }
>> +
>> +    // stream carries objects
>> +    if (stream_structure & 2) {
>> +        int obj_count = avio_r8(pb);
>> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
>> +    }
>> +
>> +    avio_seek(pb, end, SEEK_SET);
>> +    return ret;
>> +}
>> +
>> static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> {
>>     AVStream *st;
>> @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
>> { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
>> { MKTAG('w','f','e','x'), mov_read_wfex },
>> { MKTAG('c','m','o','v'), mov_read_cmov },
>> -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
>> +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
>> +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
>> { MKTAG('d','v','c','1'), mov_read_dvc1 },
>> { MKTAG('s','g','p','d'), mov_read_sgpd },
>> { MKTAG('s','b','g','p'), mov_read_sbgp },
>> diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
>> index f66bf0df7f..10ebcdc08f 100644
>> --- a/libavformat/mov_chan.c
>> +++ b/libavformat/mov_chan.c
>> @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
>> 
>>     return 0;
>> }
>> +
>> +/* ISO/IEC 23001-8, 8.2 */
>> +static const AVChannelLayout iso_channel_configuration[] = {
>> +    // 0: any setup
>> +    {},
>> +
> 
> I think the better naming for this would be CICP channel configuration
> since the specification is called "common independent coding points"
> (for video this is shared with ITU-T H.273 which is free).
> 
> Also do note that a whole bunch of these are not in the channel order
> that FFmpeg wants after stereo :<
> 
> Thankfully with manual mapping FFmpeg native channel layouts' channel
> order should be writable and readable.
> 
> The channel orders for various CICP layouts can be found both in the
> referenced specifications, as well as in the comments from Apple's
> headers for example
> 
> // ISO/IEC 23091-3, channels w/orderings
> kAudioChannelLayoutTag_CICP_1                   =
> kAudioChannelLayoutTag_MPEG_1_0,      ///< C
> kAudioChannelLayoutTag_CICP_2                   =
> kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
> kAudioChannelLayoutTag_CICP_3                   =
> kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
> kAudioChannelLayoutTag_CICP_4                   =
> kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
> kAudioChannelLayoutTag_CICP_5                   =
> kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
> kAudioChannelLayoutTag_CICP_6                   =
> kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
> kAudioChannelLayoutTag_CICP_7                   =
> kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc
> 
> kAudioChannelLayoutTag_CICP_9                   =
> kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
> kAudioChannelLayoutTag_CICP_10                  =
> kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
> kAudioChannelLayoutTag_CICP_11                  =
> kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
> kAudioChannelLayoutTag_CICP_12                  =
> kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
> kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
>                   ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
> Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb
> 
> kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
>               ///< L R C LFE Ls Rs Vhl Vhr
> kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
>                   ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr
> 
> kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
>                   ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
> kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
>                   ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
> kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
>                   ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts
> 
> kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
>                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
> kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
>                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
> Reos
> 
> Best regards,
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org <mailto:ffmpeg-devel@ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org <mailto:ffmpeg-devel-request@ffmpeg.org> with subject "unsubscribe".
Zhao Zhili Oct. 31, 2023, 3:15 a.m. UTC | #6
> On Feb 24, 2023, at 21:49, Jan Ekström <jeebjp@gmail.com> wrote:
> 
> On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack@foxmail.com <mailto:quinkblack@foxmail.com>> wrote:
>> 
>> From: Zhao Zhili <zhilizhao@tencent.com <mailto:zhilizhao@tencent.com>>
>> 
>> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com <mailto:zhilizhao@tencent.com>>
> 
> Hah, I actually happened to recently start coding uncompressed audio
> support in mp4 myself, but what this commit is handling is what
> basically killed my version off since the channel layout box is
> required.
> 
> If you're interested you can check my take over at
> https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .
> 
> Will comment on some things.

I only have an old copy of the spec, and I may have missed some comments
and made some mistakes. Please notify me in mailing list or personal email
(this one) if I didn’t something wrong.

I have network issue with IRC, can only read the archives if I get the time.
I don’t work on open source for daily jobs.

> 
>> ---
>> libavformat/mov.c      |  79 +++++++++++-
>> libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
>> libavformat/mov_chan.h |  26 ++++
>> 3 files changed, 369 insertions(+), 1 deletion(-)
>> 
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index b125343f84..1db869aa2e 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>>     return 0;
>> }
>> 
>> +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> +{
>> +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
>> +    int stream_structure;
>> +    int ret = 0;
>> +    AVStream *st;
>> +
>> +    if (c->fc->nb_streams < 1)
>> +        return 0;
>> +    st = c->fc->streams[c->fc->nb_streams-1];
>> +
>> +    /* skip version and flags */
>> +    avio_skip(pb, 4);
> 
> We should really not do this any more. Various FullBoxes have multiple
> versions or depend on the flags. See how I have added FullBox things
> recently, although I would prefer us to have a generic macro/function
> setup for this where you then get the version and flags as arguments
> or whatever in the future.
> 
> For this specific box, there are now versions 0 and 1 defined since
> circa 2018-2019 or so (visible at least in 14496-12 2022)
> 
> Since ISO/IEC has changed the rules for free specifications (against
> the wishes of various spec authors) and all that jazz, this is how
> it's defined in what I have on hand:
> 
> 12.2.4  Channel layout
> 
> 12.2.4.1  Definition
> 
> Box Types:  'chnl'
> Container: Audio sample entry
> Mandatory: No
> Quantity: Zero or one
> 
> This box may appear in an audio sample entry to document the
> assignment of channels in the audio
> stream. It is recommended to use this box to convey the base channel
> count for the DownMixInstructions
> box and other DRC-related boxes specified in ISO/IEC 23003-4.
> The channel layout can be all or part of a standard layout (from an
> enumerated list), or a custom layout
> (which also allows a track to contribute part of an overall layout).
> A stream may contain channels, objects, neither, or both. A stream
> that is neither channel nor object
> structured can implicitly be rendered in a variety of ways.
> 
> 12.2.4.2  Syntax
> 
> aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
>   if (version==0) {
>      unsigned int(8) stream_structure;
>      if (stream_structure & channelStructured) {
>         unsigned int(8) definedLayout;
>          if (definedLayout==0) {
>            for (i = 1 ; i <= layout_channel_count ; i++) {
>               //  layout_channel_count comes from the sample entry
>               unsigned int(8) speaker_position;
>               if (speaker_position == 126) {   // explicit position
>                  signed int (16) azimuth;
>                  signed int (8)  elevation;
>               }
>            }
>         } else {
>            unsigned int(64)   omittedChannelsMap;
>                  // a ‘1’ bit indicates ‘not in this track’
>         }
>      }
>      if (stream_structure & objectStructured) {
>         unsigned int(8) object_count;
>      }
>   } else {
>      unsigned int(4) stream_structure;
>      unsigned int(4) format_ordering;
>      unsigned int(8) baseChannelCount;
>      if (stream_structure & channelStructured) {
>         unsigned int(8) definedLayout;
>         if (definedLayout==0) {
>            unsigned int(8) layout_channel_count;
>            for (i = 1 ; i <= layout_channel_count ; i++) {
>               unsigned int(8) speaker_position;
>               if (speaker_position == 126) {   // explicit position
>                  signed int (16) azimuth;
>                  signed int (8)  elevation;
>               }
>            }
>         } else {
>            int(4) reserved = 0;
>            unsigned int(3) channel_order_definition;
>            unsigned int(1) omitted_channels_present;
>            if (omitted_channels_present == 1) {
>               unsigned int(64)   omittedChannelsMap;
>                     // a ‘1’ bit indicates ‘not in this track’
>            }
>         }
>      }
>      if (stream_structure & objectStructured) {
>                     // object_count is derived from baseChannelCount
>      }
>   }
> }
> 
> 12.2.4.3  Semantics
> 
> version is an integer that specifies the version of this box (0 or 1).
> When authoring, version 1 should be
>        preferred over version 0. Version 1 conveys the channel
> ordering, which is not always the case for
>        version 0. Version 1 should be used to convey the base channel
> count for DRC.
> 
> stream_structure is a field of flags that define whether the stream
> has channel or object structure (or
>                 both, or neither); the following flags are defined,
> all other values are reserved:
>   1  the stream carries channels
>   2  the stream carries objects
> 
> format_ordering indicates the order of formats in the stream starting
> from the lowest channel index
>                (see Table). Each format shall only use contiguous
> channel indices.
>   format_ordering Order
>   0               unknown
>   1               Channels, possibly followed by Objects
>   2               Objects, possibly followed by Channels
>   Remaining values are reserved
> 
> definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.
> 
> speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
> an explicit position is used,
>                 then the azimuth and elevation are as defined as for
> speakers in ISO/IEC 23091-3. The channel
>                 order corresponds to the order of speaker positions.
> 
> azimuth is a signed value in degrees, as defined for
> LoudspeakerAzimuth in ISO/IEC 23091-3.
> 
> elevation is a signed value, in degrees, as defined for
> LoudspeakerElevation in ISO/IEC 23091-3.
> 
> channel_order_definition indicates where the ordering of the audio
> channels for the definedLayout
>                         are specified (see Table).
> 
>   channel_order_definition Channel order specification
>   0                        as listed for the ChannelConfigurations in
> ISO/IEC 23091-3
>   1                        Default order of audio codec specification
>   2                        Channel ordering #2 of audio codec specification
>   3                        Channel ordering #3 of audio codec specification
>   4                        Channel ordering #4 of audio codec specification
>   Remaining values are reserved
> 
> omitted_channels_present is a flag that indicates if it is set to 1
> that the omittedChannelsMap is present.
> 
> omittedChannelsMap is a bit-map of omitted channels; the bits in the
> channel map are numbered from
>                   least-significant to most-significant, and
> correspond in that ordering with the order of the channels
>                   for  the  configuration  as  documented  in
> ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
>                   channel map mean that a channel is absent. A zero
> value of the map therefore always means that
>                   the given standard layout is fully present. The
> default value is 0.
> 
> layout_channel_count is the count of channels for the channel layout.
> The default value is 0 if stream_
>                     structure indicates that no channel structure is
> present. Otherwise, the value is the number of
>                     channels of the defined layout, if present,
> otherwise it is the value from the sample entry.
> object_count is the count of channels that contain audio objects. The
> default value is 0. For version
>             1 and if the objectStructured flag is set, the value is
> computed as baseChannelCount  minus the
>             channel count of the channel structure.
> 
> baseChannelCount represents the combined channel count of the channel
> layout and the object count.
>                 The value must match the base channel count for DRC
> (see ISO/IEC 23003-4).
> 
> 
>> +
>> +    stream_structure = avio_r8(pb);
>> +
>> +    // stream carries channels
>> +    if (stream_structure & 1) {
>> +        int layout = avio_r8(pb);
>> +
>> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
>> +        if (!layout) {
>> +            uint8_t positions[64] = {};
>> +            int enable = 1;
>> +
>> +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
>> +                int speaker_pos = avio_r8(pb);
>> +
>> +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
>> +                if (speaker_pos == 126) { // explicit position
>> +                    int16_t azimuth = avio_rb16(pb);
>> +                    int8_t elevation = avio_r8(pb);
>> +
>> +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
>> +                           azimuth, elevation);
>> +                    // Don't support explicit position
>> +                    enable = 0;
>> +                } else if (i < FF_ARRAY_ELEMS(positions)) {
>> +                    positions[i] = speaker_pos;
>> +                } else {
>> +                    // number of channel out of our supported range
>> +                    enable = 0;
>> +                }
>> +            }
>> +
>> +            if (enable) {
>> +                ret = ff_mov_get_layout_from_channel_positions(positions,
>> +                        st->codecpar->ch_layout.nb_channels,
>> +                        &st->codecpar->ch_layout);
>> +                if (ret) {
>> +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
>> +                    ret = 0;
>> +                }
>> +            }
>> +        } else {
>> +            uint64_t omitted_channel_map = avio_rb64(pb);
>> +
>> +            if (omitted_channel_map) {
>> +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
>> +                                      omitted_channel_map);
>> +                return AVERROR_PATCHWELCOME;
>> +            }
>> +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
>> +        }
>> +    }
>> +
>> +    // stream carries objects
>> +    if (stream_structure & 2) {
>> +        int obj_count = avio_r8(pb);
>> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
>> +    }
>> +
>> +    avio_seek(pb, end, SEEK_SET);
>> +    return ret;
>> +}
>> +
>> static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> {
>>     AVStream *st;
>> @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
>> { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
>> { MKTAG('w','f','e','x'), mov_read_wfex },
>> { MKTAG('c','m','o','v'), mov_read_cmov },
>> -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
>> +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
>> +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
>> { MKTAG('d','v','c','1'), mov_read_dvc1 },
>> { MKTAG('s','g','p','d'), mov_read_sgpd },
>> { MKTAG('s','b','g','p'), mov_read_sbgp },
>> diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
>> index f66bf0df7f..10ebcdc08f 100644
>> --- a/libavformat/mov_chan.c
>> +++ b/libavformat/mov_chan.c
>> @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
>> 
>>     return 0;
>> }
>> +
>> +/* ISO/IEC 23001-8, 8.2 */
>> +static const AVChannelLayout iso_channel_configuration[] = {
>> +    // 0: any setup
>> +    {},
>> +
> 
> I think the better naming for this would be CICP channel configuration
> since the specification is called "common independent coding points"
> (for video this is shared with ITU-T H.273 which is free).
> 
> Also do note that a whole bunch of these are not in the channel order
> that FFmpeg wants after stereo :<
> 
> Thankfully with manual mapping FFmpeg native channel layouts' channel
> order should be writable and readable.
> 
> The channel orders for various CICP layouts can be found both in the
> referenced specifications, as well as in the comments from Apple's
> headers for example
> 
> // ISO/IEC 23091-3, channels w/orderings
> kAudioChannelLayoutTag_CICP_1                   =
> kAudioChannelLayoutTag_MPEG_1_0,      ///< C
> kAudioChannelLayoutTag_CICP_2                   =
> kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
> kAudioChannelLayoutTag_CICP_3                   =
> kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
> kAudioChannelLayoutTag_CICP_4                   =
> kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
> kAudioChannelLayoutTag_CICP_5                   =
> kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
> kAudioChannelLayoutTag_CICP_6                   =
> kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
> kAudioChannelLayoutTag_CICP_7                   =
> kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc
> 
> kAudioChannelLayoutTag_CICP_9                   =
> kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
> kAudioChannelLayoutTag_CICP_10                  =
> kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
> kAudioChannelLayoutTag_CICP_11                  =
> kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
> kAudioChannelLayoutTag_CICP_12                  =
> kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
> kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
>                   ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
> Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb
> 
> kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
>               ///< L R C LFE Ls Rs Vhl Vhr
> kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
>                   ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr
> 
> kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
>                   ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
> kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
>                   ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
> kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
>                   ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts
> 
> kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
>                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
> kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
>                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
> Reos
> 
> Best regards,
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org <mailto:ffmpeg-devel@ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org <mailto:ffmpeg-devel-request@ffmpeg.org> with subject "unsubscribe".
diff mbox series

Patch

diff --git a/libavformat/mov.c b/libavformat/mov.c
index b125343f84..1db869aa2e 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -940,6 +940,82 @@  static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
     return 0;
 }
 
+static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
+{
+    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
+    int stream_structure;
+    int ret = 0;
+    AVStream *st;
+
+    if (c->fc->nb_streams < 1)
+        return 0;
+    st = c->fc->streams[c->fc->nb_streams-1];
+
+    /* skip version and flags */
+    avio_skip(pb, 4);
+
+    stream_structure = avio_r8(pb);
+
+    // stream carries channels
+    if (stream_structure & 1) {
+        int layout = avio_r8(pb);
+
+        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
+        if (!layout) {
+            uint8_t positions[64] = {};
+            int enable = 1;
+
+            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
+                int speaker_pos = avio_r8(pb);
+
+                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
+                if (speaker_pos == 126) { // explicit position
+                    int16_t azimuth = avio_rb16(pb);
+                    int8_t elevation = avio_r8(pb);
+
+                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
+                           azimuth, elevation);
+                    // Don't support explicit position
+                    enable = 0;
+                } else if (i < FF_ARRAY_ELEMS(positions)) {
+                    positions[i] = speaker_pos;
+                } else {
+                    // number of channel out of our supported range
+                    enable = 0;
+                }
+            }
+
+            if (enable) {
+                ret = ff_mov_get_layout_from_channel_positions(positions,
+                        st->codecpar->ch_layout.nb_channels,
+                        &st->codecpar->ch_layout);
+                if (ret) {
+                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
+                    ret = 0;
+                }
+            }
+        } else {
+            uint64_t omitted_channel_map = avio_rb64(pb);
+
+            if (omitted_channel_map) {
+                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
+                                      omitted_channel_map);
+                return AVERROR_PATCHWELCOME;
+            }
+            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
+        }
+    }
+
+    // stream carries objects
+    if (stream_structure & 2) {
+        int obj_count = avio_r8(pb);
+        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
+    }
+
+    avio_seek(pb, end, SEEK_SET);
+    return ret;
+}
+
 static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 {
     AVStream *st;
@@ -7784,7 +7860,8 @@  static const MOVParseTableEntry mov_default_parse_table[] = {
 { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
 { MKTAG('w','f','e','x'), mov_read_wfex },
 { MKTAG('c','m','o','v'), mov_read_cmov },
-{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
+{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
+{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
 { MKTAG('d','v','c','1'), mov_read_dvc1 },
 { MKTAG('s','g','p','d'), mov_read_sgpd },
 { MKTAG('s','b','g','p'), mov_read_sbgp },
diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
index f66bf0df7f..10ebcdc08f 100644
--- a/libavformat/mov_chan.c
+++ b/libavformat/mov_chan.c
@@ -551,3 +551,268 @@  int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
 
     return 0;
 }
+
+/* ISO/IEC 23001-8, 8.2 */
+static const AVChannelLayout iso_channel_configuration[] = {
+    // 0: any setup
+    {},
+
+    // 1: centre front
+    AV_CHANNEL_LAYOUT_MONO,
+
+    // 2: left front, right front
+    AV_CHANNEL_LAYOUT_STEREO,
+
+    // 3: centre front, left front, right front
+    AV_CHANNEL_LAYOUT_SURROUND,
+
+    // 4: centre front, left front, right front, rear centre
+    AV_CHANNEL_LAYOUT_4POINT0,
+
+    // 5: centre front, left front, right front, left surround, right surround
+    AV_CHANNEL_LAYOUT_5POINT0,
+
+    // 6: 5 + LFE
+    AV_CHANNEL_LAYOUT_5POINT1,
+
+    // 7: centre front, left front centre, right front centre,
+    // left front, right front, left surround, right surround, LFE
+    AV_CHANNEL_LAYOUT_7POINT1_WIDE,
+
+    // 8: channel1, channel2
+    AV_CHANNEL_LAYOUT_STEREO_DOWNMIX,
+
+    // 9: left front, right front, rear centre
+    AV_CHANNEL_LAYOUT_2_1,
+
+    // 10: left front, right front, left surround, right surround
+    AV_CHANNEL_LAYOUT_2_2,
+
+    // 11: centre front, left front, right front, left surround, right surround, rear centre, LFE
+    AV_CHANNEL_LAYOUT_6POINT1,
+
+    // 12: centre front, left front, right front
+    // left surround, right surround
+    // rear surround left, rear surround right
+    // LFE
+    AV_CHANNEL_LAYOUT_7POINT1,
+
+    // 13:
+    AV_CHANNEL_LAYOUT_22POINT2,
+
+    // 14:
+    AV_CHANNEL_LAYOUT_7POINT1_TOP_BACK,
+
+    // TODO: 15 - 20
+};
+
+/* ISO/IEC 23001-8, table 8 */
+static const enum AVChannel iso_channel_position[] = {
+    // 0: left front
+    AV_CHAN_FRONT_LEFT,
+
+    // 1: right front
+    AV_CHAN_FRONT_RIGHT,
+
+    // 2: centre front
+    AV_CHAN_FRONT_CENTER,
+
+    // 3: low frequence enhancement
+    AV_CHAN_LOW_FREQUENCY,
+
+    // 4: left surround
+    // TODO
+    AV_CHAN_NONE,
+
+    // 5: right surround
+    // TODO
+    AV_CHAN_NONE,
+
+    // 6: left front centre
+    AV_CHAN_FRONT_LEFT_OF_CENTER,
+
+    // 7: right front centre
+    AV_CHAN_FRONT_RIGHT_OF_CENTER,
+
+    // 8: rear surround left
+    AV_CHAN_BACK_LEFT,
+
+    // 9: rear surround right
+    AV_CHAN_BACK_RIGHT,
+
+    // 10: rear centre
+    AV_CHAN_BACK_CENTER,
+
+    // 11: left surround direct
+    AV_CHAN_SURROUND_DIRECT_LEFT,
+
+    // 12: right surround direct
+    AV_CHAN_SURROUND_DIRECT_RIGHT,
+
+    // 13: left side surround
+    AV_CHAN_SIDE_LEFT,
+
+    // 14: right side surround
+    AV_CHAN_SIDE_RIGHT,
+
+    // 15: left wide front
+    AV_CHAN_WIDE_LEFT,
+
+    // 16: right wide front
+    AV_CHAN_WIDE_RIGHT,
+
+    // 17: left front vertical height
+    AV_CHAN_TOP_FRONT_LEFT,
+
+    // 18: right front vertical height
+    AV_CHAN_TOP_FRONT_RIGHT,
+
+    // 19: centre front vertical height
+    AV_CHAN_TOP_FRONT_CENTER,
+
+    // 20: left surround vertical height rear
+    AV_CHAN_TOP_BACK_LEFT,
+
+    // 21: right surround vertical height rear
+    AV_CHAN_TOP_BACK_RIGHT,
+
+    // 22: centre vertical height rear
+    AV_CHAN_TOP_BACK_CENTER,
+
+    // 23: left vertical height side surround
+    AV_CHAN_TOP_SIDE_LEFT,
+
+    // 24: right vertical height side surround
+    AV_CHAN_TOP_SIDE_RIGHT,
+
+    // 25: top centre surround
+    AV_CHAN_TOP_CENTER,
+
+    // 26: low frequency enhancement 2
+    AV_CHAN_LOW_FREQUENCY_2,
+
+    // 27: left front vertical bottom
+    AV_CHAN_BOTTOM_FRONT_LEFT,
+
+    // 28: right front vertical bottom
+    AV_CHAN_BOTTOM_FRONT_RIGHT,
+
+    // 29: centre front vertical bottom
+    AV_CHAN_BOTTOM_FRONT_CENTER,
+
+    // 30: left vertical height surround
+    // TODO
+    AV_CHAN_NONE,
+
+    // 31: right vertical height surround
+    // TODO
+    AV_CHAN_NONE,
+
+    // 32, 33, 34, 35, reserved
+    AV_CHAN_NONE,
+    AV_CHAN_NONE,
+    AV_CHAN_NONE,
+    AV_CHAN_NONE,
+
+    // 36: low frequency enhancement 3
+    AV_CHAN_NONE,
+
+    // 37: left edge of screen
+    AV_CHAN_NONE,
+    // 38: right edge of screen
+    AV_CHAN_NONE,
+    // 39: half-way between centre of screen and left edge of screen
+    AV_CHAN_NONE,
+    // 40: half-way between centre of screen and right edge of screen
+    AV_CHAN_NONE,
+
+    // 41: left back surround
+    AV_CHAN_NONE,
+
+    // 42: right back surround
+    AV_CHAN_NONE,
+
+    // 43 - 125: reserved
+    // 126: explicit position
+    // 127: unknown /undefined
+};
+
+int ff_mov_get_channel_config_from_layout(const AVChannelLayout *layout, int *config)
+{
+    // Set default value which means any setup in 23001-8
+    *config = 0;
+    for (int i = 0; i < FF_ARRAY_ELEMS(iso_channel_configuration); i++) {
+        if (!av_channel_layout_compare(layout, iso_channel_configuration + i)) {
+            *config = i;
+            break;
+        }
+    }
+
+    return 0;
+}
+
+int ff_mov_get_channel_layout_from_config(int config, AVChannelLayout *layout)
+{
+    if (config > 0 && config < FF_ARRAY_ELEMS(iso_channel_configuration)) {
+        *layout = iso_channel_configuration[config];
+        return 0;
+    }
+
+    return -1;
+}
+
+int ff_mov_get_channel_positions_from_layout(const AVChannelLayout *layout,
+                                             uint8_t *position, int position_num)
+{
+    enum AVChannel channel;
+
+    if (position_num < layout->nb_channels)
+        return AVERROR(EINVAL);
+
+    if (layout->order != AV_CHANNEL_ORDER_NATIVE)
+        return AVERROR_PATCHWELCOME;
+
+    for (int i = 0; i < layout->nb_channels; i++) {
+        position[i] = 127;
+        channel = av_channel_layout_channel_from_index(layout, i);
+        if (channel == AV_CHAN_NONE)
+            return AVERROR(EINVAL);
+
+        for (int j = 0; j < FF_ARRAY_ELEMS(iso_channel_position); j++) {
+            if (iso_channel_position[j] == channel) {
+                position[i] = j;
+                break;
+            }
+        }
+        if (position[i] == 127)
+            return AVERROR(EINVAL);
+    }
+
+    return 0;
+}
+
+int ff_mov_get_layout_from_channel_positions(const uint8_t *position, int position_num,
+                                             AVChannelLayout *layout)
+{
+    AVChannelLayout tmp = {
+        .order = AV_CHANNEL_ORDER_NATIVE,
+        .nb_channels = position_num,
+    };
+    enum AVChannel channel;
+
+    for (int i = 0; i < position_num; i++) {
+        if (position[i] >= FF_ARRAY_ELEMS(iso_channel_position))
+            return AVERROR_PATCHWELCOME;
+
+        channel = iso_channel_position[position[i]];
+        // unsupported layout
+        if (channel == AV_CHAN_NONE)
+            return AVERROR_PATCHWELCOME;
+
+        tmp.u.mask |= 1ULL << channel;
+    }
+
+    *layout = tmp;
+
+    return 0;
+}
diff --git a/libavformat/mov_chan.h b/libavformat/mov_chan.h
index 93d9878798..8c807798ab 100644
--- a/libavformat/mov_chan.h
+++ b/libavformat/mov_chan.h
@@ -163,4 +163,30 @@  int ff_mov_get_channel_layout_tag(const AVCodecParameters *par,
 int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
                      int64_t size);
 
+/**
+ * Get ISO/IEC 23001-8 ChannelConfiguration from AVChannelLayout.
+ *
+ */
+int ff_mov_get_channel_config_from_layout(const AVChannelLayout *layout, int *config);
+
+/**
+ * Get AVChannelLayout from ISO/IEC 23001-8 ChannelConfiguration.
+ *
+ * @return 0 for success, -1 for doesn't match, layout is untouched on failure
+ */
+
+int ff_mov_get_channel_layout_from_config(int config, AVChannelLayout *layout);
+
+/**
+ * Get ISO/IEC 23001-8 OutputChannelPosition from AVChannelLayout.
+ */
+int ff_mov_get_channel_positions_from_layout(const AVChannelLayout *layout,
+                                             uint8_t *position, int position_num);
+
+/**
+ * Get AVChannelLayout from ISO/IEC 23001-8 OutputChannelPosition.
+ */
+int ff_mov_get_layout_from_channel_positions(const uint8_t *position, int position_num,
+                                             AVChannelLayout *layout);
+
 #endif /* AVFORMAT_MOV_CHAN_H */