diff mbox

[FFmpeg-devel,v2,14/18] avcodec/avcodec, avformat/movenc: support embedding channel layout to stream side data

Message ID 1472643361-10118-15-git-send-email-erkki.seppala.ext@nokia.com
State Superseded
Headers show

Commit Message

erkki.seppala.ext@nokia.com Aug. 31, 2016, 11:35 a.m. UTC
Added support for passing complex channel layout configuration as side
packet data (AV_PKT_DATA_AUDIO_TRACK_CHANNEL_LAYOUT) to ISO media files
as specified by ISO/IEC 14496-12. AVAudioTrackChannelLayout has the
fields to setting both predefined audio layouts, completely configuring
the azimuth and elevation of each speaker as well as describing the
number of audio objects in the scene.

This information isn't integrated into the existing channel layout
system though, which is much more restricted compared to what the
standard permits. However, the side packet data is structured so that it
does not require too much ISO base media file format knowledge in client
code. In addition, it should be possible to extend the enumeration and
the record to allow for using the same side data for a more native
solution.

The names of the channels and layouts are available in
channel_layout_isoiec23001_8.h with slightly obtuse names such as
AV_SPEAKER_POSITION_ISOIEC23001_8_L and
AV_CH_LAYOUT_ISOIEC23001_8_1_0_0 to encourage path forward to a more
native solution for FFmpeg.

This channel layout information ends up to a chnl box in the written
file in an isom track.

Signed-off-by: Erkki Seppälä <erkki.seppala.ext@nokia.com>
Signed-off-by: OZOPlayer <OZOPL@nokia.com>
---
 libavcodec/avcodec.h                     | 71 +++++++++++++++++++++++
 libavformat/movenc.c                     | 73 +++++++++++++++++++++++-
 libavutil/channel_layout_isoiec23001_8.h | 97 ++++++++++++++++++++++++++++++++
 3 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 libavutil/channel_layout_isoiec23001_8.h

Comments

Carl Eugen Hoyos Aug. 31, 2016, 2 p.m. UTC | #1
Hi!

2016-08-31 13:35 GMT+02:00 Erkki Seppälä <erkki.seppala.ext@nokia.com>:
> Added support for passing complex channel layout configuration as side
> packet data (AV_PKT_DATA_AUDIO_TRACK_CHANNEL_LAYOUT) to ISO
> media files as specified by ISO/IEC 14496-12. AVAudioTrackChannelLayout
> has the fields to setting both predefined audio layouts, completely
> configuring the azimuth and elevation of each speaker as well as describing
> the number of audio objects in the scene.

(I used to work on parts of the FFmpeg channel layout code, I know it is
broken to some degree but I still hope to get it fixed one day...)

Are you using this feature already (internally) or did you just feel like it is
missing? (Sorry, it is not meant offending.) I wonder if all this complexity
really has a real-world use-case...

How is "chnl" related to "chan"? I thought I remember that "chan" already
offers possibilities for custom channel layouts.
Can both be used in the same file?

Thank you, Carl Eugen
erkki.seppala.ext@nokia.com Sept. 1, 2016, 1:24 p.m. UTC | #2
Hello!

On 08/31/2016 05:00 PM, Carl Eugen Hoyos wrote:
> Are you using this feature already (internally) or did you just feel like it is
> missing? (Sorry, it is not meant offending.) I wonder if all this complexity
> really has a real-world use-case...
We may or may not be using this functionality internally. Maybe the 
signed off-tag gives a hint where :).
> How is "chnl" related to "chan"? I thought I remember that "chan" already
> offers possibilities for custom channel layouts.
> Can both be used in the same file?
To me it seems these two tags are probably historically connected, chan 
being a precursor to chnl, though there seem to be some big differences 
in the range of options, if not in practical functionality. For example, 
the predefined layouts are completely different - mov_chan.c mentions 85 
predefined channel layouts whereas channel_layouts_isoiec23001_8.h has 20.

The conversions in mov_chan.c seem a bit lossy, though, for example:

     { MOV_CH_LAYOUT_MPEG_4_0_A,         AV_CH_LAYOUT_4POINT0 }, // L, 
R, C, Cs
     { MOV_CH_LAYOUT_MPEG_4_0_B,         AV_CH_LAYOUT_4POINT0 }, // C, 
L, R, Cs
     { MOV_CH_LAYOUT_AC3_3_1,            AV_CH_LAYOUT_4POINT0 }, // L, 
C, R, Cs

Is this not losing the channel order? It doesn't seem like it's 
compensated anywhere.

I doubt the two tags can be used in the same track in a way that it 
makes sense. Perhaps more so in the same file, with alternative audio 
tracks where audio tracks originate from different sources?

It seems the mov_chan.c approach would be applicable to these layouts as 
well, except the mapping between existing FFmpeg layouts might be 
error-prone and one wouldn't want to have duplicate layouts meaning the 
same thing. Also mov_chan.c seems to have code for indicating speaker 
positions (by their coordinates) in ff_mov_read_chan, but it just throws 
them away (it doesn't even try to write anything complicated in 
mov_write_chan_tag).

The trickiest parts in my opinion would be:

1) The same channels can be in different order in the same file in 
different channel layouts. Doesn't FFmpeg throw away this permutation 
information? Does some higher level care about that?

2) There is no API or internals for dealing with speakers at arbitrary 
azimuth/elevation.

I'm of course just selling this patch, but it does seem less risky to 
have a side-data-based implementation in first ;-). (Perhaps even by 
adding a suffix _EXPERIMENTAL to the side data, but it seems FFmpeg 
hasn't done that so far.)
diff mbox

Patch

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 8373bca..7f14751 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -39,6 +39,7 @@ 
 #include "libavutil/log.h"
 #include "libavutil/pixfmt.h"
 #include "libavutil/rational.h"
+#include "libavutil/channel_layout_isoiec23001_8.h"
 
 #include "version.h"
 
@@ -1367,6 +1368,69 @@  typedef struct AVTrackReferences {
     /** followed by an optional gap for alignment purposes and another AVTrackReferences is applicaple */
 } AVTrackReferences;
 
+/**
+ * Describes the speaker position of a single audio channel of a single track
+ *
+ * The name is chosen in a slightly obscure manner as to allow a more
+ * natural name to take its place when the system supports FFmpeg's
+ * native channel positions.
+ */
+typedef struct AVAudioTrackChannelPositionISOIEC23001_8 {
+    AVSpeakerPositionISOIEC23001_8 speaker_position; /** an OutputChannelPosition from ISO/IEC 23001-8 */
+
+    /** The following are used if speaker_position == AV_SPEAKER_POSITION_ISOIEC23001_8_EXPL */
+    int16_t azimuth;            /** Degrees -180..180. Values increment counterclockwise from above. */
+    int8_t  elevation;          /** Degrees -90..90. >0 is above horizon. */
+} AVAudioTrackChannelPositionISOIEC23001_8;
+
+/**
+ * Describes the channel layout (ie. speaker position) of a single audio track
+ *
+ * The name is chosen in a slightly obscure manner as to allow a more
+ * natural name to take its place when the system supports FFmpeg's
+ * native channel positions.
+ */
+typedef struct AVAudioTrackChannelCompleteLayoutISOIEC23001_8 {
+    int nb_positions;
+    AVAudioTrackChannelPositionISOIEC23001_8 positions[64];
+} AVAudioTrackChannelCompleteLayoutISOIEC23001_8;
+
+/**
+ * Describes the channel layout based on predefined layout of a single
+ * track by providing the layout and the list of channels are are
+ * omitted. For example, you may choose a layout that has 6.1 channels
+ * and then choose to omit the LFE channel from your channels.
+ *
+ * The name is chosen in a slightly obscure manner as to allow a more natural 
+ * name to take its place when the system supports FFmpeg's native layouts.
+ */
+typedef struct AVAudioTrackChannelPredefinedLayoutISOIEC23001_8 {
+    AVChannelLayoutISOIEC23001_8  layout; /** ChannelConfiguration from ISO/IEC 23001-8 */
+    uint64_t omitted_channels;  /** lsb 1 means the first channel is omitted and so on */
+} AVAudioTrackChannelPredefinedLayoutISOIEC23001_8;
+
+typedef enum AVComplexAudioTrackChannelLayoutType {
+    AV_COMPLEX_CHANNEL_LAYOUT_OBJECTS_ONLY,
+    AV_COMPLEX_CHANNEL_LAYOUT_PREDEFINED_ISOIEC23001_8,
+    AV_COMPLEX_CHANNEL_LAYOUT_COMPLETE_ISOIEC23001_8,
+} AVComplexAudioTrackChannelLayoutType;
+
+typedef struct AVAudioTrackChannelLayout {
+    AVComplexAudioTrackChannelLayoutType type;
+    union {
+        AVAudioTrackChannelPredefinedLayoutISOIEC23001_8 predefined;
+        AVAudioTrackChannelCompleteLayoutISOIEC23001_8   complete;
+    };
+
+    /**
+     * Describes the channel layout to be object-structured with given
+     * number of objects. Object-structured audio is means to describe
+     * an audio scene without a fixed channel layout that can be mixed
+     * to varying channel configurations.
+     */
+    int nb_audio_objects; /** Number of audio objects */
+} AVAudioTrackChannelLayout;
+
 enum AVPacketSideDataType {
     AV_PKT_DATA_PALETTE,
 
@@ -1559,6 +1623,13 @@  enum AVPacketSideDataType {
      * meta data configuration. The value is of type AVTimedMetadataInfo.
      */
     AV_PKT_DATA_TIMED_METADATA_INFO,
+
+    /**
+     * Channel layout, describing the position of spakers for the
+     * channels of a track, following the structure
+     * AVAudioTrackChannelLayout.
+     */
+    AV_PKT_DATA_AUDIO_TRACK_CHANNEL_LAYOUT,
 };
 
 #define AV_PKT_DATA_QUALITY_FACTOR AV_PKT_DATA_QUALITY_STATS //DEPRECATED
diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 883fa57..6e179ef 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -557,6 +557,75 @@  static unsigned compute_avg_bitrate(MOVTrack *track)
     return size * 8 * track->timescale / track->track_duration;
 }
 
+static int mov_write_chnl_tag(AVIOContext *pb, MOVTrack *track)
+{
+    AVAudioTrackChannelLayout *side_data =
+        (void*) av_stream_get_side_data(track->st, AV_PKT_DATA_AUDIO_TRACK_CHANNEL_LAYOUT,
+                                        NULL);
+
+    AVAudioTrackChannelPredefinedLayoutISOIEC23001_8 *predefined =
+        side_data->type == AV_COMPLEX_CHANNEL_LAYOUT_PREDEFINED_ISOIEC23001_8
+        ? &side_data->predefined
+        : NULL;
+
+    AVAudioTrackChannelCompleteLayoutISOIEC23001_8 *complete =
+        side_data->type == AV_COMPLEX_CHANNEL_LAYOUT_COMPLETE_ISOIEC23001_8
+        ? &side_data->complete
+        : NULL;
+
+    int object_count = side_data->nb_audio_objects;
+
+    if (!predefined && !complete && !object_count) {
+        return 0;
+    } else {
+        int64_t pos = avio_tell(pb);
+
+        int channel_structured           = predefined || complete;
+        int object_structured            = !!object_count;
+
+        // ChannelConfiguration from ISO/IEC 23001-8
+        int defined_layout               = predefined ? predefined->layout : 0;
+        int channel_count                = track->par->channels;
+
+        int stream_structure             = (channel_structured << 0) | (object_structured << 1);
+
+        avio_wb32(pb, 0); // size
+        ffio_wfourcc(pb, "chnl");
+        avio_wb32(pb, 0); // Version
+
+        avio_w8(pb, stream_structure);
+
+        if (channel_structured) {
+            avio_w8(pb, defined_layout);
+            if (defined_layout == 0) {
+                AVAudioTrackChannelPositionISOIEC23001_8* positions;
+                int i;
+                av_assert0(complete);
+                av_assert0(complete->nb_positions >= channel_count);
+
+                positions = complete->positions;
+
+                for (i = 0; i < channel_count; ++i) {
+                    AVAudioTrackChannelPositionISOIEC23001_8 *pos = &positions[i];
+                    avio_w8(pb, pos->speaker_position);
+                    if (pos->speaker_position == 126) {
+                        avio_wb16(pb, pos->azimuth);
+                        avio_w8(pb, pos->elevation);
+                    }
+                }
+            } else {
+                av_assert0(predefined);
+
+                avio_wb64(pb, predefined->omitted_channels);
+            }
+        }
+        if (object_structured)
+            avio_w8(pb, object_count);
+
+        return update_size(pb, pos);
+    }
+}
+
 static int mov_write_esds_tag(AVIOContext *pb, MOVTrack *track) // Basic
 {
     AVCPBProperties *props;
@@ -996,8 +1065,10 @@  static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex
          (mov_pcm_le_gt16(track->par->codec_id) && version==1) ||
          (mov_pcm_be_gt16(track->par->codec_id) && version==1)))
         mov_write_wave_tag(s, pb, track);
-    else if (track->tag == MKTAG('m','p','4','a'))
+    else if (track->tag == MKTAG('m','p','4','a')) {
+        mov_write_chnl_tag(pb, track);
         mov_write_esds_tag(pb, track);
+    }
     else if (track->par->codec_id == AV_CODEC_ID_AMR_NB)
         mov_write_amr_tag(pb, track);
     else if (track->par->codec_id == AV_CODEC_ID_AC3)
diff --git a/libavutil/channel_layout_isoiec23001_8.h b/libavutil/channel_layout_isoiec23001_8.h
new file mode 100644
index 0000000..72f39c1
--- /dev/null
+++ b/libavutil/channel_layout_isoiec23001_8.h
@@ -0,0 +1,97 @@ 
+/*
+ * copyright (c) 2016 Erkki Seppälä <erkki.seppala.ext@nokia.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_CHANNEL_LAYOUT_ISOIEC23001_8_H
+#define AVUTIL_CHANNEL_LAYOUT_ISOIEC23001_8_H
+
+/** Speaker positions according to ISO/IEC 23001-8 */
+typedef enum AVSpeakerPositionISOIEC23001_8 {
+    AV_SPEAKER_POSITION_ISOIEC23001_8_L      = 0,   /// Left front
+    AV_SPEAKER_POSITION_ISOIEC23001_8_R      = 1,   /// Right front
+    AV_SPEAKER_POSITION_ISOIEC23001_8_C      = 2,   /// Centre front
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LFE    = 3,   /// Low frequency enhancement
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LS     = 4,   /// Left surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RS     = 5,   /// Right surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LC     = 6,   /// Left front centre
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RC     = 7,   /// Right front centre
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LSR    = 8,   /// Rear surround left
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RSR    = 9,   /// Rear surround right
+    AV_SPEAKER_POSITION_ISOIEC23001_8_CS     = 10,  /// Rear centre
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LSD    = 11,  /// Left surround direct
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RSD    = 12,  /// Right surround direct
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LSS    = 13,  /// Left side surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RSS    = 14,  /// Right side surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LW     = 15,  /// Left wide front
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RW     = 16,  /// Right wide front
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LV     = 17,  /// Left front vertical height
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RV     = 18,  /// Right front vertical height
+    AV_SPEAKER_POSITION_ISOIEC23001_8_CV     = 19,  /// Centre front vertical height
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LVR    = 20,  /// Left surround vertical height rear
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RVR    = 21,  /// Right surround vertical height rear
+    AV_SPEAKER_POSITION_ISOIEC23001_8_CVR    = 22,  /// Centre vertical height rear
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LVSS   = 23,  /// Left vertical height side surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RVSS   = 24,  /// Right vertical height side surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_TS     = 25,  /// Top centre surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LFE2   = 26,  /// E2 Low frequency enhancement 2
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LB     = 27,  /// Left front vertical bottom
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RB     = 28,  /// Right front vertical bottom
+    AV_SPEAKER_POSITION_ISOIEC23001_8_CB     = 29,  /// Centre front vertical bottom
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LVS    = 30,  /// Left vertical height surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RVS    = 31,  /// Right vertical height surround
+                                                     /// 32-45 Reserved
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LFE3   = 36,  /// E3 Low frequency enhancement 3
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LEOS   = 37,  /// Left edge of screen
+    AV_SPEAKER_POSITION_ISOIEC23001_8_REOS   = 38,  /// Right edge of screen
+    AV_SPEAKER_POSITION_ISOIEC23001_8_HWBCAL = 39,  /// half-way between centre of screen and left edge of screen
+    AV_SPEAKER_POSITION_ISOIEC23001_8_HWBCAR = 40,  /// half-way between centre of screen and right edge of screen
+    AV_SPEAKER_POSITION_ISOIEC23001_8_LBS    = 41,  /// Left back surround
+    AV_SPEAKER_POSITION_ISOIEC23001_8_RBS    = 42,  /// Right back surround
+                                                     /// 43–125 Reserved
+    AV_SPEAKER_POSITION_ISOIEC23001_8_EXPL   = 126, /// Explicit position (see text)
+                                                     /// 127 Unknown / undefined
+} AVSpeakerPositionISOIEC23001_8;
+
+/** Channel layouts according to ISO/IEC 23001-8 */
+typedef enum AVChannelLayoutISOIEC23001_8 {
+    AV_CH_LAYOUT_ISOIEC23001_8_ANY,
+    AV_CH_LAYOUT_ISOIEC23001_8_1_0_0,   ///  1 centre front
+    AV_CH_LAYOUT_ISOIEC23001_8_2_0_0,   ///  2 left front,   right front
+    AV_CH_LAYOUT_ISOIEC23001_8_3_0_0,   ///  3 centre front, left front,  right front
+    AV_CH_LAYOUT_ISOIEC23001_8_3_1_0,   ///  4 centre front, left front,  right front,  rear centre
+    AV_CH_LAYOUT_ISOIEC23001_8_3_2_0,   ///  5 centre front, left front,  right front,  left surround,  right surround
+    AV_CH_LAYOUT_ISOIEC23001_8_3_2_1,   ///  6 centre front, left front,  right front,  left surround,  right surround,  LFE
+    AV_CH_LAYOUT_ISOIEC23001_8_5_2_1A,  ///  7 centre front, left front centre,  right front centre,  left front,  right front,  left surround,  right surround,  LFE
+    AV_CH_LAYOUT_ISOIEC23001_8_1P1,     ///  8 channel1      channel2
+    AV_CH_LAYOUT_ISOIEC23001_8_2_1_0,   ///  9 left front,   right front,  rear centre
+    AV_CH_LAYOUT_ISOIEC23001_8_2_2_0,   /// 10 left front,   right front,  left surround,  right surround
+    AV_CH_LAYOUT_ISOIEC23001_8_3_3_1,   /// 11 centre front, left front,  right front,  left surround,  right surround,  rear centre,  LFE
+    AV_CH_LAYOUT_ISOIEC23001_8_3_4_1,   /// 12 centre front, left front,   right front,   left surround,   right surround,   rear surround left,   rear surround right,   LFE
+    AV_CH_LAYOUT_ISOIEC23001_8_11_11_2, /// 13 centre front, left front centre,  right front centre,  left front,  right front,  left side surround,  right side surround,  rear left surround,  rear right surround,  rear centre,  left front LFE,  right front LFE,  centre front vertical height,  left front vertical height,  right front vertical height,  left vertical height side surround,  r       ight vertical height side surround,  top centre surround,  left surround vertical height rear,  r       ight surround vertical height rear ,  centre vertical height rear,  centre front vertical bottom,  left front vertical bottom,  right front vertical bottom
+    AV_CH_LAYOUT_ISOIEC23001_8_5_2_1B,  /// 14 centre front, left front,   right front,   left surround,   right surround,   LFE,   left front vertical height,  right front vertical height
+    AV_CH_LAYOUT_ISOIEC23001_8_5_5_2,   /// 15 centre front, left front,   right front,   left side surround,   right side surround,   left surround,   right surround,   left front vertical height,   right front vertical height,   centre vertical height rear,   LFE1,   LFE2
+    AV_CH_LAYOUT_ISOIEC23001_8_5_4_1,   /// 16 centre front, left front,   right front,   left surround,   right surround,   LFE,   left front vertical height,   right front vertical height,   left vertical height surround,  right vertical height surround
+    AV_CH_LAYOUT_ISOIEC23001_8_6_5_1,   /// 17 centre front, left front,   right front,   left surround,   right surround,   LFE,   left front vertical height,   right front vertical height,   centre front vertical height,  left vertical height surround,  right vertical height surround,  top centre surround
+    AV_CH_LAYOUT_ISOIEC23001_8_6_7_1,   /// 18 centre front, left front,   right front,   left surround,   right surround,   left back surround,   right back surround   LFE,   left front vertical height,   right front vertical height,   centre front vertical height,  left vertical height surround,  right vertical height surround,  top centre surround
+    AV_CH_LAYOUT_ISOIEC23001_8_5_6_1,   /// 19 centre front, left front,  right front,  left side surround,  right side surround,  rear surround left,  rear surround right,  LFE,  left front vertical height,  right front vertical height,  left surround vertical height rear,  right surround vertical height rear
+    AV_CH_LAYOUT_ISOIEC23001_8_7_6_1,   /// 20 centre front, left edge of screen,   right edge of screen,   left front,   right front,   left side surround,   right side surround,   rear surround left,   rear surround right,   LFE,   left front vertical height,  right front vertical height,  left vertical height surround,  right vertical height surround
+                                         /// 21..63: reserved
+} AVChannelLayoutISOIEC23001_8;
+
+#endif /* AVUTIL_CHANNEL_LAYOUT_ISOIEC23001_8_H */