From patchwork Fri Dec 15 21:28:37 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Almer <jamrial@gmail.com>
X-Patchwork-Id: 45171
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id
 v37csp5954714pzf;
        Fri, 15 Dec 2023 13:29:03 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFFUQFUlrOhJwy9UDO2UaB7HUcaFjBTkX7KD/L/kLN0EKvLmb4kzBc1y0Ww5T+Tt4LY1OEs
X-Received: by 2002:a17:906:14d:b0:a19:a19b:55cd with SMTP id
 13-20020a170906014d00b00a19a19b55cdmr7082241ejh.93.1702675743350;
        Fri, 15 Dec 2023 13:29:03 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1702675743; cv=none;
        d=google.com; s=arc-20160816;
        b=y7vABvFHFL1ydj2bOQ9VrFZqJ6mP+QHtX/LyfvVULXL3k5b5MW0md3NZYT0BN+QJe7
         D2eaKc/ARXlXQftcrEiVnlCzk6Hta78XGrCG5y8YZK1ozgztad29m4+bA9TeJ1JiXGwk
         A+53ve2INJDPFDvb7BzOkgjIK8s935QLUFRzkwU7yEeW0DJr7z5nlPgCiJlhVCGmpteH
         v35Gi43kIEdF9+FGP7670Niz+GiyZjCOHBU9zQjPajnjSEDbZKe7f+BWL++4rJvvvZdg
         OIsJ7pvwE4z5AeOuBiddC85ooGYHBLnZMOXaeeBcJZ0CtlBTisKkOf38GFOc0sRRnLtb
         /4Aw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe
         :list-help:list-post:list-archive:list-unsubscribe:list-id
         :precedence:subject:mime-version:references:in-reply-to:message-id
         :date:to:from:dkim-signature:delivered-to;
        bh=MD6ZZVJ7RSHXclXvkKuWHD/uMKcUFmv2bidTG/mrH9A=;
        fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=;
        b=CjMJRtN5VOBOCM+BHv02JPLt9cD9Lrftt5a9WUQKFbGIn+l40YRsoEqt8R7J7eJ3Gl
         kii60b+1jeGO27bg+VXR4Hkma57NhLr2rELPE01gR7d/x8ihCZxZ8Kc1+MCYSn4L/R2G
         9wtwq278O3QohS3XXpoNIANtQBz75xe3p4soXgCWEQ+Fi45CLwBNd4URbLZdNyZ8vNre
         u0k/+ydlZfHqfdLHl9GPgU+vQWDBuhxH+f9oLlz50HCKmFcxaDH9lrog5agH1qipzkGj
         bKQSgzIj9l+hwP1I6HqGtxV+en5jlBYNQCRBHp1f/GYLEjx8o0Y2KWE27kx3u76rbBeo
         4Cbg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@gmail.com
 header.s=20230601 header.b=UCQY1Q0D;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 f24-20020a170906495800b00a1b80a28871si7875679ejt.350.2023.12.15.13.29.02;
        Fri, 15 Dec 2023 13:29:03 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@gmail.com
 header.s=20230601 header.b=UCQY1Q0D;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 85B2468D0E1;
	Fri, 15 Dec 2023 23:28:59 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com
 [209.85.214.177])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1164168CFFA
 for <ffmpeg-devel@ffmpeg.org>; Fri, 15 Dec 2023 23:28:53 +0200 (EET)
Received: by mail-pl1-f177.google.com with SMTP id
 d9443c01a7336-1d336760e72so9982275ad.3
 for <ffmpeg-devel@ffmpeg.org>; Fri, 15 Dec 2023 13:28:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1702675730; x=1703280530; darn=ffmpeg.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:to:from:from:to:cc:subject:date:message-id
 :reply-to; bh=2By4LVZ63oJUsGw995pX5NFFV88X3b30JEWRzL7e5q0=;
 b=UCQY1Q0DwllBaU49qlLrp3baYBK/UU0Fy+9TFNdhSeF7NrePx387tk4yMMval3Q34z
 TOPSYhaXbAVYAk316CgHhyZIYv9Yqov+120R6RK47pmDDE/azHQN4Em9FTDi7KrDUPDT
 r9gTZMPl2LLQ20JTufaJNEyJp+6gjBP1kcQTAG2YQ0G7vqj13ORM9bN3ObL1SXXC6U8M
 DSgJIx+bbwTveIbCeE6yOu7VOJr/+GWzTXCGTiml95jUyV2LQG6TXGzG+bYDWK0GXJWu
 cSY8KKx4LX1mnV2YbiDw9mSr6wNtvW76ms8WnDe7EvURdcpwxKW1ui5FIs8RMNrKz0TX
 FW5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1702675730; x=1703280530;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=2By4LVZ63oJUsGw995pX5NFFV88X3b30JEWRzL7e5q0=;
 b=vmgbkfSYWDZIf2SzEWeBeEN7DwhQEPXBl5jKFEKnH4fdnjqutjtRBRJB5R09VAXEwk
 2soaEK93qXOQj86tbFen2Olc0kCsUWT6s2VkRNwojB7APMZRYqiTlOvfinVBUDjnsx6L
 THYNMQhREOn5sbHWYdOdWDDiV7DWgt/xdJz1AlqndRhQahJMnOaNEs2OopH1OnYHhUY2
 FqKg98H/DH3tAhEVtb6T2sQethbPjtInL5xzEwwQD/CzK3eqerrJa9mE15UaTSBYDqCx
 FUaSG7LtuBnvi/L0seViey0pU3ZhuyHFLKEyAIIJICuIZL7P7Bs4NhtbAhoxajBgfiyc
 Ab4w==
X-Gm-Message-State: AOJu0YyaqAURt6TbobFHnwiYCKCLxarPY4IuFMao6MxsE3T/RMBSCf7h
 Vgd+xqrN0StTRl33Jv5+V3lyV3upXOU=
X-Received: by 2002:a17:903:2448:b0:1d0:6ffe:9f6 with SMTP id
 l8-20020a170903244800b001d06ffe09f6mr13769370pls.84.1702675729891;
 Fri, 15 Dec 2023 13:28:49 -0800 (PST)
Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar.
 [190.225.105.197]) by smtp.gmail.com with ESMTPSA id
 r13-20020a17090ad40d00b0028b03f9107asm4116972pju.55.2023.12.15.13.28.48
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 15 Dec 2023 13:28:49 -0800 (PST)
From: James Almer <jamrial@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 15 Dec 2023 18:28:37 -0300
Message-ID: <20231215212837.1395-1-jamrial@gmail.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214201433.4608-4-jamrial@gmail.com>
References: <20231214201433.4608-4-jamrial@gmail.com>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH 3/8] ffmpeg: add support for muxing
 AVStreamGroups
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: ELtSnpvhx7nC

Starting with IAMF support.

Signed-off-by: James Almer <jamrial@gmail.com>
---
 doc/ffmpeg.texi           | 200 ++++++++++++++++++++++
 fftools/ffmpeg.h          |   2 +
 fftools/ffmpeg_mux_init.c | 342 ++++++++++++++++++++++++++++++++++++++
 fftools/ffmpeg_opt.c      |   2 +
 4 files changed, 546 insertions(+)

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index c503963941..1fadb20686 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -623,6 +623,206 @@ Not all muxers support embedded thumbnails, and those who do, only support a few
 Creates a program with the specified @var{title}, @var{program_num} and adds the specified
 @var{stream}(s) to it.
 
+@item -stream_group type=@var{type}:st=@var{stream}[:st=@var{stream}][:stg=@var{stream_group}][:id=@var{stream_group_id}...] (@emph{output})
+
+Creates a stream group of the specified @var{type}, @var{stream_group_id} and adds the specified
+@var{stream}(s) and/or previously defined @var{stream_group}(s) to it.
+
+@var{type} can be one of the following:
+@table @option
+
+@item iamf_audio_element
+Groups @var{stream}s that belong to the same IAMF Audio Element
+
+For this group @var{type}, the following options are available
+@table @option
+@item audio_element_type
+The Audio Element type. The following values are supported:
+
+@table @option
+@item channel
+Scalable channel audio representation
+@item scene
+Ambisonics representation
+@end table
+
+@item demixing
+Demixing information used to reconstruct a scalable channel audio representation.
+This option must be separated from the rest with a ',', and takes the following
+key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to
+@item dmixp_mode
+A pre-defined combination of demixing parameters
+@end table
+
+@item recon_gain
+Recon gain information used to reconstruct a scalable channel audio representation.
+This option must be separated from the rest with a ',', and takes the following
+key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to
+@end table
+
+@item layer
+A layer defining a Channel Layout in the Audio Element.
+This option must be separated from the rest with a ','. Several ',' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options
+
+@table @option
+@item ch_layout
+The layer's channel layout
+@item flags
+The following flags are available:
+
+@table @option
+@item recon_gain
+Wether to signal if recon_gain is present as metadata in parameter blocks within frames
+@end table
+
+@item output_gain
+@item output_gain_flags
+Which channels output_gain applies to. The following flags are available:
+
+@table @option
+@item FL
+@item FR
+@item BL
+@item BR
+@item TFL
+@item TFR
+@end table
+
+@item ambisonics_mode
+The ambisonics mode. This has no effect if audio_element_type is set to channel.
+
+The following values are supported:
+
+@table @option
+@item mono
+Each ambisonics channel is coded as an individual mono stream in the group
+@end table
+
+@end table
+
+@item default_w
+Default weight value
+
+@end table
+
+@item iamf_mix_presentation
+Groups @var{stream}s that belong to all IAMF Audio Element the same
+IAMF Mix Presentation references
+
+For this group @var{type}, the following options are available
+
+@table @option
+@item submix
+A sub-mix within the Mix Presentation.
+This option must be separated from the rest with a ','. Several ',' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options
+
+@table @option
+@item parameter_id
+An identifier parameters blocks in frames may refer to, for post-processing the mixed
+audio signal to generate the audio signal for playback
+@item parameter_rate
+The sample rate duration fields in parameters blocks in frames that refer to this
+@var{parameter_id} are expressed as
+@item default_mix_gain
+Default mix gain value to apply when there are no parameter blocks sharing the same
+@var{parameter_id} for a given frame
+
+@item element
+References an Audio Element used in this Mix Presentation to generate the final output
+audio signal for playback.
+This option must be separated from the rest with a '|'. Several '|' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options:
+
+@table @option
+@item stg
+The @var{stream_group_id} for an Audio Element which this sub-mix refers to
+@item parameter_id
+An identifier parameters blocks in frames may refer to, for applying any processing to
+the referenced and rendered Audio Element before being summed with other processed Audio
+Elements
+@item parameter_rate
+The sample rate duration fields in parameters blocks in frames that refer to this
+@var{parameter_id} are expressed as
+@item default_mix_gain
+Default mix gain value to apply when there are no parameter blocks sharing the same
+@var{parameter_id} for a given frame
+@item annotations
+A key=value string describing the sub-mix element where "key" is a string conforming to
+BCP-47 that specifies the language for the "value" string. "key" must be the same as the
+one in the mix's @var{annotations}
+@item headphones_rendering_mode
+Indicates whether the input channel-based Audio Element is rendered to stereo loudspeakers
+or spatialized with a binaural renderer when played back on headphones.
+This has no effect if the referenced Audio Element's @var{audio_element_type} is set to
+channel.
+
+The following values are supported:
+
+@table @option
+@item stereo
+@item binaural
+@end table
+
+@end table
+
+@item layout
+Specifies the layouts for this sub-mix on which the loudness information was measured.
+This option must be separated from the rest with a '|'. Several '|' separated entries
+can be defined, and at least one must be set.
+
+It takes the following ":"-separated key=value options:
+
+@table @option
+@item layout_type
+
+@table @option
+@item loudspeakers
+The layout follows the loudspeaker sound system convention of ITU-2051-3.
+@item binaural
+The layout is binaural.
+@end table
+
+@item sound_system
+Channel layout matching one of Sound Systems A to J of ITU-2051-3, plus 7.1.2 and 3.1.2
+This has no effect if @var{layout_type} is set to binaural.
+@item integrated_loudness
+The program integrated loudness information, as defined in ITU-1770-4.
+@item digital_peak
+The digital (sampled) peak value of the audio signal, as defined in ITU-1770-4.
+@item true_peak
+The true peak of the audio signal, as defined in ITU-1770-4.
+@item dialog_anchored_loudness
+The Dialogue loudness information, as defined in ITU-1770-4.
+@item album_anchored_loudness
+The Album loudness information, as defined in ITU-1770-4.
+@end table
+
+@end table
+
+@item annotations
+A key=value string string describing the mix where "key" is a string conforming to BCP-47
+that specifies the language for the "value" string. "key" must be the same as the ones in
+all sub-mix element's @var{annotations}s
+@end table
+
+@end table
+
 @item -target @var{type} (@emph{output})
 Specify target file type (@code{vcd}, @code{svcd}, @code{dvd}, @code{dv},
 @code{dv50}). @var{type} may be prefixed with @code{pal-}, @code{ntsc-} or
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index affa80856a..1169f723d1 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -281,6 +281,8 @@ typedef struct OptionsContext {
     int        nb_disposition;
     SpecifierOpt *program;
     int        nb_program;
+    SpecifierOpt *stream_groups;
+    int        nb_stream_groups;
     SpecifierOpt *time_bases;
     int        nb_time_bases;
     SpecifierOpt *enc_time_bases;
diff --git a/fftools/ffmpeg_mux_init.c b/fftools/ffmpeg_mux_init.c
index f527a083db..2134b28512 100644
--- a/fftools/ffmpeg_mux_init.c
+++ b/fftools/ffmpeg_mux_init.c
@@ -40,6 +40,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/display.h"
 #include "libavutil/getenv_utf8.h"
+#include "libavutil/iamf.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/log.h"
 #include "libavutil/mem.h"
@@ -2008,6 +2009,343 @@ static int setup_sync_queues(Muxer *mux, AVFormatContext *oc, int64_t buf_size_u
     return 0;
 }
 
+static int of_parse_iamf_audio_element_layers(Muxer *mux, AVStreamGroup *stg, char *ptr)
+{
+    AVIAMFAudioElement *audio_element = stg->params.iamf_audio_element;
+    AVDictionary *dict = NULL;
+    const char *token;
+    int ret = 0;
+
+    audio_element->demixing_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_DEMIXING, 1, NULL);
+    audio_element->recon_gain_info =
+        av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_RECON_GAIN, 1, NULL);
+
+    if (!audio_element->demixing_info ||
+        !audio_element->recon_gain_info)
+        return AVERROR(ENOMEM);
+
+    /* process manually set layers and parameters */
+    token = av_strtok(NULL, ",", &ptr);
+    while (token) {
+        const AVDictionaryEntry *e;
+        int demixing = 0, recon_gain = 0;
+        int layer = 0;
+
+        if (av_strstart(token, "layer=", &token))
+            layer = 1;
+        else if (av_strstart(token, "demixing=", &token))
+            demixing = 1;
+        else if (av_strstart(token, "recon_gain=", &token))
+            recon_gain = 1;
+
+        av_dict_free(&dict);
+        ret = av_dict_parse_string(&dict, token, "=", ":", 0);
+        if (ret < 0) {
+            av_log(mux, AV_LOG_ERROR, "Error parsing audio element specification %s\n", token);
+            goto fail;
+        }
+
+        if (layer) {
+            AVIAMFLayer *audio_layer = av_iamf_audio_element_add_layer(audio_element);
+            if (!audio_layer) {
+                av_log(mux, AV_LOG_ERROR, "Error adding layer to stream group %d\n", stg->index);
+                ret = AVERROR(ENOMEM);
+                goto fail;
+            }
+            av_opt_set_dict(audio_layer, &dict);
+        } else if (demixing || recon_gain) {
+            AVIAMFParamDefinition *param = demixing ? audio_element->demixing_info
+                                                    : audio_element->recon_gain_info;
+            void *subblock = av_iamf_param_definition_get_subblock(param, 0);
+
+            av_opt_set_dict(param, &dict);
+            av_opt_set_dict(subblock, &dict);
+        }
+
+        // make sure that no entries are left in the dict
+        e = NULL;
+        if (e = av_dict_iterate(dict, e)) {
+            av_log(mux, AV_LOG_FATAL, "Unknown layer key %s.\n", e->key);
+            ret = AVERROR(EINVAL);
+            goto fail;
+        }
+        token = av_strtok(NULL, ",", &ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    if (!ret && !audio_element->nb_layers) {
+        av_log(mux, AV_LOG_ERROR, "No layer in audio element specification\n");
+        ret = AVERROR(EINVAL);
+    }
+
+    return ret;
+}
+
+static int of_parse_iamf_submixes(Muxer *mux, AVStreamGroup *stg, char *ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVIAMFMixPresentation *mix = stg->params.iamf_mix_presentation;
+    AVDictionary *dict = NULL;
+    const char *token;
+    char *submix_str = NULL;
+    int ret = 0;
+
+    /* process manually set submixes */
+    token = av_strtok(NULL, ",", &ptr);
+    while (token) {
+        AVIAMFSubmix *submix = NULL;
+        const char *subtoken;
+        char *subptr = NULL;
+
+        if (!av_strstart(token, "submix=", &token)) {
+            av_log(mux, AV_LOG_ERROR, "No submix in mix presentation specification \"%s\"\n", token);
+            goto fail;
+        }
+
+        submix_str = av_strdup(token);
+        if (!submix_str)
+            goto fail;
+
+        submix = av_iamf_mix_presentation_add_submix(mix);
+        if (!submix) {
+            av_log(mux, AV_LOG_ERROR, "Error adding submix to stream group %d\n", stg->index);
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+        submix->output_mix_config =
+            av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+        if (!submix->output_mix_config) {
+            ret = AVERROR(ENOMEM);
+            goto fail;
+        }
+
+        subptr = NULL;
+        subtoken = av_strtok(submix_str, "|", &subptr);
+        while (subtoken) {
+            const AVDictionaryEntry *e;
+            int element = 0, layout = 0;
+
+            if (av_strstart(subtoken, "element=", &subtoken))
+                element = 1;
+            else if (av_strstart(subtoken, "layout=", &subtoken))
+                layout = 1;
+
+            av_dict_free(&dict);
+            ret = av_dict_parse_string(&dict, subtoken, "=", ":", 0);
+            if (ret < 0) {
+                av_log(mux, AV_LOG_ERROR, "Error parsing submix specification \"%s\"\n", subtoken);
+                goto fail;
+            }
+
+            if (element) {
+                AVIAMFSubmixElement *submix_element;
+                int64_t idx = -1;
+
+                if (e = av_dict_get(dict, "stg", NULL, 0))
+                    idx = strtol(e->value, NULL, 0);
+                av_dict_set(&dict, "stg", NULL, 0);
+                if (idx < 0 || idx >= oc->nb_stream_groups - 1 ||
+                    oc->stream_groups[idx]->type != AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT) {
+                    av_log(mux, AV_LOG_ERROR, "Invalid or missing stream group index in "
+                                              "submix element specification \"%s\"\n", subtoken);
+                    ret = AVERROR(EINVAL);
+                    goto fail;
+                }
+                submix_element = av_iamf_submix_add_element(submix);
+                if (!submix_element) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding element to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+
+                submix_element->audio_element_id = oc->stream_groups[idx]->id;
+
+                submix_element->element_mix_config =
+                    av_iamf_param_definition_alloc(AV_IAMF_PARAMETER_DEFINITION_MIX_GAIN, 0, NULL);
+                if (!submix_element->element_mix_config)
+                    ret = AVERROR(ENOMEM);
+                av_opt_set_dict2(submix_element, &dict, AV_OPT_SEARCH_CHILDREN);
+            } else if (layout) {
+                AVIAMFSubmixLayout *submix_layout = av_iamf_submix_add_layout(submix);
+                if (!submix_layout) {
+                    av_log(mux, AV_LOG_ERROR, "Error adding layout to submix\n");
+                    ret = AVERROR(ENOMEM);
+                    goto fail;
+                }
+                av_opt_set_dict(submix_layout, &dict);
+            } else
+                av_opt_set_dict2(submix, &dict, AV_OPT_SEARCH_CHILDREN);
+
+            if (ret < 0) {
+                goto fail;
+            }
+
+            // make sure that no entries are left in the dict
+            e = NULL;
+            while (e = av_dict_iterate(dict, e)) {
+                av_log(mux, AV_LOG_FATAL, "Unknown submix key %s.\n", e->key);
+                ret = AVERROR(EINVAL);
+                goto fail;
+            }
+            subtoken = av_strtok(NULL, "|", &subptr);
+        }
+        av_freep(&submix_str);
+
+        if (!submix->nb_elements) {
+            av_log(mux, AV_LOG_ERROR, "No audio elements in submix specification \"%s\"\n", token);
+            ret = AVERROR(EINVAL);
+        }
+        token = av_strtok(NULL, ",", &ptr);
+    }
+
+fail:
+    av_dict_free(&dict);
+    av_free(submix_str);
+
+    return ret;
+}
+
+static int of_parse_group_token(Muxer *mux, const char *token, char *ptr)
+{
+    AVFormatContext *oc = mux->fc;
+    AVStreamGroup *stg;
+    AVDictionary *dict = NULL, *tmp = NULL;
+    const AVDictionaryEntry *e;
+    const AVOption opts[] = {
+        { "type", "Set group type", offsetof(AVStreamGroup, type), AV_OPT_TYPE_INT,
+                { .i64 = 0 }, 0, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "type" },
+            { "iamf_audio_element",    NULL, 0, AV_OPT_TYPE_CONST,
+                { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT },    .unit = "type" },
+            { "iamf_mix_presentation", NULL, 0, AV_OPT_TYPE_CONST,
+                { .i64 = AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION }, .unit = "type" },
+        { NULL },
+    };
+    const AVClass class = {
+        .class_name = "StreamGroupType",
+        .item_name  = av_default_item_name,
+        .option     = opts,
+        .version    = LIBAVUTIL_VERSION_INT,
+    };
+    const AVClass *pclass = &class;
+    int type, ret;
+
+    ret = av_dict_parse_string(&dict, token, "=", ":", AV_DICT_MULTIKEY);
+    if (ret < 0) {
+        av_log(mux, AV_LOG_ERROR, "Error parsing group specification %s\n", token);
+        return ret;
+    }
+
+    // "type" is not a user settable AVOption in AVStreamGroup, so handle it here
+    e = av_dict_get(dict, "type", NULL, 0);
+    if (!e) {
+        av_log(mux, AV_LOG_ERROR, "No type specified for Stream Group in \"%s\"\n", token);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ret = av_opt_eval_int(&pclass, opts, e->value, &type);
+    if (!ret && type == AV_STREAM_GROUP_PARAMS_NONE)
+        ret = AVERROR(EINVAL);
+    if (ret < 0) {
+        av_log(mux, AV_LOG_ERROR, "Invalid group type \"%s\"\n", e->value);
+        goto end;
+    }
+
+    av_dict_copy(&tmp, dict, 0);
+    stg = avformat_stream_group_create(oc, type, &tmp);
+    if (!stg) {
+        ret = AVERROR(ENOMEM);
+        goto end;
+    }
+
+    e = NULL;
+    while (e = av_dict_get(dict, "st", e, 0)) {
+        int64_t idx = strtol(e->value, NULL, 0);
+        if (idx < 0 || idx >= oc->nb_streams) {
+            av_log(mux, AV_LOG_ERROR, "Invalid stream index %"PRId64"\n", idx);
+            ret = AVERROR(EINVAL);
+            goto end;
+        }
+        ret = avformat_stream_group_add_stream(stg, oc->streams[idx]);
+        if (ret < 0)
+            goto end;
+    }
+    while (e = av_dict_get(dict, "stg", e, 0)) {
+        int64_t idx = strtol(e->value, NULL, 0);
+        if (idx < 0 || idx >= oc->nb_stream_groups - 1) {
+            av_log(mux, AV_LOG_ERROR, "Invalid stream group index %"PRId64"\n", idx);
+            ret = AVERROR(EINVAL);
+            goto end;
+        }
+        for (unsigned i = 0; i < oc->stream_groups[idx]->nb_streams; i++) {
+            ret = avformat_stream_group_add_stream(stg, oc->stream_groups[idx]->streams[i]);
+            if (ret < 0)
+                goto end;
+        }
+    }
+
+    switch(type) {
+    case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT:
+        ret = of_parse_iamf_audio_element_layers(mux, stg, ptr);
+        break;
+    case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION:
+        ret = of_parse_iamf_submixes(mux, stg, ptr);
+        break;
+    default:
+        av_log(mux, AV_LOG_FATAL, "Unknown group type %d.\n", type);
+        ret = AVERROR(EINVAL);
+        break;
+    }
+
+    if (ret < 0)
+        goto end;
+
+    // make sure that nothing but "st" and "stg" entries are left in the dict
+    e = NULL;
+    av_dict_set(&tmp, "type", NULL, 0);
+    while (e = av_dict_iterate(tmp, e)) {
+        if (!strcmp(e->key, "st") || !strcmp(e->key, "stg"))
+            continue;
+
+        av_log(mux, AV_LOG_FATAL, "Unknown group key %s.\n", e->key);
+        ret = AVERROR(EINVAL);
+        goto end;
+    }
+
+    ret = 0;
+end:
+    av_dict_free(&dict);
+    av_dict_free(&tmp);
+
+    return ret;
+}
+
+static int of_add_groups(Muxer *mux, const OptionsContext *o)
+{
+    /* process manually set groups */
+    for (int i = 0; i < o->nb_stream_groups; i++) {
+        const char *token;
+        char *str, *ptr = NULL;
+        int ret = 0;
+
+        str = av_strdup(o->stream_groups[i].u.str);
+        if (!str)
+            return ret;
+
+        token = av_strtok(str, ",", &ptr);
+        if (token)
+            ret = of_parse_group_token(mux, token, ptr);
+
+        av_free(str);
+        if (ret < 0)
+            return ret;
+    }
+
+    return 0;
+}
+
 static int of_add_programs(Muxer *mux, const OptionsContext *o)
 {
     AVFormatContext *oc = mux->fc;
@@ -2793,6 +3131,10 @@ int of_open(const OptionsContext *o, const char *filename, Scheduler *sch)
     if (err < 0)
         return err;
 
+    err = of_add_groups(mux, o);
+    if (err < 0)
+        return err;
+
     err = of_add_programs(mux, o);
     if (err < 0)
         return err;
diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
index 6177a96a4e..915f8e3ea0 100644
--- a/fftools/ffmpeg_opt.c
+++ b/fftools/ffmpeg_opt.c
@@ -1493,6 +1493,8 @@ const OptionDef options[] = {
         "add metadata", "string=string" },
     { "program",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(program) },
         "add program with specified streams", "title=string:st=number..." },
+    { "stream_group",        HAS_ARG | OPT_STRING | OPT_SPEC | OPT_OUTPUT, { .off = OFFSET(stream_groups) },
+        "add stream group with specified streams and group type-specific arguments", "id=number:st=number..." },
     { "dframes",        HAS_ARG | OPT_PERFILE | OPT_EXPERT |
                         OPT_OUTPUT,                                  { .func_arg = opt_data_frames },
         "set the number of data frames to output", "number" },