From patchwork Tue Jan 30 17:32:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 45915 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:8786:b0:199:de12:6fa6 with SMTP id ph6csp2315479pzb; Tue, 30 Jan 2024 09:33:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IHRTZ7HlYl1SwZWDES9fwfFnq8LfoZe+xVW94W0Go9atcaFqPJRf3hVsSWQjKK83IKeLrwl X-Received: by 2002:a17:906:b195:b0:a35:9513:4081 with SMTP id w21-20020a170906b19500b00a3595134081mr5017829ejy.14.1706636001715; Tue, 30 Jan 2024 09:33:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1706636001; cv=none; d=google.com; s=arc-20160816; b=BFgFHqaEhMctD03VRTGQC3tNUkqKwAZ7hB23sfJ9YyoOxOUfBHO/yeDsbXjik6bv76 xf2zkFlDgfzVBmLbHrYdpT0rZBvj5OHJ8ubRyyhNwLgOMmVDZ0FdVCqTnSpGD5Y0RrLf gysF0HewUCI2JuCFJwiQWfos6Uw3/08y5Ndy3RlTUVkhxCf1ELupqtJIXNudr7CiHqLa aeL4dO/D/glLynN9zknJpaax0oDhRbUeOYocht0huaMnE3GfcBy4vAe0rzvDosWGTmW3 C+WSmjy8eAGa8S8CVAxzmhpJ12DOquiUT61vmsb+HOVDapksNSNnFtU1R0EvS/lqva0l tZDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=jc1pPLkdewph2YyAlorvisSj6iqwEShS7GRdJhmzi+o=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=0Jz8sMpOOfP5AHn+EJCLqYYCr0aJ5sMP/gVkfeqbb4sIgyylaLD4DsBmf/+hFjYQ2t CNLV4ZNlnQE7u+yXtu+iNrh5yOPEqcPd6xoj4LY+/Hjhmut52EMqPAoVFL1zzynwDM0h p7idkYYc1l+W22kMDXFDyJQqEa+1vnPSrphmogaeYCtJr6e0+tIjEfXNEAC22KP5Wmd9 Ol171C6EbnuSJToEHwWv4d6yxrHMOSCwCGPX1TeimhBIeE64WBnsLVkhBXJGyc5CWiiw oAafs1Iu2mSNjngh6LoAlT4h5xQ14dSd1q8pmwLZ4X1GvPowjsFaE8b8XMTleexe5BZA 29fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=LPU5jr+e; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ch25-20020a170906c2d900b00a297d2fae1csi4652705ejb.594.2024.01.30.09.33.21; Tue, 30 Jan 2024 09:33:21 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=LPU5jr+e; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B227268D13C; Tue, 30 Jan 2024 19:32:27 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1AA6F68D0DD for ; Tue, 30 Jan 2024 19:32:19 +0200 (EET) Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1d8f3966982so11891315ad.2 for ; Tue, 30 Jan 2024 09:32:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706635936; x=1707240736; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=UUWlM8uqBQ55o4ykkPTEdtg0HKl7zwT3g4jS/Vdf8WE=; b=LPU5jr+eqqsa9GVn+xNqizlyJ0/9NkTDDcDjOW0hytSROV+4IeT45YxwQgH96KqKSV bmO/2W8K6pcWL7jqhc2KLh0dQQEukgHdQ0RHq5jRk24Zxpj+9vfuHoHw1Y591uIkCIqm Vxq1kNqCOzj1apsGlIs+L0+ZbEBfZbKb/eTY3qKeiP4dJkWoeLQR+sv0DTzsCMUNcZ16 diz6PxrXJ+xWVFegxxvrRLMQhmTh+tWdykTb8ZSaPq95CLcfWm/l3N3RVFK9525QNCdj Ooagu74rVf3KHfi7apdfbFxx/SCXSdIYqbkevDEIPKqOHcGJmuSA3K9sRTX+UV0KjNMh nsNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706635936; x=1707240736; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UUWlM8uqBQ55o4ykkPTEdtg0HKl7zwT3g4jS/Vdf8WE=; b=CqTN2EoCwpsdtgwNNxJEIc7blLVUYJmnGaKhOPsiQVyldQb2SoZUbHvApXODKjUBPc dnakM0sAv09XrgFbwNkjqogZh5uBkNgxl/H9ZbGdgwGsWVtFjahOe1ozBWDbUJVR+zdz cD4LAcd3oyIxwUCfc1NweVVaxwo7CEZtuiMWhz8cIp3mAhDOFlgrbDeIIzBEmoLB71Tp 7bwGc5MjjzSZCt4PpBsMz1iz7JX0qd3tIjD802Oh9QGU6KA3WoN3KfkIibFymPswxJQS G3Ltb2n7V3MWMnaOikBIZBILDSd4Kva93MEjwdeup4gS7QU8FTeUo77of6wQsvkTkkJK OyRA== X-Gm-Message-State: AOJu0YzAAe6hdNaCdoTn3V9gfLtUayFzAiAnn1Tbdoq1vQuodkfBU2GN hKepAkc8vDQohE5zFx8HCTTPt1Agm0GW0S1/BpbihUd+R2YYHYsCyittEKOw X-Received: by 2002:a17:902:7846:b0:1d9:1201:dcb7 with SMTP id e6-20020a170902784600b001d91201dcb7mr1248086pln.14.1706635936232; Tue, 30 Jan 2024 09:32:16 -0800 (PST) Received: from localhost.localdomain (host197.190-225-105.telecom.net.ar. [190.225.105.197]) by smtp.gmail.com with ESMTPSA id k12-20020a170902f28c00b001d73702e0b7sm4107415plc.212.2024.01.30.09.32.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 09:32:15 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Tue, 30 Jan 2024 14:32:18 -0300 Message-ID: <20240130173218.63297-7-jamrial@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240130173218.63297-1-jamrial@gmail.com> References: <20240130173218.63297-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 7/7] avformat/movenc: add support for Immersive Audio Model and Formats in ISOBMFF X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: E9oWquTr2dnU Signed-off-by: James Almer --- libavformat/movenc.c | 321 ++++++++++++++++++++++++++++++++++--------- libavformat/movenc.h | 7 + 2 files changed, 266 insertions(+), 62 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 8a27afbc57..76a568fba6 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -32,6 +32,7 @@ #include "dovi_isom.h" #include "riff.h" #include "avio.h" +#include "iamf_writer.h" #include "isom.h" #include "av1.h" #include "avc.h" @@ -47,6 +48,7 @@ #include "libavcodec/raw.h" #include "internal.h" #include "libavutil/avstring.h" +#include "libavutil/bprint.h" #include "libavutil/channel_layout.h" #include "libavutil/csp.h" #include "libavutil/intfloat.h" @@ -315,6 +317,33 @@ static int mov_write_sdtp_tag(AVIOContext *pb, MOVTrack *track) return update_size(pb, pos); } + +static int mov_write_iacb_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *track) +{ + AVIOContext *dyn_bc; + int64_t pos = avio_tell(pb); + uint8_t *dyn_buf = NULL; + int dyn_size; + int ret = avio_open_dyn_buf(&dyn_bc); + if (ret < 0) + return ret; + + avio_wb32(pb, 0); + ffio_wfourcc(pb, "iacb"); + avio_w8(pb, 1); // configurationVersion + + ret = ff_iamf_write_descriptors(track->iamf, dyn_bc, s); + if (ret < 0) + return ret; + + dyn_size = avio_close_dyn_buf(dyn_bc, &dyn_buf); + ffio_write_leb(pb, dyn_size); + avio_write(pb, dyn_buf, dyn_size); + av_free(dyn_buf); + + return update_size(pb, pos); +} + static int mov_write_amr_tag(AVIOContext *pb, MOVTrack *track) { avio_wb32(pb, 0x11); /* size */ @@ -1357,6 +1386,8 @@ static int mov_write_audio_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex ret = mov_write_wave_tag(s, pb, track); else if (track->tag == MKTAG('m','p','4','a')) ret = mov_write_esds_tag(pb, track); + else if (track->tag == MKTAG('i','a','m','f')) + ret = mov_write_iacb_tag(mov->fc, pb, track); else if (track->par->codec_id == AV_CODEC_ID_AMR_NB) ret = mov_write_amr_tag(pb, track); else if (track->par->codec_id == AV_CODEC_ID_AC3) @@ -2481,7 +2512,7 @@ static int mov_write_video_tag(AVFormatContext *s, AVIOContext *pb, MOVMuxContex if (track->mode == MODE_AVIF) { mov_write_ccst_tag(pb); - if (s->nb_streams > 0 && track == &mov->tracks[1]) + if (mov->nb_streams > 0 && track == &mov->tracks[1]) mov_write_aux_tag(pb, "auxi"); } @@ -3076,9 +3107,9 @@ static int mov_write_iloc_tag(AVIOContext *pb, MOVMuxContext *mov, AVFormatConte avio_wb32(pb, 0); /* Version & flags */ avio_w8(pb, (4 << 4) + 4); /* offset_size(4) and length_size(4) */ avio_w8(pb, 0); /* base_offset_size(4) and reserved(4) */ - avio_wb16(pb, s->nb_streams); /* item_count */ + avio_wb16(pb, mov->nb_streams); /* item_count */ - for (int i = 0; i < s->nb_streams; i++) { + for (int i = 0; i < mov->nb_streams; i++) { avio_wb16(pb, i + 1); /* item_id */ avio_wb16(pb, 0); /* data_reference_index */ avio_wb16(pb, 1); /* extent_count */ @@ -3097,9 +3128,9 @@ static int mov_write_iinf_tag(AVIOContext *pb, MOVMuxContext *mov, AVFormatConte avio_wb32(pb, 0); /* size */ ffio_wfourcc(pb, "iinf"); avio_wb32(pb, 0); /* Version & flags */ - avio_wb16(pb, s->nb_streams); /* entry_count */ + avio_wb16(pb, mov->nb_streams); /* entry_count */ - for (int i = 0; i < s->nb_streams; i++) { + for (int i = 0; i < mov->nb_streams; i++) { int64_t infe_pos = avio_tell(pb); avio_wb32(pb, 0); /* size */ ffio_wfourcc(pb, "infe"); @@ -3168,7 +3199,7 @@ static int mov_write_ipco_tag(AVIOContext *pb, MOVMuxContext *mov, AVFormatConte int64_t pos = avio_tell(pb); avio_wb32(pb, 0); /* size */ ffio_wfourcc(pb, "ipco"); - for (int i = 0; i < s->nb_streams; i++) { + for (int i = 0; i < mov->nb_streams; i++) { mov_write_ispe_tag(pb, mov, s, i); mov_write_pixi_tag(pb, mov, s, i); mov_write_av1c_tag(pb, &mov->tracks[i]); @@ -3186,9 +3217,9 @@ static int mov_write_ipma_tag(AVIOContext *pb, MOVMuxContext *mov, AVFormatConte avio_wb32(pb, 0); /* size */ ffio_wfourcc(pb, "ipma"); avio_wb32(pb, 0); /* Version & flags */ - avio_wb32(pb, s->nb_streams); /* entry_count */ + avio_wb32(pb, mov->nb_streams); /* entry_count */ - for (int i = 0, index = 1; i < s->nb_streams; i++) { + for (int i = 0, index = 1; i < mov->nb_streams; i++) { avio_wb16(pb, i + 1); /* item_ID */ avio_w8(pb, 4); /* association_count */ @@ -4165,7 +4196,7 @@ static int mov_write_covr(AVIOContext *pb, AVFormatContext *s) int64_t pos = 0; int i; - for (i = 0; i < s->nb_streams; i++) { + for (i = 0; i < mov->nb_streams; i++) { MOVTrack *trk = &mov->tracks[i]; if (!is_cover_image(trk->st) || trk->cover_image->size <= 0) @@ -4312,7 +4343,7 @@ static int mov_write_meta_tag(AVIOContext *pb, MOVMuxContext *mov, mov_write_pitm_tag(pb, 1); mov_write_iloc_tag(pb, mov, s); mov_write_iinf_tag(pb, mov, s); - if (s->nb_streams > 1) + if (mov->nb_streams > 1) mov_write_iref_tag(pb, mov, s); mov_write_iprp_tag(pb, mov, s); } else { @@ -4563,16 +4594,17 @@ static int mov_setup_track_ids(MOVMuxContext *mov, AVFormatContext *s) if (mov->use_stream_ids_as_track_ids) { int next_generated_track_id = 0; - for (i = 0; i < s->nb_streams; i++) { - if (s->streams[i]->id > next_generated_track_id) - next_generated_track_id = s->streams[i]->id; + for (i = 0; i < mov->nb_streams; i++) { + AVStream *st = mov->tracks[i].st; + if (st->id > next_generated_track_id) + next_generated_track_id = st->id; } for (i = 0; i < mov->nb_tracks; i++) { if (mov->tracks[i].entry <= 0 && !(mov->flags & FF_MOV_FLAG_FRAGMENT)) continue; - mov->tracks[i].track_id = i >= s->nb_streams ? ++next_generated_track_id : s->streams[i]->id; + mov->tracks[i].track_id = i >= mov->nb_streams ? ++next_generated_track_id : mov->tracks[i].st->id; } } else { for (i = 0; i < mov->nb_tracks; i++) { @@ -4609,7 +4641,7 @@ static int mov_write_moov_tag(AVIOContext *pb, MOVMuxContext *mov, } if (mov->chapter_track) - for (i = 0; i < s->nb_streams; i++) { + for (i = 0; i < mov->nb_streams; i++) { mov->tracks[i].tref_tag = MKTAG('c','h','a','p'); mov->tracks[i].tref_id = mov->tracks[mov->chapter_track].track_id; } @@ -4649,7 +4681,7 @@ static int mov_write_moov_tag(AVIOContext *pb, MOVMuxContext *mov, for (i = 0; i < mov->nb_tracks; i++) { if (mov->tracks[i].entry > 0 || mov->flags & FF_MOV_FLAG_FRAGMENT || mov->mode == MODE_AVIF) { - int ret = mov_write_trak_tag(s, pb, mov, &(mov->tracks[i]), i < s->nb_streams ? s->streams[i] : NULL); + int ret = mov_write_trak_tag(s, pb, mov, &(mov->tracks[i]), i < mov->nb_streams ? mov->tracks[i].st : NULL); if (ret < 0) return ret; } @@ -5443,8 +5475,8 @@ static int mov_write_ftyp_tag(AVIOContext *pb, AVFormatContext *s) int has_h264 = 0, has_av1 = 0, has_video = 0, has_dolby = 0; int i; - for (i = 0; i < s->nb_streams; i++) { - AVStream *st = s->streams[i]; + for (i = 0; i < mov->nb_streams; i++) { + AVStream *st = mov->tracks[i].st; if (is_cover_image(st)) continue; if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) @@ -5619,8 +5651,8 @@ static int mov_write_identification(AVIOContext *pb, AVFormatContext *s) mov_write_ftyp_tag(pb,s); if (mov->mode == MODE_PSP) { int video_streams_nb = 0, audio_streams_nb = 0, other_streams_nb = 0; - for (i = 0; i < s->nb_streams; i++) { - AVStream *st = s->streams[i]; + for (i = 0; i < mov->nb_streams; i++) { + AVStream *st = mov->tracks[i].st; if (is_cover_image(st)) continue; if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) @@ -5807,7 +5839,7 @@ static int mov_write_squashed_packets(AVFormatContext *s) { MOVMuxContext *mov = s->priv_data; - for (int i = 0; i < s->nb_streams; i++) { + for (int i = 0; i < mov->nb_streams; i++) { MOVTrack *track = &mov->tracks[i]; int ret = AVERROR_BUG; @@ -5848,7 +5880,7 @@ static int mov_flush_fragment(AVFormatContext *s, int force) // of fragments was triggered automatically by an AVPacket, we // already have reliable info for the end of that track, but other // tracks may need to be filled in. - for (i = 0; i < s->nb_streams; i++) { + for (i = 0; i < mov->nb_streams; i++) { MOVTrack *track = &mov->tracks[i]; if (!track->end_reliable) { const AVPacket *pkt = ff_interleaved_peek(s, i); @@ -6049,10 +6081,8 @@ static int mov_auto_flush_fragment(AVFormatContext *s, int force) return ret; } -static int check_pkt(AVFormatContext *s, AVPacket *pkt) +static int check_pkt(AVFormatContext *s, MOVTrack *trk, AVPacket *pkt) { - MOVMuxContext *mov = s->priv_data; - MOVTrack *trk = &mov->tracks[pkt->stream_index]; int64_t ref; uint64_t duration; @@ -6090,15 +6120,21 @@ int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt) { MOVMuxContext *mov = s->priv_data; AVIOContext *pb = s->pb; - MOVTrack *trk = &mov->tracks[pkt->stream_index]; - AVCodecParameters *par = trk->par; + MOVTrack *trk; + AVCodecParameters *par; AVProducerReferenceTime *prft; unsigned int samples_in_chunk = 0; int size = pkt->size, ret = 0, offset = 0; size_t prft_size; uint8_t *reformatted_data = NULL; - ret = check_pkt(s, pkt); + if (pkt->stream_index < s->nb_streams) + trk = s->streams[pkt->stream_index]->priv_data; + else // Timecode or chapter + trk = &mov->tracks[pkt->stream_index]; + par = trk->par; + + ret = check_pkt(s, trk, pkt); if (ret < 0) return ret; @@ -6187,7 +6223,7 @@ int ff_mov_write_packet(AVFormatContext *s, AVPacket *pkt) if (par->codec_id == AV_CODEC_ID_AAC && pkt->size > 2 && (AV_RB16(pkt->data) & 0xfff0) == 0xfff0) { - if (!s->streams[pkt->stream_index]->nb_frames) { + if (!trk->st->nb_frames) { av_log(s, AV_LOG_ERROR, "Malformed AAC bitstream detected: " "use the audio bitstream filter 'aac_adtstoasc' to fix it " "('-bsf:a aac_adtstoasc' option with ffmpeg)\n"); @@ -6435,18 +6471,18 @@ err: static int mov_write_single_packet(AVFormatContext *s, AVPacket *pkt) { MOVMuxContext *mov = s->priv_data; - MOVTrack *trk = &mov->tracks[pkt->stream_index]; + MOVTrack *trk = s->streams[pkt->stream_index]->priv_data; AVCodecParameters *par = trk->par; int64_t frag_duration = 0; int size = pkt->size; - int ret = check_pkt(s, pkt); + int ret = check_pkt(s, trk, pkt); if (ret < 0) return ret; if (mov->flags & FF_MOV_FLAG_FRAG_DISCONT) { int i; - for (i = 0; i < s->nb_streams; i++) + for (i = 0; i < mov->nb_streams; i++) mov->tracks[i].frag_discont = 1; mov->flags &= ~FF_MOV_FLAG_FRAG_DISCONT; } @@ -6488,7 +6524,7 @@ static int mov_write_single_packet(AVFormatContext *s, AVPacket *pkt) return 0; /* Discard 0 sized packets */ } - if (trk->entry && pkt->stream_index < s->nb_streams) + if (trk->entry && pkt->stream_index < mov->nb_streams) frag_duration = av_rescale_q(pkt->dts - trk->cluster[0].dts, s->streams[pkt->stream_index]->time_base, AV_TIME_BASE_Q); @@ -6543,17 +6579,45 @@ static int mov_write_subtitle_end_packet(AVFormatContext *s, return ret; } +static int mov_filter_packet(AVFormatContext *s, MOVTrack *track, AVPacket *pkt) +{ + int ret; + + if (!track->bsf) + return 0; + + ret = av_bsf_send_packet(track->bsf, pkt); + if (ret < 0) { + av_log(s, AV_LOG_ERROR, + "Failed to send packet to filter %s for stream %d: %s\n", + track->bsf->filter->name, pkt->stream_index, av_err2str(ret)); + return ret; + } + + return av_bsf_receive_packet(track->bsf, pkt); +} + static int mov_write_packet(AVFormatContext *s, AVPacket *pkt) { MOVMuxContext *mov = s->priv_data; MOVTrack *trk; + int ret; if (!pkt) { mov_flush_fragment(s, 1); return 1; } - trk = &mov->tracks[pkt->stream_index]; + trk = s->streams[pkt->stream_index]->priv_data; + + ret = mov_filter_packet(s, trk, pkt); + if (ret < 0) { + if (ret == AVERROR(EAGAIN)) + return 0; + av_log(s, AV_LOG_ERROR, "Error applying bitstream filters to an output " + "packet for stream #%d: %s\n", trk->st->index, av_err2str(ret)); + return ret; + } if (is_cover_image(trk->st)) { int ret; @@ -6754,12 +6818,12 @@ static int mov_create_chapter_track(AVFormatContext *s, int tracknum) } -static int mov_check_timecode_track(AVFormatContext *s, AVTimecode *tc, int src_index, const char *tcstr) +static int mov_check_timecode_track(AVFormatContext *s, AVTimecode *tc, AVStream *src_st, const char *tcstr) { int ret; /* compute the frame number */ - ret = av_timecode_init_from_string(tc, s->streams[src_index]->avg_frame_rate, tcstr, s); + ret = av_timecode_init_from_string(tc, src_st->avg_frame_rate, tcstr, s); return ret; } @@ -6767,7 +6831,7 @@ static int mov_create_timecode_track(AVFormatContext *s, int index, int src_inde { MOVMuxContext *mov = s->priv_data; MOVTrack *track = &mov->tracks[index]; - AVStream *src_st = s->streams[src_index]; + AVStream *src_st = mov->tracks[src_index].st; uint8_t data[4]; AVPacket *pkt = mov->pkt; AVRational rate = src_st->avg_frame_rate; @@ -6827,8 +6891,8 @@ static void enable_tracks(AVFormatContext *s) first[i] = -1; } - for (i = 0; i < s->nb_streams; i++) { - AVStream *st = s->streams[i]; + for (i = 0; i < mov->nb_streams; i++) { + AVStream *st = mov->tracks[i].st; if (st->codecpar->codec_type <= AVMEDIA_TYPE_UNKNOWN || st->codecpar->codec_type >= AVMEDIA_TYPE_NB || @@ -6862,6 +6926,9 @@ static void mov_free(AVFormatContext *s) MOVMuxContext *mov = s->priv_data; int i; + for (i = 0; i < s->nb_streams; i++) + s->streams[i]->priv_data = NULL; + if (!mov->tracks) return; @@ -6892,6 +6959,7 @@ static void mov_free(AVFormatContext *s) ffio_free_dyn_buf(&track->mdat_buf); avpriv_packet_list_free(&track->squashed_packet_queue); + av_bsf_free(&track->bsf); } av_freep(&mov->tracks); @@ -6964,6 +7032,89 @@ static int mov_create_dvd_sub_decoder_specific_info(MOVTrack *track, return 0; } +static int mov_init_iamf_track(AVFormatContext *s) +{ + MOVMuxContext *mov = s->priv_data; + MOVTrack *track = &mov->tracks[0]; // IAMF if present is always the first track + const AVBitStreamFilter *filter; + AVBPrint bprint; + AVStream *first_st = NULL; + char *args; + int nb_audio_elements = 0, nb_mix_presentations = 0; + int ret; + + for (int i = 0; i < s->nb_stream_groups; i++) { + const AVStreamGroup *stg = s->stream_groups[i]; + + if (stg->type == AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT) + nb_audio_elements++; + if (stg->type == AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION) + nb_mix_presentations++; + } + + if (!nb_audio_elements && !nb_mix_presentations) + return 0; + + if ((nb_audio_elements < 1 && nb_audio_elements > 2) || nb_mix_presentations < 1) { + av_log(s, AV_LOG_ERROR, "There must be >= 1 and <= 2 IAMF_AUDIO_ELEMENT and at least " + "one IAMF_MIX_PRESENTATION stream groups to write a IMAF track\n"); + return AVERROR(EINVAL); + } + + track->iamf = av_mallocz(sizeof(*track->iamf)); + if (!track->iamf) + return AVERROR(ENOMEM); + + av_bprint_init(&bprint, 0, AV_BPRINT_SIZE_UNLIMITED); + + for (int i = 0; i < s->nb_stream_groups; i++) { + const AVStreamGroup *stg = s->stream_groups[i]; + + switch(stg->type) { + case AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT: + if (!first_st) + first_st = stg->streams[0]; + + for (int j = 0; j < stg->nb_streams; j++) { + av_bprintf(&bprint, "%d=%d%s", s->streams[j]->index, s->streams[j]->id, + j < (stg->nb_streams - 1) ? ":" : ""); + s->streams[j]->priv_data = track; + } + + ret = ff_iamf_add_audio_element(track->iamf, stg, s); + break; + case AV_STREAM_GROUP_PARAMS_IAMF_MIX_PRESENTATION: + ret = ff_iamf_add_mix_presentation(track->iamf, stg, s); + break; + default: + av_assert0(0); + } + if (ret < 0) + return ret; + } + + av_bprint_finalize(&bprint, &args); + + filter = av_bsf_get_by_name("iamf_stream_merge"); + if (!filter) + return AVERROR_BUG; + + ret = av_bsf_alloc(filter, &track->bsf); + if (ret < 0) + return ret; + + ret = avcodec_parameters_copy(track->bsf->par_in, first_st->codecpar); + if (ret < 0) + return ret; + + av_opt_set(track->bsf->priv_data, "index_mapping", args, 0); + av_opt_set_int(track->bsf->priv_data, "out_index", first_st->index, 0); + + track->tag = MKTAG('i','a','m','f'); + + return av_bsf_init(track->bsf); +} + static int mov_init(AVFormatContext *s) { MOVMuxContext *mov = s->priv_data; @@ -7101,7 +7252,37 @@ static int mov_init(AVFormatContext *s) s->streams[0]->disposition |= AV_DISPOSITION_DEFAULT; } - mov->nb_tracks = s->nb_streams; + for (i = 0; i < s->nb_stream_groups; i++) { + AVStreamGroup *stg = s->stream_groups[i]; + + if (stg->type != AV_STREAM_GROUP_PARAMS_IAMF_AUDIO_ELEMENT) + continue; + + for (int j = 0; j < stg->nb_streams; j++) { + AVStream *st = stg->streams[j]; + + if (st->priv_data) { + av_log(s, AV_LOG_ERROR, "Stream %d is present in more than one Stream Group of type " + "IAMF Audio Element\n", j); + return AVERROR(EINVAL); + } + st->priv_data = st; + } + + if (!mov->nb_tracks) // We support one track for the entire IAMF structure + mov->nb_tracks++; + } + + for (i = 0; i < s->nb_streams; i++) { + AVStream *st = s->streams[i]; + if (st->priv_data) + continue; + st->priv_data = st; + mov->nb_tracks++; + } + + mov->nb_streams = mov->nb_tracks; + if (mov->mode & (MODE_MP4|MODE_MOV|MODE_IPOD) && s->nb_chapters) mov->chapter_track = mov->nb_tracks++; @@ -7127,7 +7308,7 @@ static int mov_init(AVFormatContext *s) if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO && (t || (t=av_dict_get(st->metadata, "timecode", NULL, 0)))) { AVTimecode tc; - ret = mov_check_timecode_track(s, &tc, i, t->value); + ret = mov_check_timecode_track(s, &tc, st, t->value); if (ret >= 0) mov->nb_meta_tmcd++; } @@ -7176,18 +7357,33 @@ static int mov_init(AVFormatContext *s) } } + ret = mov_init_iamf_track(s); + if (ret < 0) + return ret; + + for (int j = 0, i = 0; j < s->nb_streams; j++) { + AVStream *st = s->streams[j]; + + if (st != st->priv_data) + continue; + st->priv_data = &mov->tracks[i++]; + } + for (i = 0; i < s->nb_streams; i++) { AVStream *st= s->streams[i]; - MOVTrack *track= &mov->tracks[i]; + MOVTrack *track = st->priv_data; AVDictionaryEntry *lang = av_dict_get(st->metadata, "language", NULL,0); - track->st = st; - track->par = st->codecpar; + if (!track->st) { + track->st = st; + track->par = st->codecpar; + } track->language = ff_mov_iso639_to_lang(lang?lang->value:"und", mov->mode!=MODE_MOV); if (track->language < 0) track->language = 32767; // Unspecified Macintosh language code track->mode = mov->mode; - track->tag = mov_find_codec_tag(s, track); + if (!track->tag) + track->tag = mov_find_codec_tag(s, track); if (!track->tag) { av_log(s, AV_LOG_ERROR, "Could not find tag for codec %s in stream #%d, " "codec not currently supported in container\n", @@ -7378,25 +7574,26 @@ static int mov_write_header(AVFormatContext *s) { AVIOContext *pb = s->pb; MOVMuxContext *mov = s->priv_data; - int i, ret, hint_track = 0, tmcd_track = 0, nb_tracks = s->nb_streams; + int i, ret, hint_track = 0, tmcd_track = 0, nb_tracks = mov->nb_streams; if (mov->mode & (MODE_MP4|MODE_MOV|MODE_IPOD) && s->nb_chapters) nb_tracks++; if (mov->flags & FF_MOV_FLAG_RTP_HINT) { hint_track = nb_tracks; - for (i = 0; i < s->nb_streams; i++) - if (rtp_hinting_needed(s->streams[i])) + for (i = 0; i < mov->nb_streams; i++) { + if (rtp_hinting_needed(mov->tracks[i].st)) nb_tracks++; + } } if (mov->nb_meta_tmcd) tmcd_track = nb_tracks; - for (i = 0; i < s->nb_streams; i++) { + for (i = 0; i < mov->nb_streams; i++) { int j; - AVStream *st= s->streams[i]; - MOVTrack *track= &mov->tracks[i]; + MOVTrack *track = &mov->tracks[i]; + AVStream *st = track->st; /* copy extradata if it exists */ if (st->codecpar->extradata_size) { @@ -7418,8 +7615,8 @@ static int mov_write_header(AVFormatContext *s) &(AVChannelLayout)AV_CHANNEL_LAYOUT_MONO)) continue; - for (j = 0; j < s->nb_streams; j++) { - AVStream *stj= s->streams[j]; + for (j = 0; j < mov->nb_streams; j++) { + AVStream *stj= mov->tracks[j].st; MOVTrack *trackj= &mov->tracks[j]; if (j == i) continue; @@ -7482,8 +7679,8 @@ static int mov_write_header(AVFormatContext *s) return ret; if (mov->flags & FF_MOV_FLAG_RTP_HINT) { - for (i = 0; i < s->nb_streams; i++) { - if (rtp_hinting_needed(s->streams[i])) { + for (i = 0; i < mov->nb_streams; i++) { + if (rtp_hinting_needed(mov->tracks[i].st)) { if ((ret = ff_mov_init_hinting(s, hint_track, i)) < 0) return ret; hint_track++; @@ -7495,8 +7692,8 @@ static int mov_write_header(AVFormatContext *s) const AVDictionaryEntry *t, *global_tcr = av_dict_get(s->metadata, "timecode", NULL, 0); /* Initialize the tmcd tracks */ - for (i = 0; i < s->nb_streams; i++) { - AVStream *st = s->streams[i]; + for (i = 0; i < mov->nb_streams; i++) { + AVStream *st = mov->tracks[i].st; t = global_tcr; if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) { @@ -7505,7 +7702,7 @@ static int mov_write_header(AVFormatContext *s) t = av_dict_get(st->metadata, "timecode", NULL, 0); if (!t) continue; - if (mov_check_timecode_track(s, &tc, i, t->value) < 0) + if (mov_check_timecode_track(s, &tc, st, t->value) < 0) continue; if ((ret = mov_create_timecode_track(s, tmcd_track, i, tc)) < 0) return ret; @@ -7626,7 +7823,7 @@ static int mov_write_trailer(AVFormatContext *s) int64_t moov_pos; if (mov->need_rewrite_extradata) { - for (i = 0; i < s->nb_streams; i++) { + for (i = 0; i < mov->nb_streams; i++) { MOVTrack *track = &mov->tracks[i]; AVCodecParameters *par = track->par; @@ -7766,7 +7963,7 @@ static int avif_write_trailer(AVFormatContext *s) if (mov->moov_written) return 0; mov->is_animated_avif = s->streams[0]->nb_frames > 1; - if (mov->is_animated_avif && s->nb_streams > 1) { + if (mov->is_animated_avif && mov->nb_streams > 1) { // For animated avif with alpha channel, we need to write a tref tag // with type "auxl". mov->tracks[1].tref_tag = MKTAG('a', 'u', 'x', 'l'); @@ -7776,7 +7973,7 @@ static int avif_write_trailer(AVFormatContext *s) mov_write_meta_tag(pb, mov, s); moov_size = get_moov_size(s); - for (i = 0; i < s->nb_streams; i++) + for (i = 0; i < mov->nb_tracks; i++) mov->tracks[i].data_offset = avio_tell(pb) + moov_size + 8; if (mov->is_animated_avif) { @@ -7798,7 +7995,7 @@ static int avif_write_trailer(AVFormatContext *s) // write extent offsets. pos_backup = avio_tell(pb); - for (i = 0; i < s->nb_streams; i++) { + for (i = 0; i < mov->nb_streams; i++) { if (extent_offsets[i] != (uint32_t)extent_offsets[i]) { av_log(s, AV_LOG_ERROR, "extent offset does not fit in 32 bits\n"); return AVERROR_INVALIDDATA; diff --git a/libavformat/movenc.h b/libavformat/movenc.h index 60363198c9..fee3e759e0 100644 --- a/libavformat/movenc.h +++ b/libavformat/movenc.h @@ -25,7 +25,9 @@ #define AVFORMAT_MOVENC_H #include "avformat.h" +#include "iamf.h" #include "movenccenc.h" +#include "libavcodec/bsf.h" #include "libavcodec/packet_internal.h" #define MOV_FRAG_INFO_ALLOC_INCREMENT 64 @@ -170,6 +172,10 @@ typedef struct MOVTrack { unsigned int squash_fragment_samples_to_one; //< flag to note formats where all samples for a fragment are to be squashed PacketList squashed_packet_queue; + + AVBSFContext *bsf; + + IAMFContext *iamf; } MOVTrack; typedef enum { @@ -188,6 +194,7 @@ typedef struct MOVMuxContext { const AVClass *av_class; int mode; int64_t time; + int nb_streams; int nb_tracks; int nb_meta_tmcd; ///< number of new created tmcd track based on metadata (aka not data copy) int chapter_track; ///< qt chapter track number