Message ID | 20211208010649.381-1-jamrial@gmail.com |
---|---|
Headers | show |
Series | New channel layout API | expand |
On Wed, Dec 8, 2021 at 2:07 AM James Almer <jamrial@gmail.com> wrote: > This is an updated and rebased version of the API that was sent to this > mailing > list about two years ago. It expands it with some new helpers, implements > some > changes that allows further extensibility for new features down the line, > and > finishes porting all missing modules and those introduced since 2019. > > I'm sending a reduced amount of patches to not spam the ML too much. In > total > it's 279 patches, the bulk being one per module ported, which it's what i'm > skipping. > This reduced set will obviously not apply as is, so you can find the > entire set > in https://github.com/jamrial/FFmpeg/commits/channel_layout How are custom channel layouts defined? I remember that if they are defined by '+', some other need to be used to split multiple layouts when used at once.
James Almer (12021-12-07): > This is an updated and rebased version of the API that was sent to this mailing > list about two years ago. It expands it with some new helpers, implements some > changes that allows further extensibility for new features down the line, and > finishes porting all missing modules and those introduced since 2019. I see the concerns I raised last time have not been addressed: (1) the ability to have a channel at a certain location several times; (2) the ability to attach an arbitrary label to a channel or to a group of channels; (3) an API and syntax for the user to specify a particular channel. (1) is necessary for the amerge filter: merge two stereo streams, you get two FL and two FR. (1) is also necessary for devices that can record several sources simultaneously. IIRC, that is the case for the BlackMagick devices. If your devices records a band from two stereo and three mono microphones, we need two FL, two FR and three FC. (2) and (3) are necessary consequences of (1). I also have concerns with the signature of av_channel_description(): this is level 0 of string manipulation, we need to get past this. I should post again about this in a separate thread. Regards,
On 12/8/2021 7:55 AM, Nicolas George wrote: > James Almer (12021-12-07): >> This is an updated and rebased version of the API that was sent to this mailing >> list about two years ago. It expands it with some new helpers, implements some >> changes that allows further extensibility for new features down the line, and >> finishes porting all missing modules and those introduced since 2019. > > I see the concerns I raised last time have not been addressed: > > (1) the ability to have a channel at a certain location several times; > > (2) the ability to attach an arbitrary label to a channel or to a group > of channels; > > (3) an API and syntax for the user to specify a particular channel. > > (1) is necessary for the amerge filter: merge two stereo streams, you > get two FL and two FR. > > (1) is also necessary for devices that can record several sources > simultaneously. IIRC, that is the case for the BlackMagick devices. If > your devices records a band from two stereo and three mono microphones, > we need two FL, two FR and three FC. > > (2) and (3) are necessary consequences of (1). > > I also have concerns with the signature of av_channel_description(): > this is level 0 of string manipulation, we need to get past this. I > should post again about this in a separate thread. What is wrong with it? All the functions returning a string in this API use the same signature. You pass it a pre-allocated buffer and it's filled with the string, truncating it if there's not enough space and letting the user know about it. I recall you were against dynamic allocation of the string, which the user then had to free, so this was the alternative. I could make it use a user provided AVBprint buffer instead of using one internally, if you think that's better, but since you planned to replace that with AVWriter i figured using AVBprint in the signature would mean an eventual deprecation and removal. > > Regards, >
Quoting Nicolas George (2021-12-08 11:55:40) > James Almer (12021-12-07): > > This is an updated and rebased version of the API that was sent to this mailing > > list about two years ago. It expands it with some new helpers, implements some > > changes that allows further extensibility for new features down the line, and > > finishes porting all missing modules and those introduced since 2019. > > I see the concerns I raised last time have not been addressed: > > (1) the ability to have a channel at a certain location several times; I have no idea what you mean, the CUSTOM order can have any channel at any location. This was true in all versions of this API back to 2013. > > (2) the ability to attach an arbitrary label to a channel or to a group > of channels; > > (3) an API and syntax for the user to specify a particular channel. > > (1) is necessary for the amerge filter: merge two stereo streams, you > get two FL and two FR. > > (1) is also necessary for devices that can record several sources > simultaneously. IIRC, that is the case for the BlackMagick devices. If > your devices records a band from two stereo and three mono microphones, > we need two FL, two FR and three FC. Multiplexing multiple streams into a single AVFrame is not a valid use case. Just use multiple streams.
James Almer (12021-12-08): > What is wrong with it? All the functions returning a string in this API use > the same signature. You pass it a pre-allocated buffer and it's filled with > the string, truncating it if there's not enough space and letting the user > know about it. > I recall you were against dynamic allocation of the string, which the user > then had to free, so this was the alternative. There is nothing wrong with it, just as there is nothing wrong with wearing pelts and a loincloth, but modern clothes are much more comfortable. In this case, there is a little usability issue: this description is almost always very short but can be arbitrarily long, handling that properly in the caller is very annoying: start with a small buffer on the stack, try the conversion, if it fails start allocating a bigger buffer on the heap, etc. > I could make it use a user provided AVBprint buffer instead of using one > internally, if you think that's better, but since you planned to replace > that with AVWriter i figured using AVBprint in the signature would mean an > eventual deprecation and removal. We are on the same page on this. A memory allocation would be a big no here, because it can happen once every frame. We have implemented pools for frames and buffers, we do consider once-per-frame to be frequent enough to warrant code to avoid allocations. My preferred outcome would be that we apply AVWriter before this series, and using it here. The idea would be to start using AVWriter everywhere we return some kind of string: AVWriter in one or two places is crap, but the more we use it, the more its benefits outweigh the costs. I will start a new discussion on string API soon, and since we do not really disagree here but the rest will take some time, we can continue discussing this later. Regards,
Anton Khirnov (12021-12-08): > I have no idea what you mean, the CUSTOM order can have any channel at > any location. This was true in all versions of this API back to 2013. Ok, my memories were muddy, now I remember better: it is possible, but without (2) and (3) it is unusable: there is no point in having two LFE channels if the user cannot specify "the second one" or know which one is which. > Multiplexing multiple streams into a single AVFrame is not a valid use > case. Just use multiple streams. We get to decide what is a valid use case and what is not. And in this case, since the devices and filter I quoted already behave like that, I posit that it IS a valid use case. Therefore, this new API must be capable of handling them. Regards,
On Wed, Dec 8, 2021 at 4:09 PM Nicolas George <george@nsup.org> wrote: > Anton Khirnov (12021-12-08): > > I have no idea what you mean, the CUSTOM order can have any channel at > > any location. This was true in all versions of this API back to 2013. > > Ok, my memories were muddy, now I remember better: it is possible, but > without (2) and (3) it is unusable: there is no point in having two LFE > channels if the user cannot specify "the second one" or know which one > is which. > > > Multiplexing multiple streams into a single AVFrame is not a valid use > > case. Just use multiple streams. > > We get to decide what is a valid use case and what is not. And in this > case, since the devices and filter I quoted already behave like that, I > posit that it IS a valid use case. Therefore, this new API must be > capable of handling them. > Flawed logic. > > Regards, > > -- > Nicolas George > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
Quoting Nicolas George (2021-12-08 16:08:54) > Anton Khirnov (12021-12-08): > We get to decide what is a valid use case and what is not. And in this > case, since the devices and filter I quoted already behave like that, It is not possible for them to behave "like that", because our current channel layout API does not support duplicated channels at all. > I posit that it IS a valid use case. Therefore, this new API must be > capable of handling them. I disagree. Technical limitations that were overcome 10 years ago should not guide new API design. If you insist then we can have a TC vote about this.
Anton Khirnov (12021-12-09): > It is not possible for them to behave "like that", because our current > channel layout API does not support duplicated channels at all. They behave like in the sense they output channels from several sources in a single stream of AVFrame. They could not properly label it with a layout. The new API should allow to enhance that, not break it utterly. > I disagree. Technical limitations that were overcome 10 years ago should > not guide new API design. In the case of amerge, it was not a technical limitation, merging several streams into one so that they can be handled by single-stream filters is 100% part of the design. I suspect devices that capture several independent channels are designed that way intentionally too, possibly to reduce the risk of desynchronization. > If you insist then we can have a TC vote about this. Please, the TC is for when discussion has failed. Let us see what arguments other developers bring to the discussion first. I observe that about the ability to attach an arbitrary string label to any channel, Lynne was with me for independent reasons. And of course, if it comes to the TC, since you are a member yourself, I expect you to recuse yourself from the proceedings on this question, as you would be judge and party. Regards,
Quoting Nicolas George (2021-12-09 11:31:54) > Anton Khirnov (12021-12-09): > > I disagree. Technical limitations that were overcome 10 years ago should > > not guide new API design. > > In the case of amerge, it was not a technical limitation, merging > several streams into one so that they can be handled by single-stream > filters is 100% part of the design. I fail to see how that is an advantage. You can just as well create multiple instances of those single-stream filters instead of adding hacks into core APIs. > I suspect devices that capture several independent channels are > designed that way intentionally too, possibly to reduce the risk of > desynchronization. "possibly" is not a strong enough argument. I'd like to hear at least one clearly-defined use case that cannot just as well be handled by using multiple streams.
Anton Khirnov (12021-12-09): > I fail to see how that is an advantage. You can just as well create > multiple instances of those single-stream filters instead of adding > hacks into core APIs. Please think a little further: multiple instances of the single-stream filters would not have access to all the channels. > "possibly" is not a strong enough argument. I'd like to hear at least > one clearly-defined use case that cannot just as well be handled by > using multiple streams. This was a discussion for when the device was implemented. Now, it works that way, the new API has to accommodate it. Anyway, these are just two example. I am sure we could find other examples easily if we tried. It would be pretty stupid of us to add a new API that is barely better than the current one and that we know is too limited for likely use cases. Regards,
Quoting Nicolas George (2021-12-09 14:52:40) > Anton Khirnov (12021-12-09): > > I fail to see how that is an advantage. You can just as well create > > multiple instances of those single-stream filters instead of adding > > hacks into core APIs. > > Please think a little further: multiple instances of the single-stream > filters would not have access to all the channels. > > > "possibly" is not a strong enough argument. I'd like to hear at least > > one clearly-defined use case that cannot just as well be handled by > > using multiple streams. > > This was a discussion for when the device was implemented. Now, it works > that way, the new API has to accommodate it. > > Anyway, these are just two example. I am sure we could find other > examples easily if we tried. It would be pretty stupid of us to add a > new API that is barely better than the current one and that we know is > too limited for likely use cases. I see you repeating the same two arguments: - it was implemented like this in the past and therefore must keep working exactly the same - it might be useful under some vaguely specified conditions Neither of these strikes me as a good enough reason to make major changes to the API design. So again - can you describe a clearly-defined use case that cannot just as well be handled by using multiple streams? Empasis on "clearly defined", so not "I am sure we can find examples". I would like to hear some of those examples. So far you have not provided any.
Anton Khirnov (12021-12-09): > I see you repeating the same two arguments: > - it was implemented like this in the past and therefore must keep > working exactly the same > - it might be useful under some vaguely specified conditions > > Neither of these strikes me as a good enough reason to make major > changes to the API design. I will turn this argument back on you: you have designed your API so that it is too limited, it does not strike me as a good reason to make major changes in existing filters and devices that have given satisfaction to users for years. Regards,
9 Dec 2021, 15:24 by george@nsup.org: > Anton Khirnov (12021-12-09): > >> I see you repeating the same two arguments: >> - it was implemented like this in the past and therefore must keep >> working exactly the same >> - it might be useful under some vaguely specified conditions >> >> Neither of these strikes me as a good enough reason to make major >> changes to the API design. >> > > I will turn this argument back on you: you have designed your API so > that it is too limited, it does not strike me as a good reason to make > major changes in existing filters and devices that have given > satisfaction to users for years. > As a compromise, could we specify that while having multple channels with the same ID in a single frame can happen and can be generated by decoders, we would also specify that they possibly won't be treated correctly by encoders and filters, and could be outright dropped with a warning if unsupported. I can see why having multiple channels with the same ID can happen, an in fact, will, for custom user layouts with more channels than there are IDs. For example, an Opus stream containing a hundred or so channels from multiple overlapping locations from a venue. Each of those channels would have to have an ID of NONE, because the codec mapping family doesn't carry such information for such a configuration.
On Thu, Dec 9, 2021 at 3:42 PM Lynne <dev@lynne.ee> wrote: > > 9 Dec 2021, 15:24 by george@nsup.org: > > > Anton Khirnov (12021-12-09): > > > >> I see you repeating the same two arguments: > >> - it was implemented like this in the past and therefore must keep > >> working exactly the same > >> - it might be useful under some vaguely specified conditions > >> > >> Neither of these strikes me as a good enough reason to make major > >> changes to the API design. > >> > > > > I will turn this argument back on you: you have designed your API so > > that it is too limited, it does not strike me as a good reason to make > > major changes in existing filters and devices that have given > > satisfaction to users for years. > > > > As a compromise, could we specify that while having multple > channels with the same ID in a single frame can happen and > can be generated by decoders, we would also specify that they > possibly won't be treated correctly by encoders and filters, and > could be outright dropped with a warning if unsupported. > Actually thats the worst part of it, and I would be happy to not have to think about that, as an API user. What kind of sense does a frame make that contains the same channel twice? What am I ever supposed to do with that? It sounds to me like some kind of theoretical design flaw is trying to be solved at the wrong point, instead of clearly separating streams, they are supposed to live together but somehow still be separate? That just sounds like a hack to me. Two streams are two streams, not one stream with somehow duplicated channels. - Hendrik
Hendrik Leppkes (12021-12-09): > What kind of sense does a frame make that contains the same channel > twice? Imagine an orchestra recording: Orchestra, front left Orchestra, front center Orchestra, front right Orchestra, low freq Orchestra, rear left Orchestra, rear center Winds, left Winds, right Percussions, left Percussions, right Strings, left Strings, right We cannot have enums for winds, percussions and strings, but we should be able to label them properly left and right, and attach a string label to each. > What am I ever supposed to do with that? Apply remixing matrices on channels from different parts of the orchestra. Let a very smart codec detect correlations, and therefore solutions for better compression, between one section and another. Regards,
Quoting Lynne (2021-12-09 15:42:42) > 9 Dec 2021, 15:24 by george@nsup.org: > > > Anton Khirnov (12021-12-09): > > > >> I see you repeating the same two arguments: > >> - it was implemented like this in the past and therefore must keep > >> working exactly the same > >> - it might be useful under some vaguely specified conditions > >> > >> Neither of these strikes me as a good enough reason to make major > >> changes to the API design. > >> > > > > I will turn this argument back on you: you have designed your API so > > that it is too limited, it does not strike me as a good reason to make > > major changes in existing filters and devices that have given > > satisfaction to users for years. > > > > As a compromise, could we specify that while having multple > channels with the same ID in a single frame can happen and > can be generated by decoders, we would also specify that they > possibly won't be treated correctly by encoders and filters, and > could be outright dropped with a warning if unsupported. That is pretty much already the case. I know there are files in the wild that have duplicated channels and the proposed API supports exporting this information. What I do _not_ want is treating such streams as first-class citizens that have to be fully supported by everything. They are a pathology and should be treated as such -- that is supported on input, but not output. > > I can see why having multiple channels with the > same ID can happen, an in fact, will, for custom user > layouts with more channels than there are IDs. > For example, an Opus stream containing a hundred or > so channels from multiple overlapping locations from a venue. > Each of those channels would have to have an ID of NONE, > because the codec mapping family doesn't carry such information > for such a configuration. NONE is intended to be an invalid value, but we can add AV_CHAN_UNKNOWN with a high id for such a case. Or we can reserve a range of ids for application-specific usage.
9 Dec 2021, 15:57 by anton@khirnov.net: > Quoting Lynne (2021-12-09 15:42:42) > >> 9 Dec 2021, 15:24 by george@nsup.org: >> >> > Anton Khirnov (12021-12-09): >> > >> >> I see you repeating the same two arguments: >> >> - it was implemented like this in the past and therefore must keep >> >> working exactly the same >> >> - it might be useful under some vaguely specified conditions >> >> >> >> Neither of these strikes me as a good enough reason to make major >> >> changes to the API design. >> >> >> > >> > I will turn this argument back on you: you have designed your API so >> > that it is too limited, it does not strike me as a good reason to make >> > major changes in existing filters and devices that have given >> > satisfaction to users for years. >> > >> >> As a compromise, could we specify that while having multple >> channels with the same ID in a single frame can happen and >> can be generated by decoders, we would also specify that they >> possibly won't be treated correctly by encoders and filters, and >> could be outright dropped with a warning if unsupported. >> > > That is pretty much already the case. I know there are files in the wild > that have duplicated channels and the proposed API supports exporting > this information. > > What I do _not_ want is treating such streams as first-class citizens > that have to be fully supported by everything. They are a pathology and > should be treated as such -- that is supported on input, but not output. > >> >> I can see why having multiple channels with the >> same ID can happen, an in fact, will, for custom user >> layouts with more channels than there are IDs. >> For example, an Opus stream containing a hundred or >> so channels from multiple overlapping locations from a venue. >> Each of those channels would have to have an ID of NONE, >> because the codec mapping family doesn't carry such information >> for such a configuration. >> > > NONE is intended to be an invalid value, but we can add AV_CHAN_UNKNOWN > with a high id for such a case. Or we can reserve a range of ids for > application-specific usage. > I'm fine with this. I think an AV_CHAN_UNKNOWN ID is needed pretty much anyway, and as for user IDs, maybe a single AV_CHAN_CUSTOM or AV_CHAN_USER_DEFINED. The user could then use the channel opaque field to store some info, such as an index into their frame->opaque_ref data with which they could store channel-specific offset. So I'm fine with your proposal to have 16-bit enum for the channel ID and a 16-bit opaque. Though I'd like the opaque to be an uint16_t instead of int opaque : 16. And 16-bits does sound like enough for many channels and quite a few flags, though the silent flag should be moved to 1 << 15 instead of 64, and any new flags could be added beneath so as to not conflict with channels.
Lynne (12021-12-09): > So I'm fine with your proposal to have 16-bit enum for the channel > ID and a 16-bit opaque. Though I'd like the opaque to be an > uint16_t instead of int opaque : 16. > And 16-bits does sound like enough for many channels and quite a > few flags, though the silent flag should be moved to 1 << 15 instead > of 64, and any new flags could be added beneath so as to not conflict > with channels. I insist: a tiny field like that is not enough, let us make it a whole string. Regards,
On Thu, Dec 9, 2021 at 3:57 PM Nicolas George <george@nsup.org> wrote: > > Hendrik Leppkes (12021-12-09): > > What kind of sense does a frame make that contains the same channel > > twice? > > Imagine an orchestra recording: > > Orchestra, front left > Orchestra, front center > Orchestra, front right > Orchestra, low freq > Orchestra, rear left > Orchestra, rear center > Winds, left > Winds, right > Percussions, left > Percussions, right > Strings, left > Strings, right > > We cannot have enums for winds, percussions and strings, but we should > be able to label them properly left and right, and attach a string label > to each. > It sounds like thats object audio, and should be driven separately, and not by the legacy channel identifiers. - Hendrik
Hendrik Leppkes (12021-12-09): > It sounds like thats object audio, and should be driven separately, > and not by the legacy channel identifiers. This is not exclusive. We have code that works with channel layouts right now, and it is important that the extensions work gracefully. For example, if I add a filter to just extract the pair of channels related to the strings section, we want that its output is properly marked as left and right. Regards,
On Thu, Dec 9, 2021 at 5:26 PM Nicolas George <george@nsup.org> wrote: > Hendrik Leppkes (12021-12-09): > > It sounds like thats object audio, and should be driven separately, > > and not by the legacy channel identifiers. > > This is not exclusive. We have code that works with channel layouts > right now, and it is important that the extensions work gracefully. > > For example, if I add a filter to just extract the pair of channels > related to the strings section, we want that its output is properly > marked as left and right. > > Why would it stop work with new API in place? > Regards, > > -- > Nicolas George > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Thu, 9 Dec 2021, Anton Khirnov wrote: > Quoting Nicolas George (2021-12-09 11:31:54) >> Anton Khirnov (12021-12-09): >>> I disagree. Technical limitations that were overcome 10 years ago should >>> not guide new API design. >> >> In the case of amerge, it was not a technical limitation, merging >> several streams into one so that they can be handled by single-stream >> filters is 100% part of the design. > > I fail to see how that is an advantage. You can just as well create > multiple instances of those single-stream filters instead of adding > hacks into core APIs. > >> I suspect devices that capture several independent channels are >> designed that way intentionally too, possibly to reduce the risk of >> desynchronization. > > "possibly" is not a strong enough argument. I'd like to hear at least > one clearly-defined use case that cannot just as well be handled by > using multiple streams. I recently worked on the MXF demuxer to recognize channel designations in MXF files, and in MXF the designation and grouping of the channels is completely separate from the track those channels are muxed in. So if you have e.g. english stereo sound, and german stereo sound you can mux it - as a single 4 channel track - as two 2 channel tracks - as four 1 channel tracks. Some MXF flavors use the multichannel single track approach, others the mono-track-only approach. So the user may not be able to choose the optimal muxed track assignment... So ultimately, if you demux and decode a packet from a track, you will have an AVFrame, which might contain a single sound group on its own (optimal case), part of a sound group, or multiple sound groups. To summerize, muxed tracks are not necessarily 1:1 mapping between sound groups. And when processing/filtering audio, you typically want sound groups, not tracks. And yes, it is very rare to have a soundgroup which has channels with the same designation, but for a muxed track, it depends on the format. The goal of the end user is probably to be able to specify sound groups, not select muxed tracks. Preferably a demuxer should provide which channel is part of which sound group, and you should be able to use a filter or a combination of filters select a specific sound group. E.g. amerge all tracks, then keep only the channels from all the merged channels which are part of a specific sound group. Regards, Marton
On Thu, Dec 09, 2021 at 04:47:48PM +0100, Lynne wrote: [...] > So I'm fine with your proposal to have 16-bit enum for the channel > ID and a 16-bit opaque. Though I'd like the opaque to be an > uint16_t instead of int opaque : 16. > And 16-bits does sound like enough for many channels and quite a > few flags, though the silent flag should be moved to 1 << 15 instead > of 64, and any new flags could be added beneath so as to not conflict > with channels. in how many cases where we use flags have 16 been enough ? thx [...]
Quoting Marton Balint (2021-12-10 01:04:57) > > > On Thu, 9 Dec 2021, Anton Khirnov wrote: > > > Quoting Nicolas George (2021-12-09 11:31:54) > >> Anton Khirnov (12021-12-09): > >>> I disagree. Technical limitations that were overcome 10 years ago should > >>> not guide new API design. > >> > >> In the case of amerge, it was not a technical limitation, merging > >> several streams into one so that they can be handled by single-stream > >> filters is 100% part of the design. > > > > I fail to see how that is an advantage. You can just as well create > > multiple instances of those single-stream filters instead of adding > > hacks into core APIs. > > > >> I suspect devices that capture several independent channels are > >> designed that way intentionally too, possibly to reduce the risk of > >> desynchronization. > > > > "possibly" is not a strong enough argument. I'd like to hear at least > > one clearly-defined use case that cannot just as well be handled by > > using multiple streams. > > I recently worked on the MXF demuxer to recognize channel designations in > MXF files, and in MXF the designation and grouping of the channels is > completely separate from the track those channels are muxed in. > > So if you have e.g. english stereo sound, and german stereo sound you > can mux it > - as a single 4 channel track > - as two 2 channel tracks > - as four 1 channel tracks. > Some MXF flavors use the multichannel single track approach, others the > mono-track-only approach. So the user may not be able to choose the > optimal muxed track assignment... > > So ultimately, if you demux and decode a packet from a track, you will > have an AVFrame, which might contain a single sound group on its own > (optimal case), part of a sound group, or multiple sound groups. > > To summerize, muxed tracks are not necessarily 1:1 mapping between sound > groups. And when processing/filtering audio, you typically want sound > groups, not tracks. And yes, it is very rare to have a soundgroup which > has channels with the same designation, but for a muxed track, it depends > on the format. > > The goal of the end user is probably to be able to specify sound groups, > not select muxed tracks. Preferably a demuxer should provide which channel > is part of which sound group, and you should be able to use a filter or a > combination of filters select a specific sound group. E.g. amerge all > tracks, then keep only the channels from all the merged channels which are > part of a specific sound group. So what are you proposing? In my view, such higher level information should live at a higher level - e.g. in the side data. You can then have a filter that reads this side data and gets you the group you want.
Anton Khirnov (12021-12-12): > So what are you proposing? In my view, such higher level information > should live at a higher level - e.g. in the side data. You can then > have a filter that reads this side data and gets you the group you want. So, what is the point of this new API if anything of value needs to be done with side data and yet another API? It seems to me you are indulging in a sunken-costs fallacy: you wrote this API and all the code but you neglected to poll your fellow developers for needs that it should cover, and as a result got something much too limited. But now, you are trying to force it anyway. What I propose is: (1) define the needs properly; (2) redesign the API; (3) see how much code can still be used. The needs as far as I can see, please add to the list: A. Allow the same channel to appear several time in the layout. Hendrik agreed that it was useful for some kind of USER_SPECIFIED or UNKNOWN channel specification, but allowing for any channel specification is actually simpler. It is not limited to just having the same channel in the list, it requires API and user interface support: the API must be able to tell the user "this USER_SPECIFIED channel is the oboe, this USER_SPECIFIED is the piano", and the user must be able to tell the API "the second USER_SPECIFIED channel" or "the USER_SPECIFIED channel relating to the piano". B. Possibly, I do not personally insist on it like A: groups of channels.
On Sun, 12 Dec 2021, Anton Khirnov wrote: > Quoting Marton Balint (2021-12-10 01:04:57) >> >> >> On Thu, 9 Dec 2021, Anton Khirnov wrote: >> >>> Quoting Nicolas George (2021-12-09 11:31:54) >>>> Anton Khirnov (12021-12-09): >>>>> I disagree. Technical limitations that were overcome 10 years ago should >>>>> not guide new API design. >>>> >>>> In the case of amerge, it was not a technical limitation, merging >>>> several streams into one so that they can be handled by single-stream >>>> filters is 100% part of the design. >>> >>> I fail to see how that is an advantage. You can just as well create >>> multiple instances of those single-stream filters instead of adding >>> hacks into core APIs. >>> >>>> I suspect devices that capture several independent channels are >>>> designed that way intentionally too, possibly to reduce the risk of >>>> desynchronization. >>> >>> "possibly" is not a strong enough argument. I'd like to hear at least >>> one clearly-defined use case that cannot just as well be handled by >>> using multiple streams. >> >> I recently worked on the MXF demuxer to recognize channel designations in >> MXF files, and in MXF the designation and grouping of the channels is >> completely separate from the track those channels are muxed in. >> >> So if you have e.g. english stereo sound, and german stereo sound you >> can mux it >> - as a single 4 channel track >> - as two 2 channel tracks >> - as four 1 channel tracks. >> Some MXF flavors use the multichannel single track approach, others the >> mono-track-only approach. So the user may not be able to choose the >> optimal muxed track assignment... >> >> So ultimately, if you demux and decode a packet from a track, you will >> have an AVFrame, which might contain a single sound group on its own >> (optimal case), part of a sound group, or multiple sound groups. >> >> To summerize, muxed tracks are not necessarily 1:1 mapping between sound >> groups. And when processing/filtering audio, you typically want sound >> groups, not tracks. And yes, it is very rare to have a soundgroup which >> has channels with the same designation, but for a muxed track, it depends >> on the format. >> >> The goal of the end user is probably to be able to specify sound groups, >> not select muxed tracks. Preferably a demuxer should provide which channel >> is part of which sound group, and you should be able to use a filter or a >> combination of filters select a specific sound group. E.g. amerge all >> tracks, then keep only the channels from all the merged channels which are >> part of a specific sound group. > > So what are you proposing? In my view, such higher level information > should live at a higher level - e.g. in the side data. You can then > have a filter that reads this side data and gets you the group you want. Does not look that simple to use side data for everything, because when you have a frame in a filter, you already should have configured the output channel layout... So unless you pass side data to filters somehow before initialization, it does not help you. Regards, Marton
On 12/12/2021 5:00 PM, Nicolas George wrote: > Anton Khirnov (12021-12-12): >> So what are you proposing? In my view, such higher level information >> should live at a higher level - e.g. in the side data. You can then >> have a filter that reads this side data and gets you the group you want. > > So, what is the point of this new API if anything of value needs to be > done with side data and yet another API? > > It seems to me you are indulging in a sunken-costs fallacy: you wrote > this API and all the code but you neglected to poll your fellow > developers for needs that it should cover, and as a result got something > much too limited. But now, you are trying to force it anyway. > > What I propose is: > > (1) define the needs properly; > > (2) redesign the API; > > (3) see how much code can still be used. > > The needs as far as I can see, please add to the list: > > A. Allow the same channel to appear several time in the layout. Hendrik > agreed that it was useful for some kind of USER_SPECIFIED or UNKNOWN > channel specification, but allowing for any channel specification is > actually simpler. > > It is not limited to just having the same channel in the list, it > requires API and user interface support: the API must be able to tell > the user "this USER_SPECIFIED channel is the oboe, this USER_SPECIFIED > is the piano", and the user must be able to tell the API "the second > USER_SPECIFIED channel" or "the USER_SPECIFIED channel relating to the > piano". To achieve this you don't need the same AVChannel value to appear several times in the same layout. You have INT_MAX values available, so just assign one to each of these you mentioned. No need for an abstract value "user defined" that would then show up several times in a layout. Oboe can be 65, piano can be 66. Also, each channel is meant to map to a speaker in a different physical location. If your idea is to have oboe and piano play through the same speaker, then you're thinking filtering, and that sounds beyond the scope of a channel layout API. The user interface part to query and tell non standard AVChannel values apart in a human readable way is a different thing. That would probably require giving each channel a user defined name in some form. > > B. Possibly, I do not personally insist on it like A: groups of > channels. > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
James Almer (12021-12-13): > To achieve this you don't need the same AVChannel value to appear several > times in the same layout. You have INT_MAX values available, so just assign > one to each of these you mentioned. No need for an abstract value "user > defined" that would then show up several times in a layout. Oboe can be 65, > piano can be 66. > Also, each channel is meant to map to a speaker in a different physical > location. If your idea is to have oboe and piano play through the same The idea is that if some filters takes the oboe and piano from a single stream and extracts them into two streams, then the left and right will automatically be correctly marked left and right in the extracted streams. > speaker, then you're thinking filtering, and that sounds beyond the scope of > a channel layout API. Making sure existing filters and devices can flag the channel layout they need, which they could not because of the limitation of the current API, seems 100% in scope to me. What is your definition of the scope of a channel layout API? I hope it is not "we did it that way, anything it cannot do is beyond the scope". Regards,
On 12/13/2021 8:36 PM, Nicolas George wrote: > James Almer (12021-12-13): >> To achieve this you don't need the same AVChannel value to appear several >> times in the same layout. You have INT_MAX values available, so just assign >> one to each of these you mentioned. No need for an abstract value "user >> defined" that would then show up several times in a layout. Oboe can be 65, >> piano can be 66. >> Also, each channel is meant to map to a speaker in a different physical >> location. If your idea is to have oboe and piano play through the same > > The idea is that if some filters takes the oboe and piano from a single > stream and extracts them into two streams, then the left and right will > automatically be correctly marked left and right in the extracted > streams. Can't user defined names let you do that without the need to reuse the same AVChannel value in a given layout? And for that matter, can you not set said names within the filter itself and not in the layout? > >> speaker, then you're thinking filtering, and that sounds beyond the scope of >> a channel layout API. > > Making sure existing filters and devices can flag the channel layout > they need, which they could not because of the limitation of the current > API, seems 100% in scope to me. > > What is your definition of the scope of a channel layout API? I hope it > is not "we did it that way, anything it cannot do is beyond the scope". Mapping a channel (each of the data pointers in a frame) to an specific output, like a speaker. > > Regards, > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Quoting Marton Balint (2021-12-13 23:47:22) > On Sun, 12 Dec 2021, Anton Khirnov wrote: > > > > So what are you proposing? In my view, such higher level information > > should live at a higher level - e.g. in the side data. You can then > > have a filter that reads this side data and gets you the group you want. > > Does not look that simple to use side data for everything, because when > you have a frame in a filter, you already should have configured the > output channel layout... So unless you pass side data to filters somehow > before initialization, it does not help you. I don't see a big problem in adding stream (i.e. link-)-level side data to avfilter. We already do this for hwcontexts (which btw need to be better integrated into format negotiation). It also seems to me that video properties like stereo3D or spherical mapping are analogous to your MXF use case and they exist in side data.
James Almer (12021-12-13): > Can't user defined names let you do that without the need to reuse the same > AVChannel value in a given layout? And for that matter, can you not set said > names within the filter itself and not in the layout? I do not see how they could. But as always, "I do not see how" is not an argument, it only points to my limitations. So if you do know, please let me know. But I suspect we are talking at cross-purposes here. You say “the filter”, singular, but the issue is not an isolated filter, the issue is about several components working together in harmony. For example a demuxer that knows about exotic layouts and labels such, but later a filter that does not and only knows about standard channel identifiers. > Mapping a channel (each of the data pointers in a frame) to an specific > output, like a speaker. I think this definition is too narrow. Currently, channel layouts are also used to compute matrix coefficients for remixing, for example. But even if we adopt this definition, mapping (oboe left, oboe right, piano left, piano right) onto stereo is perfectly in scope. Regards,
Anton Khirnov (12021-12-14): > I don't see a big problem in adding stream (i.e. link-)-level side data > to avfilter. It is not a problem. In fact, if we were to stay with “uint64_t channel_layout”, that is probably what we would do. But we are discussing a new API that could be very much exactly what we need here. Again: if it does not help in these kind of cases, what is the point of the proposed API? Let us either keep “uint64_t channel_layout” or move to something really significantly more powerful. But as it is, it is a waste of effort. > We already do this for hwcontexts (which btw need to be > better integrated into format negotiation). Interesting remark. I do not know hwcontexts in lavfi enough. Please elaborate if you can. But anyway, we are not currently discussing a rework of the whole hwcontext API. And if we were, and somebody suggested we leave two thirds of the uses cases for something else in side data, I would object to.
On 12/14/2021 10:03 AM, Nicolas George wrote: > James Almer (12021-12-13): >> Can't user defined names let you do that without the need to reuse the same >> AVChannel value in a given layout? And for that matter, can you not set said >> names within the filter itself and not in the layout? > > I do not see how they could. But as always, "I do not see how" is not an > argument, it only points to my limitations. So if you do know, please > let me know. You have a stream with four channels, you set up a custom order layout where you pick any four unused ids and call them OboeFL, OboeFR, PianoFL, PianoFR in u.map[], then split the streams (retaining strings and ids), pass them to channelmap using channelmap=map=OboeFL-FL|OboeFR-FR once support is added an so on. Or am i missing something? > > But I suspect we are talking at cross-purposes here. You say “the > filter”, singular, but the issue is not an isolated filter, the issue is > about several components working together in harmony. For example a > demuxer that knows about exotic layouts and labels such, but later a > filter that does not and only knows about standard channel identifiers. > >> Mapping a channel (each of the data pointers in a frame) to an specific >> output, like a speaker. > > I think this definition is too narrow. Currently, channel layouts are > also used to compute matrix coefficients for remixing, for example. > > But even if we adopt this definition, mapping (oboe left, oboe right, > piano left, piano right) onto stereo is perfectly in scope. > > Regards, > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
James Almer (12021-12-14): > You have a stream with four channels, you set up a custom order layout where > you pick any four unused ids and call them OboeFL, OboeFR, PianoFL, PianoFR > in u.map[], then split the streams (retaining strings and ids), pass them to > channelmap using channelmap=map=OboeFL-FL|OboeFR-FR once support is added an > so on. Or am i missing something? You are thinking too small. You are letting the users fend for themselves, while we could make sure that everything is automatic. If we can, then we should. What I want is this: if the demuxer knows the label, then the user can write: pan='stereo=0.3*.oboe+0.7*.piano' and the system automatically pick the channels from the oboe and the piano and figure out the matrix (if the oboe is recorded in mono for example). Regards,