diff mbox series

[FFmpeg-devel,1/2,v2] avutil: add a Tile Grid API

Message ID 20240120220407.64141-1-jamrial@gmail.com
State New
Headers show
Series [FFmpeg-devel,1/2,v2] avutil: add a Tile Grid API | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

James Almer Jan. 20, 2024, 10:04 p.m. UTC
This includes a struct and helpers. It will be used to support container level
cropping and tiled image formats, but should be generic enough for general
usage.

Signed-off-by: James Almer <jamrial@gmail.com>
---
Extended to include fields used for cropping. Should make the struct reusable
even for non tiled images, e.g. setting both rows and tiles to 1, in which case
tile width and height would become analogous to coded_{witdh,height}.

 libavutil/Makefile |   2 +
 libavutil/tile.c   |  81 +++++++++++++++++++
 libavutil/tile.h   | 188 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 271 insertions(+)
 create mode 100644 libavutil/tile.c
 create mode 100644 libavutil/tile.h

Comments

Anton Khirnov Jan. 21, 2024, 6:27 a.m. UTC | #1
Quoting James Almer (2024-01-20 23:04:06)
> This includes a struct and helpers. It will be used to support container level
> cropping and tiled image formats, but should be generic enough for general
> usage.
> 
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> Extended to include fields used for cropping. Should make the struct reusable
> even for non tiled images, e.g. setting both rows and tiles to 1, in which case
> tile width and height would become analogous to coded_{witdh,height}.

But why? What does cropping have to do with tiling? What advantage is
there to handling them in one struct?
James Almer Jan. 21, 2024, 12:06 p.m. UTC | #2
On 1/21/2024 3:27 AM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-20 23:04:06)
>> This includes a struct and helpers. It will be used to support container level
>> cropping and tiled image formats, but should be generic enough for general
>> usage.
>>
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>> Extended to include fields used for cropping. Should make the struct reusable
>> even for non tiled images, e.g. setting both rows and tiles to 1, in which case
>> tile width and height would become analogous to coded_{witdh,height}.
> 
> But why? What does cropping have to do with tiling? What advantage is
> there to handling them in one struct?

The struct does not need to be used for non tiled image scenarios, but 
could if we decide we don't want to add another struct that would only 
contain a subset of the fields present here.

As to why said fields here present here, HEIF may use a clap box to 
define cropping for the final image, not for the tiles. This needs to be 
propagated, and the previous version of this API, which only defined 
cropping from right and bottom edges if output dimensions were smaller 
than the grid (standard case for tiled heif with no clap box), was not 
enough. Hence this change.

I can rename this struct to Image Grid or something else, which might 
make it feel less awkward if we decide to reuse it. We still need to 
propagate container cropping from clap boxes and from Matroska elements 
after all.
Anton Khirnov Jan. 21, 2024, 5:29 p.m. UTC | #3
Quoting James Almer (2024-01-21 13:06:28)
> On 1/21/2024 3:27 AM, Anton Khirnov wrote:
> > Quoting James Almer (2024-01-20 23:04:06)
> >> This includes a struct and helpers. It will be used to support container level
> >> cropping and tiled image formats, but should be generic enough for general
> >> usage.
> >>
> >> Signed-off-by: James Almer <jamrial@gmail.com>
> >> ---
> >> Extended to include fields used for cropping. Should make the struct reusable
> >> even for non tiled images, e.g. setting both rows and tiles to 1, in which case
> >> tile width and height would become analogous to coded_{witdh,height}.
> > 
> > But why? What does cropping have to do with tiling? What advantage is
> > there to handling them in one struct?
> 
> The struct does not need to be used for non tiled image scenarios, but 
> could if we decide we don't want to add another struct that would only 
> contain a subset of the fields present here.
> 
> As to why said fields here present here, HEIF may use a clap box to 
> define cropping for the final image, not for the tiles. This needs to be 
> propagated, and the previous version of this API, which only defined 
> cropping from right and bottom edges if output dimensions were smaller 
> than the grid (standard case for tiled heif with no clap box), was not 
> enough. Hence this change.
> 
> I can rename this struct to Image Grid or something else, which might 
> make it feel less awkward if we decide to reuse it. We still need to 
> propagate container cropping from clap boxes and from Matroska elements 
> after all.

Honestly this whole new API strikes me as massively overthinking it. All
you should need to describe an arbitrary partition of an image into
sub-rectangles is an array of (x, y, width, height). Instead you're
proposing a new public header, struct, three functions, multiple "tile
types", and if I'm not mistaken it still cannot describe an arbitrary
partitioning. Plus it's in libavutil for some reason, even though
libavformat seems to be the only intended user.

Is all this complexity really warranted?
James Almer Jan. 21, 2024, 5:47 p.m. UTC | #4
On 1/21/2024 2:29 PM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-21 13:06:28)
>> On 1/21/2024 3:27 AM, Anton Khirnov wrote:
>>> Quoting James Almer (2024-01-20 23:04:06)
>>>> This includes a struct and helpers. It will be used to support container level
>>>> cropping and tiled image formats, but should be generic enough for general
>>>> usage.
>>>>
>>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>>> ---
>>>> Extended to include fields used for cropping. Should make the struct reusable
>>>> even for non tiled images, e.g. setting both rows and tiles to 1, in which case
>>>> tile width and height would become analogous to coded_{witdh,height}.
>>>
>>> But why? What does cropping have to do with tiling? What advantage is
>>> there to handling them in one struct?
>>
>> The struct does not need to be used for non tiled image scenarios, but
>> could if we decide we don't want to add another struct that would only
>> contain a subset of the fields present here.
>>
>> As to why said fields here present here, HEIF may use a clap box to
>> define cropping for the final image, not for the tiles. This needs to be
>> propagated, and the previous version of this API, which only defined
>> cropping from right and bottom edges if output dimensions were smaller
>> than the grid (standard case for tiled heif with no clap box), was not
>> enough. Hence this change.
>>
>> I can rename this struct to Image Grid or something else, which might
>> make it feel less awkward if we decide to reuse it. We still need to
>> propagate container cropping from clap boxes and from Matroska elements
>> after all.
> 
> Honestly this whole new API strikes me as massively overthinking it. All
> you should need to describe an arbitrary partition of an image into
> sub-rectangles is an array of (x, y, width, height). Instead you're
> proposing a new public header, struct, three functions, multiple "tile
> types", and if I'm not mistaken it still cannot describe an arbitrary
> partitioning. Plus it's in libavutil for some reason, even though
> libavformat seems to be the only intended user.
> 
> Is all this complexity really warranted?

1. It needs to be usable as a Stream Group type, so a struct is 
required. Said struct needs an allocator unless we want to have its size 
be part of the ABI. I can remove the free function, but then the caller 
needs to manually free any internal data.
2. We need tile dimensions (Width and height) plus row and column count, 
which give you the final size of the grid, then offsets x and y to get 
the actual image within the grid meant for presentation.
3. I want to support uniform tiles as well as variable tile dimensions, 
hence multiple tile types. The latter currently has no use case, but 
eventually might. I can if you prefer not include said type at first, 
but i want to keep the union in place so it and other extensions can be 
added.
4. It's in lavu because its meant to be generic. It can also be used to 
transport tiling and cropping information as stream and packet side 
data, which can't depend on something defined in lavf.

And what do you mean with not supporting describing arbitrary 
partitioning? Isn't that what variable tile dimensions achieve?
Anton Khirnov Jan. 21, 2024, 6:29 p.m. UTC | #5
Quoting James Almer (2024-01-21 18:47:43)
> On 1/21/2024 2:29 PM, Anton Khirnov wrote:
> > Honestly this whole new API strikes me as massively overthinking it. All
> > you should need to describe an arbitrary partition of an image into
> > sub-rectangles is an array of (x, y, width, height). Instead you're
> > proposing a new public header, struct, three functions, multiple "tile
> > types", and if I'm not mistaken it still cannot describe an arbitrary
> > partitioning. Plus it's in libavutil for some reason, even though
> > libavformat seems to be the only intended user.
> > 
> > Is all this complexity really warranted?
> 
> 1. It needs to be usable as a Stream Group type, so a struct is 
> required. Said struct needs an allocator unless we want to have its size 
> be part of the ABI. I can remove the free function, but then the caller 
> needs to manually free any internal data.

If the struct lives in lavf and is always allocated as a part of
AVStreamGroup then you don't need a public constructor/destructor and
can still extend the struct.

> 2. We need tile dimensions (Width and height) plus row and column count, 
> which give you the final size of the grid, then offsets x and y to get 
> the actual image within the grid meant for presentation.
> 3. I want to support uniform tiles as well as variable tile dimensions, 
> hence multiple tile types. The latter currently has no use case, but 
> eventually might. I can if you prefer not include said type at first, 
> but i want to keep the union in place so it and other extensions can be 
> added.
> 4. It's in lavu because its meant to be generic. It can also be used to 
> transport tiling and cropping information as stream and packet side 
> data, which can't depend on something defined in lavf.

When would you have tiling information associated with a specific
stream?

> And what do you mean with not supporting describing arbitrary 
> partitioning? Isn't that what variable tile dimensions achieve?

IIUC your tiling scheme still assumes that the partitioning is by rows
and columns. A completely generic partitioning could be irregular.
James Almer Jan. 21, 2024, 6:38 p.m. UTC | #6
On 1/21/2024 3:29 PM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-21 18:47:43)
>> On 1/21/2024 2:29 PM, Anton Khirnov wrote:
>>> Honestly this whole new API strikes me as massively overthinking it. All
>>> you should need to describe an arbitrary partition of an image into
>>> sub-rectangles is an array of (x, y, width, height). Instead you're
>>> proposing a new public header, struct, three functions, multiple "tile
>>> types", and if I'm not mistaken it still cannot describe an arbitrary
>>> partitioning. Plus it's in libavutil for some reason, even though
>>> libavformat seems to be the only intended user.
>>>
>>> Is all this complexity really warranted?
>>
>> 1. It needs to be usable as a Stream Group type, so a struct is
>> required. Said struct needs an allocator unless we want to have its size
>> be part of the ABI. I can remove the free function, but then the caller
>> needs to manually free any internal data.
> 
> If the struct lives in lavf and is always allocated as a part of
> AVStreamGroup then you don't need a public constructor/destructor and
> can still extend the struct.

Yes, but that would be the case if it's only meant to be allocated by 
AVStreamGroup and nothing else.

> 
>> 2. We need tile dimensions (Width and height) plus row and column count,
>> which give you the final size of the grid, then offsets x and y to get
>> the actual image within the grid meant for presentation.
>> 3. I want to support uniform tiles as well as variable tile dimensions,
>> hence multiple tile types. The latter currently has no use case, but
>> eventually might. I can if you prefer not include said type at first,
>> but i want to keep the union in place so it and other extensions can be
>> added.
>> 4. It's in lavu because its meant to be generic. It can also be used to
>> transport tiling and cropping information as stream and packet side
>> data, which can't depend on something defined in lavf.
> 
> When would you have tiling information associated with a specific
> stream?

Can't think of an example for tiling, but i can for cropping. If you 
insist on not reusing this for non-HEIF cropping usage in mp4/matroska, 
then ok, I'll move it to lavf.

> 
>> And what do you mean with not supporting describing arbitrary
>> partitioning? Isn't that what variable tile dimensions achieve?
> 
> IIUC your tiling scheme still assumes that the partitioning is by rows
> and columns. A completely generic partitioning could be irregular.

A new tile type that doesn't define rows and columns can be added if 
needed. But the current variable tile type can support things like grids 
of two rows and two columns where the second row is effectively a single 
tile, simply by setting the second tile in said row as having a width of 0.
Anton Khirnov Jan. 21, 2024, 7:02 p.m. UTC | #7
Quoting James Almer (2024-01-21 19:38:50)
> On 1/21/2024 3:29 PM, Anton Khirnov wrote:
> > Quoting James Almer (2024-01-21 18:47:43)
> >> On 1/21/2024 2:29 PM, Anton Khirnov wrote:
> >>> Honestly this whole new API strikes me as massively overthinking it. All
> >>> you should need to describe an arbitrary partition of an image into
> >>> sub-rectangles is an array of (x, y, width, height). Instead you're
> >>> proposing a new public header, struct, three functions, multiple "tile
> >>> types", and if I'm not mistaken it still cannot describe an arbitrary
> >>> partitioning. Plus it's in libavutil for some reason, even though
> >>> libavformat seems to be the only intended user.
> >>>
> >>> Is all this complexity really warranted?
> >>
> >> 1. It needs to be usable as a Stream Group type, so a struct is
> >> required. Said struct needs an allocator unless we want to have its size
> >> be part of the ABI. I can remove the free function, but then the caller
> >> needs to manually free any internal data.
> > 
> > If the struct lives in lavf and is always allocated as a part of
> > AVStreamGroup then you don't need a public constructor/destructor and
> > can still extend the struct.
> 
> Yes, but that would be the case if it's only meant to be allocated by 
> AVStreamGroup and nothing else.

That is the case right now, no?

If that ever changes then the constructor can be added.

> > 
> >> 2. We need tile dimensions (Width and height) plus row and column count,
> >> which give you the final size of the grid, then offsets x and y to get
> >> the actual image within the grid meant for presentation.
> >> 3. I want to support uniform tiles as well as variable tile dimensions,
> >> hence multiple tile types. The latter currently has no use case, but
> >> eventually might. I can if you prefer not include said type at first,
> >> but i want to keep the union in place so it and other extensions can be
> >> added.
> >> 4. It's in lavu because its meant to be generic. It can also be used to
> >> transport tiling and cropping information as stream and packet side
> >> data, which can't depend on something defined in lavf.
> > 
> > When would you have tiling information associated with a specific
> > stream?
> 
> Can't think of an example for tiling, but i can for cropping. If you 
> insist on not reusing this for non-HEIF cropping usage in mp4/matroska, 
> then ok, I'll move it to lavf.

I still don't see why should it be a good idea to use this struct for
generic container cropping. It feels very much like a hammer in search
of a nail.

> > 
> >> And what do you mean with not supporting describing arbitrary
> >> partitioning? Isn't that what variable tile dimensions achieve?
> > 
> > IIUC your tiling scheme still assumes that the partitioning is by rows
> > and columns. A completely generic partitioning could be irregular.
> 
> A new tile type that doesn't define rows and columns can be added if 
> needed. But the current variable tile type can support things like grids 
> of two rows and two columns where the second row is effectively a single 
> tile, simply by setting the second tile in said row as having a width of 0.

The problem I see here is that every consumer of this struct then has to
explicitly support every type, and adding a new type requires updating
all callers. This seems unnecessary when "list of N rectangles" covers
all possible partitionings.

That does not mean you actually have to store it that way - the struct
could be a list of N rectangles logically, while actually being
represented more efficiently (in the same way a channel layout is always
logically a list of channels, even though it's often represented by an
uint64 rather than a malloced array).
James Almer Jan. 21, 2024, 7:29 p.m. UTC | #8
On 1/21/2024 4:02 PM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-21 19:38:50)
>> On 1/21/2024 3:29 PM, Anton Khirnov wrote:
>>> Quoting James Almer (2024-01-21 18:47:43)
>>>> On 1/21/2024 2:29 PM, Anton Khirnov wrote:
>>>>> Honestly this whole new API strikes me as massively overthinking it. All
>>>>> you should need to describe an arbitrary partition of an image into
>>>>> sub-rectangles is an array of (x, y, width, height). Instead you're
>>>>> proposing a new public header, struct, three functions, multiple "tile
>>>>> types", and if I'm not mistaken it still cannot describe an arbitrary
>>>>> partitioning. Plus it's in libavutil for some reason, even though
>>>>> libavformat seems to be the only intended user.
>>>>>
>>>>> Is all this complexity really warranted?
>>>>
>>>> 1. It needs to be usable as a Stream Group type, so a struct is
>>>> required. Said struct needs an allocator unless we want to have its size
>>>> be part of the ABI. I can remove the free function, but then the caller
>>>> needs to manually free any internal data.
>>>
>>> If the struct lives in lavf and is always allocated as a part of
>>> AVStreamGroup then you don't need a public constructor/destructor and
>>> can still extend the struct.
>>
>> Yes, but that would be the case if it's only meant to be allocated by
>> AVStreamGroup and nothing else.
> 
> That is the case right now, no?
> 
> If that ever changes then the constructor can be added.
> 
>>>
>>>> 2. We need tile dimensions (Width and height) plus row and column count,
>>>> which give you the final size of the grid, then offsets x and y to get
>>>> the actual image within the grid meant for presentation.
>>>> 3. I want to support uniform tiles as well as variable tile dimensions,
>>>> hence multiple tile types. The latter currently has no use case, but
>>>> eventually might. I can if you prefer not include said type at first,
>>>> but i want to keep the union in place so it and other extensions can be
>>>> added.
>>>> 4. It's in lavu because its meant to be generic. It can also be used to
>>>> transport tiling and cropping information as stream and packet side
>>>> data, which can't depend on something defined in lavf.
>>>
>>> When would you have tiling information associated with a specific
>>> stream?
>>
>> Can't think of an example for tiling, but i can for cropping. If you
>> insist on not reusing this for non-HEIF cropping usage in mp4/matroska,
>> then ok, I'll move it to lavf.
> 
> I still don't see why should it be a good idea to use this struct for
> generic container cropping. It feels very much like a hammer in search
> of a nail.

Because once we support container cropping, we will be defining a 
stream/packet side data type that will contain a subset of the fields 
from this struct.

If we reuse this struct, we can export a clap box as an AVTileGrid (Or i 
can rename it to AVImageGrid, and tile to subrectangle) either as the 
stream group tile grid specific parameters if HEIF, or as stream side 
data otherwise.

> 
>>>
>>>> And what do you mean with not supporting describing arbitrary
>>>> partitioning? Isn't that what variable tile dimensions achieve?
>>>
>>> IIUC your tiling scheme still assumes that the partitioning is by rows
>>> and columns. A completely generic partitioning could be irregular.
>>
>> A new tile type that doesn't define rows and columns can be added if
>> needed. But the current variable tile type can support things like grids
>> of two rows and two columns where the second row is effectively a single
>> tile, simply by setting the second tile in said row as having a width of 0.
> 
> The problem I see here is that every consumer of this struct then has to
> explicitly support every type, and adding a new type requires updating
> all callers. This seems unnecessary when "list of N rectangles" covers
> all possible partitionings.

Well, the variable type supports a list of N rectangles where each 
rectangle has arbitrary dimensions, and you can do things like having 
three tiles/rectangles that together still form a rectangle, while 
defining row and column count. So i don't personally see the need for a 
new type to begin with.

> 
> That does not mean you actually have to store it that way - the struct
> could be a list of N rectangles logically, while actually being
> represented more efficiently (in the same way a channel layout is always
> logically a list of channels, even though it's often represented by an
> uint64 rather than a malloced array).
>
James Almer Jan. 21, 2024, 9:03 p.m. UTC | #9
On 1/21/2024 4:29 PM, James Almer wrote:
> On 1/21/2024 4:02 PM, Anton Khirnov wrote:
>> Quoting James Almer (2024-01-21 19:38:50)
>>> On 1/21/2024 3:29 PM, Anton Khirnov wrote:
>>>> Quoting James Almer (2024-01-21 18:47:43)
>>>>> On 1/21/2024 2:29 PM, Anton Khirnov wrote:
>>>>>> Honestly this whole new API strikes me as massively overthinking 
>>>>>> it. All
>>>>>> you should need to describe an arbitrary partition of an image into
>>>>>> sub-rectangles is an array of (x, y, width, height). Instead you're
>>>>>> proposing a new public header, struct, three functions, multiple 
>>>>>> "tile
>>>>>> types", and if I'm not mistaken it still cannot describe an arbitrary
>>>>>> partitioning. Plus it's in libavutil for some reason, even though
>>>>>> libavformat seems to be the only intended user.
>>>>>>
>>>>>> Is all this complexity really warranted?
>>>>>
>>>>> 1. It needs to be usable as a Stream Group type, so a struct is
>>>>> required. Said struct needs an allocator unless we want to have its 
>>>>> size
>>>>> be part of the ABI. I can remove the free function, but then the 
>>>>> caller
>>>>> needs to manually free any internal data.
>>>>
>>>> If the struct lives in lavf and is always allocated as a part of
>>>> AVStreamGroup then you don't need a public constructor/destructor and
>>>> can still extend the struct.
>>>
>>> Yes, but that would be the case if it's only meant to be allocated by
>>> AVStreamGroup and nothing else.
>>
>> That is the case right now, no?
>>
>> If that ever changes then the constructor can be added.
>>
>>>>
>>>>> 2. We need tile dimensions (Width and height) plus row and column 
>>>>> count,
>>>>> which give you the final size of the grid, then offsets x and y to get
>>>>> the actual image within the grid meant for presentation.
>>>>> 3. I want to support uniform tiles as well as variable tile 
>>>>> dimensions,
>>>>> hence multiple tile types. The latter currently has no use case, but
>>>>> eventually might. I can if you prefer not include said type at first,
>>>>> but i want to keep the union in place so it and other extensions 
>>>>> can be
>>>>> added.
>>>>> 4. It's in lavu because its meant to be generic. It can also be 
>>>>> used to
>>>>> transport tiling and cropping information as stream and packet side
>>>>> data, which can't depend on something defined in lavf.
>>>>
>>>> When would you have tiling information associated with a specific
>>>> stream?
>>>
>>> Can't think of an example for tiling, but i can for cropping. If you
>>> insist on not reusing this for non-HEIF cropping usage in mp4/matroska,
>>> then ok, I'll move it to lavf.
>>
>> I still don't see why should it be a good idea to use this struct for
>> generic container cropping. It feels very much like a hammer in search
>> of a nail.
> 
> Because once we support container cropping, we will be defining a 
> stream/packet side data type that will contain a subset of the fields 
> from this struct.
> 
> If we reuse this struct, we can export a clap box as an AVTileGrid (Or i 
> can rename it to AVImageGrid, and tile to subrectangle) either as the 
> stream group tile grid specific parameters if HEIF, or as stream side 
> data otherwise.
> 
>>
>>>>
>>>>> And what do you mean with not supporting describing arbitrary
>>>>> partitioning? Isn't that what variable tile dimensions achieve?
>>>>
>>>> IIUC your tiling scheme still assumes that the partitioning is by rows
>>>> and columns. A completely generic partitioning could be irregular.
>>>
>>> A new tile type that doesn't define rows and columns can be added if
>>> needed. But the current variable tile type can support things like grids
>>> of two rows and two columns where the second row is effectively a single
>>> tile, simply by setting the second tile in said row as having a width 
>>> of 0.
>>
>> The problem I see here is that every consumer of this struct then has to
>> explicitly support every type, and adding a new type requires updating
>> all callers. This seems unnecessary when "list of N rectangles" covers
>> all possible partitionings.
> 
> Well, the variable type supports a list of N rectangles where each 
> rectangle has arbitrary dimensions, and you can do things like having 
> three tiles/rectangles that together still form a rectangle, while 
> defining row and column count. So i don't personally see the need for a 
> new type to begin with.

I could remove the types and the union altogether and leave only the 
array even for uniform tiles if you think that simplifies the API, but 
seems like a waste of memory to allocate a rows x cols array of ints 
just to have the same value written for every entry.
Anton Khirnov Jan. 22, 2024, 10:32 a.m. UTC | #10
Quoting James Almer (2024-01-21 20:29:58)
> On 1/21/2024 4:02 PM, Anton Khirnov wrote:
> > I still don't see why should it be a good idea to use this struct for
> > generic container cropping. It feels very much like a hammer in search
> > of a nail.
> 
> Because once we support container cropping, we will be defining a 
> stream/packet side data type that will contain a subset of the fields 
> from this struct.

AVCodecParameters is a subset of AVCodecContext. By this logic we should
use AVCodecContext everywhere instead of AVCodecParameters.

There are also issues with storing it in packet side data since it
can contain pointers to malloced data.

> If we reuse this struct, we can export a clap box as an AVTileGrid (Or i 
> can rename it to AVImageGrid, and tile to subrectangle) either as the 
> stream group tile grid specific parameters if HEIF, or as stream side 
> data otherwise.

How does that make the API better? It seems highly counterintuitive to
me to export container cropping information in a struct designed for
tiling.

> > 
> >>>
> >>>> And what do you mean with not supporting describing arbitrary
> >>>> partitioning? Isn't that what variable tile dimensions achieve?
> >>>
> >>> IIUC your tiling scheme still assumes that the partitioning is by rows
> >>> and columns. A completely generic partitioning could be irregular.
> >>
> >> A new tile type that doesn't define rows and columns can be added if
> >> needed. But the current variable tile type can support things like grids
> >> of two rows and two columns where the second row is effectively a single
> >> tile, simply by setting the second tile in said row as having a width of 0.
> > 
> > The problem I see here is that every consumer of this struct then has to
> > explicitly support every type, and adding a new type requires updating
> > all callers. This seems unnecessary when "list of N rectangles" covers
> > all possible partitionings.
> 
> Well, the variable type supports a list of N rectangles where each 
> rectangle has arbitrary dimensions, and you can do things like having 
> three tiles/rectangles that together still form a rectangle, while 
> defining row and column count. So i don't personally see the need for a 
> new type to begin with.

I don't see how is that supposed to work. E.g. consider the following
partitioning:
┌─┬────┬─┐
│ │    ├─┤
├─┤    │ │
│ ├────┤ │
└─┴────┴─┘

How would you represent it in this API?
Anton Khirnov Jan. 22, 2024, 10:38 a.m. UTC | #11
Quoting James Almer (2024-01-21 22:03:10)
> I could remove the types and the union altogether and leave only the
> array even for uniform tiles if you think that simplifies the API, but
> seems like a waste of memory to allocate a rows x cols array of ints
> just to have the same value written for every entry.

My point is that the API can abstract away the details of how the data
is stored. You could have it always behave like a list of N rectangles,
while only storing as much information as is needed.
James Almer Jan. 22, 2024, 11:59 a.m. UTC | #12
On 1/22/2024 7:32 AM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-21 20:29:58)
>> On 1/21/2024 4:02 PM, Anton Khirnov wrote:
>>> I still don't see why should it be a good idea to use this struct for
>>> generic container cropping. It feels very much like a hammer in search
>>> of a nail.
>>
>> Because once we support container cropping, we will be defining a
>> stream/packet side data type that will contain a subset of the fields
>> from this struct.
> 
> AVCodecParameters is a subset of AVCodecContext. By this logic we should
> use AVCodecContext everywhere instead of AVCodecParameters.
> 
> There are also issues with storing it in packet side data since it
> can contain pointers to malloced data.
> 
>> If we reuse this struct, we can export a clap box as an AVTileGrid (Or i
>> can rename it to AVImageGrid, and tile to subrectangle) either as the
>> stream group tile grid specific parameters if HEIF, or as stream side
>> data otherwise.
> 
> How does that make the API better? It seems highly counterintuitive to
> me to export container cropping information in a struct designed for
> tiling.
> 
>>>
>>>>>
>>>>>> And what do you mean with not supporting describing arbitrary
>>>>>> partitioning? Isn't that what variable tile dimensions achieve?
>>>>>
>>>>> IIUC your tiling scheme still assumes that the partitioning is by rows
>>>>> and columns. A completely generic partitioning could be irregular.
>>>>
>>>> A new tile type that doesn't define rows and columns can be added if
>>>> needed. But the current variable tile type can support things like grids
>>>> of two rows and two columns where the second row is effectively a single
>>>> tile, simply by setting the second tile in said row as having a width of 0.
>>>
>>> The problem I see here is that every consumer of this struct then has to
>>> explicitly support every type, and adding a new type requires updating
>>> all callers. This seems unnecessary when "list of N rectangles" covers
>>> all possible partitionings.
>>
>> Well, the variable type supports a list of N rectangles where each
>> rectangle has arbitrary dimensions, and you can do things like having
>> three tiles/rectangles that together still form a rectangle, while
>> defining row and column count. So i don't personally see the need for a
>> new type to begin with.
> 
> I don't see how is that supposed to work. E.g. consider the following
> partitioning:
> ┌─┬────┬─┐
> │ │    ├─┤
> ├─┤    │ │
> │ ├────┤ │
> └─┴────┴─┘
> 
> How would you represent it in this API?

That's two rows and three columns. Lets assume the smallest rectangle 
there is 1x1:

1x2 2x3 1x1
1x2 2x1 1x3

Meaning

tile_width[] = { 1, 2, 1, 1, 2, 1 };
tile_height[] = { 2, 3, 1, 2, 1, 3 };

As long as the sum of widths on every row and the sum of heights on 
every column is the same (To ensure you get a rectangle), it can be 
represented.

If what you're trying to say is "What about offsets?", they can be 
inferred based on dimensions and position of previous tiles within the grid.
I don't think adding yet another array to store offsets is worth it. But 
maybe a helper function that will build it?
James Almer Jan. 22, 2024, 12:08 p.m. UTC | #13
On 1/22/2024 7:32 AM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-21 20:29:58)
>> On 1/21/2024 4:02 PM, Anton Khirnov wrote:
>>> I still don't see why should it be a good idea to use this struct for
>>> generic container cropping. It feels very much like a hammer in search
>>> of a nail.
>>
>> Because once we support container cropping, we will be defining a
>> stream/packet side data type that will contain a subset of the fields
>> from this struct.
> 
> AVCodecParameters is a subset of AVCodecContext. By this logic we should
> use AVCodecContext everywhere instead of AVCodecParameters.
> 
> There are also issues with storing it in packet side data since it
> can contain pointers to malloced data.

Ah, that's true. At least if i remove the union.

> 
>> If we reuse this struct, we can export a clap box as an AVTileGrid (Or i
>> can rename it to AVImageGrid, and tile to subrectangle) either as the
>> stream group tile grid specific parameters if HEIF, or as stream side
>> data otherwise.
> 
> How does that make the API better? It seems highly counterintuitive to
> me to export container cropping information in a struct designed for
> tiling.

Or you could consider it a struct designed to represent a group of 
images combined in a grid, for the purpose of extracting a subrectangle 
for presentation. How many images there are is defined by the rows and 
cols fields, and it can be 1.

> 
>>>
>>>>>
>>>>>> And what do you mean with not supporting describing arbitrary
>>>>>> partitioning? Isn't that what variable tile dimensions achieve?
>>>>>
>>>>> IIUC your tiling scheme still assumes that the partitioning is by rows
>>>>> and columns. A completely generic partitioning could be irregular.
>>>>
>>>> A new tile type that doesn't define rows and columns can be added if
>>>> needed. But the current variable tile type can support things like grids
>>>> of two rows and two columns where the second row is effectively a single
>>>> tile, simply by setting the second tile in said row as having a width of 0.
>>>
>>> The problem I see here is that every consumer of this struct then has to
>>> explicitly support every type, and adding a new type requires updating
>>> all callers. This seems unnecessary when "list of N rectangles" covers
>>> all possible partitionings.
>>
>> Well, the variable type supports a list of N rectangles where each
>> rectangle has arbitrary dimensions, and you can do things like having
>> three tiles/rectangles that together still form a rectangle, while
>> defining row and column count. So i don't personally see the need for a
>> new type to begin with.
> 
> I don't see how is that supposed to work. E.g. consider the following
> partitioning:
> ┌─┬────┬─┐
> │ │    ├─┤
> ├─┤    │ │
> │ ├────┤ │
> └─┴────┴─┘
> 
> How would you represent it in this API?
>
James Almer Jan. 22, 2024, 12:12 p.m. UTC | #14
On 1/22/2024 7:38 AM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-21 22:03:10)
>> I could remove the types and the union altogether and leave only the
>> array even for uniform tiles if you think that simplifies the API, but
>> seems like a waste of memory to allocate a rows x cols array of ints
>> just to have the same value written for every entry.
> 
> My point is that the API can abstract away the details of how the data
> is stored. You could have it always behave like a list of N rectangles,
> while only storing as much information as is needed.

The struct needs tile rows, tile columns, tile dimensions, output 
dimensions, and cropping information. I don't see how it could get smaller.
Anton Khirnov Jan. 25, 2024, 5:13 p.m. UTC | #15
Quoting James Almer (2024-01-22 12:59:52)
> > 
> > I don't see how is that supposed to work. E.g. consider the following
> > partitioning:
> > ┌─┬────┬─┐
> > │ │    ├─┤
> > ├─┤    │ │
> > │ ├────┤ │
> > └─┴────┴─┘
> > 
> > How would you represent it in this API?
> 
> That's two rows and three columns. Lets assume the smallest rectangle 
> there is 1x1:
> 
> 1x2 2x3 1x1
> 1x2 2x1 1x3
> 
> Meaning
> 
> tile_width[] = { 1, 2, 1, 1, 2, 1 };
> tile_height[] = { 2, 3, 1, 2, 1, 3 };
> 
> As long as the sum of widths on every row and the sum of heights on 
> every column is the same (To ensure you get a rectangle), it can be 
> represented.
> 
> If what you're trying to say is "What about offsets?", they can be 
> inferred based on dimensions and position of previous tiles within the grid.
> I don't think adding yet another array to store offsets is worth it. But 
> maybe a helper function that will build it?

This seems horribly obfuscated to me. Why do the users of this API have
to deal with all this? Why are the notions of "tile rows", "tile
columns", and "tiles" necessary?

-- 
Anton Khirnov
Anton Khirnov Jan. 25, 2024, 5:17 p.m. UTC | #16
Quoting James Almer (2024-01-22 13:08:43)
> Or you could consider it a struct designed to represent a group of 
> images combined in a grid, for the purpose of extracting a subrectangle 
> for presentation. How many images there are is defined by the rows and 
> cols fields, and it can be 1.

It still very much sounds like you wrote a hammer and now everything is
starting to look like a nail. What is the advantage of using such a
compelex struct to export 4 unsigned integers?
James Almer Jan. 25, 2024, 5:26 p.m. UTC | #17
On 1/25/2024 2:17 PM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-22 13:08:43)
>> Or you could consider it a struct designed to represent a group of
>> images combined in a grid, for the purpose of extracting a subrectangle
>> for presentation. How many images there are is defined by the rows and
>> cols fields, and it can be 1.
> 
> It still very much sounds like you wrote a hammer and now everything is
> starting to look like a nail. What is the advantage of using such a
> compelex struct to export 4 unsigned integers?

None other than not adding another struct. But i already dropped this 
idea and moved everything to lavf, like you asked.
James Almer Jan. 25, 2024, 5:31 p.m. UTC | #18
On 1/25/2024 2:13 PM, Anton Khirnov wrote:
> Quoting James Almer (2024-01-22 12:59:52)
>>>
>>> I don't see how is that supposed to work. E.g. consider the following
>>> partitioning:
>>> ┌─┬────┬─┐
>>> │ │    ├─┤
>>> ├─┤    │ │
>>> │ ├────┤ │
>>> └─┴────┴─┘
>>>
>>> How would you represent it in this API?
>>
>> That's two rows and three columns. Lets assume the smallest rectangle
>> there is 1x1:
>>
>> 1x2 2x3 1x1
>> 1x2 2x1 1x3
>>
>> Meaning
>>
>> tile_width[] = { 1, 2, 1, 1, 2, 1 };
>> tile_height[] = { 2, 3, 1, 2, 1, 3 };
>>
>> As long as the sum of widths on every row and the sum of heights on
>> every column is the same (To ensure you get a rectangle), it can be
>> represented.
>>
>> If what you're trying to say is "What about offsets?", they can be
>> inferred based on dimensions and position of previous tiles within the grid.
>> I don't think adding yet another array to store offsets is worth it. But
>> maybe a helper function that will build it?
> 
> This seems horribly obfuscated to me. Why do the users of this API have
> to deal with all this? Why are the notions of "tile rows", "tile
> columns", and "tiles" necessary?

It aligns with how HEIF (and even bitstream codecs like AV1 internally) 
define a grid of images put together. Since this will be mainly used for 
the former, it seems like the best way to propagate such information.

Strictly speaking, the "tile" dimensions are available as part of each 
stream in the group, so now that i made the struct not be generic for 
lavu, i can remove the arrays. But how to place the images in a grid 
still requires the concept of rows and columns.
diff mbox series

Patch

diff --git a/libavutil/Makefile b/libavutil/Makefile
index e7709b97d0..380d706cfe 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -82,6 +82,7 @@  HEADERS = adler32.h                                                     \
           spherical.h                                                   \
           stereo3d.h                                                    \
           threadmessage.h                                               \
+          tile.h                                                        \
           time.h                                                        \
           timecode.h                                                    \
           timestamp.h                                                   \
@@ -172,6 +173,7 @@  OBJS = adler32.o                                                        \
        spherical.o                                                      \
        stereo3d.o                                                       \
        threadmessage.o                                                  \
+       tile.o                                                           \
        time.o                                                           \
        timecode.o                                                       \
        tree.o                                                           \
diff --git a/libavutil/tile.c b/libavutil/tile.c
new file mode 100644
index 0000000000..de59ba4bab
--- /dev/null
+++ b/libavutil/tile.c
@@ -0,0 +1,81 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdint.h>
+#include <limits.h>
+
+#include "log.h"
+#include "mem.h"
+#include "opt.h"
+#include "tile.h"
+
+#define FLAGS AV_OPT_FLAG_ENCODING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
+#define OFFSET(x) offsetof(AVTileGrid, x)
+static const AVOption tile_grid_options[] = {
+    { "type", NULL, OFFSET(type), AV_OPT_TYPE_INT,
+            { .i64 = AV_TILE_DIMENSION_TYPE_UNIFORM },
+            AV_TILE_DIMENSION_TYPE_UNIFORM, AV_TILE_DIMENSION_TYPE_VARIABLE, FLAGS, "type" },
+        { "uniform",  NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_TILE_DIMENSION_TYPE_UNIFORM },  .unit = "type" },
+        { "variable", NULL, 0, AV_OPT_TYPE_CONST,
+                   { .i64 = AV_TILE_DIMENSION_TYPE_VARIABLE }, .unit = "type" },
+    { "tile_rows", NULL, OFFSET(tile_rows), AV_OPT_TYPE_INT, { .i64 = 1 }, 1, INT_MAX, FLAGS },
+    { "tile_cols", NULL, OFFSET(tile_cols), AV_OPT_TYPE_INT, { .i64 = 1 }, 1, INT_MAX, FLAGS },
+    { "horizontal_offset", NULL, OFFSET(horizontal_offset), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, FLAGS },
+    { "vertical_offset", NULL, OFFSET(vertical_offset),     AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, FLAGS },
+    { "output_size", "size of the output image", OFFSET(output_width), AV_OPT_TYPE_IMAGE_SIZE,
+        { .str = NULL }, 0, INT_MAX, FLAGS },
+    { NULL },
+};
+
+static const AVClass tile_grid_class = {
+    .class_name          = "AVTileGrid",
+    .version             = LIBAVUTIL_VERSION_INT,
+    .option              = tile_grid_options,
+};
+
+const AVClass *av_tile_grid_get_class(void)
+{
+    return &tile_grid_class;
+}
+
+AVTileGrid *av_tile_grid_alloc(void)
+{
+    AVTileGrid *tile_grid = av_mallocz(sizeof(AVTileGrid));
+
+    if (tile_grid) {
+        tile_grid->av_class = &tile_grid_class;
+        av_opt_set_defaults(tile_grid);
+    }
+
+    return tile_grid;
+}
+
+void av_tile_grid_free(AVTileGrid **ptile_grid)
+{
+    AVTileGrid *tile_grid = *ptile_grid;
+
+    if (!tile_grid)
+        return;
+
+    if (tile_grid->type == AV_TILE_DIMENSION_TYPE_VARIABLE) {
+        av_freep(&tile_grid->w.tile_widths);
+        av_freep(&tile_grid->h.tile_heights);
+    }
+    av_freep(ptile_grid);
+}
diff --git a/libavutil/tile.h b/libavutil/tile.h
new file mode 100644
index 0000000000..7e68dc7bd5
--- /dev/null
+++ b/libavutil/tile.h
@@ -0,0 +1,188 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_TILE_H
+#define AVUTIL_TILE_H
+
+#include <stdint.h>
+
+#include "log.h"
+
+/**
+ * @defgroup lavu_tile_grid Tile grid paremeters
+ * @{
+ */
+
+enum AVTileGridType {
+    AV_TILE_DIMENSION_TYPE_UNIFORM,  ///< All tiles have the same size
+    AV_TILE_DIMENSION_TYPE_VARIABLE, ///< Tiles may have variable size
+};
+
+/**
+ * AVTileGrid holds information on how to combine several independent images in a
+ * single grid for presentation.
+ *
+ * It must be allocated by av_tile_grid_alloc(), and its size is not a part of the
+ * ABI. No new fields may be added to this struct without a major version bump,
+ * except for new elements of the unions w and h fitting in sizeof(intptr_t).
+ */
+typedef struct AVTileGrid {
+    const AVClass *av_class;
+
+    /**
+     * Amount of rows in the grid.
+     *
+     * Must be > 0.
+     */
+    int tile_rows;
+    /**
+     * Amount of columns in the grid.
+     *
+     * Must be > 0.
+     */
+    int tile_cols;
+
+    enum AVTileGridType type;
+
+    /**
+     * Tile witdh paremeters
+     */
+    union {
+        /**
+         * Width of every tile.
+         * This member must be used when the @ref type is AV_TILE_DIMENSION_TYPE_UNIFORM.
+         */
+        intptr_t tile_width;
+        /**
+         * A @ref tile_rows * @ref tile_cols sized array of width values for each
+         * tile in the grid, in row major order.
+         * This member must be used when the @ref type is AV_TILE_DIMENSION_TYPE_VARIABLE.
+         *
+         * Must be allocated with the av_malloc() family of functions, and will be
+         * freed by av_tile_grid_free().
+         */
+        int *tile_widths;
+    } w;
+
+    /**
+     * Tile height paremeters
+     */
+    union {
+        /**
+         * Height of every tile.
+         *
+         * This member must be used when the @ref type is AV_TILE_DIMENSION_TYPE_UNIFORM.
+         */
+        intptr_t tile_height;
+        /**
+         * A @ref tile_rows * @ref tile_cols sized array of height values for each
+         * tile in the grid, in row major order.
+         *
+         * This member must be used when the @ref type is AV_TILE_DIMENSION_TYPE_VARIABLE.
+         * Must be allocated with the av_malloc() family of functions, and will be
+         * freed by av_tile_grid_free().
+         */
+        int *tile_heights;
+    } h;
+
+    /**
+     * Offset in pixels from the left edge of the grid where the actual image
+     * meant for presentation starts.
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_UNIFORM, this field must be >= 0
+     * and <= ((@ref tile_width * @ref tile_cols) - @ref output_width).
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_VARIABLE, this field must be >= 0
+     * and <= the sum of all values in the tile_widths array minus
+     * @ref output_width.
+     */
+    int horizontal_offset;
+    /**
+     * Offset in pixels from the top edge of the grid where the actual image meant
+     * for presentation starts.
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_UNIFORM, this field must be >= 0
+     * and <= ((@ref tile_height * @ref tile_rows) - @ref output_height).
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_VARIABLE, this field must be >= 0
+     * and <= the sum of all values in the tile_heights array minus
+     * @ref output_height.
+     */
+    int vertical_offset;
+
+    /**
+     * Width of the final image for presentation.
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_UNIFORM, this field must be > 0
+     * and <= (@ref tile_width * @ref tile_cols) - @ref horizontal_offset.
+     * When it's not equal to tile_width * tile_cols, the result of
+     * (@ref tile_width * @ref tile_cols) - output_width - @ref horizontal_offset
+     * is the amount of pixels to be cropped from the right edge of the final
+     * image before presentation.
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_VARIABLE, this field must be > 0
+     * and <= the sum of all values in the tile_widths array minus
+     * @ref horizontal_offset.
+     * When it's not equal to the sum of all values in the tile_widths array,
+     * the result of said sum minus output_width minus @ref horizontal_offset is
+     * the amount of pixels to be cropped from the right edge of the final image
+     * before presentation.
+     */
+    int output_width;
+    /**
+     * Height of the final image for presentation.
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_UNIFORM, this field must be > 0
+     * and <= (@ref tile_height * @ref tile_rows) - @ref vertical_offset.
+     * When it's not equal to tile_height * tile_rows, the result of
+     * (@ref tile_height * @ref tile_rows) - output_height - @ref vertical_offset
+     * is the amount of pixels to be cropped from the bottom edge of the final
+     * image before presentation.
+     *
+     * When @ref type is AV_TILE_DIMENSION_TYPE_VARIABLE, this field must be > 0
+     * and <= the sum of all values in the tile_heights array minus
+     * @ref vertical_offset.
+     * When it's not equal to the sum of all values in the tile_heights array,
+     * the result of said sum minus output_height minus @ref vertical_offset is
+     * the amount of pixels to be cropped from the bottom edge of the final image
+     * before presentation.
+     */
+    int output_height;
+} AVTileGrid;
+
+const AVClass *av_tile_grid_get_class(void);
+
+/**
+ * Allocates a AVTileGrid, and initializes its fields with default values.
+ * Must be freed with av_tile_grid_free().
+ */
+AVTileGrid *av_tile_grid_alloc(void);
+
+/**
+ * Free an AVTileGrid and all its contents.
+ *
+ * @param tile_grid pointer to pointer to an allocated AVTileGrid.
+ *                  upon return, *tile_grid will be set to NULL.
+ */
+void av_tile_grid_free(AVTileGrid **tile_grid);
+
+/**
+ * @}
+ */
+
+#endif /* AVUTIL_TILE_H */