[FFmpeg-devel,V5,1/2] avutil: add ROI (Region Of Interest) data struct and bump version

Submitted by Guo, Yejun on Jan. 3, 2019, 4:33 p.m.

Details

Message ID 1546533223-30566-1-git-send-email-yejun.guo@intel.com
State New
Headers show

Commit Message

Guo, Yejun Jan. 3, 2019, 4:33 p.m.
The encoders such as libx264 support different QPs offset for different MBs,
it makes possible for ROI-based encoding. It makes sense to add support
within ffmpeg to generate/accept ROI infos and pass into encoders.

Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
generates ROI info for that frame, and the encoder finally does the
ROI-based encoding.

The ROI info is maintained as side data of AVFrame.

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 libavutil/frame.c   |  1 +
 libavutil/frame.h   | 29 +++++++++++++++++++++++++++++
 libavutil/version.h |  2 +-
 3 files changed, 31 insertions(+), 1 deletion(-)

Comments

Rostislav Pehlivanov Jan. 4, 2019, 11:13 a.m.
On Thu, 3 Jan 2019 at 08:41, Guo, Yejun <yejun.guo@intel.com> wrote:

> The encoders such as libx264 support different QPs offset for different
> MBs,
> it makes possible for ROI-based encoding. It makes sense to add support
> within ffmpeg to generate/accept ROI infos and pass into encoders.
>
> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> generates ROI info for that frame, and the encoder finally does the
> ROI-based encoding.
>
> The ROI info is maintained as side data of AVFrame.
>
> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> ---
>  libavutil/frame.c   |  1 +
>  libavutil/frame.h   | 29 +++++++++++++++++++++++++++++
>  libavutil/version.h |  2 +-
>  3 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/libavutil/frame.c b/libavutil/frame.c
> index 34a6210..dcf1fc3 100644
> --- a/libavutil/frame.c
> +++ b/libavutil/frame.c
> @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum
> AVFrameSideDataType type)
>      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table
> data";
>  #endif
>      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata
> SMPTE2094-40 (HDR10+)";
> +    case AV_FRAME_DATA_REGIONS_OF_INTEREST: return "Regions Of Interest";
>      }
>      return NULL;
>  }
> diff --git a/libavutil/frame.h b/libavutil/frame.h
> index 582ac47..7887391 100644
> --- a/libavutil/frame.h
> +++ b/libavutil/frame.h
> @@ -173,6 +173,12 @@ enum AVFrameSideDataType {
>       * volume transform - application 4 of SMPTE 2094-40:2016 standard.
>       */
>      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
> +
> +    /**
> +     * Regions Of Interest, the data is an array of AVRegionOfInterest
> type, the number of
> +     * array element is implied by AVFrameSideData.size /
> AVRegionOfInterest.self_size.
> +     */
> +    AV_FRAME_DATA_REGIONS_OF_INTEREST,
>  };
>
>  enum AVActiveFormatDescription {
> @@ -201,6 +207,29 @@ typedef struct AVFrameSideData {
>  } AVFrameSideData;
>
>  /**
> + * Structure to hold Region Of Interest.
> + *
> + * self_size specifies the size of this data structure. This value
> + * should be set to sizeof(AVRegionOfInterest).
> + *
> + * Number of pixels to discard from the top/bottom/left/right border of
> + * the frame to obtain the region of interest of the frame.
> + * They are encoder dependent and will be extended internally
> + * if the codec requires an alignment.
> + * If the regions overlap, the last value in the list will be used.
> + *
> + * qoffset is quant offset, it is encoder dependent.
> + */
> +typedef struct AVRegionOfInterest {
> +    size_t self_size;
> +    size_t top;
> +    size_t bottom;
> +    size_t left;
> +    size_t right;
>

I'd still much rather have uints with fixed sizes than these platform
dependent types.


+    float qoffset;
>

Use an AVRational with a denum set to the max quantizer, that'll make it
encoder independent and give you better precision.
We also generally don't have floats in the API.
Nicolas George Jan. 4, 2019, 11:22 a.m.
Rostislav Pehlivanov (12019-01-04):
> > +typedef struct AVRegionOfInterest {
> > +    size_t self_size;
> > +    size_t top;
> > +    size_t bottom;
> > +    size_t left;
> > +    size_t right;
> I'd still much rather have uints with fixed sizes than these platform
> dependent types.

Guo, Yejun said:

>> I usually choose 'size_t' for the meanings with length/size.

But that is a mistake. size_t is for length/size of objects in memory,
not any length/size.

These numbers, unless I am mistaken, are coordinates within an AVFrame.
In that case, the only correct type is the same as AVFrame.width and
AVFrame.height.

> Use an AVRational with a denum set to the max quantizer

Can you explain "set to the max quantizer"? For decoders it makes sense,
but what should the encoders do? Return EINVAL? Scale?

Regards,
Rostislav Pehlivanov Jan. 4, 2019, 12:28 p.m.
On Fri, 4 Jan 2019 at 11:22, Nicolas George <george@nsup.org> wrote:

> Rostislav Pehlivanov (12019-01-04):
> > > +typedef struct AVRegionOfInterest {
> > > +    size_t self_size;
> > > +    size_t top;
> > > +    size_t bottom;
> > > +    size_t left;
> > > +    size_t right;
> > I'd still much rather have uints with fixed sizes than these platform
> > dependent types.
>
> Guo, Yejun said:
>
> >> I usually choose 'size_t' for the meanings with length/size.
>
> But that is a mistake. size_t is for length/size of objects in memory,
> not any length/size.
>
> These numbers, unless I am mistaken, are coordinates within an AVFrame.
> In that case, the only correct type is the same as AVFrame.width and
> AVFrame.height.
>

Which are in pixels, not bytes.



>
> > Use an AVRational with a denum set to the max quantizer
>
> Can you explain "set to the max quantizer"? For decoders it makes sense,
> but what should the encoders do? Return EINVAL? Scale?
>

This is an encoder-only interface for now. For vp9 the denum would be 255
for example. They would warn on out of range qdelta.
I still don't get why this can't be an int.
Vittorio Giovara Jan. 4, 2019, 1:08 p.m.
On Fri, Jan 4, 2019 at 12:22 PM Nicolas George <george@nsup.org> wrote:

> Rostislav Pehlivanov (12019-01-04):
> > > +typedef struct AVRegionOfInterest {
> > > +    size_t self_size;
> > > +    size_t top;
> > > +    size_t bottom;
> > > +    size_t left;
> > > +    size_t right;
> > I'd still much rather have uints with fixed sizes than these platform
> > dependent types.
>
> Guo, Yejun said:
>
> >> I usually choose 'size_t' for the meanings with length/size.
>
> But that is a mistake. size_t is for length/size of objects in memory,
> not any length/size.
>
> These numbers, unless I am mistaken, are coordinates within an AVFrame.
> In that case, the only correct type is the same as AVFrame.width and
> AVFrame.height.
>

I personally disagree, what are coordinates within an AVFrame if not the
length/size of an object in memory?
A buffer containing video data is still an object in memory after all, so
IMHO using size_t makes a lot of sense here.
Nicolas George Jan. 4, 2019, 1:18 p.m.
Vittorio Giovara (12019-01-04):
> I personally disagree, what are coordinates within an AVFrame if not the
> length/size of an object in memory?

That would be an argument for making AVFrame.width and AVFrame.height
size_t. But they are not, and therefore these ROI values have no reason
to be either. There is no point in being able to express ROI coordinates
in the quadrillion when the size of the frame is bounded by much less.

Regards,
Nicolas George Jan. 4, 2019, 1:22 p.m.
Rostislav Pehlivanov (12019-01-04):
> Which are in pixels, not bytes.

And plain int, not uint32_t.

> This is an encoder-only interface for now. For vp9 the denum would be 255
> for example. They would warn on out of range qdelta.
> I still don't get why this can't be an int.

"For now" is the key. This is a new API, we should try to think of the
likely future uses, not just what we have in mind right now.

So: very likely use of ROI: -vf facedetect.

Since the filter will not know the specifics of the encoder, it needs to
be able to express things (and have default value) in an
encoder-agnostic way.

Regards,
Rostislav Pehlivanov Jan. 4, 2019, 2:14 p.m.
On Fri, 4 Jan 2019 at 14:02, Nicolas George <george@nsup.org> wrote:

> Rostislav Pehlivanov (12019-01-04):
> > Which are in pixels, not bytes.
>
> And plain int, not uint32_t.
>
> > This is an encoder-only interface for now. For vp9 the denum would be 255
> > for example. They would warn on out of range qdelta.
> > I still don't get why this can't be an int.
>
> "For now" is the key. This is a new API, we should try to think of the
> likely future uses, not just what we have in mind right now.
>
> So: very likely use of ROI: -vf facedetect.
>
> Since the filter will not know the specifics of the encoder, it needs to
> be able to express things (and have default value) in an
> encoder-agnostic way.
>

Hence an AVRational is appropriate as you can have fractions between 0 and
1, the encoder can adjust it and it'll be agnostic.
Nicolas George Jan. 4, 2019, 2:15 p.m.
Rostislav Pehlivanov (12019-01-04):
> Hence an AVRational is appropriate as you can have fractions between 0 and
> 1, the encoder can adjust it and it'll be agnostic.

Yes, AVRational is fine. Producing warnings for an unexpected
denominator would be a bad idea.

Regards,
Derek Buitenhuis Jan. 4, 2019, 4:26 p.m.
On 04/01/2019 14:15, Nicolas George wrote:
> Rostislav Pehlivanov (12019-01-04):
>> Hence an AVRational is appropriate as you can have fractions between 0 and
>> 1, the encoder can adjust it and it'll be agnostic.
> 
> Yes, AVRational is fine. Producing warnings for an unexpected
> denominator would be a bad idea.

Agree... I don't really understand how such a requirement could even be feasibly
used in an agnostic way... you'd need some function to return the quantizer
offset range for any given encoder, or you risk it just failing, or having a
huge load of code in any user application (filter, mpv, whatever) that has
to special case every single encoder. And also a flag saying it can consume
this data at all.

As a side note, I'd like to point out, that this is well into bikeshed territory;
Yejun sent v1 of this set a month ago and every revision has different people wanting
different things, after being silent for weeks. I can imagine it's a bit frustrating.

Cheers,
- Derek
Vittorio Giovara Jan. 4, 2019, 4:28 p.m.
On Fri, Jan 4, 2019 at 2:37 PM Nicolas George <george@nsup.org> wrote:

> Vittorio Giovara (12019-01-04):
> > I personally disagree, what are coordinates within an AVFrame if not the
> > length/size of an object in memory?
>
> That would be an argument for making AVFrame.width and AVFrame.height
> size_t. But they are not, and therefore these ROI values have no reason
> to be either. There is no point in being able to express ROI coordinates
> in the quadrillion when the size of the frame is bounded by much less.
>

That seems a poor argument since the code base is so old that there are a
plethora of bad design decisions that should not dictate what choices are
made now.
size_t really seems the right choice here, and since it's the one chosen by
the author of the patch (revision 5) I think we should respect that.
Using AVRational instead of float for qoffset has the added benefit of
making the field encoder independent, so it's a sensible change, while
using uint32 instead of sizet does not bring anything to the table, as far
as I can tell, so let's not ask Yejun to change it.
Steinar H. Gunderson Jan. 4, 2019, 4:41 p.m.
On Fri, Jan 04, 2019 at 05:28:03PM +0100, Vittorio Giovara wrote:
> size_t really seems the right choice here

Out of curiosity; if it's important to support as large resolutions
as you can address in memory, what would you do with a 1bpp format
on a 32-bit system, where one could easily store more than size_t pixels in
memory?

/* Steinar */
Nicolas George Jan. 4, 2019, 4:44 p.m.
Steinar H. Gunderson (12019-01-04):
> Out of curiosity; if it's important to support as large resolutions
> as you can address in memory, what would you do with a 1bpp format
> on a 32-bit system, where one could easily store more than size_t pixels in
> memory?

It is not that important, and anyway it would require changing other
parts of the code, since they already use ints.

Regards,
Michael Niedermayer Jan. 4, 2019, 5:23 p.m.
On Fri, Jan 04, 2019 at 04:26:51PM +0000, Derek Buitenhuis wrote:
> On 04/01/2019 14:15, Nicolas George wrote:
> > Rostislav Pehlivanov (12019-01-04):
> >> Hence an AVRational is appropriate as you can have fractions between 0 and
> >> 1, the encoder can adjust it and it'll be agnostic.
> > 
> > Yes, AVRational is fine. Producing warnings for an unexpected
> > denominator would be a bad idea.
> 
> Agree... I don't really understand how such a requirement could even be feasibly
> used in an agnostic way... you'd need some function to return the quantizer
> offset range for any given encoder, or you risk it just failing, or having a
> huge load of code in any user application (filter, mpv, whatever) that has
> to special case every single encoder. And also a flag saying it can consume
> this data at all.
> 

> As a side note, I'd like to point out, that this is well into bikeshed territory;
> Yejun sent v1 of this set a month ago and every revision has different people wanting
> different things, after being silent for weeks. I can imagine it's a bit frustrating.

yes
i would suggest that if a disagreement on types remain that people move
the discussion to a thread dedicated to the type of what the disagrement is
abouzt like width/height.
Its not specific to this patch. This should avoid the frustration, as
only once the disagreement is resolved would then someone reply here with
what type it should be.

Thanks

[...]
Rostislav Pehlivanov Jan. 4, 2019, 6:57 p.m.
On Fri, 4 Jan 2019 at 16:28, Vittorio Giovara <vittorio.giovara@gmail.com>
wrote:

> On Fri, Jan 4, 2019 at 2:37 PM Nicolas George <george@nsup.org> wrote:
>
> > Vittorio Giovara (12019-01-04):
> > > I personally disagree, what are coordinates within an AVFrame if not
> the
> > > length/size of an object in memory?
> >
> > That would be an argument for making AVFrame.width and AVFrame.height
> > size_t. But they are not, and therefore these ROI values have no reason
> > to be either. There is no point in being able to express ROI coordinates
> > in the quadrillion when the size of the frame is bounded by much less.
> >
>
> That seems a poor argument since the code base is so old that there are a
> plethora of bad design decisions that should not dictate what choices are
> made now.
>

Hence we should avoid making a new one now. Pixels shouldn't have the
size_t type, it just makes things confusing.
I fail to see how personal choice and silence for weeks constitute a good
argument to keep things as they are. If merged, things will have to be kept
the way they are for at least 2 years.
Vittorio Giovara Jan. 4, 2019, 8:37 p.m.
On Fri, Jan 4, 2019 at 7:57 PM Rostislav Pehlivanov <atomnuker@gmail.com>
wrote:

> On Fri, 4 Jan 2019 at 16:28, Vittorio Giovara <vittorio.giovara@gmail.com>
> wrote:
>
> > On Fri, Jan 4, 2019 at 2:37 PM Nicolas George <george@nsup.org> wrote:
> >
> > > Vittorio Giovara (12019-01-04):
> > > > I personally disagree, what are coordinates within an AVFrame if not
> > the
> > > > length/size of an object in memory?
> > >
> > > That would be an argument for making AVFrame.width and AVFrame.height
> > > size_t. But they are not, and therefore these ROI values have no reason
> > > to be either. There is no point in being able to express ROI
> coordinates
> > > in the quadrillion when the size of the frame is bounded by much less.
> > >
> >
> > That seems a poor argument since the code base is so old that there are a
> > plethora of bad design decisions that should not dictate what choices are
> > made now.
> >
>
> Hence we should avoid making a new one now.


Right, we should do things that make sense and argument them with something
more than saying "it's confusing".

Pixels shouldn't have the size_t type, it just makes things confusing.
>
Rostislav Pehlivanov Jan. 4, 2019, 11:04 p.m.
On Fri, 4 Jan 2019 at 20:44, Vittorio Giovara <vittorio.giovara@gmail.com>
wrote:

> On Fri, Jan 4, 2019 at 7:57 PM Rostislav Pehlivanov <atomnuker@gmail.com>
> wrote:
>
> > On Fri, 4 Jan 2019 at 16:28, Vittorio Giovara <
> vittorio.giovara@gmail.com>
> > wrote:
> >
> > > On Fri, Jan 4, 2019 at 2:37 PM Nicolas George <george@nsup.org> wrote:
> > >
> > > > Vittorio Giovara (12019-01-04):
> > > > > I personally disagree, what are coordinates within an AVFrame if
> not
> > > the
> > > > > length/size of an object in memory?
> > > >
> > > > That would be an argument for making AVFrame.width and AVFrame.height
> > > > size_t. But they are not, and therefore these ROI values have no
> reason
> > > > to be either. There is no point in being able to express ROI
> > coordinates
> > > > in the quadrillion when the size of the frame is bounded by much
> less.
> > > >
> > >
> > > That seems a poor argument since the code base is so old that there
> are a
> > > plethora of bad design decisions that should not dictate what choices
> are
> > > made now.
> > >
> >
> > Hence we should avoid making a new one now.
>
>
> Right, we should do things that make sense and argument them with something
> more than saying "it's confusing".
>
> Pixels shouldn't have the size_t type, it just makes things confusing.
> >
> --
> Vittorio
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Its illogical to have pixels be typed with a size_t, its not something
we've done before either (which makes sense), and its different size across
platforms doesn't help nor does it keep things consistent when there's no
reason for them to be different.
Its a noop type change on 32 bit arches anyway as uint32_t == size_t. Won't
break anything. And the author's having to change another field in the
struct anyway so its not like its more work.
All you've argued thus far for is that since no one said anything for a
month, and that its the author's choice, it should remain that way, and
that coordinates within an avframe are memory sizes (which they can be, but
width and height are not, they're in pixels, and I'd argue they should
remain such). I don't see the reasoning here.
James Almer Jan. 5, 2019, 1:15 a.m.
On 1/4/2019 10:08 AM, Vittorio Giovara wrote:
> On Fri, Jan 4, 2019 at 12:22 PM Nicolas George <george@nsup.org> wrote:
> 
>> Rostislav Pehlivanov (12019-01-04):
>>>> +typedef struct AVRegionOfInterest {
>>>> +    size_t self_size;
>>>> +    size_t top;
>>>> +    size_t bottom;
>>>> +    size_t left;
>>>> +    size_t right;
>>> I'd still much rather have uints with fixed sizes than these platform
>>> dependent types.
>>
>> Guo, Yejun said:
>>
>>>> I usually choose 'size_t' for the meanings with length/size.
>>
>> But that is a mistake. size_t is for length/size of objects in memory,
>> not any length/size.
>>
>> These numbers, unless I am mistaken, are coordinates within an AVFrame.
>> In that case, the only correct type is the same as AVFrame.width and
>> AVFrame.height.
>>
> 
> I personally disagree, what are coordinates within an AVFrame if not the
> length/size of an object in memory?
> A buffer containing video data is still an object in memory after all, so
> IMHO using size_t makes a lot of sense here.

We already did size_t for AVSphericalMapping and had to change them to
uint32_t.
size_t is not the correct type at all, as video dimensions are not
system dependent, and it will generate the same issues we already
experienced with AVSphericalMapping in fate tests.
Nicolas George Jan. 5, 2019, 4:51 a.m.
James Almer (12019-01-04):
> We already did size_t for AVSphericalMapping and had to change them to
> uint32_t.
> size_t is not the correct type at all, as video dimensions are not
> system dependent,

I fully agree with that.

>		    and it will generate the same issues we already
> experienced with AVSphericalMapping in fate tests.

I disagree with that: for some things, size_t would be the correct type,
and in that case, the FATE test should work. A misdesigned FATE test
should not direct the choice of type. Also, IIUC, somebody said the
issue was fixed.

For me, the definite argument is: All the rest of the code uses int for
frame size and coordinate, so new code should use int too unless there
is a very good and specific reason.

Note that while we disagree on the reasons, I think we agree on the
conclusion, which is what matter.

Regards,
Guo, Yejun Jan. 7, 2019, 6:55 a.m.
thanks a lot for all your valuable comments.

> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf

> Of Nicolas George

> Sent: Saturday, January 05, 2019 12:51 PM

> To: FFmpeg development discussions and patches <ffmpeg-

> devel@ffmpeg.org>

> Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] avutil: add ROI (Region Of

> Interest) data struct and bump version

> 

> James Almer (12019-01-04):

> > We already did size_t for AVSphericalMapping and had to change them to

> > uint32_t.

> > size_t is not the correct type at all, as video dimensions are not

> > system dependent,

> 

> I fully agree with that.

> 

> >		    and it will generate the same issues we already

> experienced with

> >AVSphericalMapping in fate tests.

> 

> I disagree with that: for some things, size_t would be the correct type, and in

> that case, the FATE test should work. A misdesigned FATE test should not

> direct the choice of type. Also, IIUC, somebody said the issue was fixed.

> 

> For me, the definite argument is: All the rest of the code uses int for frame

> size and coordinate, so new code should use int too unless there is a very

> good and specific reason.

> 

> Note that while we disagree on the reasons, I think we agree on the

> conclusion, which is what matter.

> 

> Regards,

> 

> --

>   Nicolas George

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


For the type of ROI coordinate, in struct AVFrame, there are ' int width, height;'  and  'size_t crop_top;' as reference, I'll choose int.

For the type of qoffset, I'll change to AVRational.
Vittorio Giovara Jan. 7, 2019, 12:35 p.m.
On Sat, Jan 5, 2019 at 2:15 AM James Almer <jamrial@gmail.com> wrote:

> On 1/4/2019 10:08 AM, Vittorio Giovara wrote:
> > On Fri, Jan 4, 2019 at 12:22 PM Nicolas George <george@nsup.org> wrote:
> >
> >> Rostislav Pehlivanov (12019-01-04):
> >>>> +typedef struct AVRegionOfInterest {
> >>>> +    size_t self_size;
> >>>> +    size_t top;
> >>>> +    size_t bottom;
> >>>> +    size_t left;
> >>>> +    size_t right;
> >>> I'd still much rather have uints with fixed sizes than these platform
> >>> dependent types.
> >>
> >> Guo, Yejun said:
> >>
> >>>> I usually choose 'size_t' for the meanings with length/size.
> >>
> >> But that is a mistake. size_t is for length/size of objects in memory,
> >> not any length/size.
> >>
> >> These numbers, unless I am mistaken, are coordinates within an AVFrame.
> >> In that case, the only correct type is the same as AVFrame.width and
> >> AVFrame.height.
> >>
> >
> > I personally disagree, what are coordinates within an AVFrame if not the
> > length/size of an object in memory?
> > A buffer containing video data is still an object in memory after all, so
> > IMHO using size_t makes a lot of sense here.
>
> We already did size_t for AVSphericalMapping and had to change them to
> uint32_t.
> size_t is not the correct type at all, as video dimensions are not
> system dependent, and it will generate the same issues we already
> experienced with AVSphericalMapping in fate tests.
>
>
Just for the sake of information, the issue with the spherical fate tests
is that some mov tests included the side data size which was thankfully
removed because it did not made any sense. Even then, the
AVSphericalMapping structure was changed only to make it more similar to
the specification -- this is not the case here: size_t represents a length
so using it for video dimensions makes perfect sense. That being said, use
whatever type you want, as long as it's correctly documented and argumented.
Regards
Nicolas George Jan. 7, 2019, 12:40 p.m.
Vittorio Giovara (12019-01-07):
>						  size_t represents a length
> so using it for video dimensions makes perfect sense.

size_t represents a system-dependent size of object in memory, not any
"length". There is no reason to have different resolution limits on 32
and 64 bits systems, as long as the data fits in memory.

Regards,

Patch hide | download patch | download mbox

diff --git a/libavutil/frame.c b/libavutil/frame.c
index 34a6210..dcf1fc3 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -841,6 +841,7 @@  const char *av_frame_side_data_name(enum AVFrameSideDataType type)
     case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table data";
 #endif
     case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata SMPTE2094-40 (HDR10+)";
+    case AV_FRAME_DATA_REGIONS_OF_INTEREST: return "Regions Of Interest";
     }
     return NULL;
 }
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 582ac47..7887391 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -173,6 +173,12 @@  enum AVFrameSideDataType {
      * volume transform - application 4 of SMPTE 2094-40:2016 standard.
      */
     AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
+
+    /**
+     * Regions Of Interest, the data is an array of AVRegionOfInterest type, the number of
+     * array element is implied by AVFrameSideData.size / AVRegionOfInterest.self_size.
+     */
+    AV_FRAME_DATA_REGIONS_OF_INTEREST,
 };
 
 enum AVActiveFormatDescription {
@@ -201,6 +207,29 @@  typedef struct AVFrameSideData {
 } AVFrameSideData;
 
 /**
+ * Structure to hold Region Of Interest.
+ *
+ * self_size specifies the size of this data structure. This value
+ * should be set to sizeof(AVRegionOfInterest).
+ *
+ * Number of pixels to discard from the top/bottom/left/right border of
+ * the frame to obtain the region of interest of the frame.
+ * They are encoder dependent and will be extended internally
+ * if the codec requires an alignment.
+ * If the regions overlap, the last value in the list will be used.
+ *
+ * qoffset is quant offset, it is encoder dependent.
+ */
+typedef struct AVRegionOfInterest {
+    size_t self_size;
+    size_t top;
+    size_t bottom;
+    size_t left;
+    size_t right;
+    float qoffset;
+} AVRegionOfInterest;
+
+/**
  * This structure describes decoded (raw) audio or video data.
  *
  * AVFrame must be allocated using av_frame_alloc(). Note that this only
diff --git a/libavutil/version.h b/libavutil/version.h
index f997615..1fcdea9 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,7 +79,7 @@ 
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  56
-#define LIBAVUTIL_VERSION_MINOR  25
+#define LIBAVUTIL_VERSION_MINOR  26
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \