diff mbox

[FFmpeg-devel,V4,1/2] avutil: add ROI data struct and bump version

Message ID 1545991785-5434-1-git-send-email-yejun.guo@intel.com
State New
Headers show

Commit Message

Guo, Yejun Dec. 28, 2018, 10:09 a.m. UTC
The encoders such as libx264 support different QPs offset for different MBs,
it makes possible for ROI-based encoding. It makes sense to add support
within ffmpeg to generate/accept ROI infos and pass into encoders.

Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
generates ROI info for that frame, and the encoder finally does the
ROI-based encoding.

The ROI info is maintained as side data of AVFrame.

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 libavutil/frame.c   |  1 +
 libavutil/frame.h   | 23 +++++++++++++++++++++++
 libavutil/version.h |  2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

Comments

Derek Buitenhuis Dec. 28, 2018, 4:05 p.m. UTC | #1
On 28/12/2018 10:09, Guo, Yejun wrote:
> The encoders such as libx264 support different QPs offset for different MBs,
> it makes possible for ROI-based encoding. It makes sense to add support
> within ffmpeg to generate/accept ROI infos and pass into encoders.
> 
> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> generates ROI info for that frame, and the encoder finally does the
> ROI-based encoding.
> 
> The ROI info is maintained as side data of AVFrame.
> 
> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> ---
>  libavutil/frame.c   |  1 +
>  libavutil/frame.h   | 23 +++++++++++++++++++++++
>  libavutil/version.h |  2 +-
>  3 files changed, 25 insertions(+), 1 deletion(-)

Seems OK, I think.

- Derek
Vittorio Giovara Jan. 2, 2019, 3:13 p.m. UTC | #2
On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com> wrote:

> The encoders such as libx264 support different QPs offset for different
> MBs,
> it makes possible for ROI-based encoding. It makes sense to add support
> within ffmpeg to generate/accept ROI infos and pass into encoders.
>
> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> generates ROI info for that frame, and the encoder finally does the
> ROI-based encoding.
>
> The ROI info is maintained as side data of AVFrame.
>
> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> ---
>  libavutil/frame.c   |  1 +
>  libavutil/frame.h   | 23 +++++++++++++++++++++++
>  libavutil/version.h |  2 +-
>  3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/libavutil/frame.c b/libavutil/frame.c
> index 34a6210..bebc50e 100644
> --- a/libavutil/frame.c
> +++ b/libavutil/frame.c
> @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum
> AVFrameSideDataType type)
>      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table
> data";
>  #endif
>      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata
> SMPTE2094-40 (HDR10+)";
> +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";
>      }
>      return NULL;
>  }
> diff --git a/libavutil/frame.h b/libavutil/frame.h
> index 582ac47..3460b01 100644
> --- a/libavutil/frame.h
> +++ b/libavutil/frame.h
> @@ -173,6 +173,12 @@ enum AVFrameSideDataType {
>       * volume transform - application 4 of SMPTE 2094-40:2016 standard.
>       */
>      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
> +
> +    /**
> +     * Regions Of Interest, the data is an array of AVROI type, the array
> size
> +     * is implied by AVFrameSideData::size / sizeof(AVROI).
> +     */
> +    AV_FRAME_DATA_ROIS,
>

Any chance i could convince you of unfolding this acronym into
AV_FRAME_REGIONS_OF_INTEREST (and AVRegionOfInterest)?
When I first read it I thought you were talking about Return of Investments
or Request of Invention, which were mildly confusing.

The "AVFrameSideData::size" is a C++-ism, could you please use
"AVFrameSideData.size" like elsewhere in the header?

You should probably document that sizeof() of this struct is part of the
public ABI.


>  };
>
>  enum AVActiveFormatDescription {
> @@ -201,6 +207,23 @@ typedef struct AVFrameSideData {
>  } AVFrameSideData;
>
>  /**
> + * Structure to hold Region Of Interest.
> + *
> + * top/bottom/left/right are coordinates at frame pixel level.
>

what does  "pixel level" mean? May I suggest better wording?

    Number of pixels to discard from the the top/bottom/left/right border
of the frame to obtain the region of interest of the frame.

+ * They will be extended internally if the codec requires an alignment.
> + * If the regions overlap, the last value in the list will be used.
>

Isn't this encoder dependent too?


> + *
> + * qoffset is quant offset, it is encoder dependent.

+ */
> +typedef struct AVROI {
> +    size_t top;
> +    size_t bottom;
> +    size_t left;
> +    size_t right;
> +    int qoffset;
>

so int instead of float?

+} AVROI;
> +
> +/**
>   * This structure describes decoded (raw) audio or video data.
>   *
>   * AVFrame must be allocated using av_frame_alloc(). Note that this only
> diff --git a/libavutil/version.h b/libavutil/version.h
> index f997615..1fcdea9 100644
> --- a/libavutil/version.h
> +++ b/libavutil/version.h
> @@ -79,7 +79,7 @@
>   */
>
Vittorio Giovara Jan. 2, 2019, 5:18 p.m. UTC | #3
On Wed, Jan 2, 2019 at 4:13 PM Vittorio Giovara <vittorio.giovara@gmail.com>
wrote:

>
>
> On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com> wrote:
>
>> The encoders such as libx264 support different QPs offset for different
>> MBs,
>> it makes possible for ROI-based encoding. It makes sense to add support
>> within ffmpeg to generate/accept ROI infos and pass into encoders.
>>
>> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
>> generates ROI info for that frame, and the encoder finally does the
>> ROI-based encoding.
>>
>> The ROI info is maintained as side data of AVFrame.
>>
>> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
>> ---
>>  libavutil/frame.c   |  1 +
>>  libavutil/frame.h   | 23 +++++++++++++++++++++++
>>  libavutil/version.h |  2 +-
>>  3 files changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavutil/frame.c b/libavutil/frame.c
>> index 34a6210..bebc50e 100644
>> --- a/libavutil/frame.c
>> +++ b/libavutil/frame.c
>> @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum
>> AVFrameSideDataType type)
>>      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table
>> data";
>>  #endif
>>      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata
>> SMPTE2094-40 (HDR10+)";
>> +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";
>>      }
>>      return NULL;
>>  }
>> diff --git a/libavutil/frame.h b/libavutil/frame.h
>> index 582ac47..3460b01 100644
>> --- a/libavutil/frame.h
>> +++ b/libavutil/frame.h
>> @@ -173,6 +173,12 @@ enum AVFrameSideDataType {
>>       * volume transform - application 4 of SMPTE 2094-40:2016 standard.
>>       */
>>      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
>> +
>> +    /**
>> +     * Regions Of Interest, the data is an array of AVROI type, the
>> array size
>> +     * is implied by AVFrameSideData::size / sizeof(AVROI).
>> +     */
>> +    AV_FRAME_DATA_ROIS,
>>
>
> Any chance i could convince you of unfolding this acronym into
> AV_FRAME_REGIONS_OF_INTEREST (and AVRegionOfInterest)?
> When I first read it I thought you were talking about Return of
> Investments or Request of Invention, which were mildly confusing.
>
> The "AVFrameSideData::size" is a C++-ism, could you please use
> "AVFrameSideData.size" like elsewhere in the header?
>
> You should probably document that sizeof() of this struct is part of the
> public ABI.
>

After thinking some more about this, it's probably a bad idea to embed an
array in a side data. Not only it constrains the structure to only change
at major bumps, but it seems very reminiscent of binary side data which
is/should be not used for newer side data. As a matter of fact side data
normally hosts either structs or simple types like ints or enums only.
Instead of this, why not creating a structure hosting the number of
elements and a pointer? Something like

AVRegionOfInterest {
 size_t top/left/right/bottom
}

AVRegionOfInterestSet {
 int rois_nb;
 AVRegionOfInterest *rois;
}
James Almer Jan. 2, 2019, 5:41 p.m. UTC | #4
On 12/28/2018 7:09 AM, Guo, Yejun wrote:
> The encoders such as libx264 support different QPs offset for different MBs,
> it makes possible for ROI-based encoding. It makes sense to add support
> within ffmpeg to generate/accept ROI infos and pass into encoders.
> 
> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> generates ROI info for that frame, and the encoder finally does the
> ROI-based encoding.
> 
> The ROI info is maintained as side data of AVFrame.
> 
> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> ---
>  libavutil/frame.c   |  1 +
>  libavutil/frame.h   | 23 +++++++++++++++++++++++
>  libavutil/version.h |  2 +-
>  3 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/libavutil/frame.c b/libavutil/frame.c
> index 34a6210..bebc50e 100644
> --- a/libavutil/frame.c
> +++ b/libavutil/frame.c
> @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum AVFrameSideDataType type)
>      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table data";
>  #endif
>      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata SMPTE2094-40 (HDR10+)";
> +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";
>      }
>      return NULL;
>  }
> diff --git a/libavutil/frame.h b/libavutil/frame.h
> index 582ac47..3460b01 100644
> --- a/libavutil/frame.h
> +++ b/libavutil/frame.h
> @@ -173,6 +173,12 @@ enum AVFrameSideDataType {
>       * volume transform - application 4 of SMPTE 2094-40:2016 standard.
>       */
>      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
> +
> +    /**
> +     * Regions Of Interest, the data is an array of AVROI type, the array size
> +     * is implied by AVFrameSideData::size / sizeof(AVROI).
> +     */
> +    AV_FRAME_DATA_ROIS,
>  };
>  
>  enum AVActiveFormatDescription {
> @@ -201,6 +207,23 @@ typedef struct AVFrameSideData {
>  } AVFrameSideData;
>  
>  /**
> + * Structure to hold Region Of Interest.
> + *
> + * top/bottom/left/right are coordinates at frame pixel level.
> + * They will be extended internally if the codec requires an alignment.
> + * If the regions overlap, the last value in the list will be used.
> + *
> + * qoffset is quant offset, it is encoder dependent.
> + */
> +typedef struct AVROI {
> +    size_t top;
> +    size_t bottom;
> +    size_t left;
> +    size_t right;

uint32_t, please. Make the struct have a fixed size so we don't repeat
the same issues we had with fate tests and AVSphericalMapping.

> +    int qoffset;
> +} AVROI;
> +
> +/**
>   * This structure describes decoded (raw) audio or video data.
>   *
>   * AVFrame must be allocated using av_frame_alloc(). Note that this only
> diff --git a/libavutil/version.h b/libavutil/version.h
> index f997615..1fcdea9 100644
> --- a/libavutil/version.h
> +++ b/libavutil/version.h
> @@ -79,7 +79,7 @@
>   */
>  
>  #define LIBAVUTIL_VERSION_MAJOR  56
> -#define LIBAVUTIL_VERSION_MINOR  25
> +#define LIBAVUTIL_VERSION_MINOR  26
>  #define LIBAVUTIL_VERSION_MICRO 100
>  
>  #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
>
James Almer Jan. 2, 2019, 5:44 p.m. UTC | #5
On 1/2/2019 2:18 PM, Vittorio Giovara wrote:
> On Wed, Jan 2, 2019 at 4:13 PM Vittorio Giovara <vittorio.giovara@gmail.com>
> wrote:
> 
>>
>>
>> On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com> wrote:
>>
>>> The encoders such as libx264 support different QPs offset for different
>>> MBs,
>>> it makes possible for ROI-based encoding. It makes sense to add support
>>> within ffmpeg to generate/accept ROI infos and pass into encoders.
>>>
>>> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
>>> generates ROI info for that frame, and the encoder finally does the
>>> ROI-based encoding.
>>>
>>> The ROI info is maintained as side data of AVFrame.
>>>
>>> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
>>> ---
>>>  libavutil/frame.c   |  1 +
>>>  libavutil/frame.h   | 23 +++++++++++++++++++++++
>>>  libavutil/version.h |  2 +-
>>>  3 files changed, 25 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavutil/frame.c b/libavutil/frame.c
>>> index 34a6210..bebc50e 100644
>>> --- a/libavutil/frame.c
>>> +++ b/libavutil/frame.c
>>> @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum
>>> AVFrameSideDataType type)
>>>      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table
>>> data";
>>>  #endif
>>>      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata
>>> SMPTE2094-40 (HDR10+)";
>>> +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";
>>>      }
>>>      return NULL;
>>>  }
>>> diff --git a/libavutil/frame.h b/libavutil/frame.h
>>> index 582ac47..3460b01 100644
>>> --- a/libavutil/frame.h
>>> +++ b/libavutil/frame.h
>>> @@ -173,6 +173,12 @@ enum AVFrameSideDataType {
>>>       * volume transform - application 4 of SMPTE 2094-40:2016 standard.
>>>       */
>>>      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
>>> +
>>> +    /**
>>> +     * Regions Of Interest, the data is an array of AVROI type, the
>>> array size
>>> +     * is implied by AVFrameSideData::size / sizeof(AVROI).
>>> +     */
>>> +    AV_FRAME_DATA_ROIS,
>>>
>>
>> Any chance i could convince you of unfolding this acronym into
>> AV_FRAME_REGIONS_OF_INTEREST (and AVRegionOfInterest)?
>> When I first read it I thought you were talking about Return of
>> Investments or Request of Invention, which were mildly confusing.
>>
>> The "AVFrameSideData::size" is a C++-ism, could you please use
>> "AVFrameSideData.size" like elsewhere in the header?
>>
>> You should probably document that sizeof() of this struct is part of the
>> public ABI.
>>
> 
> After thinking some more about this, it's probably a bad idea to embed an
> array in a side data. Not only it constrains the structure to only change
> at major bumps, but it seems very reminiscent of binary side data which
> is/should be not used for newer side data. As a matter of fact side data
> normally hosts either structs or simple types like ints or enums only.
> Instead of this, why not creating a structure hosting the number of
> elements and a pointer? Something like
> 
> AVRegionOfInterest {
>  size_t top/left/right/bottom
> }
> 
> AVRegionOfInterestSet {
>  int rois_nb;
>  AVRegionOfInterest *rois;

This will go south as soon as you start copying, referencing and freeing
frames and/or frame side data.

All side data types need to be a contiguous array of bytes in memory
with no pointers.

> }
>
Vittorio Giovara Jan. 2, 2019, 8:50 p.m. UTC | #6
On Wed, Jan 2, 2019 at 6:45 PM James Almer <jamrial@gmail.com> wrote:

> On 1/2/2019 2:18 PM, Vittorio Giovara wrote:
> > On Wed, Jan 2, 2019 at 4:13 PM Vittorio Giovara <
> vittorio.giovara@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com> wrote:
> >>
> >
> > AVRegionOfInterest {
> >  size_t top/left/right/bottom
> > }
> >
> > AVRegionOfInterestSet {
> >  int rois_nb;
> >  AVRegionOfInterest *rois;
>
> This will go south as soon as you start copying, referencing and freeing
> frames and/or frame side data.
>
> All side data types need to be a contiguous array of bytes in memory
> with no pointers.
>

Hmm you're right, but embedding an entire array is pretty bad too,
especially because it locks the struct size...
Do you have any alternative ideas for this?
Vittorio Giovara Jan. 2, 2019, 8:55 p.m. UTC | #7
On Wed, Jan 2, 2019 at 6:48 PM James Almer <jamrial@gmail.com> wrote:

> On 12/28/2018 7:09 AM, Guo, Yejun wrote:
> > The encoders such as libx264 support different QPs offset for different
> MBs,
> > it makes possible for ROI-based encoding. It makes sense to add support
> > within ffmpeg to generate/accept ROI infos and pass into encoders.
> >
> > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> > generates ROI info for that frame, and the encoder finally does the
> > ROI-based encoding.
> >
> > The ROI info is maintained as side data of AVFrame.
> >
> > Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> > ---
> >  libavutil/frame.c   |  1 +
> >  libavutil/frame.h   | 23 +++++++++++++++++++++++
> >  libavutil/version.h |  2 +-
> >  3 files changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavutil/frame.c b/libavutil/frame.c
> > index 34a6210..bebc50e 100644
> > --- a/libavutil/frame.c
> > +++ b/libavutil/frame.c
> > @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum
> AVFrameSideDataType type)
> >      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table
> data";
> >  #endif
> >      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata
> SMPTE2094-40 (HDR10+)";
> > +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";
> >      }
> >      return NULL;
> >  }
> > diff --git a/libavutil/frame.h b/libavutil/frame.h
> > index 582ac47..3460b01 100644
> > --- a/libavutil/frame.h
> > +++ b/libavutil/frame.h
> > @@ -173,6 +173,12 @@ enum AVFrameSideDataType {
> >       * volume transform - application 4 of SMPTE 2094-40:2016 standard.
> >       */
> >      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
> > +
> > +    /**
> > +     * Regions Of Interest, the data is an array of AVROI type, the
> array size
> > +     * is implied by AVFrameSideData::size / sizeof(AVROI).
> > +     */
> > +    AV_FRAME_DATA_ROIS,
> >  };
> >
> >  enum AVActiveFormatDescription {
> > @@ -201,6 +207,23 @@ typedef struct AVFrameSideData {
> >  } AVFrameSideData;
> >
> >  /**
> > + * Structure to hold Region Of Interest.
> > + *
> > + * top/bottom/left/right are coordinates at frame pixel level.
> > + * They will be extended internally if the codec requires an alignment.
> > + * If the regions overlap, the last value in the list will be used.
> > + *
> > + * qoffset is quant offset, it is encoder dependent.
> > + */
> > +typedef struct AVROI {
> > +    size_t top;
> > +    size_t bottom;
> > +    size_t left;
> > +    size_t right;
>
> uint32_t, please. Make the struct have a fixed size so we don't repeat
> the same issues we had with fate tests and AVSphericalMapping.
>

I thought we dropped the side data size from fate tests in
21a8e751ad6abb2d423afa3041da92f8f7741997.
If so, size_t seems the better choice here.
Michael Niedermayer Jan. 2, 2019, 10:07 p.m. UTC | #8
On Wed, Jan 02, 2019 at 09:50:24PM +0100, Vittorio Giovara wrote:
> On Wed, Jan 2, 2019 at 6:45 PM James Almer <jamrial@gmail.com> wrote:
> 
> > On 1/2/2019 2:18 PM, Vittorio Giovara wrote:
> > > On Wed, Jan 2, 2019 at 4:13 PM Vittorio Giovara <
> > vittorio.giovara@gmail.com>
> > > wrote:
> > >
> > >>
> > >>
> > >> On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com> wrote:
> > >>
> > >
> > > AVRegionOfInterest {
> > >  size_t top/left/right/bottom
> > > }
> > >
> > > AVRegionOfInterestSet {
> > >  int rois_nb;
> > >  AVRegionOfInterest *rois;
> >
> > This will go south as soon as you start copying, referencing and freeing
> > frames and/or frame side data.
> >
> > All side data types need to be a contiguous array of bytes in memory
> > with no pointers.
> >
> 
> Hmm you're right, but embedding an entire array is pretty bad too,
> especially because it locks the struct size...
> Do you have any alternative ideas for this?

the array element size could be added in addition to the number of
entries. that way elements can be added after the array and also
the individual elements can be extended

int rois_nb;
int roi_size;
AVRegionOfInterest rois[rois_nb];

Alternativly some "int version" could be used but we dont use that style
elsewhere


[...]
Guo, Yejun Jan. 3, 2019, 5:32 a.m. UTC | #9
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf

> Of Vittorio Giovara

> Sent: Wednesday, January 02, 2019 11:13 PM

> To: FFmpeg development discussions and patches <ffmpeg-

> devel@ffmpeg.org>

> Subject: Re: [FFmpeg-devel] [PATCH V4 1/2] avutil: add ROI data struct and

> bump version

> 

> On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com> wrote:

> 

> > The encoders such as libx264 support different QPs offset for

> > different MBs, it makes possible for ROI-based encoding. It makes

> > sense to add support within ffmpeg to generate/accept ROI infos and

> > pass into encoders.

> >

> > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's

> > code generates ROI info for that frame, and the encoder finally does

> > the ROI-based encoding.

> >

> > The ROI info is maintained as side data of AVFrame.

> >

> > Signed-off-by: Guo, Yejun <yejun.guo@intel.com>

> > ---

> >  libavutil/frame.c   |  1 +

> >  libavutil/frame.h   | 23 +++++++++++++++++++++++

> >  libavutil/version.h |  2 +-

> >  3 files changed, 25 insertions(+), 1 deletion(-)

> >

> > diff --git a/libavutil/frame.c b/libavutil/frame.c index

> > 34a6210..bebc50e 100644

> > --- a/libavutil/frame.c

> > +++ b/libavutil/frame.c

> > @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum

> > AVFrameSideDataType type)

> >      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table

> > data";

> >  #endif

> >      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic

> Metadata

> > SMPTE2094-40 (HDR10+)";

> > +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";

> >      }

> >      return NULL;

> >  }

> > diff --git a/libavutil/frame.h b/libavutil/frame.h index

> > 582ac47..3460b01 100644

> > --- a/libavutil/frame.h

> > +++ b/libavutil/frame.h

> > @@ -173,6 +173,12 @@ enum AVFrameSideDataType {

> >       * volume transform - application 4 of SMPTE 2094-40:2016 standard.

> >       */

> >      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,

> > +

> > +    /**

> > +     * Regions Of Interest, the data is an array of AVROI type, the

> > + array

> > size

> > +     * is implied by AVFrameSideData::size / sizeof(AVROI).

> > +     */

> > +    AV_FRAME_DATA_ROIS,

> >

> 

> Any chance i could convince you of unfolding this acronym into

> AV_FRAME_REGIONS_OF_INTEREST (and AVRegionOfInterest)?

> When I first read it I thought you were talking about Return of Investments

> or Request of Invention, which were mildly confusing.


thanks, sure, will change to AV_FRAME_DATA_REGIONS_OF_INTEREST and AVRegionOfInterest

> 

> The "AVFrameSideData::size" is a C++-ism, could you please use

> "AVFrameSideData.size" like elsewhere in the header?


ok, will change it.

> 

> You should probably document that sizeof() of this struct is part of the public

> ABI.


thanks for catching this issue, it is not ABI compatible if a new data member is later added into
struct AVRegionOfInterest.  will response in following emails.

> 

> 

> >  };

> >

> >  enum AVActiveFormatDescription {

> > @@ -201,6 +207,23 @@ typedef struct AVFrameSideData {  }

> > AVFrameSideData;

> >

> >  /**

> > + * Structure to hold Region Of Interest.

> > + *

> > + * top/bottom/left/right are coordinates at frame pixel level.

> >

> 

> what does  "pixel level" mean? May I suggest better wording?

> 

>     Number of pixels to discard from the the top/bottom/left/right border of

> the frame to obtain the region of interest of the frame.


thanks., much better than mine, will use it.

> 

> + * They will be extended internally if the codec requires an alignment.

> > + * If the regions overlap, the last value in the list will be used.

> >

> 

> Isn't this encoder dependent too?


yes, and will add to notes explicitly.

> 

> 

> > + *

> > + * qoffset is quant offset, it is encoder dependent.

> 

> + */

> > +typedef struct AVROI {

> > +    size_t top;

> > +    size_t bottom;

> > +    size_t left;

> > +    size_t right;

> > +    int qoffset;

> >

> 

> so int instead of float?


I used float in very early version and got feedback to use int. I'm open to both, I'll change to float since two comments till now ask for float.

> 

> +} AVROI;

> > +

> > +/**

> >   * This structure describes decoded (raw) audio or video data.

> >   *

> >   * AVFrame must be allocated using av_frame_alloc(). Note that this

> > only diff --git a/libavutil/version.h b/libavutil/version.h index

> > f997615..1fcdea9 100644

> > --- a/libavutil/version.h

> > +++ b/libavutil/version.h

> > @@ -79,7 +79,7 @@

> >   */

> >

> --

> Vittorio

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Guo, Yejun Jan. 3, 2019, 5:42 a.m. UTC | #10
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf

> Of Vittorio Giovara

> Sent: Thursday, January 03, 2019 4:55 AM

> To: FFmpeg development discussions and patches <ffmpeg-

> devel@ffmpeg.org>

> Subject: Re: [FFmpeg-devel] [PATCH V4 1/2] avutil: add ROI data struct and

> bump version

> 

> On Wed, Jan 2, 2019 at 6:48 PM James Almer <jamrial@gmail.com> wrote:

> 

> > On 12/28/2018 7:09 AM, Guo, Yejun wrote:

> > > The encoders such as libx264 support different QPs offset for

> > > different

> > MBs,

> > > it makes possible for ROI-based encoding. It makes sense to add

> > > support within ffmpeg to generate/accept ROI infos and pass into

> encoders.

> > >

> > > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's

> > > code generates ROI info for that frame, and the encoder finally does

> > > the ROI-based encoding.

> > >

> > > The ROI info is maintained as side data of AVFrame.

> > >

> > > Signed-off-by: Guo, Yejun <yejun.guo@intel.com>

> > > ---

> > >  libavutil/frame.c   |  1 +

> > >  libavutil/frame.h   | 23 +++++++++++++++++++++++

> > >  libavutil/version.h |  2 +-

> > >  3 files changed, 25 insertions(+), 1 deletion(-)

> > >

> > > diff --git a/libavutil/frame.c b/libavutil/frame.c index

> > > 34a6210..bebc50e 100644

> > > --- a/libavutil/frame.c

> > > +++ b/libavutil/frame.c

> > > @@ -841,6 +841,7 @@ const char *av_frame_side_data_name(enum

> > AVFrameSideDataType type)

> > >      case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table

> > data";

> > >  #endif

> > >      case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic

> > > Metadata

> > SMPTE2094-40 (HDR10+)";

> > > +    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";

> > >      }

> > >      return NULL;

> > >  }

> > > diff --git a/libavutil/frame.h b/libavutil/frame.h index

> > > 582ac47..3460b01 100644

> > > --- a/libavutil/frame.h

> > > +++ b/libavutil/frame.h

> > > @@ -173,6 +173,12 @@ enum AVFrameSideDataType {

> > >       * volume transform - application 4 of SMPTE 2094-40:2016 standard.

> > >       */

> > >      AV_FRAME_DATA_DYNAMIC_HDR_PLUS,

> > > +

> > > +    /**

> > > +     * Regions Of Interest, the data is an array of AVROI type, the

> > array size

> > > +     * is implied by AVFrameSideData::size / sizeof(AVROI).

> > > +     */

> > > +    AV_FRAME_DATA_ROIS,

> > >  };

> > >

> > >  enum AVActiveFormatDescription {

> > > @@ -201,6 +207,23 @@ typedef struct AVFrameSideData {  }

> > > AVFrameSideData;

> > >

> > >  /**

> > > + * Structure to hold Region Of Interest.

> > > + *

> > > + * top/bottom/left/right are coordinates at frame pixel level.

> > > + * They will be extended internally if the codec requires an alignment.

> > > + * If the regions overlap, the last value in the list will be used.

> > > + *

> > > + * qoffset is quant offset, it is encoder dependent.

> > > + */

> > > +typedef struct AVROI {

> > > +    size_t top;

> > > +    size_t bottom;

> > > +    size_t left;

> > > +    size_t right;

> >

> > uint32_t, please. Make the struct have a fixed size so we don't repeat

> > the same issues we had with fate tests and AVSphericalMapping.

> >

> 

> I thought we dropped the side data size from fate tests in

> 21a8e751ad6abb2d423afa3041da92f8f7741997.

> If so, size_t seems the better choice here.


I usually choose 'size_t' for the meanings with length/size. Will keep this since 21a8e751ad6abb2d423afa3041da92f8f7741997 was there.

> --

> Vittorio

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Guo, Yejun Jan. 3, 2019, 8:10 a.m. UTC | #11
> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf
> Of Michael Niedermayer
> Sent: Thursday, January 03, 2019 6:07 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V4 1/2] avutil: add ROI data struct and
> bump version
> 
> On Wed, Jan 02, 2019 at 09:50:24PM +0100, Vittorio Giovara wrote:
> > On Wed, Jan 2, 2019 at 6:45 PM James Almer <jamrial@gmail.com> wrote:
> >
> > > On 1/2/2019 2:18 PM, Vittorio Giovara wrote:
> > > > On Wed, Jan 2, 2019 at 4:13 PM Vittorio Giovara <
> > > vittorio.giovara@gmail.com>
> > > > wrote:
> > > >
> > > >>
> > > >>
> > > >> On Fri, Dec 28, 2018 at 3:17 AM Guo, Yejun <yejun.guo@intel.com>
> wrote:
> > > >>
> > > >
> > > > AVRegionOfInterest {
> > > >  size_t top/left/right/bottom
> > > > }
> > > >
> > > > AVRegionOfInterestSet {
> > > >  int rois_nb;
> > > >  AVRegionOfInterest *rois;
> > >
> > > This will go south as soon as you start copying, referencing and
> > > freeing frames and/or frame side data.
> > >
> > > All side data types need to be a contiguous array of bytes in memory
> > > with no pointers.
> > >
> >
> > Hmm you're right, but embedding an entire array is pretty bad too,
> > especially because it locks the struct size...
> > Do you have any alternative ideas for this?
> 
> the array element size could be added in addition to the number of entries.
> that way elements can be added after the array and also the individual
> elements can be extended
> 
> int rois_nb;
> int roi_size;
> AVRegionOfInterest rois[rois_nb];

nice direction, and I just have a little adjustment.

From user's perspective, the buffer size in this method is not straightforward when it calls
av_frame_new_side_data(frame, AV_FRAME_DATA_REGIONS_OF_INTEREST, size) to prepare for ROI buffer.

the size calculation looks like:
size =  sizeof(rois_nb) + sizeof(roi_size) + rois_nb * sizeof(AVRegionOfInterest).

and, the ROI array starts from:  AVFrameSideData.data + sizeof(rois_nb) + sizeof(roi_size).


My change is to move roi_size into struct AVRegionOfInterest as self_size, 
and remove rois_nb (can be calculated with AVFrameSideData.size / AVRegionOfInterest.self_size).

From user's perspective, the code looks like:
    size_t nb_rois = 2;
    AVFrameSideData *sd = av_frame_new_side_data(in, AV_FRAME_DATA_REGIONS_OF_INTEREST, nb_rois * sizeof(AVRegionOfInterest));
    if (!sd) {
        av_frame_free(&in);
        return AVERROR(ENOMEM);
    }
    AVRegionOfInterest* rois = (AVRegionOfInterest*)sd->data;
    rois[0].self_size = sizeof(*rois);
    rois[0].top = 0;
    rois[0].left = 0;
    rois[0].bottom = in->height;
    rois[0].right = in->width/4;
    rois[0].qoffset = -15;
    rois[1].self_size = sizeof(*rois);
    rois[1].top = 0;
    rois[1].left = in->width*3/4;
    rois[1].bottom = in->height;
    rois[1].right = in->width;
    rois[1].qoffset = -15;

The advantage is the code is straightforward.
The disadvantage is that we have to always do 'rois[i].self_size = sizeof(*rois);' and there is a little memory waste. 

I personally prefer this method, will send out the new patch.

> 
> Alternativly some "int version" could be used but we dont use that style
> elsewhere
> 
> 
> [...]
> --
> Michael     GnuPG fingerprint:
> 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Republics decline into democracies and democracies degenerate into
> despotisms. -- Aristotle
diff mbox

Patch

diff --git a/libavutil/frame.c b/libavutil/frame.c
index 34a6210..bebc50e 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -841,6 +841,7 @@  const char *av_frame_side_data_name(enum AVFrameSideDataType type)
     case AV_FRAME_DATA_QP_TABLE_DATA:               return "QP table data";
 #endif
     case AV_FRAME_DATA_DYNAMIC_HDR_PLUS: return "HDR Dynamic Metadata SMPTE2094-40 (HDR10+)";
+    case AV_FRAME_DATA_ROIS: return "Regions Of Interest";
     }
     return NULL;
 }
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 582ac47..3460b01 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -173,6 +173,12 @@  enum AVFrameSideDataType {
      * volume transform - application 4 of SMPTE 2094-40:2016 standard.
      */
     AV_FRAME_DATA_DYNAMIC_HDR_PLUS,
+
+    /**
+     * Regions Of Interest, the data is an array of AVROI type, the array size
+     * is implied by AVFrameSideData::size / sizeof(AVROI).
+     */
+    AV_FRAME_DATA_ROIS,
 };
 
 enum AVActiveFormatDescription {
@@ -201,6 +207,23 @@  typedef struct AVFrameSideData {
 } AVFrameSideData;
 
 /**
+ * Structure to hold Region Of Interest.
+ *
+ * top/bottom/left/right are coordinates at frame pixel level.
+ * They will be extended internally if the codec requires an alignment.
+ * If the regions overlap, the last value in the list will be used.
+ *
+ * qoffset is quant offset, it is encoder dependent.
+ */
+typedef struct AVROI {
+    size_t top;
+    size_t bottom;
+    size_t left;
+    size_t right;
+    int qoffset;
+} AVROI;
+
+/**
  * This structure describes decoded (raw) audio or video data.
  *
  * AVFrame must be allocated using av_frame_alloc(). Note that this only
diff --git a/libavutil/version.h b/libavutil/version.h
index f997615..1fcdea9 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -79,7 +79,7 @@ 
  */
 
 #define LIBAVUTIL_VERSION_MAJOR  56
-#define LIBAVUTIL_VERSION_MINOR  25
+#define LIBAVUTIL_VERSION_MINOR  26
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \