diff mbox

[FFmpeg-devel] add support for ROI-based encoding

Message ID 1544003938-2219-1-git-send-email-yejun.guo@intel.com
State Superseded
Headers show

Commit Message

Guo, Yejun Dec. 5, 2018, 9:58 a.m. UTC
this patch is not ask for merge, it is more to get a feature feedback.

The encoders such as libx264 support different QPs offset for different MBs,
it makes possible for ROI-based encoding. It makes sense to add support
within ffmpeg to generate/accept ROI infos and pass into encoders.

Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
generates ROI info for that frame, and the encoder finally does the
ROI-based encoding. And so I choose to maintain the ROI info within
AVFrame struct.

TODO:
- remove code in vf_scale.c, it is just an example to generate ROI info
- use AVBufferRef instead of current implementation within AVFrame struct.
- add other encoders support

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++
 libavfilter/vf_scale.c |  8 ++++++++
 libavutil/frame.c      |  9 +++++++++
 libavutil/frame.h      | 14 ++++++++++++++
 4 files changed, 66 insertions(+)

Comments

Michael Niedermayer Dec. 5, 2018, 8:41 p.m. UTC | #1
On Wed, Dec 05, 2018 at 05:58:58PM +0800, Guo, Yejun wrote:
> this patch is not ask for merge, it is more to get a feature feedback.
> 
> The encoders such as libx264 support different QPs offset for different MBs,
> it makes possible for ROI-based encoding. It makes sense to add support
> within ffmpeg to generate/accept ROI infos and pass into encoders.
> 
> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> generates ROI info for that frame, and the encoder finally does the
> ROI-based encoding. And so I choose to maintain the ROI info within
> AVFrame struct.
> 
> TODO:
> - remove code in vf_scale.c, it is just an example to generate ROI info
> - use AVBufferRef instead of current implementation within AVFrame struct.
> - add other encoders support
> 
> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> ---
>  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++
>  libavfilter/vf_scale.c |  8 ++++++++
>  libavutil/frame.c      |  9 +++++++++
>  libavutil/frame.h      | 14 ++++++++++++++
>  4 files changed, 66 insertions(+)

this causes asserion failures
./ffmpeg -i matrixbench_mpeg2.mpg  this.mkv

...

ooooops, frame 0x0x3c03cc0 with rois 0
Assertion !"should not reach here" failed at libavcodec/libx264.c:381
Aborted (core dumped)


[...]
Guo, Yejun Dec. 6, 2018, 3:03 a.m. UTC | #2
> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf
> Of Michael Niedermayer
> Sent: Thursday, December 06, 2018 4:41 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] add support for ROI-based encoding
> 
> On Wed, Dec 05, 2018 at 05:58:58PM +0800, Guo, Yejun wrote:
> > this patch is not ask for merge, it is more to get a feature feedback.
> >
> > The encoders such as libx264 support different QPs offset for
> > different MBs, it makes possible for ROI-based encoding. It makes
> > sense to add support within ffmpeg to generate/accept ROI infos and pass
> into encoders.
> >
> > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's
> > code generates ROI info for that frame, and the encoder finally does
> > the ROI-based encoding. And so I choose to maintain the ROI info
> > within AVFrame struct.
> >
> > TODO:
> > - remove code in vf_scale.c, it is just an example to generate ROI
> > info
> > - use AVBufferRef instead of current implementation within AVFrame
> struct.
> > - add other encoders support
> >
> > Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> > ---
> >  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++
> >  libavfilter/vf_scale.c |  8 ++++++++
> >  libavutil/frame.c      |  9 +++++++++
> >  libavutil/frame.h      | 14 ++++++++++++++
> >  4 files changed, 66 insertions(+)
> 
> this causes asserion failures
> ./ffmpeg -i matrixbench_mpeg2.mpg  this.mkv
> 
> ...
> 
> ooooops, frame 0x0x3c03cc0 with rois 0
> Assertion !"should not reach here" failed at libavcodec/libx264.c:381 Aborted
> (core dumped)
> 

The reason is that no one generates the ROI info with this command line.

In this patch, I just borrowed vf_scale to mock the behavior to generate ROI info, the qp offset
of the left half video is forced to be 15, and so the left part will be more rough with a given bitrate.

And only libx264 encoder is supported now, so the command line looks like:
./ffmpeg  -i  .../path_to_1920x1080_video_file  -vf scale=1920:1080 -c:v libx264  -b:v 2000k -y tmp.264

btw, the impact between the left and the right of the video will be more obvious if the video is about
a sky with proper bitrates.

> 
> [...]
> --
> Michael     GnuPG fingerprint:
> 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> While the State exists there can be no freedom; when there is freedom
> there will be no State. -- Vladimir Lenin
Zhong Li Dec. 6, 2018, 5:57 a.m. UTC | #3
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf

> Of Guo, Yejun

> Sent: Wednesday, December 5, 2018 5:59 PM

> To: ffmpeg-devel@ffmpeg.org

> Subject: [FFmpeg-devel] [PATCH] add support for ROI-based encoding

> 

> this patch is not ask for merge, it is more to get a feature feedback.

> 

> The encoders such as libx264 support different QPs offset for different MBs,

> it makes possible for ROI-based encoding. It makes sense to add support

> within ffmpeg to generate/accept ROI infos and pass into encoders.

> 

> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code

> generates ROI info for that frame, and the encoder finally does the

> ROI-based encoding. And so I choose to maintain the ROI info within

> AVFrame struct.

> 

> TODO:

> - remove code in vf_scale.c, it is just an example to generate ROI info

> - use AVBufferRef instead of current implementation within AVFrame struct.

> - add other encoders support

> 

> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>

> ---

>  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++

>  libavfilter/vf_scale.c |  8 ++++++++

>  libavutil/frame.c      |  9 +++++++++

>  libavutil/frame.h      | 14 ++++++++++++++

>  4 files changed, 66 insertions(+)

> 

> diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c index

> a68d0a7..d8cc327 100644

> --- a/libavcodec/libx264.c

> +++ b/libavcodec/libx264.c

> @@ -26,6 +26,7 @@

>  #include "libavutil/pixdesc.h"

>  #include "libavutil/stereo3d.h"

>  #include "libavutil/intreadwrite.h"

> +#include "libavutil/avassert.h"

>  #include "avcodec.h"

>  #include "internal.h"

> 

> @@ -345,6 +346,40 @@ static int X264_frame(AVCodecContext *ctx,

> AVPacket *pkt, const AVFrame *frame,

>                  }

>              }

>          }

> +

> +        if (frame->nb_rois > 0) {

> +            if (x4->params.rc.i_aq_mode == X264_AQ_NONE) {

> +                av_log(ctx, AV_LOG_ERROR, "Adaptive quantization

> must be enabled to use ROI encoding, skipping ROI.\n");

> +            }

> +            if (frame->interlaced_frame == 0) {

> +                const static int MBSIZE = 16;

> +                size_t mbx = (frame->width + MBSIZE - 1) / MBSIZE;

> +                size_t mby = (frame->height + MBSIZE - 1) / MBSIZE;


> +                float* qoffsets = (float*)av_malloc(sizeof(float) * mbx *

> mby);

> +                memset(qoffsets, 0, sizeof(float) * mbx * mby);

> +

> +                for (size_t roi = 0; roi < frame->nb_rois; ++roi) {

> +                    int starty = FFMIN(mby, frame->rois[roi].top /

> MBSIZE);

> +                    int endy = FFMIN(mby, (frame->rois[roi].bottom +

> MBSIZE - 1)/ MBSIZE);

> +                    int startx = FFMIN(mbx, frame->rois[roi].left /

> MBSIZE);

> +                    int endx = FFMIN(mbx, (frame->rois[roi].right +

> MBSIZE - 1)/ MBSIZE);

> +                    for (int y = starty; y < endy; ++y) {

> +                        for (int x = startx; x < endx; ++x) {

> +                            qoffsets[x + y*mbx] =

> frame->rois[roi].qoffset;

> +                        }

> +                    }

> +                }

> +

> +                x4->pic.prop.quant_offsets = qoffsets;

> +                x4->pic.prop.quant_offsets_free = av_free;

> +            } else {

> +                av_log(ctx, AV_LOG_ERROR, "interlaced_frame not

> supported for ROI encoding, skipping ROI.\n");

> +            }

> +        } else {

> +            //to be removed in the final code, it is just for debug usage

> now.

> +            printf("ooooops, frame 0x%p with rois %ld\n", frame,

> frame->nb_rois);

> +            av_assert0(!"should not reach here");

> +        }

>      }

> 

>      do {

> diff --git a/libavfilter/vf_scale.c b/libavfilter/vf_scale.c index

> f741419..71def72 100644

> --- a/libavfilter/vf_scale.c

> +++ b/libavfilter/vf_scale.c

> @@ -437,6 +437,14 @@ static int filter_frame(AVFilterLink *link, AVFrame

> *in)

>              return ret;

>      }

> 

> +    // to be removed, just for debug usage temporarily

> +    in->nb_rois = 1;

> +    in->rois[0].top = 0;

> +    in->rois[0].left = 0;

> +    in->rois[0].bottom = in->height;

> +    in->rois[0].right = in->width/2;

> +    in->rois[0].qoffset = 15.0f; // 15.0f, +-5.0f, +-25.0f

> +

>      if (!scale->sws)

>          return ff_filter_frame(outlink, in);

> 

> diff --git a/libavutil/frame.c b/libavutil/frame.c index 9b3fb13..9c38bdd

> 100644

> --- a/libavutil/frame.c

> +++ b/libavutil/frame.c

> @@ -425,6 +425,15 @@ FF_DISABLE_DEPRECATION_WARNINGS

> FF_ENABLE_DEPRECATION_WARNINGS  #endif

> 

> +    dst->nb_rois = src->nb_rois;

> +    for (int i = 0; i < dst->nb_rois; ++i) {

> +        dst->rois[i].top = src->rois[i].top;

> +        dst->rois[i].bottom = src->rois[i].bottom;

> +        dst->rois[i].left = src->rois[i].left;

> +        dst->rois[i].right = src->rois[i].right;

> +        dst->rois[i].qoffset = src->rois[i].qoffset;

> +    }

> +

>      av_buffer_unref(&dst->opaque_ref);

>      av_buffer_unref(&dst->private_ref);

>      if (src->opaque_ref) {

> diff --git a/libavutil/frame.h b/libavutil/frame.h index 66f27f4..b245a90

> 100644

> --- a/libavutil/frame.h

> +++ b/libavutil/frame.h

> @@ -193,6 +193,15 @@ typedef struct AVFrameSideData {

>      AVBufferRef *buf;

>  } AVFrameSideData;

> 

> +

> +typedef struct AVFrameROI {

> +    size_t top;

> +    size_t bottom;

> +    size_t left;

> +    size_t right;

> +    float qoffset;

> +} AVFrameROI;

> +

>  /**

>   * This structure describes decoded (raw) audio or video data.

>   *

> @@ -556,6 +565,11 @@ typedef struct AVFrame {

>      attribute_deprecated

>      AVBufferRef *qp_table_buf;

>  #endif

> +

> +    //TODO: AVBufferRef*

> +    AVFrameROI rois[2];


It means the maxium roi number is 2, thus making nb_rois is useless more or less.
(And will cause segment fault since I have seen any check of nb_rois<= 2). 

> +    size_t nb_rois;

> +

>      /**

>       * For hwaccel-format frames, this should be a reference to the

>       * AVHWFramesContext describing the frame.

> --

> 2.7.4
Guo, Yejun Dec. 6, 2018, 7:11 a.m. UTC | #4
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf

> Of Li, Zhong

> Sent: Thursday, December 06, 2018 1:57 PM

> To: FFmpeg development discussions and patches <ffmpeg-

> devel@ffmpeg.org>

> Subject: Re: [FFmpeg-devel] [PATCH] add support for ROI-based encoding

> 

> > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On

> Behalf

> > Of Guo, Yejun

> > Sent: Wednesday, December 5, 2018 5:59 PM

> > To: ffmpeg-devel@ffmpeg.org

> > Subject: [FFmpeg-devel] [PATCH] add support for ROI-based encoding

> >

> > this patch is not ask for merge, it is more to get a feature feedback.

> >

> > The encoders such as libx264 support different QPs offset for

> > different MBs, it makes possible for ROI-based encoding. It makes

> > sense to add support within ffmpeg to generate/accept ROI infos and pass

> into encoders.

> >

> > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's

> > code generates ROI info for that frame, and the encoder finally does

> > the ROI-based encoding. And so I choose to maintain the ROI info

> > within AVFrame struct.

> >

> > TODO:

> > - remove code in vf_scale.c, it is just an example to generate ROI

> > info

> > - use AVBufferRef instead of current implementation within AVFrame

> struct.

> > - add other encoders support

> >

> > Signed-off-by: Guo, Yejun <yejun.guo@intel.com>

> > ---

> >  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++

> >  libavfilter/vf_scale.c |  8 ++++++++

> >  libavutil/frame.c      |  9 +++++++++

> >  libavutil/frame.h      | 14 ++++++++++++++

> >  4 files changed, 66 insertions(+)

> >

> > diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c index

> > a68d0a7..d8cc327 100644

> > --- a/libavcodec/libx264.c

> > +++ b/libavcodec/libx264.c

> > @@ -26,6 +26,7 @@

> >  #include "libavutil/pixdesc.h"

> >  #include "libavutil/stereo3d.h"

> >  #include "libavutil/intreadwrite.h"

> > +#include "libavutil/avassert.h"

> >  #include "avcodec.h"

> >  #include "internal.h"

> >

> > @@ -345,6 +346,40 @@ static int X264_frame(AVCodecContext *ctx,

> > AVPacket *pkt, const AVFrame *frame,

> >                  }

> >              }

> >          }

> > +

> > +        if (frame->nb_rois > 0) {

> > +            if (x4->params.rc.i_aq_mode == X264_AQ_NONE) {

> > +                av_log(ctx, AV_LOG_ERROR, "Adaptive quantization

> > must be enabled to use ROI encoding, skipping ROI.\n");

> > +            }

> > +            if (frame->interlaced_frame == 0) {

> > +                const static int MBSIZE = 16;

> > +                size_t mbx = (frame->width + MBSIZE - 1) / MBSIZE;

> > +                size_t mby = (frame->height + MBSIZE - 1) / MBSIZE;

> 

> > +                float* qoffsets = (float*)av_malloc(sizeof(float) *

> > + mbx *

> > mby);

> > +                memset(qoffsets, 0, sizeof(float) * mbx * mby);

> > +

> > +                for (size_t roi = 0; roi < frame->nb_rois; ++roi) {

> > +                    int starty = FFMIN(mby, frame->rois[roi].top /

> > MBSIZE);

> > +                    int endy = FFMIN(mby, (frame->rois[roi].bottom +

> > MBSIZE - 1)/ MBSIZE);

> > +                    int startx = FFMIN(mbx, frame->rois[roi].left /

> > MBSIZE);

> > +                    int endx = FFMIN(mbx, (frame->rois[roi].right +

> > MBSIZE - 1)/ MBSIZE);

> > +                    for (int y = starty; y < endy; ++y) {

> > +                        for (int x = startx; x < endx; ++x) {

> > +                            qoffsets[x + y*mbx] =

> > frame->rois[roi].qoffset;

> > +                        }

> > +                    }

> > +                }

> > +

> > +                x4->pic.prop.quant_offsets = qoffsets;

> > +                x4->pic.prop.quant_offsets_free = av_free;

> > +            } else {

> > +                av_log(ctx, AV_LOG_ERROR, "interlaced_frame not

> > supported for ROI encoding, skipping ROI.\n");

> > +            }

> > +        } else {

> > +            //to be removed in the final code, it is just for debug

> > + usage

> > now.

> > +            printf("ooooops, frame 0x%p with rois %ld\n", frame,

> > frame->nb_rois);

> > +            av_assert0(!"should not reach here");

> > +        }

> >      }

> >

> >      do {

> > diff --git a/libavfilter/vf_scale.c b/libavfilter/vf_scale.c index

> > f741419..71def72 100644

> > --- a/libavfilter/vf_scale.c

> > +++ b/libavfilter/vf_scale.c

> > @@ -437,6 +437,14 @@ static int filter_frame(AVFilterLink *link,

> > AVFrame

> > *in)

> >              return ret;

> >      }

> >

> > +    // to be removed, just for debug usage temporarily

> > +    in->nb_rois = 1;

> > +    in->rois[0].top = 0;

> > +    in->rois[0].left = 0;

> > +    in->rois[0].bottom = in->height;

> > +    in->rois[0].right = in->width/2;

> > +    in->rois[0].qoffset = 15.0f; // 15.0f, +-5.0f, +-25.0f

> > +

> >      if (!scale->sws)

> >          return ff_filter_frame(outlink, in);

> >

> > diff --git a/libavutil/frame.c b/libavutil/frame.c index

> > 9b3fb13..9c38bdd

> > 100644

> > --- a/libavutil/frame.c

> > +++ b/libavutil/frame.c

> > @@ -425,6 +425,15 @@ FF_DISABLE_DEPRECATION_WARNINGS

> > FF_ENABLE_DEPRECATION_WARNINGS  #endif

> >

> > +    dst->nb_rois = src->nb_rois;

> > +    for (int i = 0; i < dst->nb_rois; ++i) {

> > +        dst->rois[i].top = src->rois[i].top;

> > +        dst->rois[i].bottom = src->rois[i].bottom;

> > +        dst->rois[i].left = src->rois[i].left;

> > +        dst->rois[i].right = src->rois[i].right;

> > +        dst->rois[i].qoffset = src->rois[i].qoffset;

> > +    }

> > +

> >      av_buffer_unref(&dst->opaque_ref);

> >      av_buffer_unref(&dst->private_ref);

> >      if (src->opaque_ref) {

> > diff --git a/libavutil/frame.h b/libavutil/frame.h index

> > 66f27f4..b245a90

> > 100644

> > --- a/libavutil/frame.h

> > +++ b/libavutil/frame.h

> > @@ -193,6 +193,15 @@ typedef struct AVFrameSideData {

> >      AVBufferRef *buf;

> >  } AVFrameSideData;

> >

> > +

> > +typedef struct AVFrameROI {

> > +    size_t top;

> > +    size_t bottom;

> > +    size_t left;

> > +    size_t right;

> > +    float qoffset;

> > +} AVFrameROI;

> > +

> >  /**

> >   * This structure describes decoded (raw) audio or video data.

> >   *

> > @@ -556,6 +565,11 @@ typedef struct AVFrame {

> >      attribute_deprecated

> >      AVBufferRef *qp_table_buf;

> >  #endif

> > +

> > +    //TODO: AVBufferRef*

> > +    AVFrameROI rois[2];

> 

> It means the maxium roi number is 2, thus making nb_rois is useless more or

> less.

> (And will cause segment fault since I have seen any check of nb_rois<= 2).

> 


this is quick and dirty code to get feature feedback, this array will be replaced
with AVBufferRef* in my TODO.

> > +    size_t nb_rois;

> > +

> >      /**

> >       * For hwaccel-format frames, this should be a reference to the

> >       * AVHWFramesContext describing the frame.

> > --

> > 2.7.4

> 

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Mark Thompson Dec. 9, 2018, 7:21 p.m. UTC | #5
On 05/12/2018 09:58, Guo, Yejun wrote:
> this patch is not ask for merge, it is more to get a feature feedback.
> 
> The encoders such as libx264 support different QPs offset for different MBs,
> it makes possible for ROI-based encoding. It makes sense to add support
> within ffmpeg to generate/accept ROI infos and pass into encoders.
> 
> Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
> generates ROI info for that frame, and the encoder finally does the
> ROI-based encoding. And so I choose to maintain the ROI info within
> AVFrame struct.
> 
> TODO:
> - remove code in vf_scale.c, it is just an example to generate ROI info
> - use AVBufferRef instead of current implementation within AVFrame struct.
> - add other encoders support
> 
> Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
> ---
>  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++
>  libavfilter/vf_scale.c |  8 ++++++++
>  libavutil/frame.c      |  9 +++++++++
>  libavutil/frame.h      | 14 ++++++++++++++
>  4 files changed, 66 insertions(+)

This approach seems reasonable to me; some thoughts below.

> ...
> diff --git a/libavutil/frame.c b/libavutil/frame.c
> index 9b3fb13..9c38bdd 100644
> --- a/libavutil/frame.c
> +++ b/libavutil/frame.c
> @@ -425,6 +425,15 @@ FF_DISABLE_DEPRECATION_WARNINGS
>  FF_ENABLE_DEPRECATION_WARNINGS
>  #endif
>  
> +    dst->nb_rois = src->nb_rois;
> +    for (int i = 0; i < dst->nb_rois; ++i) {
> +        dst->rois[i].top = src->rois[i].top;
> +        dst->rois[i].bottom = src->rois[i].bottom;
> +        dst->rois[i].left = src->rois[i].left;
> +        dst->rois[i].right = src->rois[i].right;
> +        dst->rois[i].qoffset = src->rois[i].qoffset;
> +    }
> +
>      av_buffer_unref(&dst->opaque_ref);
>      av_buffer_unref(&dst->private_ref);
>      if (src->opaque_ref) {
> diff --git a/libavutil/frame.h b/libavutil/frame.h
> index 66f27f4..b245a90 100644
> --- a/libavutil/frame.h
> +++ b/libavutil/frame.h
> @@ -193,6 +193,15 @@ typedef struct AVFrameSideData {
>      AVBufferRef *buf;
>  } AVFrameSideData;
>  
> +
> +typedef struct AVFrameROI {
> +    size_t top;
> +    size_t bottom;
> +    size_t left;
> +    size_t right;

Do you want some additional constraints on this?  For example, must be integers in every plane (so even values only for 4:2:0), or maybe even must match some codec-specific block size.

> +    float qoffset;

Everything else uses integers here, so this should be an integer.  The meaning of it will probably be entirely codec-dependent.

> +} AVFrameROI;
> +
>  /**
>   * This structure describes decoded (raw) audio or video data.
>   *
> @@ -556,6 +565,11 @@ typedef struct AVFrame {
>      attribute_deprecated
>      AVBufferRef *qp_table_buf;
>  #endif
> +
> +    //TODO: AVBufferRef*
> +    AVFrameROI rois[2];
> +    size_t nb_rois;

This should be side-data (which is automatically refcounted) rather than included directly in the AVFrame.

> +
>      /**
>       * For hwaccel-format frames, this should be a reference to the
>       * AVHWFramesContext describing the frame.
> 

Thanks,

- Mark
Guo, Yejun Dec. 10, 2018, 3:09 a.m. UTC | #6
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf

> Of Mark Thompson

> Sent: Monday, December 10, 2018 3:21 AM

> To: ffmpeg-devel@ffmpeg.org

> Subject: Re: [FFmpeg-devel] [PATCH] add support for ROI-based encoding

> 

> On 05/12/2018 09:58, Guo, Yejun wrote:

> > this patch is not ask for merge, it is more to get a feature feedback.

> >

> > The encoders such as libx264 support different QPs offset for

> > different MBs, it makes possible for ROI-based encoding. It makes

> > sense to add support within ffmpeg to generate/accept ROI infos and pass

> into encoders.

> >

> > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's

> > code generates ROI info for that frame, and the encoder finally does

> > the ROI-based encoding. And so I choose to maintain the ROI info

> > within AVFrame struct.

> >

> > TODO:

> > - remove code in vf_scale.c, it is just an example to generate ROI

> > info

> > - use AVBufferRef instead of current implementation within AVFrame

> struct.

> > - add other encoders support

> >

> > Signed-off-by: Guo, Yejun <yejun.guo@intel.com>

> > ---

> >  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++

> >  libavfilter/vf_scale.c |  8 ++++++++

> >  libavutil/frame.c      |  9 +++++++++

> >  libavutil/frame.h      | 14 ++++++++++++++

> >  4 files changed, 66 insertions(+)

> 

> This approach seems reasonable to me; some thoughts below.


yeah, :)

> 

> > ...

> > diff --git a/libavutil/frame.c b/libavutil/frame.c index

> > 9b3fb13..9c38bdd 100644

> > --- a/libavutil/frame.c

> > +++ b/libavutil/frame.c

> > @@ -425,6 +425,15 @@ FF_DISABLE_DEPRECATION_WARNINGS

> > FF_ENABLE_DEPRECATION_WARNINGS  #endif

> >

> > +    dst->nb_rois = src->nb_rois;

> > +    for (int i = 0; i < dst->nb_rois; ++i) {

> > +        dst->rois[i].top = src->rois[i].top;

> > +        dst->rois[i].bottom = src->rois[i].bottom;

> > +        dst->rois[i].left = src->rois[i].left;

> > +        dst->rois[i].right = src->rois[i].right;

> > +        dst->rois[i].qoffset = src->rois[i].qoffset;

> > +    }

> > +

> >      av_buffer_unref(&dst->opaque_ref);

> >      av_buffer_unref(&dst->private_ref);

> >      if (src->opaque_ref) {

> > diff --git a/libavutil/frame.h b/libavutil/frame.h index

> > 66f27f4..b245a90 100644

> > --- a/libavutil/frame.h

> > +++ b/libavutil/frame.h

> > @@ -193,6 +193,15 @@ typedef struct AVFrameSideData {

> >      AVBufferRef *buf;

> >  } AVFrameSideData;

> >

> > +

> > +typedef struct AVFrameROI {

> > +    size_t top;

> > +    size_t bottom;

> > +    size_t left;

> > +    size_t right;

> 

> Do you want some additional constraints on this?  For example, must be

> integers in every plane (so even values only for 4:2:0), or maybe even must

> match some codec-specific block size.


my thought is they are the coordinates at frame pixel level no matter what format is.
And the ROI info generator can focus on their logic, do not care the underlying codec.

If the ROI area does not exactly match with the codec-specific block size, my thought
is to just internally extend the ROI area to match. This will added into the comments.

> 

> > +    float qoffset;

> 

> Everything else uses integers here, so this should be an integer.  The

> meaning of it will probably be entirely codec-dependent.


I chose float because the interface provided by libx264 is float, see https://github.com/mirror/x264/blob/master/x264.h#L760.
And I now think we might use a more general concept (enum AVRoiQuality), to make the codec features transparent to users.

enum AVRoiQuality {
    AV_RQ_NONE = 0,
    AV_RQ_BETTER = 1,
    AV_RQ_BEST = 2,
};


Actually, my initial idea is to export more control here, but it might
give the ROI generator burden to decide this value for different codecs.
For example, the ROI might be generated by a neural network (NN), and the NN
developers mainly focus on network model, not on the codec detail. 


It is still open for me to choose which direction, I now prefer to 'enum AVRoiQuality'


> 

> > +} AVFrameROI;

> > +

> >  /**

> >   * This structure describes decoded (raw) audio or video data.

> >   *

> > @@ -556,6 +565,11 @@ typedef struct AVFrame {

> >      attribute_deprecated

> >      AVBufferRef *qp_table_buf;

> >  #endif

> > +

> > +    //TODO: AVBufferRef*

> > +    AVFrameROI rois[2];

> > +    size_t nb_rois;

> 

> This should be side-data (which is automatically refcounted) rather than

> included directly in the AVFrame.


I guess it is AVBufferRef*, I will switch to it in the final patch. I'm not familiar with it
and so chose the array method for a quick implementation.

> 

> > +

> >      /**

> >       * For hwaccel-format frames, this should be a reference to the

> >       * AVHWFramesContext describing the frame.

> >

> 

> Thanks,

> 

> - Mark

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
diff mbox

Patch

diff --git a/libavcodec/libx264.c b/libavcodec/libx264.c
index a68d0a7..d8cc327 100644
--- a/libavcodec/libx264.c
+++ b/libavcodec/libx264.c
@@ -26,6 +26,7 @@ 
 #include "libavutil/pixdesc.h"
 #include "libavutil/stereo3d.h"
 #include "libavutil/intreadwrite.h"
+#include "libavutil/avassert.h"
 #include "avcodec.h"
 #include "internal.h"
 
@@ -345,6 +346,40 @@  static int X264_frame(AVCodecContext *ctx, AVPacket *pkt, const AVFrame *frame,
                 }
             }
         }
+
+        if (frame->nb_rois > 0) {
+            if (x4->params.rc.i_aq_mode == X264_AQ_NONE) {
+                av_log(ctx, AV_LOG_ERROR, "Adaptive quantization must be enabled to use ROI encoding, skipping ROI.\n");
+            }
+            if (frame->interlaced_frame == 0) {
+                const static int MBSIZE = 16;
+                size_t mbx = (frame->width + MBSIZE - 1) / MBSIZE;
+                size_t mby = (frame->height + MBSIZE - 1) / MBSIZE;
+                float* qoffsets = (float*)av_malloc(sizeof(float) * mbx * mby);
+                memset(qoffsets, 0, sizeof(float) * mbx * mby);
+
+                for (size_t roi = 0; roi < frame->nb_rois; ++roi) {
+                    int starty = FFMIN(mby, frame->rois[roi].top / MBSIZE);
+                    int endy = FFMIN(mby, (frame->rois[roi].bottom + MBSIZE - 1)/ MBSIZE);
+                    int startx = FFMIN(mbx, frame->rois[roi].left / MBSIZE);
+                    int endx = FFMIN(mbx, (frame->rois[roi].right + MBSIZE - 1)/ MBSIZE);
+                    for (int y = starty; y < endy; ++y) {
+                        for (int x = startx; x < endx; ++x) {
+                            qoffsets[x + y*mbx] = frame->rois[roi].qoffset;
+                        }
+                    }
+                }
+
+                x4->pic.prop.quant_offsets = qoffsets;
+                x4->pic.prop.quant_offsets_free = av_free;
+            } else {
+                av_log(ctx, AV_LOG_ERROR, "interlaced_frame not supported for ROI encoding, skipping ROI.\n");
+            }
+        } else {
+            //to be removed in the final code, it is just for debug usage now.
+            printf("ooooops, frame 0x%p with rois %ld\n", frame, frame->nb_rois);
+            av_assert0(!"should not reach here");
+        }
     }
 
     do {
diff --git a/libavfilter/vf_scale.c b/libavfilter/vf_scale.c
index f741419..71def72 100644
--- a/libavfilter/vf_scale.c
+++ b/libavfilter/vf_scale.c
@@ -437,6 +437,14 @@  static int filter_frame(AVFilterLink *link, AVFrame *in)
             return ret;
     }
 
+    // to be removed, just for debug usage temporarily
+    in->nb_rois = 1;
+    in->rois[0].top = 0;
+    in->rois[0].left = 0;
+    in->rois[0].bottom = in->height;
+    in->rois[0].right = in->width/2;
+    in->rois[0].qoffset = 15.0f; // 15.0f, +-5.0f, +-25.0f
+
     if (!scale->sws)
         return ff_filter_frame(outlink, in);
 
diff --git a/libavutil/frame.c b/libavutil/frame.c
index 9b3fb13..9c38bdd 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -425,6 +425,15 @@  FF_DISABLE_DEPRECATION_WARNINGS
 FF_ENABLE_DEPRECATION_WARNINGS
 #endif
 
+    dst->nb_rois = src->nb_rois;
+    for (int i = 0; i < dst->nb_rois; ++i) {
+        dst->rois[i].top = src->rois[i].top;
+        dst->rois[i].bottom = src->rois[i].bottom;
+        dst->rois[i].left = src->rois[i].left;
+        dst->rois[i].right = src->rois[i].right;
+        dst->rois[i].qoffset = src->rois[i].qoffset;
+    }
+
     av_buffer_unref(&dst->opaque_ref);
     av_buffer_unref(&dst->private_ref);
     if (src->opaque_ref) {
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 66f27f4..b245a90 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -193,6 +193,15 @@  typedef struct AVFrameSideData {
     AVBufferRef *buf;
 } AVFrameSideData;
 
+
+typedef struct AVFrameROI {
+    size_t top;
+    size_t bottom;
+    size_t left;
+    size_t right;
+    float qoffset;
+} AVFrameROI;
+
 /**
  * This structure describes decoded (raw) audio or video data.
  *
@@ -556,6 +565,11 @@  typedef struct AVFrame {
     attribute_deprecated
     AVBufferRef *qp_table_buf;
 #endif
+
+    //TODO: AVBufferRef*
+    AVFrameROI rois[2];
+    size_t nb_rois;
+
     /**
      * For hwaccel-format frames, this should be a reference to the
      * AVHWFramesContext describing the frame.