[FFmpeg-devel,3/3] lavfi/motion_estimation: use pixelutils API for sad.

Submitted by Jun Zhao on July 10, 2018, 10:37 p.m.

Details

Message ID 1531262257-4660-3-git-send-email-mypopydev@gmail.com
State New
Headers show

Commit Message

Jun Zhao July 10, 2018, 10:37 p.m.
use pixelutils API for sad in motion estimation.

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
---
 libavfilter/motion_estimation.c |   12 +++++++++---
 libavfilter/motion_estimation.h |    2 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

Comments

Carl Eugen Hoyos July 10, 2018, 11:47 p.m.
2018-07-11 0:37 GMT+02:00, Jun Zhao <mypopydev@gmail.com>:
> use pixelutils API for sad in motion estimation.

Some performance number make sense for the commit message imo.

Carl Eugen
Michael Niedermayer July 11, 2018, 12:30 a.m.
On Wed, Jul 11, 2018 at 06:37:37AM +0800, Jun Zhao wrote:
> use pixelutils API for sad in motion estimation.
> 
> Signed-off-by: Jun Zhao <mypopydev@gmail.com>
> ---
>  libavfilter/motion_estimation.c |   12 +++++++++---
>  libavfilter/motion_estimation.h |    2 ++
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/libavfilter/motion_estimation.c b/libavfilter/motion_estimation.c
> index 0f9ba21..8ccd879 100644
> --- a/libavfilter/motion_estimation.c
> +++ b/libavfilter/motion_estimation.c
> @@ -54,6 +54,8 @@ void ff_me_init_context(AVMotionEstContext *me_ctx, int mb_size, int search_para
>      me_ctx->x_max = x_max;
>      me_ctx->y_min = y_min;
>      me_ctx->y_max = y_max;
> +
> +    me_ctx->sad = av_pixelutils_get_sad_fn(av_ceil_log2_c(mb_size), av_ceil_log2_c(mb_size), 0, NULL);
>  }
>  
>  uint64_t ff_me_cmp_sad(AVMotionEstContext *me_ctx, int x_mb, int y_mb, int x_mv, int y_mv)
> @@ -67,9 +69,13 @@ uint64_t ff_me_cmp_sad(AVMotionEstContext *me_ctx, int x_mb, int y_mb, int x_mv,
>      data_ref += y_mv * linesize;
>      data_cur += y_mb * linesize;
>  
> -    for (j = 0; j < me_ctx->mb_size; j++)
> -        for (i = 0; i < me_ctx->mb_size; i++)
> -            sad += FFABS(data_ref[x_mv + i + j * linesize] - data_cur[x_mb + i + j * linesize]);
> +    if (me_ctx->sad) {
> +        sad = me_ctx->sad(data_ref+x_mv, linesize, data_cur+x_mb, linesize);
> +    } else {
> +        for (j = 0; j < me_ctx->mb_size; j++)
> +            for (i = 0; i < me_ctx->mb_size; i++)
> +                sad += FFABS(data_ref[x_mv + i + j * linesize] - data_cur[x_mb + i + j * linesize]);
> +    }
>  

The function pointers which point to ff_me_cmp_sad() should point to SIMD
code in the optimized case.
there should be no check per call

[...]
Jun Zhao July 11, 2018, 2:06 a.m.
On Wed, Jul 11, 2018 at 7:47 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2018-07-11 0:37 GMT+02:00, Jun Zhao <mypopydev@gmail.com>:
> > use pixelutils API for sad in motion estimation.
>
> Some performance number make sense for the commit message imo.
>
> Carl Eugen
Will update performance number in next version, Thanks
Jun Zhao July 11, 2018, 2:11 a.m.
On Wed, Jul 11, 2018 at 8:31 AM Michael Niedermayer
<michael@niedermayer.cc> wrote:
>
> On Wed, Jul 11, 2018 at 06:37:37AM +0800, Jun Zhao wrote:
> > use pixelutils API for sad in motion estimation.
> >
> > Signed-off-by: Jun Zhao <mypopydev@gmail.com>
> > ---
> >  libavfilter/motion_estimation.c |   12 +++++++++---
> >  libavfilter/motion_estimation.h |    2 ++
> >  2 files changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/libavfilter/motion_estimation.c b/libavfilter/motion_estimation.c
> > index 0f9ba21..8ccd879 100644
> > --- a/libavfilter/motion_estimation.c
> > +++ b/libavfilter/motion_estimation.c
> > @@ -54,6 +54,8 @@ void ff_me_init_context(AVMotionEstContext *me_ctx, int mb_size, int search_para
> >      me_ctx->x_max = x_max;
> >      me_ctx->y_min = y_min;
> >      me_ctx->y_max = y_max;
> > +
> > +    me_ctx->sad = av_pixelutils_get_sad_fn(av_ceil_log2_c(mb_size), av_ceil_log2_c(mb_size), 0, NULL);
> >  }
> >
> >  uint64_t ff_me_cmp_sad(AVMotionEstContext *me_ctx, int x_mb, int y_mb, int x_mv, int y_mv)
> > @@ -67,9 +69,13 @@ uint64_t ff_me_cmp_sad(AVMotionEstContext *me_ctx, int x_mb, int y_mb, int x_mv,
> >      data_ref += y_mv * linesize;
> >      data_cur += y_mb * linesize;
> >
> > -    for (j = 0; j < me_ctx->mb_size; j++)
> > -        for (i = 0; i < me_ctx->mb_size; i++)
> > -            sad += FFABS(data_ref[x_mv + i + j * linesize] - data_cur[x_mb + i + j * linesize]);
> > +    if (me_ctx->sad) {
> > +        sad = me_ctx->sad(data_ref+x_mv, linesize, data_cur+x_mb, linesize);
> > +    } else {
> > +        for (j = 0; j < me_ctx->mb_size; j++)
> > +            for (i = 0; i < me_ctx->mb_size; i++)
> > +                sad += FFABS(data_ref[x_mv + i + j * linesize] - data_cur[x_mb + i + j * linesize]);
> > +    }
> >
>
> The function pointers which point to ff_me_cmp_sad() should point to SIMD
> code in the optimized case.
> there should be no check per call
Thanks the suggestion, will move the check in ff_me_init_context to
avoid the check per call in ff_me_cmp_sad
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> it is not once nor twice but times without number that the same ideas make
> their appearance in the world. -- Aristotle
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Marton Balint July 11, 2018, 4:42 p.m.
On Wed, 11 Jul 2018, Jun Zhao wrote:

> use pixelutils API for sad in motion estimation.

Does it make sense to improve this code? I thought a superior and faster 
approach was a result of 2017 GSOC task:

https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit

Maybe that code should be merged back, and any further optimalization 
should be done based on that code, no?

Thanks,
Marton
mypopy@gmail.com July 12, 2018, 12:50 a.m.
On Thu, Jul 12, 2018 at 12:43 AM Marton Balint <cus@passwd.hu> wrote:
>
>
>
> On Wed, 11 Jul 2018, Jun Zhao wrote:
>
> > use pixelutils API for sad in motion estimation.
>
> Does it make sense to improve this code? I thought a superior and faster
> approach was a result of 2017 GSOC task:
>
> https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit
>
> Maybe that code should be merged back, and any further optimalization
> should be done based on that code, no?
>
> Thanks,
> Marton
>
Hi, Marton:

Yes, now I try to improve the minterpolate, and after use perf
profiing the commands:

./ffmpeg -i a.ts -filter_complex
"minterpolate=mi_mode=mci:mc_mode=aobmc:vsbmc=1" -f null /dev/null
I found the hotspot is:
- get_sbad_ob
- get_sbad
- get_sad_ob
- bilateral_obmc
- set_frame_data

So, as my plan, I will try to use sse2/avx2
Scatter/Gather, optimized
sad function (use pixelutils API)
 in  get_sbad_ob /  get_sbad /  get_sad_ob first, for  set_frame_data
case, maybe need to use Scatter/Gather SIMD instruction.

But if some guys have done some improve task in this case, I think
based on the pre-existing work is the better way.

BTW: I have tried to use SIMD to improve the blend data path (like a
command: ./ffmpeg -i a.ts -filter_complex
"minterpolate=mi_mode=blend:scd=fdiff:scd_threshold=1" -f null
/dev/null ), after the SIMD improve, the fps increae about 100% (from
30fps to 60fps in 1080P)
Marton Balint July 13, 2018, 8:51 a.m.
On Thu, 12 Jul 2018, mypopy@gmail.com wrote:

> On Thu, Jul 12, 2018 at 12:43 AM Marton Balint <cus@passwd.hu> wrote:
>>
>>
>>
>> On Wed, 11 Jul 2018, Jun Zhao wrote:
>>
>> > use pixelutils API for sad in motion estimation.
>>
>> Does it make sense to improve this code? I thought a superior and faster
>> approach was a result of 2017 GSOC task:
>>
>> https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit
>>
>> Maybe that code should be merged back, and any further optimalization
>> should be done based on that code, no?
>>
>> Thanks,
>> Marton
>>
> Hi, Marton:
>
> Yes, now I try to improve the minterpolate, and after use perf
> profiing the commands:
>
> ./ffmpeg -i a.ts -filter_complex
> "minterpolate=mi_mode=mci:mc_mode=aobmc:vsbmc=1" -f null /dev/null
> I found the hotspot is:
> - get_sbad_ob
> - get_sbad
> - get_sad_ob
> - bilateral_obmc
> - set_frame_data
>
> So, as my plan, I will try to use sse2/avx2
> Scatter/Gather, optimized
> sad function (use pixelutils API)
> in  get_sbad_ob /  get_sbad /  get_sad_ob first, for  set_frame_data
> case, maybe need to use Scatter/Gather SIMD instruction.

That is great, all I am saying we should avoid diverging the two brances 
(FFmpeg branch, and GSOC 2017 branch), and try to merge back 
GSOC2017 if it can be done with reasonable amount of work before 
optimizing code, otherwise the GSOC2017 branch will rot and we will lose 
the result of the GSOC task.

>
> But if some guys have done some improve task in this case, I think
> based on the pre-existing work is the better way.

Michael was the mentor, maybe he can chip in on what should be done here.

Thanks,
Marton
Michael Niedermayer July 13, 2018, 11:33 p.m.
On Fri, Jul 13, 2018 at 10:51:00AM +0200, Marton Balint wrote:
> 
> 
> On Thu, 12 Jul 2018, mypopy@gmail.com wrote:
> 
> >On Thu, Jul 12, 2018 at 12:43 AM Marton Balint <cus@passwd.hu> wrote:
> >>
> >>
> >>
> >>On Wed, 11 Jul 2018, Jun Zhao wrote:
> >>
> >>> use pixelutils API for sad in motion estimation.
> >>
> >>Does it make sense to improve this code? I thought a superior and faster
> >>approach was a result of 2017 GSOC task:
> >>
> >>https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit
> >>
> >>Maybe that code should be merged back, and any further optimalization
> >>should be done based on that code, no?
> >>
> >>Thanks,
> >>Marton
> >>
> >Hi, Marton:
> >
> >Yes, now I try to improve the minterpolate, and after use perf
> >profiing the commands:
> >
> >./ffmpeg -i a.ts -filter_complex
> >"minterpolate=mi_mode=mci:mc_mode=aobmc:vsbmc=1" -f null /dev/null
> >I found the hotspot is:
> >- get_sbad_ob
> >- get_sbad
> >- get_sad_ob
> >- bilateral_obmc
> >- set_frame_data
> >
> >So, as my plan, I will try to use sse2/avx2
> >Scatter/Gather, optimized
> >sad function (use pixelutils API)
> >in  get_sbad_ob /  get_sbad /  get_sad_ob first, for  set_frame_data
> >case, maybe need to use Scatter/Gather SIMD instruction.
> 
> That is great, all I am saying we should avoid diverging the two brances
> (FFmpeg branch, and GSOC 2017 branch), and try to merge back GSOC2017 if it
> can be done with reasonable amount of work before optimizing code, otherwise
> the GSOC2017 branch will rot and we will lose the result of the GSOC task.
> 
> >
> >But if some guys have done some improve task in this case, I think
> >based on the pre-existing work is the better way.
> 
> Michael was the mentor, maybe he can chip in on what should be done here.

talk with the author/student who wrote the code, not me :)


[...]
Marton Balint July 14, 2018, 10:04 a.m.
On Sat, 14 Jul 2018, Michael Niedermayer wrote:

> On Fri, Jul 13, 2018 at 10:51:00AM +0200, Marton Balint wrote:
>>
>>
>> On Thu, 12 Jul 2018, mypopy@gmail.com wrote:
>>
>>> On Thu, Jul 12, 2018 at 12:43 AM Marton Balint <cus@passwd.hu> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, 11 Jul 2018, Jun Zhao wrote:
>>>>
>>>>> use pixelutils API for sad in motion estimation.
>>>>
>>>> Does it make sense to improve this code? I thought a superior and faster
>>>> approach was a result of 2017 GSOC task:
>>>>
>>>> https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit
>>>>
>>>> Maybe that code should be merged back, and any further optimalization
>>>> should be done based on that code, no?
>>>>
>>>> Thanks,
>>>> Marton
>>>>
>>> Hi, Marton:
>>>
>>> Yes, now I try to improve the minterpolate, and after use perf
>>> profiing the commands:
>>>
>>> ./ffmpeg -i a.ts -filter_complex
>>> "minterpolate=mi_mode=mci:mc_mode=aobmc:vsbmc=1" -f null /dev/null
>>> I found the hotspot is:
>>> - get_sbad_ob
>>> - get_sbad
>>> - get_sad_ob
>>> - bilateral_obmc
>>> - set_frame_data
>>>
>>> So, as my plan, I will try to use sse2/avx2
>>> Scatter/Gather, optimized
>>> sad function (use pixelutils API)
>>> in  get_sbad_ob /  get_sbad /  get_sad_ob first, for  set_frame_data
>>> case, maybe need to use Scatter/Gather SIMD instruction.
>>
>> That is great, all I am saying we should avoid diverging the two brances
>> (FFmpeg branch, and GSOC 2017 branch), and try to merge back GSOC2017 if it
>> can be done with reasonable amount of work before optimizing code, otherwise
>> the GSOC2017 branch will rot and we will lose the result of the GSOC task.
>>
>>>
>>> But if some guys have done some improve task in this case, I think
>>> based on the pre-existing work is the better way.
>>
>> Michael was the mentor, maybe he can chip in on what should be done here.
>
> talk with the author/student who wrote the code, not me :)

Well, his not active here, and the question is if his work is ready for 
mainline inclusion or not, and if he has done enough valuable work during 
GSOC that its worth working on mainlining it.

Thanks,
Marton
Michael Niedermayer July 14, 2018, 5:03 p.m.
On Sat, Jul 14, 2018 at 12:04:46PM +0200, Marton Balint wrote:
> 
> 
> On Sat, 14 Jul 2018, Michael Niedermayer wrote:
> 
> >On Fri, Jul 13, 2018 at 10:51:00AM +0200, Marton Balint wrote:
> >>
> >>
> >>On Thu, 12 Jul 2018, mypopy@gmail.com wrote:
> >>
> >>>On Thu, Jul 12, 2018 at 12:43 AM Marton Balint <cus@passwd.hu> wrote:
> >>>>
> >>>>
> >>>>
> >>>>On Wed, 11 Jul 2018, Jun Zhao wrote:
> >>>>
> >>>>>use pixelutils API for sad in motion estimation.
> >>>>
> >>>>Does it make sense to improve this code? I thought a superior and faster
> >>>>approach was a result of 2017 GSOC task:
> >>>>
> >>>>https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit
> >>>>
> >>>>Maybe that code should be merged back, and any further optimalization
> >>>>should be done based on that code, no?
> >>>>
> >>>>Thanks,
> >>>>Marton
> >>>>
> >>>Hi, Marton:
> >>>
> >>>Yes, now I try to improve the minterpolate, and after use perf
> >>>profiing the commands:
> >>>
> >>>./ffmpeg -i a.ts -filter_complex
> >>>"minterpolate=mi_mode=mci:mc_mode=aobmc:vsbmc=1" -f null /dev/null
> >>>I found the hotspot is:
> >>>- get_sbad_ob
> >>>- get_sbad
> >>>- get_sad_ob
> >>>- bilateral_obmc
> >>>- set_frame_data
> >>>
> >>>So, as my plan, I will try to use sse2/avx2
> >>>Scatter/Gather, optimized
> >>>sad function (use pixelutils API)
> >>>in  get_sbad_ob /  get_sbad /  get_sad_ob first, for  set_frame_data
> >>>case, maybe need to use Scatter/Gather SIMD instruction.
> >>
> >>That is great, all I am saying we should avoid diverging the two brances
> >>(FFmpeg branch, and GSOC 2017 branch), and try to merge back GSOC2017 if it
> >>can be done with reasonable amount of work before optimizing code, otherwise
> >>the GSOC2017 branch will rot and we will lose the result of the GSOC task.
> >>
> >>>
> >>>But if some guys have done some improve task in this case, I think
> >>>based on the pre-existing work is the better way.
> >>
> >>Michael was the mentor, maybe he can chip in on what should be done here.
> >
> >talk with the author/student who wrote the code, not me :)
> 
> Well, his not active here,

yes but last i heared from him, he was interrested in continuing this project
i think ive not heared much from him after that but i now see that there is a
small commit in his repo from 2018 so he is not completely inactive.
I think you should talk with him


> and the question is if his work is ready for
> mainline inclusion or not, and if he has done enough valuable work during
> GSOC that its worth working on mainlining it.

He certainly did valuable work. Looking now at the ML, it seems the more or
less last thing on the ML was the RFC/Discussion thread about libmotion.
In that everyone wanted to dictate the design, and all that was contradicting
each other. 
If you want to work on unifying this entangled bikeshed ball of conflicting
oppinions, that surely is very welcome. Important is that it ends in something
that is practical and high quality.
Personally i think the author should be given more authority in the design.
But again, please talk with the author of this code
I dont remember everything in as much detail about this ...

also ive added him to the CC

Thanks

[...]

Patch hide | download patch | download mbox

diff --git a/libavfilter/motion_estimation.c b/libavfilter/motion_estimation.c
index 0f9ba21..8ccd879 100644
--- a/libavfilter/motion_estimation.c
+++ b/libavfilter/motion_estimation.c
@@ -54,6 +54,8 @@  void ff_me_init_context(AVMotionEstContext *me_ctx, int mb_size, int search_para
     me_ctx->x_max = x_max;
     me_ctx->y_min = y_min;
     me_ctx->y_max = y_max;
+
+    me_ctx->sad = av_pixelutils_get_sad_fn(av_ceil_log2_c(mb_size), av_ceil_log2_c(mb_size), 0, NULL);
 }
 
 uint64_t ff_me_cmp_sad(AVMotionEstContext *me_ctx, int x_mb, int y_mb, int x_mv, int y_mv)
@@ -67,9 +69,13 @@  uint64_t ff_me_cmp_sad(AVMotionEstContext *me_ctx, int x_mb, int y_mb, int x_mv,
     data_ref += y_mv * linesize;
     data_cur += y_mb * linesize;
 
-    for (j = 0; j < me_ctx->mb_size; j++)
-        for (i = 0; i < me_ctx->mb_size; i++)
-            sad += FFABS(data_ref[x_mv + i + j * linesize] - data_cur[x_mb + i + j * linesize]);
+    if (me_ctx->sad) {
+        sad = me_ctx->sad(data_ref+x_mv, linesize, data_cur+x_mb, linesize);
+    } else {
+        for (j = 0; j < me_ctx->mb_size; j++)
+            for (i = 0; i < me_ctx->mb_size; i++)
+                sad += FFABS(data_ref[x_mv + i + j * linesize] - data_cur[x_mb + i + j * linesize]);
+    }
 
     return sad;
 }
diff --git a/libavfilter/motion_estimation.h b/libavfilter/motion_estimation.h
index 6ae29dd..9f7710b 100644
--- a/libavfilter/motion_estimation.h
+++ b/libavfilter/motion_estimation.h
@@ -22,6 +22,7 @@ 
 #define AVFILTER_MOTION_ESTIMATION_H
 
 #include "libavutil/avutil.h"
+#include "libavutil/pixelutils.h"
 
 #define AV_ME_METHOD_ESA        1
 #define AV_ME_METHOD_TSS        2
@@ -59,6 +60,7 @@  typedef struct AVMotionEstContext {
 
     uint64_t (*get_cost)(struct AVMotionEstContext *me_ctx, int x_mb, int y_mb,
                          int mv_x, int mv_y);
+    av_pixelutils_sad_fn sad;
 } AVMotionEstContext;
 
 void ff_me_init_context(AVMotionEstContext *me_ctx, int mb_size, int search_param,