[FFmpeg-devel,V1] lavfi/normalize: improve the performance

Submitted by Jun Zhao on June 20, 2019, 1:43 p.m.

Details

Message ID 1561038226-451-1-git-send-email-mypopydev@gmail.com
State New
Headers show

Commit Message

Jun Zhao June 20, 2019, 1:43 p.m.
From: Jun Zhao <barryjzhao@tencent.com>

Remove unnecessary max value found, it's will improve the performance
about 10%. Used the test command like:
ffmpeg -i 1080P.mp4 -an -vf normalize -f null /dev/null, the FPS change
from 96fps to 107fps.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
---
 libavfilter/vf_normalize.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

Comments

Reimar Döffinger June 21, 2019, 8:39 a.m.
On Thu, Jun 20, 2019 at 09:43:46PM +0800, Jun Zhao wrote:
> From: Jun Zhao <barryjzhao@tencent.com>
>
> Remove unnecessary max value found, it's will improve the performance
> about 10%. Used the test command like:
> ffmpeg -i 1080P.mp4 -an -vf normalize -f null /dev/null, the FPS change
> from 96fps to 107fps.
>
> Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
> ---
>  libavfilter/vf_normalize.c |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/libavfilter/vf_normalize.c b/libavfilter/vf_normalize.c
> index 48eea59..40dc031 100644
> --- a/libavfilter/vf_normalize.c
> +++ b/libavfilter/vf_normalize.c
> @@ -143,14 +143,13 @@ static void normalize(NormalizeContext *s, AVFrame *in, AVFrame *out)
>          min[c].in = max[c].in = in->data[0][s->co[c]];
>      for (y = 0; y < in->height; y++) {
>          uint8_t *inp = in->data[0] + y * in->linesize[0];
> -        uint8_t *outp = out->data[0] + y * out->linesize[0];
>          for (x = 0; x < in->width; x++) {
>              for (c = 0; c < 3; c++) {
> -                min[c].in = FFMIN(min[c].in, inp[s->co[c]]);
> -                max[c].in = FFMAX(max[c].in, inp[s->co[c]]);
> +                uint8_t val = inp[s->co[c]];
> +                if (val < min[c].in)      min[c].in = val;
> +                else if (val > max[c].in) max[c].in = val;
>              }
>              inp += s->step;
> -            outp += s->step;

outp should just be removed, a patch for that certainly would be
accepted.
But I am very skeptical towards the rest of the change.
In a way I would expect the compiler should be able to
optimize this.
However this code is clearly completely unoptimized, and
thus I don't think making it more complex for a minor optimization
that gives just 10% is worth it.
A proper optimization would get rid of the s->co indirection in the
inner loop - it really is not necessary for the limited format
support the code has.
This would then also allow for SIMD optimization, which should give
a vastly larger speedup - and possibly the compiler would even be able
to auto-vectorize.
This applies even more to the processing step - the transform
calculation done in SIMD is almost certain to be cheaper than
a LUT lookup - at least if switch to fixed-point arithmetic
instead of float.
Side note: I haven't really figured out what the point
of all this s->co complexity is supposed to be.
The only thing the code seems to actually care is whether
there is an alpha channel, and if so where it is.
But the quickest path to a significant speedup would
be to write a SIMD version of the normalize function specific
to that format and the SIMD instruction set available
and call it instead of normalize if applicable.
I would expect that you should see at least a 3x speedup
if using e.g. SSE2 (which is why I don't consider a
10% speedup worth changing the code for really).

Best regards,
Reimar Döffinger

Patch hide | download patch | download mbox

diff --git a/libavfilter/vf_normalize.c b/libavfilter/vf_normalize.c
index 48eea59..40dc031 100644
--- a/libavfilter/vf_normalize.c
+++ b/libavfilter/vf_normalize.c
@@ -143,14 +143,13 @@  static void normalize(NormalizeContext *s, AVFrame *in, AVFrame *out)
         min[c].in = max[c].in = in->data[0][s->co[c]];
     for (y = 0; y < in->height; y++) {
         uint8_t *inp = in->data[0] + y * in->linesize[0];
-        uint8_t *outp = out->data[0] + y * out->linesize[0];
         for (x = 0; x < in->width; x++) {
             for (c = 0; c < 3; c++) {
-                min[c].in = FFMIN(min[c].in, inp[s->co[c]]);
-                max[c].in = FFMAX(max[c].in, inp[s->co[c]]);
+                uint8_t val = inp[s->co[c]];
+                if (val < min[c].in)      min[c].in = val;
+                else if (val > max[c].in) max[c].in = val;
             }
             inp += s->step;
-            outp += s->step;
         }
     }