[FFmpeg-devel] beautified + accelerated vf_fillborders – Please review

Submitted by Ulf Zibis on March 11, 2019, 9:59 p.m.

Details

Message ID d73e8622-30d4-bf8f-401a-a2d8009f9937@CoSoCo.de
State New
Headers show

Commit Message

Ulf Zibis March 11, 2019, 9:59 p.m.
Hi,

I have made some refactoring to vf_fillborders.c.

My motivation came from using this filter as a template for a new
filter. Refactoring the code was a good exercise to understand FFmpeg's
data models.

I think, the code is now much better readable and I believe, it's faster
because of:
- more use of memset() and memcpy()
- less calculations in the inner for loops
- skip the looping for right/left filling when there is nothing to do

I would be appreciated, if one would review my proposed changes.

Hopefully I've used the right format for the patch. Please honour, that
I'm new in FFmpeg development.

-Ulf

Comments

Carl Eugen Hoyos March 11, 2019, 10:08 p.m.
2019-03-11 22:59 GMT+01:00, Ulf Zibis <Ulf.Zibis@cosoco.de>:

> I have made some refactoring to vf_fillborders.c.

You are not supposed to mix cosmetic changes
like removing braces or moving variable declarations
with actual code changes.

> My motivation came from using this filter as a template
> for a new filter. Refactoring the code was a good exercise
> to understand FFmpeg's data models.
>
> I think, the code is now much better readable and

> I believe, it's faster because of:

Please post some numbers if your patch is supposed to
speed up the filter.

Carl Eugen
Ulf Zibis March 11, 2019, 10:23 p.m.
Am 11.03.19 um 23:08 schrieb Carl Eugen Hoyos:
> You are not supposed to mix cosmetic changes
> like removing braces or moving variable declarations
> with actual code changes.

Hm difficult, because the code changes are dependent from different
variables.

But I'll give it a try to make some separated patches.

>> I believe, it's faster because of:
> Please post some numbers if your patch is supposed to
> speed up the filter.

Hm, I have no clue how to do this. I thing the listed arguments speak
for themselves. Is there a kind of harness, template or framework for this?

Thanks for your first review.

-Ulf
Moritz Barsnick March 11, 2019, 11:25 p.m.
On Mon, Mar 11, 2019 at 23:23:15 +0100, Ulf Zibis wrote:
> >> I believe, it's faster because of:
> > Please post some numbers if your patch is supposed to
> > speed up the filter.
> 
> Hm, I have no clue how to do this. I thing the listed arguments speak
> for themselves. Is there a kind of harness, template or framework for this?

Well, belief is not enough. ;-) In the simplest case, you show command
lines of some examples using the filter (with various inputs,
parameters) and their filtering/encoding performance before and after
the patch. Ideally, you use the START_TIMER/STOP_TIMER macros to
profile the actual functions you changed. (Check this mailing list's
archives for some examples, and play with the code.)

Cheers,
Moritz
Carl Eugen Hoyos March 11, 2019, 11:37 p.m.
2019-03-12 0:25 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
> On Mon, Mar 11, 2019 at 23:23:15 +0100, Ulf Zibis wrote:
>> >> I believe, it's faster because of:
>> > Please post some numbers if your patch is supposed to
>> > speed up the filter.
>>
>> Hm, I have no clue how to do this. I thing the listed arguments speak
>> for themselves. Is there a kind of harness, template or framework for
>> this?
>
> Well, belief is not enough. ;-) In the simplest case, you show command
> lines of some examples using the filter (with various inputs,
> parameters) and their filtering/encoding performance before and after
> the patch.

> Ideally, you use the START_TIMER/STOP_TIMER macros to
> profile the actual functions you changed. (Check this mailing list's
> archives for some examples, and play with the code.)

But this should not be needed if time (the command) and / or
benchmark (the FFmpeg option) show clear improvements.

Carl Eugen

Patch hide | download patch | download mbox

diff --git a/libavfilter/vf_fillborders.c b/libavfilter/vf_fillborders.c
index 1344587..a7a074b 100644
--- a/libavfilter/vf_fillborders.c
+++ b/libavfilter/vf_fillborders.c
@@ -1,5 +1,7 @@ 
 /*
  * Copyright (c) 2017 Paul B Mahol
+ * 
+ * Contributors: Ulf Zibis
  *
  * This file is part of FFmpeg.
  *
@@ -18,6 +20,7 @@ 
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include <sys/param.h>
 #include "libavutil/colorspace.h"
 #include "libavutil/common.h"
 #include "libavutil/opt.h"
@@ -84,187 +87,176 @@ 
 
 static void smear_borders8(FillBordersContext *s, AVFrame *frame)
 {
-    int p, y;
+    for (int p = 0; p < s->nb_planes; p++) {
+        uint8_t *data = frame->data[p];
+        int lz = frame->linesize[p];
+        int width = s->planewidth[p];
+        int height = s->planeheight[p] * lz;
+        int left = s->borders[p].left;
+        int right = width - s->borders[p].right;
+        int top = s->borders[p].top * lz;
+        int bottom = height - s->borders[p].bottom * lz;
 
-    for (p = 0; p < s->nb_planes; p++) {
-        uint8_t *ptr = frame->data[p];
-        int linesize = frame->linesize[p];
+        /* fill left and right borders from top to bottom border */
+        if (left != 0 || right != width) // in case skip for performance
+            for (int y = top; y < bottom; y += lz) {
+                memset(data + y, *(data + y + left), left);
+                memset(data + y + right, *(data + y + right - 1), width - right);
+            }
 
-        for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-            memset(ptr + y * linesize,
-                   *(ptr + y * linesize + s->borders[p].left),
-                   s->borders[p].left);
-            memset(ptr + y * linesize + s->planewidth[p] - s->borders[p].right,
-                   *(ptr + y * linesize + s->planewidth[p] - s->borders[p].right - 1),
-                   s->borders[p].right);
-        }
-
-        for (y = 0; y < s->borders[p].top; y++) {
-            memcpy(ptr + y * linesize,
-                   ptr + s->borders[p].top * linesize, s->planewidth[p]);
-        }
-
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
-            memcpy(ptr + y * linesize,
-                   ptr + (s->planeheight[p] - s->borders[p].bottom - 1) * linesize,
-                   s->planewidth[p]);
-        }
+        /* fill top and bottom borders */
+        for (uint8_t *y = data + top; y > data; )
+            memcpy(y -= lz, data + top, width);
+        for (uint8_t *y = data + bottom; y < data + height; y += lz)
+            memcpy(y, data + bottom - lz, width);
     }
 }
 
 static void smear_borders16(FillBordersContext *s, AVFrame *frame)
 {
-    int p, y, x;
+    for (int p = 0; p < s->nb_planes; p++) {
+        uint16_t *data = (uint16_t *)frame->data[p];
+        int lz = frame->linesize[p] / sizeof(uint16_t);
+        int width = s->planewidth[p];
+        int height = s->planeheight[p] * lz;
+        int left = s->borders[p].left;
+        int right = width - s->borders[p].right;
+        int top = s->borders[p].top * lz;
+        int bottom = height - s->borders[p].bottom * lz;
 
-    for (p = 0; p < s->nb_planes; p++) {
-        uint16_t *ptr = (uint16_t *)frame->data[p];
-        int linesize = frame->linesize[p] / 2;
-
-        for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-            for (x = 0; x < s->borders[p].left; x++) {
-                ptr[y * linesize + x] =  *(ptr + y * linesize + s->borders[p].left);
+        /* fill left and right borders from top to bottom border */
+        if (left != 0 || right != width) // in case skip for performance
+            for (int y = top; y < bottom; y += lz) {
+                for (int x = left; x >= 0; x--)
+                    data[y + x] =  data[y + left];
+                for (int x = right; x < width; x++)
+                    data[y + x] = data[y + right - 1];
             }
 
-            for (x = 0; x < s->borders[p].right; x++) {
-                ptr[y * linesize + s->planewidth[p] - s->borders[p].right + x] =
-                   *(ptr + y * linesize + s->planewidth[p] - s->borders[p].right - 1);
-            }
-        }
-
-        for (y = 0; y < s->borders[p].top; y++) {
-            memcpy(ptr + y * linesize,
-                   ptr + s->borders[p].top * linesize, s->planewidth[p] * 2);
-        }
-
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
-            memcpy(ptr + y * linesize,
-                   ptr + (s->planeheight[p] - s->borders[p].bottom - 1) * linesize,
-                   s->planewidth[p] * 2);
-        }
+        /* fill top and bottom borders */
+        for (uint16_t *y = data + top; y > data; )
+            memcpy(y -= lz, data + top, width * sizeof(uint16_t));
+        for (uint16_t *y = data + bottom; y < data + height; y += lz)
+            memcpy(y, data + bottom - lz, width * sizeof(uint16_t));
     }
 }
 
 static void mirror_borders8(FillBordersContext *s, AVFrame *frame)
 {
-    int p, y, x;
+    for (int p = 0; p < s->nb_planes; p++) {
+        uint8_t *data = frame->data[p];
+        int lz = frame->linesize[p];
+        int width = s->planewidth[p];
+        int height = s->planeheight[p] * lz;
+        int left = s->borders[p].left;
+        int right = width - s->borders[p].right;
+        int top = s->borders[p].top * lz;
+        int bottom = height - s->borders[p].bottom * lz;
 
-    for (p = 0; p < s->nb_planes; p++) {
-        uint8_t *ptr = frame->data[p];
-        int linesize = frame->linesize[p];
-
-        for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-            for (x = 0; x < s->borders[p].left; x++) {
-                ptr[y * linesize + x] = ptr[y * linesize + s->borders[p].left * 2 - 1 - x];
+        /* fill left and right borders from top to bottom border */
+        if (left != 0 || right != width) // in case skip for performance
+            for (int y = top; y < bottom; y += lz)
+                for (int x = left, x2 = x; x >= 0; x--) {
+                    data[y + x] = data[y + x2++];
+                for (int x = right, x2 = x; x < width; x++)
+                    data[y + x] = data[y + --x2];
             }
 
-            for (x = 0; x < s->borders[p].right; x++) {
-                ptr[y * linesize + s->planewidth[p] - s->borders[p].right + x] =
-                    ptr[y * linesize + s->planewidth[p] - s->borders[p].right - 1 - x];
-            }
-        }
-
-        for (y = 0; y < s->borders[p].top; y++) {
-            memcpy(ptr + y * linesize,
-                   ptr + (s->borders[p].top * 2 - 1 - y) * linesize,
-                   s->planewidth[p]);
-        }
-
-        for (y = 0; y < s->borders[p].bottom; y++) {
-            memcpy(ptr + (s->planeheight[p] - s->borders[p].bottom + y) * linesize,
-                   ptr + (s->planeheight[p] - s->borders[p].bottom - 1 - y) * linesize,
-                   s->planewidth[p]);
-        }
+        /* fill top and bottom borders */
+        for (uint8_t *y = data + top, *y2 = y; y > data; y2 += lz)
+            memcpy(y -= lz, y2, width);
+        for (uint8_t *y = data + bottom, *y2 = y; y < data + height; y += lz)
+            memcpy(y, y2 -= lz, width);
     }
 }
 
 static void mirror_borders16(FillBordersContext *s, AVFrame *frame)
 {
-    int p, y, x;
+    for (int p = 0; p < s->nb_planes; p++) {
+        uint16_t *data = (uint16_t *)frame->data[p];
+        int lz = frame->linesize[p] / sizeof(uint16_t);
+        int width = s->planewidth[p];
+        int height = s->planeheight[p] * lz;
+        int left = s->borders[p].left;
+        int right = width - s->borders[p].right;
+        int top = s->borders[p].top * lz;
+        int bottom = height - s->borders[p].bottom * lz;
 
-    for (p = 0; p < s->nb_planes; p++) {
-        uint16_t *ptr = (uint16_t *)frame->data[p];
-        int linesize = frame->linesize[p] / 2;
-
-        for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-            for (x = 0; x < s->borders[p].left; x++) {
-                ptr[y * linesize + x] = ptr[y * linesize + s->borders[p].left * 2 - 1 - x];
+        /* fill left and right borders from top to bottom border */
+        if (left != 0 || right != width) // in case skip for performance
+            for (int y = top; y < bottom; y += lz) {
+                for (int x = left, x2 = x; x >= 0; x--)
+                    data[y + x] = data[y + x2++];
+                for (int x = right, x2 = x; x < width; x++)
+                    data[y + x] = data[y + --x2];
             }
 
-            for (x = 0; x < s->borders[p].right; x++) {
-                ptr[y * linesize + s->planewidth[p] - s->borders[p].right + x] =
-                    ptr[y * linesize + s->planewidth[p] - s->borders[p].right - 1 - x];
-            }
-        }
-
-        for (y = 0; y < s->borders[p].top; y++) {
-            memcpy(ptr + y * linesize,
-                   ptr + (s->borders[p].top * 2 - 1 - y) * linesize,
-                   s->planewidth[p] * 2);
-        }
-
-        for (y = 0; y < s->borders[p].bottom; y++) {
-            memcpy(ptr + (s->planeheight[p] - s->borders[p].bottom + y) * linesize,
-                   ptr + (s->planeheight[p] - s->borders[p].bottom - 1 - y) * linesize,
-                   s->planewidth[p] * 2);
-        }
+        /* fill top and bottom borders */
+        for (uint16_t *y = data + top, *y2 = y; y > data; y2 += lz)
+            memcpy(y -= lz, y2, width * sizeof(uint16_t));
+        for (uint16_t *y = data + bottom, *y2 = y; y < data + height; y += lz)
+            memcpy(y, y2 -= lz, width * sizeof(uint16_t));
     }
 }
 
 static void fixed_borders8(FillBordersContext *s, AVFrame *frame)
 {
-    int p, y;
-
-    for (p = 0; p < s->nb_planes; p++) {
-        uint8_t *ptr = frame->data[p];
+    for (int p = 0; p < s->nb_planes; p++) {
+        uint8_t *data = frame->data[p];
+        int lz = frame->linesize[p];
+        int width = s->planewidth[p];
+        int height = s->planeheight[p] * lz;
+        int left = s->borders[p].left;
+        int right = width - s->borders[p].right;
+        int top = s->borders[p].top * lz;
+        int bottom = height - s->borders[p].bottom * lz;
         uint8_t fill = s->fill[p];
-        int linesize = frame->linesize[p];
 
-        for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-            memset(ptr + y * linesize, fill, s->borders[p].left);
-            memset(ptr + y * linesize + s->planewidth[p] - s->borders[p].right, fill,
-                   s->borders[p].right);
-        }
+        /* fill left and right borders from top to bottom border */
+        if (left != 0 || right != width) // in case skip for performance
+            for (int y = top; y < bottom; y += lz) {
+                memset(data + y, fill, left);
+                memset(data + y + right, fill, width - right);
+            }
 
-        for (y = 0; y < s->borders[p].top; y++) {
-            memset(ptr + y * linesize, fill, s->planewidth[p]);
-        }
-
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
-            memset(ptr + y * linesize, fill, s->planewidth[p]);
-        }
+        /* fill top and bottom borders */
+        for (uint8_t *y = data + top; y > data; )
+            memset(y -= lz, fill, width);
+        for (uint8_t *y = data + bottom; y < data + height; y += lz)
+            memset(y, fill, width);
     }
 }
 
 static void fixed_borders16(FillBordersContext *s, AVFrame *frame)
 {
-    int p, y, x;
+    for (int p = 0; p < s->nb_planes; p++) {
+        uint16_t *data = (uint16_t *)frame->data[p];
+        int lz = frame->linesize[p] / sizeof(uint16_t);
+        int width = s->planewidth[p];
+        int height = s->planeheight[p] * lz;
+        int left = s->borders[p].left;
+        int right = width - s->borders[p].right;
+        int top = s->borders[p].top * lz;
+        int bottom = height - s->borders[p].bottom * lz;
+        int fill_sz = MAX(MAX(left, right), top!=0 || height-bottom!=0 ? width : 0);
+        uint16_t *fill = malloc(fill_sz * sizeof(uint16_t));
+        for (int i = 0; i < fill_sz; i++)
+            fill[i] = s->fill[p] << (s->depth - 8);
 
-    for (p = 0; p < s->nb_planes; p++) {
-        uint16_t *ptr = (uint16_t *)frame->data[p];
-        uint16_t fill = s->fill[p] << (s->depth - 8);
-        int linesize = frame->linesize[p] / 2;
-
-        for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-            for (x = 0; x < s->borders[p].left; x++) {
-                ptr[y * linesize + x] = fill;
+        /* fill left and right borders from top to bottom border */
+        if (left != 0 || right != width) // in case skip for performance
+            for (int y = top; y < bottom; y += lz) {
+                memcpy(data + y, fill, left * sizeof(uint16_t));
+                memcpy(data + y + right, fill, (width - right) * sizeof(uint16_t));
             }
 
-            for (x = 0; x < s->borders[p].right; x++) {
-                ptr[y * linesize + s->planewidth[p] - s->borders[p].right + x] = fill;
-            }
-        }
+        /* fill top and bottom borders */
+        for (uint16_t *y = data + top; y > data; )
+            memcpy(y -= lz, fill, width * sizeof(uint16_t));
+        for (uint16_t *y = data + bottom; y < data + height; y += lz)
+            memcpy(y, fill, width * sizeof(uint16_t));
 
-        for (y = 0; y < s->borders[p].top; y++) {
-            for (x = 0; x < s->planewidth[p]; x++) {
-                ptr[y * linesize + x] = fill;
-            }
-        }
-
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
-            for (x = 0; x < s->planewidth[p]; x++) {
-                ptr[y * linesize + x] = fill;
-            }
-        }
+        free(fill);
     }
 }
 
@@ -282,6 +274,20 @@ 
     AVFilterContext *ctx = inlink->dst;
     FillBordersContext *s = ctx->priv;
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format);
+
+    if (inlink->w < s->left + s->right ||
+        inlink->w <= s->left ||
+        inlink->w <= s->right ||
+        inlink->h < s->top + s->bottom ||
+        inlink->h <= s->top ||
+        inlink->h <= s->bottom ||
+        inlink->w < s->left * 2 ||
+        inlink->w < s->right * 2 ||
+        inlink->h < s->top * 2 ||
+        inlink->h < s->bottom * 2) {
+        av_log(ctx, AV_LOG_ERROR, "Borders are bigger than input frame size.\n");
+        return AVERROR(EINVAL);
+    }
 
     s->nb_planes = desc->nb_components;
     s->depth = desc->comp[0].depth;
@@ -306,40 +312,24 @@ 
     s->borders[2].top    = s->top >> desc->log2_chroma_h;
     s->borders[2].bottom = s->bottom >> desc->log2_chroma_h;
 
-    if (inlink->w < s->left + s->right ||
-        inlink->w <= s->left ||
-        inlink->w <= s->right ||
-        inlink->h < s->top + s->bottom ||
-        inlink->h <= s->top ||
-        inlink->h <= s->bottom ||
-        inlink->w < s->left * 2 ||
-        inlink->w < s->right * 2 ||
-        inlink->h < s->top * 2 ||
-        inlink->h < s->bottom * 2) {
-        av_log(ctx, AV_LOG_ERROR, "Borders are bigger than input frame size.\n");
-        return AVERROR(EINVAL);
-    }
-
     switch (s->mode) {
-    case FM_SMEAR:  s->fillborders = s->depth <= 8 ? smear_borders8  : smear_borders16;  break;
-    case FM_MIRROR: s->fillborders = s->depth <= 8 ? mirror_borders8 : mirror_borders16; break;
-    case FM_FIXED:  s->fillborders = s->depth <= 8 ? fixed_borders8  : fixed_borders16;  break;
-    }
+        case FM_SMEAR:  s->fillborders = s->depth <= 8 ? smear_borders8  : smear_borders16;  break;
+        case FM_MIRROR: s->fillborders = s->depth <= 8 ? mirror_borders8 : mirror_borders16; break;
+        case FM_FIXED:  s->fillborders = s->depth <= 8 ? fixed_borders8  : fixed_borders16;
+            if (desc->flags & AV_PIX_FMT_FLAG_RGB) {
+                uint8_t rgba_map[4];
+                int i;
 
-    s->yuv_color[Y] = RGB_TO_Y_CCIR(s->rgba_color[R], s->rgba_color[G], s->rgba_color[B]);
-    s->yuv_color[U] = RGB_TO_U_CCIR(s->rgba_color[R], s->rgba_color[G], s->rgba_color[B], 0);
-    s->yuv_color[V] = RGB_TO_V_CCIR(s->rgba_color[R], s->rgba_color[G], s->rgba_color[B], 0);
-    s->yuv_color[A] = s->rgba_color[A];
-
-    if (desc->flags & AV_PIX_FMT_FLAG_RGB) {
-        uint8_t rgba_map[4];
-        int i;
-
-        ff_fill_rgba_map(rgba_map, inlink->format);
-        for (i = 0; i < 4; i++)
-            s->fill[rgba_map[i]] = s->rgba_color[i];
-    } else {
-        memcpy(s->fill, s->yuv_color, sizeof(s->yuv_color));
+                ff_fill_rgba_map(rgba_map, inlink->format);
+                for (i = 0; i < sizeof(rgba_map); i++)
+                    s->fill[rgba_map[i]] = s->rgba_color[i];
+            } else {
+                s->yuv_color[Y] = RGB_TO_Y_CCIR(s->rgba_color[R], s->rgba_color[G], s->rgba_color[B]);
+                s->yuv_color[U] = RGB_TO_U_CCIR(s->rgba_color[R], s->rgba_color[G], s->rgba_color[B], 0);
+                s->yuv_color[V] = RGB_TO_V_CCIR(s->rgba_color[R], s->rgba_color[G], s->rgba_color[B], 0);
+                s->yuv_color[A] = s->rgba_color[A];
+                memcpy(s->fill, s->yuv_color, sizeof(s->yuv_color));
+            } break;
     }
 
     return 0;