[FFmpeg-devel] beautified + accelerated vf_fillborders – Please review

Submitted by Ulf Zibis on March 23, 2019, 11:13 p.m.

Details

Message ID ec6eb0f6-8eb9-4507-d3ef-1651c9f8a5d7@CoSoCo.de
State New
Headers show

Commit Message

Ulf Zibis March 23, 2019, 11:13 p.m.
Hi again,

Am 19.03.19 um 17:31 schrieb Carl Eugen Hoyos:
> One run is not good.
> Either use the loop option to filter the same frame again and
> again or feed a video to ffmpeg.

I have new patches.

Patch 1 is just a little renaming and a preparation for the benchmark
timer code.

Patch 2 is a slight enhancement in performance for cases, where only top
and bottom borders are filled.

Patch 3 beautifies the code an really enhances the performance. See the
results included in the benchmark patch

-Ulf

Comments

Carl Eugen Hoyos March 23, 2019, 11:40 p.m.
2019-03-24 0:13 GMT+01:00, Ulf Zibis <Ulf.Zibis@cosoco.de>:
> Hi again,
>
> Am 19.03.19 um 17:31 schrieb Carl Eugen Hoyos:
>> One run is not good.
>> Either use the loop option to filter the same frame again and
>> again or feed a video to ffmpeg.
>
> I have new patches.
>
> Patch 1 is just a little renaming and a preparation for the
> benchmark timer code.

There are two patches "1", one with wrong indentation.

Iiuc, the patch mixes the following:
Renaming of a variable
Moving of blocks
Adding commented code that is removed in a later patch
Replacing constants with harder to read calculations
Adding comments

I don't maintain the file but in general not all of these are
acceptable and it any case should be separate changes.

> Patch 2 is a slight enhancement in performance for cases,
> where only top and bottom borders are filled.

This also removes outcommented code which should
not be part of a performance enhancement.
How much "slight enhancement" were you able to measure?
Iiuc, some cases get slower, no?
Some people prefer that patches like this do not change the
indentation to make it more readable.

> Patch 3 beautifies the code an really enhances the performance.

Again: Please do not mix functional and cosmetic changes.

> See the results included in the benchmark patch

Do I read the results correctly that for all patches some cases
get faster and others get slower?

Carl Eugen
Ulf Zibis March 25, 2019, 11:31 a.m.
Am 24.03.19 um 00:40 schrieb Carl Eugen Hoyos:
> There are two patches "1", one with wrong indentation.
I intentionally have provided 2 patches with the same number, one for
the code base an one with additions for the benchmark. I've catched the
wrong indentation, hopefully at the place you meant.

> Iiuc, the patch mixes the following:
> Renaming of a variable
> Moving of blocks
> Adding commented code that is removed in a later patch
> Replacing constants with harder to read calculations
This is an argument. There are several places in the code using "* 2".
At some places this is an algorithmic purpose, at others to serve the
16-bit logic. I wanted to set this in clear view.

> Adding comments
>
> I don't maintain the file but in general not all of these are
> acceptable and it any case should be separate changes.
>
> This also removes outcommented code which should
> not be part of a performance enhancement.
> How much "slight enhancement" were you able to measure?
> Iiuc, some cases get slower, no?
> Some people prefer that patches like this do not change the
> indentation to make it more readable.
>
> Again: Please do not mix functional and cosmetic changes.
I'm preparing a new set of patches to follow your advice.

> Do I read the results correctly that for all patches some cases
> get faster and others get slower?

Correct. I'm wondering about the cases, where it gets such slower. So
I'm interested in an answer from you experienced developers to
understand this. Maybe a compiler option would help.

-Ulf

Patch hide | download patch | download mbox

From 727ead94327e04b9f1128af97dc8607b5276d739 Mon Sep 17 00:00:00 2001
From: Ulf Zibis <Ulf.Zibis@CoSoCo.de>
Date: 24.03.2019, 00:02:09

avfilter/fillborders: enhanced readability;
side effect: better performance by less indirections in for loops

diff --git a/libavfilter/vf_fillborders.c b/libavfilter/vf_fillborders.c
index 43de099..f6631c3 100644
--- a/libavfilter/vf_fillborders.c
+++ b/libavfilter/vf_fillborders.c
@@ -101,28 +101,30 @@ 
     for (p = 0; p < s->nb_planes; p++) {
         uint8_t *data = frame->data[p];
         int linesize = frame->linesize[p];
+        int width = s->planewidth[p];
+        int height = s->planeheight[p];
+        int left = s->borders[p].left;
+        int right = s->borders[p].right;
+        int top = s->borders[p].top;
+        int bottom = s->borders[p].bottom;
 
         /* fill left and right borders from top to bottom border */
-        if (s->borders[p].left != 0 ||
-                s->borders[p].right != s->planewidth[p]) // in case skip for performance
-            for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
+        if (left != 0 || right != width) // in case skip for performance
+            for (y = top; y < height - bottom; y++) {
                 memset(data + y * linesize,
-                        *(data + y * linesize + s->borders[p].left),
-                        s->borders[p].left);
-                memset(data + y * linesize + s->planewidth[p] - s->borders[p].right,
-                        *(data + y * linesize + s->planewidth[p] - s->borders[p].right - 1),
-                        s->borders[p].right);
+                        *(data + y * linesize + left), left);
+                memset(data + y * linesize + width - right,
+                        *(data + y * linesize + width - right - 1), right);
             }
 
         /* fill top and bottom borders */
-        for (y = 0; y < s->borders[p].top; y++) {
+        for (y = 0; y < top; y++) {
             memcpy(data + y * linesize,
-                    data + s->borders[p].top * linesize, s->planewidth[p]);
+                    data + top * linesize, width);
         }
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
+        for (y = height - bottom; y < height; y++) {
             memcpy(data + y * linesize,
-                    data + (s->planeheight[p] - s->borders[p].bottom - 1) * linesize,
-                    s->planewidth[p]);
+                    data + (height - bottom - 1) * linesize, width);
         }
     }
 }
@@ -134,29 +136,33 @@ 
     for (p = 0; p < s->nb_planes; p++) {
         uint16_t *data = (uint16_t *)frame->data[p];
         int linesize = frame->linesize[p] / sizeof(uint16_t);
+        int width = s->planewidth[p];
+        int height = s->planeheight[p];
+        int left = s->borders[p].left;
+        int right = s->borders[p].right;
+        int top = s->borders[p].top;
+        int bottom = s->borders[p].bottom;
 
         /* fill left and right borders from top to bottom border */
-        if (s->borders[p].left != 0 ||
-                s->borders[p].right != s->planewidth[p]) // in case skip for performance
-            for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-                for (x = 0; x < s->borders[p].left; x++) {
-                    data[y * linesize + x] = *(data + y * linesize + s->borders[p].left);
+        if (left != 0 || right != width) // in case skip for performance
+            for (y = top; y < height - bottom; y++) {
+                for (x = 0; x < left; x++) {
+                    data[y * linesize + x] = data[y * linesize + left];
                 }
-                for (x = 0; x < s->borders[p].right; x++) {
-                    data[y * linesize + s->planewidth[p] - s->borders[p].right + x] =
-                            *(data + y * linesize + s->planewidth[p] - s->borders[p].right - 1);
+                for (x = 0; x < right; x++) {
+                    data[y * linesize + width - right + x] =
+                            data[y * linesize + width - right - 1];
                 }
             }
 
         /* fill top and bottom borders */
-        for (y = 0; y < s->borders[p].top; y++) {
+        for (y = 0; y < top; y++) {
             memcpy(data + y * linesize,
-                    data + s->borders[p].top * linesize, s->planewidth[p] * sizeof(uint16_t));
+                    data + top * linesize, width * sizeof(uint16_t));
         }
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
+        for (y = height - bottom; y < height; y++) {
             memcpy(data + y * linesize,
-                    data + (s->planeheight[p] - s->borders[p].bottom - 1) * linesize,
-                    s->planewidth[p] * sizeof(uint16_t));
+                    data + (height - bottom - 1) * linesize, width * sizeof(uint16_t));
         }
     }
 }
@@ -168,30 +174,33 @@ 
     for (p = 0; p < s->nb_planes; p++) {
         uint8_t *data = frame->data[p];
         int linesize = frame->linesize[p];
+        int width = s->planewidth[p];
+        int height = s->planeheight[p];
+        int left = s->borders[p].left;
+        int right = s->borders[p].right;
+        int top = s->borders[p].top;
+        int bottom = s->borders[p].bottom;
 
         /* fill left and right borders from top to bottom border */
-        if (s->borders[p].left != 0 ||
-                s->borders[p].right != s->planewidth[p]) // in case skip for performance
-            for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-                for (x = 0; x < s->borders[p].left; x++) {
-                    data[y * linesize + x] = data[y * linesize + s->borders[p].left * 2 - 1 - x];
+        if (left != 0 || right != width) // in case skip for performance
+            for (y = top; y < height - bottom; y++) {
+                for (x = 0; x < left; x++) {
+                    data[y * linesize + x] = data[y * linesize + left * 2 - 1 - x];
                 }
-                for (x = 0; x < s->borders[p].right; x++) {
-                    data[y * linesize + s->planewidth[p] - s->borders[p].right + x] =
-                            data[y * linesize + s->planewidth[p] - s->borders[p].right - 1 - x];
+                for (x = 0; x < right; x++) {
+                    data[y * linesize + width - right + x] =
+                            data[y * linesize + width - right - 1 - x];
                 }
             }
 
         /* fill top and bottom borders */
-        for (y = 0; y < s->borders[p].top; y++) {
+        for (y = 0; y < top; y++) {
             memcpy(data + y * linesize,
-                    data + (s->borders[p].top * 2 - 1 - y) * linesize,
-                    s->planewidth[p]);
+                    data + (top * 2 - 1 - y) * linesize, width);
         }
-        for (y = 0; y < s->borders[p].bottom; y++) {
-            memcpy(data + (s->planeheight[p] - s->borders[p].bottom + y) * linesize,
-                    data + (s->planeheight[p] - s->borders[p].bottom - 1 - y) * linesize,
-                    s->planewidth[p]);
+        for (y = 0; y < bottom; y++) {
+            memcpy(data + (height - bottom + y) * linesize,
+                    data + (height - bottom - 1 - y) * linesize, width);
         }
     }
 }
@@ -203,31 +212,34 @@ 
     for (p = 0; p < s->nb_planes; p++) {
         uint16_t *data = (uint16_t *)frame->data[p];
         int linesize = frame->linesize[p] / sizeof(uint16_t);
+        int width = s->planewidth[p];
+        int height = s->planeheight[p];
+        int left = s->borders[p].left;
+        int right = s->borders[p].right;
+        int top = s->borders[p].top;
+        int bottom = s->borders[p].bottom;
 
         /* fill left and right borders from top to bottom border */
-        if (s->borders[p].left != 0 ||
-                s->borders[p].right != s->planewidth[p]) // in case skip for performance
-            for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-                for (x = 0; x < s->borders[p].left; x++) {
-                    data[y * linesize + x] = data[y * linesize + s->borders[p].left * 2 - 1 - x];
+        if (left != 0 || right != width) // in case skip for performance
+            for (y = top; y < height - bottom; y++) {
+                for (x = 0; x < left; x++) {
+                    data[y * linesize + x] = data[y * linesize + left * 2 - 1 - x];
                 }
 
-                for (x = 0; x < s->borders[p].right; x++) {
-                    data[y * linesize + s->planewidth[p] - s->borders[p].right + x] =
-                            data[y * linesize + s->planewidth[p] - s->borders[p].right - 1 - x];
+                for (x = 0; x < right; x++) {
+                    data[y * linesize + width - right + x] =
+                            data[y * linesize + width - right - 1 - x];
                 }
             }
 
         /* fill top and bottom borders */
-        for (y = 0; y < s->borders[p].top; y++) {
+        for (y = 0; y < top; y++) {
             memcpy(data + y * linesize,
-                    data + (s->borders[p].top * 2 - 1 - y) * linesize,
-                    s->planewidth[p] * sizeof(uint16_t));
+                    data + (top * 2 - 1 - y) * linesize, width * sizeof(uint16_t));
         }
-        for (y = 0; y < s->borders[p].bottom; y++) {
-            memcpy(data + (s->planeheight[p] - s->borders[p].bottom + y) * linesize,
-                    data + (s->planeheight[p] - s->borders[p].bottom - 1 - y) * linesize,
-                    s->planewidth[p] * sizeof(uint16_t));
+        for (y = 0; y < bottom; y++) {
+            memcpy(data + (height - bottom + y) * linesize,
+                    data + (height - bottom - 1 - y) * linesize, width * sizeof(uint16_t));
         }
     }
 }
@@ -238,24 +250,28 @@ 
 
     for (p = 0; p < s->nb_planes; p++) {
         uint8_t *data = frame->data[p];
-        uint8_t fill = s->fill[p];
         int linesize = frame->linesize[p];
+        int width = s->planewidth[p];
+        int height = s->planeheight[p];
+        int left = s->borders[p].left;
+        int right = s->borders[p].right;
+        int top = s->borders[p].top;
+        int bottom = s->borders[p].bottom;
+        uint8_t fill = s->fill[p];
 
         /* fill left and right borders from top to bottom border */
-        if (s->borders[p].left != 0 ||
-                s->borders[p].right != s->planewidth[p]) // in case skip for performance
-            for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-                memset(data + y * linesize, fill, s->borders[p].left);
-                memset(data + y * linesize + s->planewidth[p] - s->borders[p].right, fill,
-                        s->borders[p].right);
+        if (left != 0 || right != width) // in case skip for performance
+            for (y = top; y < height - bottom; y++) {
+                memset(data + y * linesize, fill, left);
+                memset(data + y * linesize + width - right, fill, right);
             }
 
         /* fill top and bottom borders */
-        for (y = 0; y < s->borders[p].top; y++) {
-            memset(data + y * linesize, fill, s->planewidth[p]);
+        for (y = 0; y < top; y++) {
+            memset(data + y * linesize, fill, width);
         }
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
-            memset(data + y * linesize, fill, s->planewidth[p]);
+        for (y = height - bottom; y < height; y++) {
+            memset(data + y * linesize, fill, width);
         }
     }
 }
@@ -266,29 +282,34 @@ 
 
     for (p = 0; p < s->nb_planes; p++) {
         uint16_t *data = (uint16_t *)frame->data[p];
-        uint16_t fill = s->fill[p] << (s->depth - 8);
         int linesize = frame->linesize[p] / sizeof(uint16_t);
+        int width = s->planewidth[p];
+        int height = s->planeheight[p];
+        int left = s->borders[p].left;
+        int right = s->borders[p].right;
+        int top = s->borders[p].top;
+        int bottom = s->borders[p].bottom;
+        uint16_t fill = s->fill[p] << (s->depth - 8);
 
         /* fill left and right borders from top to bottom border */
-        if (s->borders[p].left != 0 ||
-                s->borders[p].right != s->planewidth[p]) // in case skip for performance
-            for (y = s->borders[p].top; y < s->planeheight[p] - s->borders[p].bottom; y++) {
-                for (x = 0; x < s->borders[p].left; x++) {
+        if (left != 0 || right != width) // in case skip for performance
+            for (y = top; y < height - bottom; y++) {
+                for (x = 0; x < left; x++) {
                     data[y * linesize + x] = fill;
                 }
-                for (x = 0; x < s->borders[p].right; x++) {
-                    data[y * linesize + s->planewidth[p] - s->borders[p].right + x] = fill;
+                for (x = 0; x < right; x++) {
+                    data[y * linesize + width - right + x] = fill;
                 }
             }
 
         /* fill top and bottom borders */
-        for (y = 0; y < s->borders[p].top; y++) {
-            for (x = 0; x < s->planewidth[p]; x++) {
+        for (y = 0; y < top; y++) {
+            for (x = 0; x < width; x++) {
                 data[y * linesize + x] = fill;
             }
         }
-        for (y = s->planeheight[p] - s->borders[p].bottom; y < s->planeheight[p]; y++) {
-            for (x = 0; x < s->planewidth[p]; x++) {
+        for (y = height - bottom; y < height; y++) {
+            for (x = 0; x < width; x++) {
                 data[y * linesize + x] = fill;
             }
         }