diff mbox series

[FFmpeg-devel,v2,4/5] avfilter/vf_yadif: Process more pixels using filter_line

Message ID 20220721022514.1466331-4-cphlipot0@gmail.com
State New
Headers show
Series [FFmpeg-devel,v2,1/5] avfilter/vf_yadif: Fix edge size when MAX_ALIGN is < 4 | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 fail Make fate failed

Commit Message

Chris Phlipot July 21, 2022, 2:25 a.m. UTC
filter_line is generally vectorized, wheras filter_edge is implemented
in C. Currently we rely on filter_edge to process non-edges in cases
where the width doesn't match the alignment. This causes us to process
non-edge pixels with the slow C implementation vs the faster SSE
implementation.

It is generally faster to process 8 pixels with the slowest SSE2
vectorized implementation than it is to process 2 pixels with the
C implementation. Therefore, if filter_edge needs to process 2 or
more non-edge pixels, it would be faster to process these non-edge
pixels with filter_line instead even if it processes more pixels
than necessary.

To address this, we use filter_line so long as we know that at least
2 pixels will be used in the final output even if the rest of the
computed pixels are invalid. Any incorrect output pixels generated by
filter_line will be overwritten by the following call to filter_edge.
In addtion we avoid running filter_line if it would read or write
pixels outside the current slice.

Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
---
 libavfilter/vf_yadif.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c
index 54109566be..394c04a985 100644
--- a/libavfilter/vf_yadif.c
+++ b/libavfilter/vf_yadif.c
@@ -201,6 +201,8 @@  static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
     int slice_end   = (td->h * (jobnr+1)) / nb_jobs;
     int y;
     int edge = 3 + s->req_align / df - 1;
+    int filter_width_target = td->w - 3;
+    int filter_width_rounded_up = (filter_width_target & ~(s->req_align-1)) + s->req_align;
 
     /* filtering reads 3 pixels to the left/right; to avoid invalid reads,
      * we need to call the c variant which avoids this for border pixels
@@ -215,11 +217,28 @@  static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
             int     mrefs = y ? -refs : refs;
             int    parity = td->parity ^ td->tff;
             int     mode  = y == 1 || y + 2 == td->h ? 2 : s->mode;
+
+            /* Adjust width and alignment to process extra pixels in filter_line
+             * using potentially vectorized code so long as it doesn't cause
+             * reads or writes outside of the current slice. filter_edge will
+             * correct any incorrect pixels written by filter_line in this
+             * scenario.
+             */
+            int filter_width;
+            int edge_alignment;
+            if (filter_width_rounded_up - filter_width_target >= 2
+                && y*refs + filter_width_rounded_up < slice_end * refs + refs - 3) {
+                filter_width = filter_width_rounded_up;
+                edge_alignment = 1;
+            } else {
+                filter_width = td->w - edge;
+                edge_alignment = s->req_align;
+            }
             s->filter_line(dst + pix_3, prev + pix_3, cur + pix_3,
-                           next + pix_3, td->w - edge,
+                           next + pix_3, filter_width,
                            prefs, mrefs, parity, mode);
             s->filter_edges(dst, prev, cur, next, td->w,
-                            prefs, mrefs, parity, mode, s->req_align);
+                            prefs, mrefs, parity, mode, edge_alignment);
         } else {
             memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]],
                    &s->cur->data[td->plane][y * refs], td->w * df);