diff mbox

[FFmpeg-devel] avfilter: implement halve filter

Message ID 39DDB4BE-507B-4A83-B052-7073EBA158CF@googlemail.com
State New
Headers show

Commit Message

Daniel Oberhoff Feb. 14, 2017, 7:44 p.m. UTC
filter strictly “halves” the image efficiently, which is often exactly what is needed
likely much faster than using scale
fully slice parallelized

Signed-off-by: Daniel Oberhoff <daniel@danieloberhoff.de>
---
 libavfilter/Makefile     |   1 +
 libavfilter/allfilters.c |   1 +
 libavfilter/vf_halve.c   | 367 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 369 insertions(+)
 create mode 100644 libavfilter/vf_halve.c

Comments

Paul B Mahol Feb. 14, 2017, 7:49 p.m. UTC | #1
On 2/14/17, Daniel Oberhoff <danieloberhoff@googlemail.com> wrote:
> filter strictly "halves" the image efficiently, which is often exactly what
> is needed
> likely much faster than using scale
> fully slice parallelized
>
> Signed-off-by: Daniel Oberhoff <daniel@danieloberhoff.de>
> ---
>  libavfilter/Makefile     |   1 +
>  libavfilter/allfilters.c |   1 +
>  libavfilter/vf_halve.c   | 367
> +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 369 insertions(+)
>  create mode 100644 libavfilter/vf_halve.c
>

I see big cargo cult here.

No need to use framesync for 1-1 case.
Mark Thompson Feb. 14, 2017, 9:25 p.m. UTC | #2
On 14/02/17 19:44, Daniel Oberhoff wrote:
> filter strictly “halves” the image efficiently, which is often exactly what is needed
> likely much faster than using scale

Did you benchmark this?  How?

$ time ./ffmpeg -f lavfi -i allyuv -vf 'scale=iw/2:ih/2' -vframes 400 -f null -
...
frame=  400 fps= 26 q=-0.0 Lsize=N/A time=00:00:16.00 bitrate=N/A speed=1.05x
...
real    0m15.365s
user    0m11.092s
sys     0m4.272s

$ time ./ffmpeg -f lavfi -i allyuv -vf 'halve' -vframes 400 -f null -
...
frame=  400 fps= 22 q=-0.0 Lsize=N/A time=00:00:16.00 bitrate=N/A speed=0.873x
...
real    0m18.392s
user    0m46.280s
sys     0m3.656s

So it uses four times as much CPU as swscale to be marginally slower?

(Skylake 6300; I admit the SMT could well be making it look a bit worse than it actually is on the CPU time.)

On a more general note, components that duplicate existing functionality are unlikely to be accepted without a compelling use-case for them to be included.
Carl Eugen Hoyos Feb. 15, 2017, 11:04 a.m. UTC | #3
2017-02-14 20:44 GMT+01:00 Daniel Oberhoff <danieloberhoff@googlemail.com>:
> filter strictly “halves” the image efficiently, which is often exactly what is needed

> likely much faster than using scale

I am not a native speaker but this seems to imply
you never tested the performance of the new filter:
Does it have another advantage over using scale?

> fully slice parallelized

(Not necessarily related)
libswscale support slices, I believe only the scale
filter never added support for it.

Carl Eugen
James Darnley Feb. 15, 2017, 9:44 p.m. UTC | #4
On 2017-02-14 22:25, Mark Thompson wrote:
> On 14/02/17 19:44, Daniel Oberhoff wrote:
>> filter strictly “halves” the image efficiently, which is often exactly what is needed
>> likely much faster than using scale
> 
> Did you benchmark this?  How?
> 
> $ time ./ffmpeg -f lavfi -i allyuv -vf 'scale=iw/2:ih/2' -vframes 400 -f null -
> ...
> frame=  400 fps= 26 q=-0.0 Lsize=N/A time=00:00:16.00 bitrate=N/A speed=1.05x
> ...
> real    0m15.365s
> user    0m11.092s
> sys     0m4.272s
> 
> $ time ./ffmpeg -f lavfi -i allyuv -vf 'halve' -vframes 400 -f null -
> ...
> frame=  400 fps= 22 q=-0.0 Lsize=N/A time=00:00:16.00 bitrate=N/A speed=0.873x
> ...
> real    0m18.392s
> user    0m46.280s
> sys     0m3.656s
> 
> So it uses four times as much CPU as swscale to be marginally slower?

I would be tempted to blame the lack of SIMD for the poor performance.
Michael Niedermayer Feb. 15, 2017, 11:06 p.m. UTC | #5
On Tue, Feb 14, 2017 at 08:44:54PM +0100, Daniel Oberhoff wrote:
> filter strictly “halves” the image efficiently, which is often exactly what is needed
> likely much faster than using scale
> fully slice parallelized
> 
> Signed-off-by: Daniel Oberhoff <daniel@danieloberhoff.de>
> ---
>  libavfilter/Makefile     |   1 +
>  libavfilter/allfilters.c |   1 +
>  libavfilter/vf_halve.c   | 367 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 369 insertions(+)
>  create mode 100644 libavfilter/vf_halve.c

if you want to improve scaling by a specific factor, be that with
a specialized implementation, threads, SIMD or other that belongs
into libswscale.
libswscale is the component in FFmpeg and avfilter that does that

having many special cases added into libavfilter filters that could be
done equally fast in libswscale is not a good idea.
Also this filter would not be used when scaling by that factor unless
its manually used, which means it would have very few users in practice
while in swscale it would be used automatically if the case and
quality settings match ...

[...]
diff mbox

Patch

diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 9ab65eb..a986322 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -250,6 +250,7 @@  OBJS-$(CONFIG_RANDOM_FILTER)                 += vf_random.o
 OBJS-$(CONFIG_READVITC_FILTER)               += vf_readvitc.o
 OBJS-$(CONFIG_REALTIME_FILTER)               += f_realtime.o
 OBJS-$(CONFIG_REMAP_FILTER)                  += vf_remap.o framesync.o
+OBJS-$(CONFIG_HALVE_FILTER)                  += vf_halve.o framesync.o
 OBJS-$(CONFIG_REMOVEGRAIN_FILTER)            += vf_removegrain.o
 OBJS-$(CONFIG_REMOVELOGO_FILTER)             += bbox.o lswsutils.o lavfutils.o vf_removelogo.o
 OBJS-$(CONFIG_REPEATFIELDS_FILTER)           += vf_repeatfields.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 2c37818..0b9b69c 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -265,6 +265,7 @@  void avfilter_register_all(void)
     REGISTER_FILTER(READVITC,       readvitc,       vf);
     REGISTER_FILTER(REALTIME,       realtime,       vf);
     REGISTER_FILTER(REMAP,          remap,          vf);
+    REGISTER_FILTER(HALVE,          halve,          vf);
     REGISTER_FILTER(REMOVEGRAIN,    removegrain,    vf);
     REGISTER_FILTER(REMOVELOGO,     removelogo,     vf);
     REGISTER_FILTER(REPEATFIELDS,   repeatfields,   vf);
diff --git a/libavfilter/vf_halve.c b/libavfilter/vf_halve.c
new file mode 100644
index 0000000..dd77eb8
--- /dev/null
+++ b/libavfilter/vf_halve.c
@@ -0,0 +1,367 @@ 
+/*
+ * Copyright (c) 2016 Floris Sluiter
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * Pixel halve filter
+ * This filter makes the target image exactly halve in size in x and y
+ * dimension of the source image by summing the corresponding four source
+ * pixels.
+ */
+
+#include "libavutil/imgutils.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/opt.h"
+#include "avfilter.h"
+#include "formats.h"
+#include "framesync.h"
+#include "internal.h"
+#include "video.h"
+
+typedef struct HalveContext {
+    const AVClass *class;
+    int nb_planes;
+    int nb_components;
+    int step;
+    FFFrameSync fs;
+    void (*halve_slice)(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs);
+} HalveContext;
+
+#define OFFSET(x) offsetof(HalveContext, x)
+#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
+
+static const AVOption halve_options[] = {
+    { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(halve);
+
+typedef struct ThreadData {
+    AVFrame *in, *out;
+    int nb_planes;
+    int nb_components;
+    int step;
+} ThreadData;
+
+static int query_formats(AVFilterContext *ctx)
+{
+    static const enum AVPixelFormat pix_fmts[] = {
+        AV_PIX_FMT_YUVA444P,
+        AV_PIX_FMT_YUV444P,
+        AV_PIX_FMT_YUVJ444P,
+        AV_PIX_FMT_RGB24, AV_PIX_FMT_BGR24,
+        AV_PIX_FMT_ARGB, AV_PIX_FMT_ABGR, AV_PIX_FMT_RGBA, AV_PIX_FMT_BGRA,
+        AV_PIX_FMT_GBRP, AV_PIX_FMT_GBRAP,
+        AV_PIX_FMT_YUV444P9, AV_PIX_FMT_YUV444P10, AV_PIX_FMT_YUV444P12,
+        AV_PIX_FMT_YUV444P14, AV_PIX_FMT_YUV444P16,
+        AV_PIX_FMT_YUVA444P9, AV_PIX_FMT_YUVA444P10, AV_PIX_FMT_YUVA444P16,
+        AV_PIX_FMT_GBRP9, AV_PIX_FMT_GBRP10, AV_PIX_FMT_GBRP12,
+        AV_PIX_FMT_GBRP14, AV_PIX_FMT_GBRP16,
+        AV_PIX_FMT_GBRAP10, AV_PIX_FMT_GBRAP12, AV_PIX_FMT_GBRAP16,
+        AV_PIX_FMT_RGB48, AV_PIX_FMT_BGR48,
+        AV_PIX_FMT_RGBA64, AV_PIX_FMT_BGRA64,
+        AV_PIX_FMT_NONE
+    };
+    AVFilterFormats *pix_formats = NULL;
+    int ret;
+
+    if (!(pix_formats = ff_make_format_list(pix_fmts))) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+    if ((ret = ff_formats_ref(pix_formats, &ctx->inputs[0]->out_formats)) < 0 ||
+        (ret = ff_formats_ref(pix_formats, &ctx->outputs[0]->in_formats)) < 0)
+        goto fail;
+    return 0;
+fail:
+    if (pix_formats)
+        av_freep(&pix_formats->formats);
+    av_freep(&pix_formats);
+    return ret;
+}
+
+/**
+ * halve_planar algorithm expects planes of same size
+ * pixels are copied from source to target using :
+ * Target_frame[y][x] = Source_frame[ ymap[y][x] ][ [xmap[y][x] ];
+ */
+static void halve_planar_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    const ThreadData *td = (ThreadData*)arg;
+    const AVFrame *in  = td->in;
+    const AVFrame *out = td->out;
+    int x , y, plane;
+
+    for (plane = 0; plane < td->nb_planes ; plane++) {
+        uint8_t *dst        = out->data[plane];
+        const int dlinesize  = out->linesize[plane];
+        const uint8_t *src  = in->data[plane];
+        const int slinesize  = in->linesize[plane];
+        for (y = 0; y < out->height; y++) {
+            for (x = 0; x < out->width; x++, ++dst, src += 2) {
+                dst[0] = (uint8_t)(
+                    (
+                        (uint16_t)(src[0]) +
+                        (uint16_t)(src[slinesize]) +
+                        (uint16_t)(src[1]) +
+                        (uint16_t)(src[slinesize + 1])
+                    ) >> 2);
+            }
+            src  += 2 * slinesize - in->width;
+            dst  += dlinesize - out->width;
+        }
+    }
+}
+
+static void halve_planar16_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    const ThreadData *td = (ThreadData*)arg;
+    const AVFrame *in  = td->in;
+    const AVFrame *out = td->out;
+    int x , y, plane;
+
+    for (plane = 0; plane < td->nb_planes ; plane++) {
+        uint16_t *dst        = (uint16_t *)out->data[plane];
+        const int dlinesize  = out->linesize[plane] / 2;
+        const uint16_t *src  = (const uint16_t *)in->data[plane];
+        const int slinesize  = in->linesize[plane] / 2;
+        for (y = 0; y < out->height; y++) {
+            for (x = 0; x < out->width; x++, ++dst, ++src) {
+                dst[0] = (uint16_t)(
+                    (
+                        (uint32_t)(src[0]) +
+                        (uint32_t)(src[slinesize]) +
+                        (uint32_t)(src[1]) +
+                        (uint32_t)(src[slinesize + 1])
+                    ) >> 2);
+            }
+            src  += 2 * slinesize - in->width;
+            dst  += dlinesize - out->width;
+        }
+    }
+}
+
+
+
+/**
+ * halve_packed algorithm expects pixels with both padded bits (step) and
+ * number of components correctly set.
+ * pixels are copied from source to target using :
+ * Target_frame[y][x] = Source_frame[ ymap[y][x] ][ [xmap[y][x] ];
+ */
+static void halve_packed_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    const ThreadData *td = (ThreadData*)arg;
+    const AVFrame *in  = td->in;
+    const AVFrame *out = td->out;
+    uint8_t *dst = out->data[0];
+    const uint8_t *src  = in->data[0];
+    const int dlinesize = out->linesize[0];
+    const int slinesize = in->linesize[0];
+    const int step = td->step;
+    int c, x, y;
+
+    for (y = 0; y < out->height; y++) {
+        for (x = 0; x < out->width; x++, dst += step, src += step) {
+            for (c = 0; c < td->nb_components; ++c) {
+                dst[c] = (uint8_t)(
+                    (
+                        (uint16_t)(src[c]) +
+                        (uint16_t)(src[slinesize + c]) +
+                        (uint16_t)(src[step + c]) +
+                        (uint16_t)(src[slinesize + step + c])
+                    ) >> 2);
+            }
+        }
+        src  += 2 * slinesize - in->width * step;
+        dst  += dlinesize - out->width * step;
+    }
+}
+
+static void halve_packed16_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
+{
+    const ThreadData *td = (ThreadData*)arg;
+    const AVFrame *in  = td->in;
+    const AVFrame *out = td->out;
+    uint16_t *dst = (uint16_t *)out->data[0];
+    const uint16_t *src  = (const uint16_t *)in->data[0];
+    const int dlinesize = out->linesize[0] / 2;
+    const int slinesize = in->linesize[0] / 2;
+    const int step = td->step / 2;
+    int c, x, y;
+
+    for (y = 0; y < out->height; y++) {
+        for (x = 0; x < out->width; x++, dst += step, src += step) {
+            for (c = 0; c < td->nb_components; ++c) {
+                dst[c] = (uint16_t)(
+                    (
+                        (uint32_t)(src[c]) +
+                        (uint32_t)(src[slinesize + c]) +
+                        (uint32_t)(src[step + c]) +
+                        (uint32_t)(src[slinesize + step + c])
+                    ) >> 2);
+            }
+        }
+        src  += 2 * slinesize - in->width * step;
+        dst  += dlinesize - out->width * step;
+    }
+}
+
+static int config_input(AVFilterLink *inlink)
+{
+    AVFilterContext *ctx = inlink->dst;
+    HalveContext *s = ctx->priv;
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format);
+
+    s->nb_planes = av_pix_fmt_count_planes(inlink->format);
+    s->nb_components = desc->nb_components;
+
+    if (desc->comp[0].depth == 8) {
+        if (s->nb_planes > 1 || s->nb_components == 1) {
+            s->halve_slice = halve_planar_slice;
+        } else {
+            s->halve_slice = halve_packed_slice;
+        }
+    } else {
+        if (s->nb_planes > 1 || s->nb_components == 1) {
+            s->halve_slice = halve_planar16_slice;
+        } else {
+            s->halve_slice = halve_packed16_slice;
+        }
+    }
+
+    s->step = av_get_padded_bits_per_pixel(desc) >> 3;
+    return 0;
+}
+
+static int process_frame(FFFrameSync *fs)
+{
+    AVFilterContext *ctx = fs->parent;
+    HalveContext *s = fs->opaque;
+    AVFilterLink *outlink = ctx->outputs[0];
+    AVFrame *out, *in;
+    int ret;
+
+    if ((ret = ff_framesync_get_frame(&s->fs, 0, &in,   0)) < 0)
+        return ret;
+
+    if (ctx->is_disabled) {
+        out = av_frame_clone(in);
+        if (!out)
+            return AVERROR(ENOMEM);
+    } else {
+        ThreadData td;
+
+        out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+        if (!out)
+            return AVERROR(ENOMEM);
+        av_frame_copy_props(out, in);
+
+        td.in  = in;
+        td.out = out;
+        td.nb_planes = s->nb_planes;
+        td.nb_components = s->nb_components;
+        td.step = s->step;
+        ctx->internal->execute(ctx, s->halve_slice, &td, NULL, FFMIN(outlink->h, ctx->graph->nb_threads));
+    }
+    out->pts = av_rescale_q(in->pts, s->fs.time_base, outlink->time_base);
+
+    return ff_filter_frame(outlink, out);
+}
+
+static int config_output(AVFilterLink *outlink)
+{
+    AVFilterContext *ctx = outlink->src;
+    HalveContext *s = ctx->priv;
+    AVFilterLink *srclink = ctx->inputs[0];
+    FFFrameSyncIn *in;
+    int ret;
+
+    outlink->w = srclink->w / 2;
+    outlink->h = srclink->h / 2;
+    outlink->time_base = srclink->time_base;
+    outlink->sample_aspect_ratio = srclink->sample_aspect_ratio;
+    outlink->frame_rate = srclink->frame_rate;
+
+    ret = ff_framesync_init(&s->fs, ctx, 1);
+    if (ret < 0)
+        return ret;
+
+    in = s->fs.in;
+    in[0].time_base = srclink->time_base;
+    in[0].sync   = 2;
+    in[0].before = EXT_STOP;
+    in[0].after  = EXT_STOP;
+    s->fs.opaque   = s;
+    s->fs.on_event = process_frame;
+
+    return ff_framesync_configure(&s->fs);
+}
+
+static int filter_frame(AVFilterLink *inlink, AVFrame *buf)
+{
+    HalveContext *s = inlink->dst->priv;
+    return ff_framesync_filter_frame(&s->fs, inlink, buf);
+}
+
+static int request_frame(AVFilterLink *outlink)
+{
+    HalveContext *s = outlink->src->priv;
+    return ff_framesync_request_frame(&s->fs, outlink);
+}
+
+static av_cold void uninit(AVFilterContext *ctx)
+{
+    HalveContext *s = ctx->priv;
+
+    ff_framesync_uninit(&s->fs);
+}
+
+static const AVFilterPad halve_inputs[] = {
+    {
+        .name         = "source",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .filter_frame = filter_frame,
+        .config_props = config_input,
+    },
+    { NULL }
+};
+
+static const AVFilterPad halve_outputs[] = {
+    {
+        .name          = "default",
+        .type          = AVMEDIA_TYPE_VIDEO,
+        .config_props  = config_output,
+        .request_frame = request_frame,
+    },
+    { NULL }
+};
+
+AVFilter ff_vf_halve = {
+    .name          = "halve",
+    .description   = NULL_IF_CONFIG_SMALL("Halve image"),
+    .priv_size     = sizeof(HalveContext),
+    .uninit        = uninit,
+    .query_formats = query_formats,
+    .inputs        = halve_inputs,
+    .outputs       = halve_outputs,
+    .priv_class    = &halve_class,
+    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC | AVFILTER_FLAG_SLICE_THREADS,
+};