diff mbox series

[FFmpeg-devel] lavfi: Add cropdetect_video filter

Message ID a4cacc34-c714-ec1e-e7e5-6e30415a93a1@mail.de
State New
Headers show
Series [FFmpeg-devel] lavfi: Add cropdetect_video filter | expand

Checks

Context Check Description
yinshiyou/configure_loongarch64 warning Failed to apply patch
andriy/commit_msg_x86 warning Please wrap lines in the body of the commit message between 60 and 72 characters.
andriy/make_x86 success Make finished
andriy/make_fate_x86 fail Make fate failed
andriy/commit_msg_armv7_RPi4 warning Please wrap lines in the body of the commit message between 60 and 72 characters.
andriy/make_armv7_RPi4 success Make finished
andriy/make_fate_armv7_RPi4 fail Make fate failed

Commit Message

Thilo Borgmann July 2, 2022, 9:55 a.m. UTC
Hi,

$subject allows crop detection even if the video is embedded in non-black areas.

Shares logic and purpose of lavfi/vf_cropdetect, though its edge detection is 8-bit formats only. Therefore this ends up in a separate filter.
It would also benew GPL code if it shares living in lavfi/vf_cropdetect.c - if we'd like to add this as LGPL for some reason it might go into its own file without sharing (useful) logic with the existing cropdetect.

Thanks,
Thilo
From 9050d15c2f1bcb3b2a628c8b6f04ea3a5f7e69d1 Mon Sep 17 00:00:00 2001
From: Thilo Borgmann <thilo.borgmann@mail.de>
Date: Sat, 2 Jul 2022 11:42:47 +0200
Subject: [PATCH] lavfi: Add cropdetect_video filter

This filter allows crop detection even if the video is embedded in non-black areas.
---
 Changelog                                     |   1 +
 doc/filters.texi                              |  69 +++++
 libavfilter/Makefile                          |   1 +
 libavfilter/allfilters.c                      |   1 +
 libavfilter/version.h                         |   2 +-
 libavfilter/vf_cropdetect.c                   | 245 +++++++++++++++++-
 tests/fate/filter-video.mak                   |   8 +
 .../fate/filter-metadata-cropdetect_video1    |   9 +
 .../fate/filter-metadata-cropdetect_video2    |   9 +
 9 files changed, 343 insertions(+), 2 deletions(-)
 create mode 100644 tests/ref/fate/filter-metadata-cropdetect_video1
 create mode 100644 tests/ref/fate/filter-metadata-cropdetect_video2

Comments

Paul B Mahol July 2, 2022, 12:30 p.m. UTC | #1
Make it work >8 depth and use same filter name.
diff mbox series

Patch

diff --git a/Changelog b/Changelog
index d4ca674b1b..3b5a4880cb 100644
--- a/Changelog
+++ b/Changelog
@@ -19,6 +19,7 @@  version 5.1:
 - blurdetect filter
 - tiltshelf audio filter
 - QOI image format support
+- cropdetect_video video filter
 
 
 version 5.0:
diff --git a/doc/filters.texi b/doc/filters.texi
index d65e83d4d0..5117e12623 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -10108,6 +10108,75 @@  indicates 'never reset', and returns the largest area encountered during
 playback.
 @end table
 
+@anchor{cropdetect_video}
+@section cropdetect_video
+
+Auto-detect the crop size.
+
+It calculates the necessary cropping parameters and prints the
+recommended parameters via the logging system. The detected dimensions
+correspond to the video playback area of the input video.
+It can detect videos embedded even in non-black areas although it supports only 8 bit pixel formats.
+
+It accepts the following parameters:
+
+@table @option
+
+@item mv_threshold
+Set motion in pixel units as threshold for motion detection. It defaults to 8.
+
+@item low
+@item high
+Set low and high threshold values used by the Canny thresholding
+algorithm.
+
+The high threshold selects the "strong" edge pixels, which are then
+connected through 8-connectivity with the "weak" edge pixels selected
+by the low threshold.
+
+@var{low} and @var{high} threshold values must be chosen in the range
+[0,1], and @var{low} should be lesser or equal to @var{high}.
+
+Default value for @var{low} is @code{5/255}, and default value for @var{high}
+is @code{15/255}.
+
+@item round
+The value which the width/height should be divisible by. It defaults to
+16. The offset is automatically adjusted to center the video. Use 2 to
+get only even dimensions (needed for 4:2:2 video). 16 is best when
+encoding to most video codecs.
+
+@item skip
+Set the number of initial frames for which evaluation is skipped.
+Default is 2. Range is 0 to INT_MAX.
+
+@item reset_count, reset
+Set the counter that determines after how many frames cropdetect will
+reset the previously detected largest video area and start over to
+detect the current optimal crop area. Default value is 0.
+
+This can be useful when channel logos distort the video area. 0
+indicates 'never reset', and returns the largest area encountered during
+playback.
+@end table
+
+@subsection Examples
+
+@itemize
+@item
+Find an embedded video area, generate motion vectors beforehand:
+@example
+ffmpeg -i file.mp4 -vf mestimate,cropdetect_video,metadata=mode=print -f null -
+@end example
+
+@item
+Find an embedded video area, use motion vectors from decoder:
+@example
+ffmpeg -flags2 +export_mvs -i file.mp4 -vf cropdetect_video,metadata=mode=print -f null -
+@end example
+@end itemize
+
+
 @anchor{cue}
 @section cue
 
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index e0e4d0de2c..8e4b4d33b1 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -235,6 +235,7 @@  OBJS-$(CONFIG_COREIMAGE_FILTER)              += vf_coreimage.o
 OBJS-$(CONFIG_COVER_RECT_FILTER)             += vf_cover_rect.o lavfutils.o
 OBJS-$(CONFIG_CROP_FILTER)                   += vf_crop.o
 OBJS-$(CONFIG_CROPDETECT_FILTER)             += vf_cropdetect.o
+OBJS-$(CONFIG_CROPDETECT_VIDEO_FILTER)       += vf_cropdetect.o
 OBJS-$(CONFIG_CUE_FILTER)                    += f_cue.o
 OBJS-$(CONFIG_CURVES_FILTER)                 += vf_curves.o
 OBJS-$(CONFIG_DATASCOPE_FILTER)              += vf_datascope.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 2f72477523..d68746a4b0 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -219,6 +219,7 @@  extern const AVFilter ff_vf_coreimage;
 extern const AVFilter ff_vf_cover_rect;
 extern const AVFilter ff_vf_crop;
 extern const AVFilter ff_vf_cropdetect;
+extern const AVFilter ff_vf_cropdetect_video;
 extern const AVFilter ff_vf_cue;
 extern const AVFilter ff_vf_curves;
 extern const AVFilter ff_vf_datascope;
diff --git a/libavfilter/version.h b/libavfilter/version.h
index 86b33c4174..814ab071da 100644
--- a/libavfilter/version.h
+++ b/libavfilter/version.h
@@ -31,7 +31,7 @@ 
 
 #include "version_major.h"
 
-#define LIBAVFILTER_VERSION_MINOR  40
+#define LIBAVFILTER_VERSION_MINOR  41
 #define LIBAVFILTER_VERSION_MICRO 100
 
 
diff --git a/libavfilter/vf_cropdetect.c b/libavfilter/vf_cropdetect.c
index b887b9ecb1..f2e25ff90d 100644
--- a/libavfilter/vf_cropdetect.c
+++ b/libavfilter/vf_cropdetect.c
@@ -26,11 +26,14 @@ 
 #include "libavutil/imgutils.h"
 #include "libavutil/internal.h"
 #include "libavutil/opt.h"
+#include "libavutil/motion_vector.h"
+#include "libavutil/qsort.h"
 
 #include "avfilter.h"
 #include "formats.h"
 #include "internal.h"
 #include "video.h"
+#include "edge_common.h"
 
 typedef struct CropDetectContext {
     const AVClass *class;
@@ -42,6 +45,16 @@  typedef struct CropDetectContext {
     int frame_nb;
     int max_pixsteps[4];
     int max_outliers;
+    int mode;
+    int window_size;
+    int mv_threshold;
+    float   low, high;
+    uint8_t low_u8, high_u8;
+    uint8_t  *filterbuf;
+    uint8_t  *tmpbuf;
+    uint16_t *gradients;
+    char     *directions;
+    int      *bboxes[4];
 } CropDetectContext;
 
 static const enum AVPixelFormat pix_fmts[] = {
@@ -61,6 +74,29 @@  static const enum AVPixelFormat pix_fmts[] = {
     AV_PIX_FMT_NONE
 };
 
+static const enum AVPixelFormat pix_fmts_video[] = {
+    AV_PIX_FMT_GRAY8,
+    AV_PIX_FMT_GBRP,     AV_PIX_FMT_GBRAP,
+    AV_PIX_FMT_YUV422P,  AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_YUV444P,  AV_PIX_FMT_YUV440P,
+    AV_PIX_FMT_YUV411P,  AV_PIX_FMT_YUV410P,
+    AV_PIX_FMT_YUVJ440P, AV_PIX_FMT_YUVJ411P, AV_PIX_FMT_YUVJ420P,
+    AV_PIX_FMT_YUVJ422P, AV_PIX_FMT_YUVJ444P,
+    AV_PIX_FMT_YUVA444P, AV_PIX_FMT_YUVA422P, AV_PIX_FMT_YUVA420P,
+    AV_PIX_FMT_NONE
+};
+
+enum CropMode {
+    MODE_BELOW_TH,
+    MODE_MV_EDGES,
+    MODE_NB
+};
+
+static int comp(const int *a,const int *b)
+{
+    return FFDIFFSIGN(*a, *b);
+}
+
 static int checkline(void *ctx, const unsigned char *src, int stride, int len, int bpp)
 {
     int total = 0;
@@ -116,6 +152,36 @@  static int checkline(void *ctx, const unsigned char *src, int stride, int len, i
     return total;
 }
 
+static int checkline_edge(void *ctx, const unsigned char *src, int stride, int len, int bpp)
+{
+    const uint16_t *src16 = (const uint16_t *)src;
+
+    switch (bpp) {
+    case 1:
+        while (--len >= 0) {
+            if(src[0]) return 0;
+            src += stride;
+        }
+        break;
+    case 2:
+        stride >>= 1;
+        while (--len >= 0) {
+            if(src16[0]) return 0;
+            src16 += stride;
+        }
+        break;
+    case 3:
+    case 4:
+        while (--len >= 0) {
+            if(src[0] || src[1] || src[2]) return 0;
+            src += stride;
+        }
+        break;
+    }
+
+    return 1;
+}
+
 static av_cold int init(AVFilterContext *ctx)
 {
     CropDetectContext *s = ctx->priv;
@@ -128,6 +194,31 @@  static av_cold int init(AVFilterContext *ctx)
     return 0;
 }
 
+static av_cold int init_video(AVFilterContext *ctx)
+{
+    CropDetectContext *s = ctx->priv;
+
+    s->mode    = MODE_MV_EDGES;
+    s->low_u8  = s->low  * 255. + .5;
+    s->high_u8 = s->high * 255. + .5;
+
+    return init(ctx);
+}
+
+static av_cold void uninit_video(AVFilterContext *ctx)
+{
+    CropDetectContext *s = ctx->priv;
+
+    av_freep(&s->tmpbuf);
+    av_freep(&s->filterbuf);
+    av_freep(&s->gradients);
+    av_freep(&s->directions);
+    av_freep(&s->bboxes[0]);
+    av_freep(&s->bboxes[1]);
+    av_freep(&s->bboxes[2]);
+    av_freep(&s->bboxes[3]);
+}
+
 static int config_input(AVFilterLink *inlink)
 {
     AVFilterContext *ctx = inlink->dst;
@@ -147,6 +238,29 @@  static int config_input(AVFilterLink *inlink)
     return 0;
 }
 
+static int config_input_video(AVFilterLink *inlink)
+{
+    AVFilterContext *ctx = inlink->dst;
+    CropDetectContext *s = ctx->priv;
+    const int bufsize = inlink->w * inlink->h;
+
+    s->window_size = FFMAX(s->reset_count, 15);
+    s->tmpbuf     = av_malloc(bufsize);
+    s->filterbuf  = av_malloc(bufsize);
+    s->gradients  = av_calloc(bufsize, sizeof(*s->gradients));
+    s->directions = av_malloc(bufsize);
+    s->bboxes[0]  = av_malloc(s->window_size * sizeof(*s->bboxes[0]));
+    s->bboxes[1]  = av_malloc(s->window_size * sizeof(*s->bboxes[1]));
+    s->bboxes[2]  = av_malloc(s->window_size * sizeof(*s->bboxes[2]));
+    s->bboxes[3]  = av_malloc(s->window_size * sizeof(*s->bboxes[3]));
+
+    if (!s->tmpbuf    || !s->filterbuf || !s->gradients || !s->directions ||
+        !s->bboxes[0] || !s->bboxes[1] || !s->bboxes[2] || !s->bboxes[3])
+        return AVERROR(ENOMEM);
+
+    return config_input(inlink);
+}
+
 #define SET_META(key, value) \
     av_dict_set_int(metadata, key, value, 0)
 
@@ -155,11 +269,20 @@  static int filter_frame(AVFilterLink *inlink, AVFrame *frame)
     AVFilterContext *ctx = inlink->dst;
     CropDetectContext *s = ctx->priv;
     int bpp = s->max_pixsteps[0];
-    int w, h, x, y, shrink_by;
+    int w, h, x, y, shrink_by, i;
     AVDictionary **metadata;
     int outliers, last_y;
     int limit = lrint(s->limit);
 
+    const int inw = inlink->w;
+    const int inh = inlink->h;
+    uint8_t *tmpbuf     = s->tmpbuf;
+    uint8_t *filterbuf  = s->filterbuf;
+    uint16_t *gradients = s->gradients;
+    int8_t *directions  = s->directions;
+    const AVFrameSideData *sd = NULL;
+    int scan_w, scan_h, bboff;
+
     // ignore first s->skip frames
     if (++s->frame_nb > 0) {
         metadata = &frame->metadata;
@@ -185,11 +308,105 @@  static int filter_frame(AVFilterLink *inlink, AVFrame *frame)
                 last_y = y INC;\
         }
 
+        if (s->mode == MODE_BELOW_TH) {
         FIND(s->y1,                 0,               y < s->y1, +1, frame->linesize[0], bpp, frame->width);
         FIND(s->y2, frame->height - 1, y > FFMAX(s->y2, s->y1), -1, frame->linesize[0], bpp, frame->width);
         FIND(s->x1,                 0,               y < s->x1, +1, bpp, frame->linesize[0], frame->height);
         FIND(s->x2,  frame->width - 1, y > FFMAX(s->x2, s->x1), -1, bpp, frame->linesize[0], frame->height);
+        } else { // MODE_MV_EDGES
+            sd = av_frame_get_side_data(frame, AV_FRAME_DATA_MOTION_VECTORS);
+            s->x1 = 0;
+            s->y1 = 0;
+            s->x2 = inw - 1;
+            s->y2 = inh - 1;
+
+            if (!sd) {
+                av_log(ctx, AV_LOG_WARNING, "Cannot detect: no motion vectors available");
+            } else {
+                // gaussian filter to reduce noise
+                ff_gaussian_blur(inw, inh,
+                                 filterbuf,  inw,
+                                 frame->data[0], frame->linesize[0]);
+
+                // compute the 16-bits gradients and directions for the next step
+                ff_sobel(inw, inh, gradients, inw, directions, inw, filterbuf, inw);
+
+                // non_maximum_suppression() will actually keep & clip what's necessary and
+                // ignore the rest, so we need a clean output buffer
+                memset(tmpbuf, 0, inw * inh);
+                ff_non_maximum_suppression(inw, inh, tmpbuf, inw, directions, inw, gradients, inw);
+
+
+                // keep high values, or low values surrounded by high values
+                ff_double_threshold(s->low_u8, s->high_u8, inw, inh,
+                                    tmpbuf, inw, tmpbuf, inw);
+
+                // scan all MVs and store bounding box
+                s->x1 = inw - 1;
+                s->y1 = inh - 1;
+                s->x2 = 0;
+                s->y2 = 0;
+                for (i = 0; i < sd->size / sizeof(AVMotionVector); i++) {
+                    const AVMotionVector *mv = (const AVMotionVector*)sd->data + i;
+                    const int mx = mv->dst_x - mv->src_x;
+                    const int my = mv->dst_y - mv->src_y;
+
+                    if (mv->dst_x >= 0 && mv->dst_x < inw &&
+                        mv->dst_y >= 0 && mv->dst_y < inh &&
+                        mv->src_x >= 0 && mv->src_x < inw &&
+                        mv->src_y >= 0 && mv->src_y < inh &&
+                        mx * mx + my * my >= s->mv_threshold * s->mv_threshold) {
+                        s->x1 = mv->dst_x < s->x1 ? mv->dst_x : s->x1;
+                        s->y1 = mv->dst_y < s->y1 ? mv->dst_y : s->y1;
+                        s->x2 = mv->dst_x > s->x2 ? mv->dst_x : s->x2;
+                        s->y2 = mv->dst_y > s->y2 ? mv->dst_y : s->y2;
+                    }
+                }
+
+                // scan outward looking for 0-edge-lines in edge image
+                scan_w = s->x2 - s->x1;
+                scan_h = s->y2 - s->y1;
+
+#define FIND_EDGE(DST, FROM, NOEND, INC, STEP0, STEP1, LEN)             \
+    for (last_y = y = FROM; NOEND; y = y INC) {                         \
+        if (checkline_edge(ctx, tmpbuf + STEP0 * y, STEP1, LEN, bpp)) { \
+            if (last_y INC == y) {                                      \
+                DST = y;                                                \
+                break;                                                  \
+            } else                                                      \
+                last_y = y;                                             \
+        }                                                               \
+    }                                                                   \
+    if (!(NOEND)) {                                                     \
+        DST = y -(INC);                                                 \
+    }
 
+                FIND_EDGE(s->y1, s->y1, y >=  0, -1, inw, bpp, scan_w);
+                FIND_EDGE(s->y2, s->y2, y < inh, +1, inw, bpp, scan_w);
+                FIND_EDGE(s->x1, s->x1, y >=  0, -1, bpp, inw, scan_h);
+                FIND_EDGE(s->x2, s->x2, y < inw, +1, bpp, inw, scan_h);
+
+                // queue bboxes
+                bboff = (s->frame_nb - 1) % s->window_size;
+                s->bboxes[0][bboff] = s->x1;
+                s->bboxes[1][bboff] = s->x2;
+                s->bboxes[2][bboff] = s->y1;
+                s->bboxes[3][bboff] = s->y2;
+
+                // sort queue
+                bboff = FFMIN(s->frame_nb, s->window_size);
+                AV_QSORT(s->bboxes[0], bboff, int, comp);
+                AV_QSORT(s->bboxes[1], bboff, int, comp);
+                AV_QSORT(s->bboxes[2], bboff, int, comp);
+                AV_QSORT(s->bboxes[3], bboff, int, comp);
+
+                // return median of window_size elems
+                s->x1 = s->bboxes[0][bboff/2];
+                s->x2 = s->bboxes[1][bboff/2];
+                s->y1 = s->bboxes[2][bboff/2];
+                s->y2 = s->bboxes[3][bboff/2];
+            }
+        }
 
         // round x and y (up), important for yuv colorspaces
         // make sure they stay rounded!
@@ -243,10 +460,14 @@  static const AVOption cropdetect_options[] = {
     { "skip",  "Number of initial frames to skip",                    OFFSET(skip),        AV_OPT_TYPE_INT, { .i64 = 2 },  0, INT_MAX, FLAGS },
     { "reset_count", "Recalculate the crop area after this many frames",OFFSET(reset_count),AV_OPT_TYPE_INT,{ .i64 = 0 },  0, INT_MAX, FLAGS },
     { "max_outliers", "Threshold count of outliers",                  OFFSET(max_outliers),AV_OPT_TYPE_INT, { .i64 = 0 },  0, INT_MAX, FLAGS },
+    { "high", "Set high threshold for edge detection",                OFFSET(high),        AV_OPT_TYPE_FLOAT, {.dbl=25/255.}, 0, 1, FLAGS },
+    { "low", "Set low threshold for edge detection",                  OFFSET(low),         AV_OPT_TYPE_FLOAT, {.dbl=15/255.}, 0, 1, FLAGS },
+    { "mv_threshold", "motion vector threshold when estimating video window size", OFFSET(mv_threshold), AV_OPT_TYPE_INT, {.i64=8}, 0, 100, FLAGS},
     { NULL }
 };
 
 AVFILTER_DEFINE_CLASS(cropdetect);
+AVFILTER_DEFINE_CLASS_EXT(cropdetect_video, "cropdetect_video", cropdetect_options);
 
 static const AVFilterPad avfilter_vf_cropdetect_inputs[] = {
     {
@@ -257,6 +478,15 @@  static const AVFilterPad avfilter_vf_cropdetect_inputs[] = {
     },
 };
 
+static const AVFilterPad avfilter_vf_cropdetect_video_inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_VIDEO,
+        .config_props = config_input_video,
+        .filter_frame = filter_frame,
+    },
+};
+
 static const AVFilterPad avfilter_vf_cropdetect_outputs[] = {
     {
         .name = "default",
@@ -275,3 +505,16 @@  const AVFilter ff_vf_cropdetect = {
     FILTER_PIXFMTS_ARRAY(pix_fmts),
     .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC | AVFILTER_FLAG_METADATA_ONLY,
 };
+
+const AVFilter ff_vf_cropdetect_video = {
+    .name          = "cropdetect_video",
+    .description   = NULL_IF_CONFIG_SMALL("Auto-detect crop size of an embedded video area."),
+    .priv_size     = sizeof(CropDetectContext),
+    .priv_class    = &cropdetect_video_class,
+    .init          = init_video,
+    .uninit        = uninit_video,
+    FILTER_INPUTS(avfilter_vf_cropdetect_video_inputs),
+    FILTER_OUTPUTS(avfilter_vf_cropdetect_outputs),
+    FILTER_PIXFMTS_ARRAY(pix_fmts_video),
+    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC | AVFILTER_FLAG_METADATA_ONLY,
+};
diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index faed832cd4..2da4018785 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -647,6 +647,14 @@  FATE_METADATA_FILTER-$(call ALLYES, $(CROPDETECT_DEPS)) += fate-filter-metadata-
 fate-filter-metadata-cropdetect: SRC = $(TARGET_SAMPLES)/filter/cropdetect.mp4
 fate-filter-metadata-cropdetect: CMD = run $(FILTER_METADATA_COMMAND) "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',cropdetect=max_outliers=3"
 
+CROPDETECT_VIDEO_DEPS = LAVFI_INDEV FILE_PROTOCOL MOVIE_FILTER MESTIMATE_FILTER CROPDETECT_VIDEO_FILTER \
+                  SCALE_FILTER MOV_DEMUXER H264_DECODER
+FATE_METADATA_FILTER-$(call ALLYES, $(CROPDETECT_VIDEO_DEPS)) += fate-filter-metadata-cropdetect_video1 fate-filter-metadata-cropdetect_video2
+fate-filter-metadata-cropdetect_video1: SRC = $(TARGET_SAMPLES)/filter/cropdetect_video1.mp4
+fate-filter-metadata-cropdetect_video1: CMD = run $(FILTER_METADATA_COMMAND) "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',mestimate,cropdetect_video,metadata=mode=print"
+fate-filter-metadata-cropdetect_video2: SRC = $(TARGET_SAMPLES)/filter/cropdetect_video2.mp4
+fate-filter-metadata-cropdetect_video2: CMD = run $(FILTER_METADATA_COMMAND) "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',mestimate,cropdetect_video,metadata=mode=print"
+
 FREEZEDETECT_DEPS = LAVFI_INDEV MPTESTSRC_FILTER SCALE_FILTER FREEZEDETECT_FILTER
 FATE_METADATA_FILTER-$(call ALLYES, $(FREEZEDETECT_DEPS)) += fate-filter-metadata-freezedetect
 fate-filter-metadata-freezedetect: CMD = run $(FILTER_METADATA_COMMAND) "sws_flags=+accurate_rnd+bitexact;mptestsrc=r=25:d=10:m=51,freezedetect"
diff --git a/tests/ref/fate/filter-metadata-cropdetect_video1 b/tests/ref/fate/filter-metadata-cropdetect_video1
new file mode 100644
index 0000000000..892373cc11
--- /dev/null
+++ b/tests/ref/fate/filter-metadata-cropdetect_video1
@@ -0,0 +1,9 @@ 
+pts=0
+pts=1001
+pts=2002|tag:lavfi.cropdetect.x1=20|tag:lavfi.cropdetect.x2=851|tag:lavfi.cropdetect.y1=311|tag:lavfi.cropdetect.y2=601|tag:lavfi.cropdetect.w=832|tag:lavfi.cropdetect.h=288|tag:lavfi.cropdetect.x=20|tag:lavfi.cropdetect.y=314
+pts=3003|tag:lavfi.cropdetect.x1=20|tag:lavfi.cropdetect.x2=885|tag:lavfi.cropdetect.y1=311|tag:lavfi.cropdetect.y2=621|tag:lavfi.cropdetect.w=864|tag:lavfi.cropdetect.h=304|tag:lavfi.cropdetect.x=22|tag:lavfi.cropdetect.y=316
+pts=4004|tag:lavfi.cropdetect.x1=0|tag:lavfi.cropdetect.x2=885|tag:lavfi.cropdetect.y1=115|tag:lavfi.cropdetect.y2=621|tag:lavfi.cropdetect.w=880|tag:lavfi.cropdetect.h=496|tag:lavfi.cropdetect.x=4|tag:lavfi.cropdetect.y=122
+pts=5005|tag:lavfi.cropdetect.x1=20|tag:lavfi.cropdetect.x2=885|tag:lavfi.cropdetect.y1=311|tag:lavfi.cropdetect.y2=621|tag:lavfi.cropdetect.w=864|tag:lavfi.cropdetect.h=304|tag:lavfi.cropdetect.x=22|tag:lavfi.cropdetect.y=316
+pts=6006|tag:lavfi.cropdetect.x1=0|tag:lavfi.cropdetect.x2=885|tag:lavfi.cropdetect.y1=115|tag:lavfi.cropdetect.y2=621|tag:lavfi.cropdetect.w=880|tag:lavfi.cropdetect.h=496|tag:lavfi.cropdetect.x=4|tag:lavfi.cropdetect.y=122
+pts=7007|tag:lavfi.cropdetect.x1=0|tag:lavfi.cropdetect.x2=885|tag:lavfi.cropdetect.y1=115|tag:lavfi.cropdetect.y2=621|tag:lavfi.cropdetect.w=880|tag:lavfi.cropdetect.h=496|tag:lavfi.cropdetect.x=4|tag:lavfi.cropdetect.y=122
+pts=8008|tag:lavfi.cropdetect.x1=0|tag:lavfi.cropdetect.x2=885|tag:lavfi.cropdetect.y1=115|tag:lavfi.cropdetect.y2=621|tag:lavfi.cropdetect.w=880|tag:lavfi.cropdetect.h=496|tag:lavfi.cropdetect.x=4|tag:lavfi.cropdetect.y=122
diff --git a/tests/ref/fate/filter-metadata-cropdetect_video2 b/tests/ref/fate/filter-metadata-cropdetect_video2
new file mode 100644
index 0000000000..6b433d17cb
--- /dev/null
+++ b/tests/ref/fate/filter-metadata-cropdetect_video2
@@ -0,0 +1,9 @@ 
+pts=0
+pts=512
+pts=1024|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=33|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=34
+pts=1536|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=33|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=34
+pts=2048|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=29|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=32
+pts=2560|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=29|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=32
+pts=3072|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=29|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=32
+pts=3584|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=29|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=32
+pts=4096|tag:lavfi.cropdetect.x1=21|tag:lavfi.cropdetect.x2=817|tag:lavfi.cropdetect.y1=29|tag:lavfi.cropdetect.y2=465|tag:lavfi.cropdetect.w=784|tag:lavfi.cropdetect.h=432|tag:lavfi.cropdetect.x=28|tag:lavfi.cropdetect.y=32