mbox series

[FFmpeg-devel,v6,00/25] Subtitle Filtering 2022

Message ID pull.18.v6.ffstaging.FFmpeg.1656261322.ffmpegagent@gmail.com
Headers show
Series Subtitle Filtering 2022 | expand

Message

Aman Karmani June 26, 2022, 4:34 p.m. UTC
Subtitle Filtering 2022
=======================

This is a substantial update to the earlier subtitle filtering patch series.
A primary goal has been to address others' concerns as much as possible on
one side and to provide more clarity and control over the way things are
working. Clarity is is specifically important to allow for a better
understanding of the need for a subtitle start pts value that can be
different from the frame's pts value. This is done by refactoring the
subtitle timing fields in AVFrame, adding a frame field to indicate repeated
subtitle frames, and finally the full removal of the heartbeat
functionality, replaced by a new 'subfeed' filter that provides different
modes for arbitrating subtitle frames in a filter graph. Finally, each
subtitle filter's documentation has been amended by a section describing the
filter's timeline behavior (in v3 update).


Subtitle Filtering Demos
========================

I published a demonstration of subtitle filtering capabilities with OCR,
text and bitmap subtitle manipulation involved: Demo 1: Text-Manipulation
with Bitmap Subtitles
[https://github.com/softworkz/SubtitleFilteringDemos/tree/master/Demo1]


v6 - Fix assertion errors
=========================

 * text2graphicsub: fix null point on uninit after error
 * strim: propagate width and height
 * avfilter: add default propagation time_base from inlink to outlink


v5 - Conversion to Graphic Subtitles, and other enhancements
============================================================

 * I'm glad to announce that Traian (@tcoza) has joined the project and
   contributed a new 'text2graphicsub' filter to convert text subtitles to
   graphic subtitles, which can in turn be encoded as dvd, dvb or x-subs
   (and any other encoder for graphic subs that might be added in the
   future). This filter closes the last open "gap" in subtitle processing.
 * stripstyles filter: now allows very fine-grained control over which ASS
   style codes should be preserved or stripped
 * stripstyles: do not drop dialog margin values
 * subfeed filter: eliminates duplicate frames with duplicate start times
   when 'fix_overlap' is specified
 * textmod: do not drop effect values
 * graphicsub2text: reduce font size jitter
 * ass_split: add function to selectively preserve elements when splitting
 * add strim, snull and ssink and further unify subtitle frame handling with
   audio and video
 * ffmpeg_filter: get simple filter notation working for subtitles


v4 - Quality Improvements
=========================

 * finally an updated version
 * includes many improvements from internal testing
 * all FATE tests passed
 * all example commands from the docs verified to work
 * can't list all the detail changes..
 * I have left out the extra commits which can be handled separately, just
   in case somebody wonders why these are missing:
   * avcodec/webvttenc: Don't encode drawing codes and empty lines
   * avcodec/webvttenc: convert hard-space tags to  
   * avutil/ass_split: Add parsing of hard-space tags (\h)
   * avutil/ass_split: Treat all content in curly braces as hidden
   * avutil/ass_split: Fix ass parsing of style codes with comments


v3 - Rebase
===========

due to merge conflicts - apologies.


Changes in v2
=============

 * added .gitattributes file to enforce binary diffs for the test refs that
   cannot be applied when being sent via e-mail
 * perform filter graph re-init due to subtitle "frame size" change only
   when the size was unknown before and not set via -canvas_size
 * overlaytextsubs: Make sure to request frames on the subtitle input
 * avfilter/splitcc: Start parsing cc data on key frames only
 * avcodec/webvttenc: Don't encode ass drawing codes and empty lines
 * stripstyles: fix mem leak
 * gs2t: improve color detection
 * gs2t: empty frames must not be skipped
 * subfeed: fix name
 * textmod: preserve margins
 * added .gitattributes file to enforce binary diffs for the test refs that
   cannot be applied when being sent via e-mail
 * perform filter graph re-init due to subtitle "frame size" change only
   when the size was unknown before and not set via -canvas_size
 * avcodec/dvbsubdec: Fix conditions for fallback to default resolution
 * Made changes suggested by Andreas
 * Fixed failing command line reported by Michael

Changes from previous version v24:


AVFrame
=======

 * Removed sub_start_time The start time is now added to the subtitle
   start_pts during decoding The sub_end_time field is adjusted accordingly
 * Renamed sub_end_time to duration which it is effectively after removing
   the start_time
 * Added a sub-struct 'subtitle_timing' to av frame Contains subtitle_pts
   renamed to 'subtitle_timing.start_pts' and 'subtitle_timing.duration'
 * Change both fields to (fixed) time_base AV_TIMEBASE
 * add repeat_sub field provides a clear indication whether a subtitle frame
   is an actual subtitle event or a repeated subtitle frame in a filter
   graph


Heartbeat Removal
=================

 * completely removed the earlier heartbeat implementation
 * filtering arbitration is now implemented in a new filter: 'subfeed'
 * subfeed will be auto-inserted for compatiblity with sub2video command
   lines
 * the new behavior is not exactly identical to the earlier behavior, but it
   basically allows to achieve the same results
 * there's a small remainder, now named subtitle kickoff which serves to get
   things (in the filter graph) going right from the start


New 'subfeed' Filter
====================

 * a versatile filter for solving all kinds of problems with subtile frame
   flow in filter graphs
 * Can be inserted at any position in a graph
 * Auto-inserted for sub2video command lines (in repeat-mode)
 * Allows duration fixup delay input frames with unknown duration and infer
   duration from start of subsequent frame
 * Provides multiple modes of operation:
   * repeat mode (default) Queues input frames Outputs frames at a fixed
     (configurable) rate Either sends a matching input frame (repeatedly) or
     empty frames otherwise
   * scatter mode similar to repeat mode, but splits input frames by
     duration into small segments with same content
   * forward mode No fixed output rate Useful in combination with duration
     fixup or overlap fixup


ffmpeg Tool Changes
===================

 * delay subtitle output stream initialization (like for audio and video)
   This is needed for example when a format header depends on having
   received an initial frame to derive certain header values from
 * decoding: set subtitle frame size from decoding context
 * re-init graph when subtitle size changes
 * always insert subscale filter for sub2video command lines (to ensure
   correct scaling)


Subtitle Encoding
=================

 * ignore repeated frames for encoding based on repeat_sub field in AVFrame
 * support multi-area encoding for text subtitles Subtitle OCR can create
   multiple areas at different positions. Previously, the texts were always
   squashed into a single area ('subtitle rect'), which was not ideal.
   Multiple text areas are now generally supported:
   * ASS Encoder Changed to use the 'receive_packet' encoding API A single
     frame with multiple text areas will create multiple packets now
   * All other text subtitle encoders A newline is inserted between the text
     from multiple areas


graphicsub2text (OCR)
=====================

 * enhanced preprocessing
   * using elbg algorithm for color quantization
   * detection and removal of text outlines
   * map-based identification of colors per word (text, outline, background)
 * add option for duration fixup
 * add option to dump preprocessing bitmaps
 * Recognize formatting and apply as ASS inline styles
   * per word(!)
   * paragraph alignment
   * positioning
   * font names
   * font size
   * font style (italic, underline, bold)
   * text color, outline color


Other Filter Changes
====================

 * all: Make sure to forward all link properties (time base, frame rate, w,
   h) where appropriate
 * overlaytextsubs: request frames on the subtitle input
 * overlaytextsubs: disable read-order checking
 * overlaytextsubs: improve implementation of render_latest_only
 * overlaytextsubs: ensure equal in/out video formats
 * splitcc: derive framerate from realtime_latency
 * graphicsub2video: implement caching of converted frames
 * graphicsub2video: use 1x1 output frame size as long as subtitle size is
   unknown (0x0)

Plus a dozen of things I forgot..

softworkz (25):
  avcodec,avutil: Move enum AVSubtitleType to avutil, add new and
    deprecate old values
  avutil/frame: Prepare AVFrame for subtitle handling
  avcodec/subtitles: Introduce new frame-based subtitle decoding API
  avcodec/libzvbi: set subtitle type
  avfilter/subtitles: Update vf_subtitles to use new decoding api
  avcodec,avutil: Move ass helper functions to avutil as avpriv_ and
    extend ass dialog parsing
  avcodec/subtitles: Replace deprecated enum values
  fftools/play,probe: Adjust for subtitle changes
  avfilter/subtitles: Add subtitles.c for subtitle frame allocation
  avfilter/avfilter: Handle subtitle frames
  avfilter/avfilter: Fix hardcoded input index
  avfilter/sbuffer: Add sbuffersrc and sbuffersink filters
  avfilter/overlaygraphicsubs: Add overlaygraphicsubs and
    graphicsub2video filters
  avfilter/overlaytextsubs: Add overlaytextsubs and textsubs2video
    filters
  avfilter/textmod: Add textmod, censor and show_speaker filters
  avfilter/stripstyles: Add stripstyles filter
  avfilter/splitcc: Add splitcc filter for closed caption handling
  avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)
  avfilter/subscale: Add filter for scaling and/or re-arranging
    graphical subtitles
  avfilter/subfeed: add subtitle feed filter
  avfilter/text2graphicsub: Added text2graphicsub subtitle filter
  avfilter/snull,strim: Add snull and strim filters
  avcodec/subtitles: Migrate subtitle encoders to frame-based API
  fftools/ffmpeg: Introduce subtitle filtering and new frame-based
    subtitle encoding
  avcodec/dvbsubdec: Fix conditions for fallback to default resolution

 configure                                 |   10 +-
 doc/filters.texi                          |  807 ++++++++++++++
 fftools/ffmpeg.c                          |  613 +++++-----
 fftools/ffmpeg.h                          |   17 +-
 fftools/ffmpeg_filter.c                   |  270 +++--
 fftools/ffmpeg_hw.c                       |    2 +-
 fftools/ffmpeg_opt.c                      |   28 +-
 fftools/ffplay.c                          |  102 +-
 fftools/ffprobe.c                         |   47 +-
 libavcodec/Makefile                       |   56 +-
 libavcodec/ass.h                          |  151 +--
 libavcodec/ass_split.h                    |  191 ----
 libavcodec/assdec.c                       |    4 +-
 libavcodec/assenc.c                       |  191 +++-
 libavcodec/avcodec.c                      |    8 +
 libavcodec/avcodec.h                      |   34 +-
 libavcodec/ccaption_dec.c                 |   20 +-
 libavcodec/codec_internal.h               |   12 -
 libavcodec/decode.c                       |   60 +-
 libavcodec/dvbsubdec.c                    |   53 +-
 libavcodec/dvbsubenc.c                    |   96 +-
 libavcodec/dvdsubdec.c                    |    2 +-
 libavcodec/dvdsubenc.c                    |  102 +-
 libavcodec/encode.c                       |   61 +-
 libavcodec/internal.h                     |   16 +
 libavcodec/jacosubdec.c                   |    2 +-
 libavcodec/libaribb24.c                   |    2 +-
 libavcodec/libzvbi-teletextdec.c          |   17 +-
 libavcodec/microdvddec.c                  |    7 +-
 libavcodec/movtextdec.c                   |    3 +-
 libavcodec/movtextenc.c                   |  126 ++-
 libavcodec/mpl2dec.c                      |    2 +-
 libavcodec/pgssubdec.c                    |    2 +-
 libavcodec/realtextdec.c                  |    2 +-
 libavcodec/samidec.c                      |    2 +-
 libavcodec/srtdec.c                       |    2 +-
 libavcodec/srtenc.c                       |  116 +-
 libavcodec/subviewerdec.c                 |    2 +-
 libavcodec/tests/avcodec.c                |    5 +-
 libavcodec/textdec.c                      |    4 +-
 libavcodec/ttmlenc.c                      |  114 +-
 libavcodec/utils.c                        |  185 ++-
 libavcodec/webvttdec.c                    |    2 +-
 libavcodec/webvttenc.c                    |   94 +-
 libavcodec/xsubdec.c                      |    2 +-
 libavcodec/xsubenc.c                      |   88 +-
 libavfilter/Makefile                      |   18 +
 libavfilter/allfilters.c                  |   19 +
 libavfilter/avfilter.c                    |   42 +-
 libavfilter/avfilter.h                    |   11 +
 libavfilter/avfiltergraph.c               |    5 +
 libavfilter/buffersink.c                  |   54 +
 libavfilter/buffersink.h                  |    7 +
 libavfilter/buffersrc.c                   |   72 ++
 libavfilter/buffersrc.h                   |    1 +
 libavfilter/formats.c                     |   16 +
 libavfilter/formats.h                     |    3 +
 libavfilter/internal.h                    |   19 +-
 libavfilter/sf_graphicsub2text.c          | 1137 +++++++++++++++++++
 libavfilter/sf_snull.c                    |   50 +
 libavfilter/sf_splitcc.c                  |  395 +++++++
 libavfilter/sf_stripstyles.c              |  237 ++++
 libavfilter/sf_subfeed.c                  |  412 +++++++
 libavfilter/sf_subscale.c                 |  884 +++++++++++++++
 libavfilter/sf_text2graphicsub.c          |  634 +++++++++++
 libavfilter/sf_textmod.c                  |  710 ++++++++++++
 libavfilter/subtitles.c                   |   63 ++
 libavfilter/subtitles.h                   |   44 +
 libavfilter/trim.c                        |   60 +-
 libavfilter/vf_overlaygraphicsubs.c       |  765 +++++++++++++
 libavfilter/vf_overlaytextsubs.c          |  680 +++++++++++
 libavfilter/vf_subtitles.c                |   67 +-
 libavutil/Makefile                        |    4 +
 {libavcodec => libavutil}/ass.c           |  115 +-
 libavutil/ass_internal.h                  |  135 +++
 {libavcodec => libavutil}/ass_split.c     |  179 ++-
 libavutil/ass_split_internal.h            |  254 +++++
 libavutil/frame.c                         |  206 +++-
 libavutil/frame.h                         |   85 +-
 libavutil/subfmt.c                        |   45 +
 libavutil/subfmt.h                        |  115 ++
 libavutil/version.h                       |    1 +
 tests/ref/fate/filter-overlay-dvdsub-2397 |  182 +--
 tests/ref/fate/sub-dvb                    |  162 +--
 tests/ref/fate/sub-scc                    |    1 -
 tests/ref/fate/sub2video                  | 1091 +++++++++++++++++-
 tests/ref/fate/sub2video_basic            | 1238 +++++++++++++++++++--
 tests/ref/fate/sub2video_time_limited     |   78 +-
 88 files changed, 12424 insertions(+), 1604 deletions(-)
 delete mode 100644 libavcodec/ass_split.h
 create mode 100644 libavfilter/sf_graphicsub2text.c
 create mode 100644 libavfilter/sf_snull.c
 create mode 100644 libavfilter/sf_splitcc.c
 create mode 100644 libavfilter/sf_stripstyles.c
 create mode 100644 libavfilter/sf_subfeed.c
 create mode 100644 libavfilter/sf_subscale.c
 create mode 100644 libavfilter/sf_text2graphicsub.c
 create mode 100644 libavfilter/sf_textmod.c
 create mode 100644 libavfilter/subtitles.c
 create mode 100644 libavfilter/subtitles.h
 create mode 100644 libavfilter/vf_overlaygraphicsubs.c
 create mode 100644 libavfilter/vf_overlaytextsubs.c
 rename {libavcodec => libavutil}/ass.c (59%)
 create mode 100644 libavutil/ass_internal.h
 rename {libavcodec => libavutil}/ass_split.c (71%)
 create mode 100644 libavutil/ass_split_internal.h
 create mode 100644 libavutil/subfmt.c
 create mode 100644 libavutil/subfmt.h


base-commit: 6a82412bf33108111eb3f63076fd5a51349ae114
Published-As: https://github.com/ffstaging/FFmpeg/releases/tag/pr-ffstaging-18%2Fsoftworkz%2Fsubmit_subfiltering-v6
Fetch-It-Via: git fetch https://github.com/ffstaging/FFmpeg pr-ffstaging-18/softworkz/submit_subfiltering-v6
Pull-Request: https://github.com/ffstaging/FFmpeg/pull/18

Range-diff vs v5:

  1:  aa32b9048f =  1:  aa32b9048f avcodec,avutil: Move enum AVSubtitleType to avutil, add new and deprecate old values
  2:  d5ab9d1919 =  2:  d5ab9d1919 avutil/frame: Prepare AVFrame for subtitle handling
  3:  0a685a6b19 =  3:  0a685a6b19 avcodec/subtitles: Introduce new frame-based subtitle decoding API
  4:  0b69b1ce19 =  4:  0b69b1ce19 avcodec/libzvbi: set subtitle type
  5:  0c2091e57c =  5:  0c2091e57c avfilter/subtitles: Update vf_subtitles to use new decoding api
  6:  4903cdd1cd =  6:  4903cdd1cd avcodec,avutil: Move ass helper functions to avutil as avpriv_ and extend ass dialog parsing
  7:  98f12ad7e9 =  7:  98f12ad7e9 avcodec/subtitles: Replace deprecated enum values
  8:  12c8a308d3 =  8:  12c8a308d3 fftools/play,probe: Adjust for subtitle changes
  9:  2e55dbe180 =  9:  2e55dbe180 avfilter/subtitles: Add subtitles.c for subtitle frame allocation
 10:  c931041103 ! 10:  0d953dedcb avfilter/avfilter: Handle subtitle frames
     @@ libavfilter/avfilter.c: static void tlog_ref(void *ctx, AVFrame *ref, int end)
           }
       
           ff_tlog(ctx, "]%s", end ? "\n" : "");
     +@@ libavfilter/avfilter.c: int avfilter_config_links(AVFilterContext *filter)
     + 
     +                 if (!link->time_base.num && !link->time_base.den)
     +                     link->time_base = (AVRational) {1, link->sample_rate};
     ++
     ++                break;
     ++
     ++            case AVMEDIA_TYPE_SUBTITLE:
     ++                if (!link->time_base.num && !link->time_base.den)
     ++                    link->time_base = inlink ? inlink->time_base : AV_TIME_BASE_Q;
     ++
     ++                break;
     +             }
     + 
     +             if (link->src->nb_inputs && link->src->inputs[0]->hw_frames_ctx &&
      @@ libavfilter/avfilter.c: int ff_filter_frame(AVFilterLink *link, AVFrame *frame)
                   av_assert1(frame->width               == link->w);
                   av_assert1(frame->height               == link->h);
 11:  36cab55ff2 = 11:  b462fa2c2f avfilter/avfilter: Fix hardcoded input index
 12:  f41070479c = 12:  fcabb53750 avfilter/sbuffer: Add sbuffersrc and sbuffersink filters
 13:  9bfaba4ace = 13:  9e16dbcecd avfilter/overlaygraphicsubs: Add overlaygraphicsubs and graphicsub2video filters
 14:  918fd9aaf5 = 14:  a17048cfff avfilter/overlaytextsubs: Add overlaytextsubs and textsubs2video filters
 15:  a361ad35c5 = 15:  6330a337b2 avfilter/textmod: Add textmod, censor and show_speaker filters
 16:  bca90ebc3e = 16:  732e2fbf7d avfilter/stripstyles: Add stripstyles filter
 17:  6e488e495f = 17:  4df0d12130 avfilter/splitcc: Add splitcc filter for closed caption handling
 18:  1057dff7da = 18:  27bf505078 avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)
 19:  4e85fb5d2f = 19:  8b98a32895 avfilter/subscale: Add filter for scaling and/or re-arranging graphical subtitles
 20:  88e8adb889 = 20:  de1d1db41c avfilter/subfeed: add subtitle feed filter
 21:  a96bb5c788 ! 21:  f33df64eb4 avfilter/text2graphicsub: Added text2graphicsub subtitle filter
     @@
       ## Metadata ##
     -Author: tcoza <traian.coza@gmail.com>
     +Author: softworkz <softworkz@hotmail.com>
      
       ## Commit message ##
          avfilter/text2graphicsub: Added text2graphicsub subtitle filter
     @@ libavfilter/sf_text2graphicsub.c (new)
      +static void free_palettizecontext(PalettizeContext **palettizecontext)
      +{
      +    PalettizeContext *context = *palettizecontext;
     -+    av_freep(&context->codebook);
     -+    av_freep(&context->codeword);
     -+    av_freep(&context->codeword_closest_codebook_idxs);
     -+    avpriv_elbg_free(&context->elbg);
     -+    av_free(context);
     -+    *palettizecontext = NULL;
     ++    if (context) {
     ++        av_freep(&context->codebook);
     ++        av_freep(&context->codeword);
     ++        av_freep(&context->codeword_closest_codebook_idxs);
     ++        avpriv_elbg_free(&context->elbg);
     ++        av_free(context);
     ++        *palettizecontext = NULL;
     ++    }
      +}
      +
      +/* libass supports a log level ranging from 0 to 7 */
     @@ libavfilter/sf_text2graphicsub.c (new)
      +    AVFilterContext *ctx = outlink->src;
      +    Text2GraphicSubContext *context = ctx->priv;
      +
     ++    outlink->time_base = AV_TIME_BASE_Q;
     ++    outlink->format = AV_SUBTITLE_FMT_BITMAP;
      +    outlink->w = context->size.width;
      +    outlink->h = context->size.height;
      +
 22:  c4922f8466 ! 22:  22d81747d1 avfilter/snull,strim: Add snull and strim filters
     @@ libavfilter/trim.c: const AVFilter ff_af_atrim = {
      +
      +#if CONFIG_STRIM_FILTER
      +
     ++static int sconfig_output(AVFilterLink *outlink)
     ++{
     ++    AVFilterContext *ctx = outlink->src;
     ++    AVFilterLink *inlink = ctx->inputs[0];
     ++
     ++    outlink->format = inlink->format;
     ++    outlink->w = inlink->w;
     ++    outlink->h = inlink->h;
     ++
     ++    return 0;
     ++}
     ++
     ++
      +#define FLAGS (AV_OPT_FLAG_SUBTITLE_PARAM | AV_OPT_FLAG_FILTERING_PARAM)
      +static const AVOption strim_options[] = {
      +    COMMON_OPTS
     @@ libavfilter/trim.c: const AVFilter ff_af_atrim = {
      +    {
      +        .name         = "default",
      +        .type         = AVMEDIA_TYPE_SUBTITLE,
     ++        .config_props = sconfig_output,
      +    },
      +};
      +
 23:  848f84d5dc = 23:  6d8532d73d avcodec/subtitles: Migrate subtitle encoders to frame-based API
 24:  2645a1a842 = 24:  1e2fc0d09f fftools/ffmpeg: Introduce subtitle filtering and new frame-based subtitle encoding
 25:  a90a6e1086 = 25:  61f775e35f avcodec/dvbsubdec: Fix conditions for fallback to default resolution