From patchwork Sun Dec 5 19:42:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Soft Works X-Patchwork-Id: 32020 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp3630341iog; Sun, 5 Dec 2021 11:45:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJypOu8tMoeIHZtr43F4I/FyZi/fhG6/nhbeN475BZ6XevuG3iJgdl9SQrWBjalRL+E9RtL7 X-Received: by 2002:a05:6402:26d4:: with SMTP id x20mr48368082edd.119.1638733499830; Sun, 05 Dec 2021 11:44:59 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dz21si21349675edb.49.2021.12.05.11.44.59; Sun, 05 Dec 2021 11:44:59 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@hotmail.com header.s=selector1 header.b=EZtsXkVE; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hotmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B98CA68B003; Sun, 5 Dec 2021 21:42:06 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from NAM12-MW2-obe.outbound.protection.outlook.com (unknown [40.92.23.83]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DB98F68AF9C for ; Sun, 5 Dec 2021 21:42:02 +0200 (EET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yjo7NjdyXXdp5SN2Y4Htk4YMUnm+gsy2gA3eimZjxOPchMTXmfQC/f7+LbrMVvFBjwnOpGQVZMFqdDpzYoX1osMavaaN+n1Udj556UCaweEeEHykHJl1k0NZ94u5BL0CPTH+E1muICnSRiGxBBb7gb6pnhkvrJ6kGgANzO+3nhSMMLv+1R5bYQWK+yFgMpADKDK9Q65daFZ2F2q8FMWLb7L4w8z2Rdzl2V/Gw2d/O8r0TAue3BC08dLWCLM7X7QCQTttbIP8EUs/v/5xquqkupst25r7DWKO5ewE7CjeB/Koz1K5bTiMpaZAFN70pt4YSigYyt05IUwFRmaOKAuV1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iNQk/6sK8zvW8uoyNOy+WkuI3q5iE6YJRP0NlUK+Ejo=; b=JEFdvofQG0JJ6JweH8MrmwnsIa4lrh3ZXhyIUKE5mLo6qKXEzOd/NkxiqTpUtrhLp8Rov1F69XGb6fRPih7s0y1Gpx1s6hIBmrj8EhxxsRj5pCv5xxyx+EaGqy2k2nC3BhR37fmCUWplN+ZuWO2pXMCadHNxtGTdbvloPpq896DgYlAdB0KskR/7zIX3/XtWjEIrHf9QSRA0zFvqVU5ImNsZGYK/b6c5XaEofAJty8QBRlbkyWQM5X099Tnnu6ImuSa8jarkRCHHRJoYRui7bvgna8Cv5s34WlQTCJkK9H7467PbEGEBxe9HYP4PShhBa+udrDAnFmGFbil6jnTfAg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iNQk/6sK8zvW8uoyNOy+WkuI3q5iE6YJRP0NlUK+Ejo=; b=EZtsXkVEJ1vlB7KeIXGV6ZK4a/QehiXtkRZBGfrvQhTJFa7sxC+0vOSPoJ9oqb8l7KjnA8EPZzI/I3wEb8y/hLZgu+g6rDB6qTkB9Z5YXvPSX3pVOC1JNlwkhmVmr0OFbuK66MKA0pTVjXtFi/QTRRS6tFTxe2un+jFRgqCQLzlNpWnYQDu67xGnA/uavoXJ0xZrexmQDJDRKvqU5mF2fpSwqTPGMhD77Wm12/qNw/WDY8zet0x09wO58q7S95shHcGo+3lCsGeCDk9PMAGDZQTchftnnEDZN6g+5BqGRlTcJ+8etkxgmAir119mqvnHyIFbrc3k87EM8pzSgMTqPg== Received: from DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM (2603:10b6:8:b::20) by DM8P223MB0320.NAMP223.PROD.OUTLOOK.COM (2603:10b6:8::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4755.21; Sun, 5 Dec 2021 19:42:02 +0000 Received: from DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM ([fe80::9c8d:fc63:9488:9775]) by DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM ([fe80::9c8d:fc63:9488:9775%8]) with mapi id 15.20.4755.021; Sun, 5 Dec 2021 19:42:02 +0000 From: Soft Works To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH v21 19/20] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR) Thread-Index: AQHX6hAs1+TP4cemMk2VqfwGb0IblA== Date: Sun, 5 Dec 2021 19:42:02 +0000 Message-ID: References: <1bdaf4de8cf369e4a28e7a5d8be2870ea5e34b39.1638733198.git.softworkz@hotmail.com> <9fdc2f3e9bedd2ade341e81095496866d786488b.1638733198.git.softworkz@hotmail.com> <9189c7c58671e4d611b5a7f44c9db4b37416088c.1638733198.git.softworkz@hotmail.com> <3ce4c329dfd8718fb86f61ee73b044f1a3285486.1638733198.git.softworkz@hotmail.com> <8e26ae0f796a7c9be3aa94740b68a7fa9400dafc.1638733198.git.softworkz@hotmail.com> <7cd2de4f62b2e30ce670cfe421afa1171ad23594.1638733198.git.softworkz@hotmail.com> <602d3c490440d703afd815714e183e62367e3e3e.1638733198.git.softworkz@hotmail.com> <65fed074109cab2b1dc123b0f37e85890dea528b.1638733198.git.softworkz@hotmail.com> <72b7410d291e8356649700eaaef61b30a526db3d.1638733198.git.softworkz@hotmail.com> <99dc80c1b7f60adefdb1812c0fda7ca2b49b3524.1638733198.git.softworkz@hotmail.com> <0fbd116020f0accff90e78704527a90b38b4da9c.1638733198.git.softworkz@hotmail.com> <1bcd5eac7b3d980803e828771f7dd8ac62d2d6fc.1638733198.git.softworkz@hotmail.com> <42ac242cb70029add2b251ff3d5b719aa3e40cff.1638733199.git.softworkz@hotmail.com> <8aebe812b530a1b0ee1915afd8138124068d434d.1638733199.git.softworkz@hotmail.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [3NVctq06PV+PpD7L0GUI7oWr8YrlZHhElRlyO2LNV7ydY6GFClt//GEuCTv5RaDt] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 1aad4577-5ec4-4aaa-bdee-08d9b8274f2b x-ms-traffictypediagnostic: DM8P223MB0320: x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 7ZD4YR3uuTN1WH7hiiVQIuAV8AW5SYMk5dWHbgzDmcmHuPRB7mSbUZl4b1hhZz6uhkcsl3s72j+wd9q5S46gvEsBmm34kpK1QwCGnrjDnnDw/PzIUkpKGHxIvmiEWLVUFSzQ0RQIqYppV6+Xg/V6FEQdj1scprnIpx+20OCmtw1dbPQNjNpRlorFw+HP2GcHeN65lkHhfnGdjXwPXlP0B4WG5Y+Hu9wXUAzB2W0QT5+CXVU1QOCilT8aSrvuYgpN8Pq7aFL7FlBEXEkW7wbxhKk0flin6XwLvvrsj+KayoD+3nB13PN4aPniycpXE/0+vGDvnG0eH9pLm2ezIpfEOYVTm91lOLXI4lZ7k6pmJcA5CdviZTONpb7llhcjMpUicbQ1lHn0KrGW/ZPDHmXIM2vGH7uSNpxog9dBl3L/MLGYEvCFKMVu1kqbvx2pES25BEyOCwS/5XFVluiBaz6jMpvxt1k/R+II9VyRSGVNroGWVrkGz6HyZ5jIoc3KmMBr0lMewpbhllsE/38m823DPkWItmT894XNelbhLLt8JI309XWCy6VqvlRjYLXaflzCC24Qvpq5z/O7nO5Df5TAw50IrgfscOAS1dLnwGmsRYyEI4vyRRVo6Vx7IAh1uL9Yi7PleBvdwcXLe6oiLiliAqvT7JzZbxT181/uh8mBvu8= x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?utf-8?q?ztzgx7LyoUsFdaDchU0phWrIimfn?= =?utf-8?q?s3KCpqUovTvhsGBvgZlp9H/UWB9AJLYOo5P3Mu0Hd+nTWkuR1zxpGJFYUuQ6UcNRX?= =?utf-8?q?WvZVdhZ3vdGSegtd0mRhT4WE15NaT5YA8ehI2aRXQgViV7k4PEEtv4PO4Ju160cL/?= =?utf-8?q?gAJCyVc1iaMxvtEYkqDWQZQUmAa/b5IZhklKGRX/cD9R8FB4M29zgPc7wc130Vzd5?= =?utf-8?q?YbRtdmbjaHTjXzHr7gQCEfCh9yR24JTP9qZXAIkxg99IaVmLT+rLMKlAxq9rQ1XeP?= =?utf-8?q?gOG9/LRNOJ84qP4jqFGQKLK4CWLU2n2iyyGz6fM7pHIDyVfhb0nUnoDBmE7Hgk0Ri?= =?utf-8?q?Wh4wnUGPsbYDJ2ltJ9lv6D+LHodySGB8oSUDvtGszAOc3Ws6bTmuF4Uxu0761hipS?= =?utf-8?q?G/ixCEmldmUmsKe6QayXz09+1mrFsKx9edfB9gY5J3ZxZc0Vw6stpvj6rlsUEwMtl?= =?utf-8?q?/LAfIPDozvxcr1DPgVzjJgE5XemWZcY4LqNadrSHSw+bMDTVPGILZO3O8w5cMMJ5g?= =?utf-8?q?9wkup2CK5feejOqdd6RpdVvimLG63LPCJNbSJHTlFesMJGVqIyUPk3N3Rpe18Q5Q6?= =?utf-8?q?pRDxONyZEHQhdS9IlQ3g7uUyDgKVJI1RrPYmwqaIvy9UxXvbfuVnrPBmhM4hla0w4?= =?utf-8?q?iJe5irwumhtEFmGbL270gk8VCxIVy5FSVqoftcgOzVY1iiv28VCMSU3wSm1zADJjk?= =?utf-8?q?6TeS6j0rDyPfCLnY3ngY/NTmPRcseH+fgQLO4dXn2J/Br+PYOEORZBo2c20CdkOZ5?= =?utf-8?q?/CDl/dvLeqOdT76NdMzhPD0VEullYa7GjJV6z+sgL9x2P43mQNHtJaFXsHba/hG4A?= =?utf-8?q?uj5ZMd1kMs3jg1StyHp9lsAo1LZcTEA95oJGvTouuDkauWgvaHp718KBzNeGlnruf?= =?utf-8?q?gD0PdMrAo34D3kiMP/vrYa/RpBulqya0qly2r4vYtlmsY27NRFPiq587j/2mLFfb/?= =?utf-8?q?nvIgbeBofEP/XkVSXZkenmp/JHLYKw9oaT93yoTWCag=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: sct-15-20-4755-11-msonline-outlook-1ff67.templateTenant X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 1aad4577-5ec4-4aaa-bdee-08d9b8274f2b X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Dec 2021 19:42:02.0162 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8P223MB0320 Subject: [FFmpeg-devel] [PATCH v21 19/20] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: F3c+z6Z4ykJH Signed-off-by: softworkz --- configure | 1 + doc/filters.texi | 55 +++++ libavfilter/Makefile | 2 + libavfilter/allfilters.c | 1 + libavfilter/sf_graphicsub2text.c | 354 +++++++++++++++++++++++++++++++ 5 files changed, 413 insertions(+) create mode 100644 libavfilter/sf_graphicsub2text.c diff --git a/configure b/configure index 2160521951..1110c8fd51 100755 --- a/configure +++ b/configure @@ -3640,6 +3640,7 @@ frei0r_filter_deps="frei0r" frei0r_src_filter_deps="frei0r" fspp_filter_deps="gpl" gblur_vulkan_filter_deps="vulkan spirv_compiler" +graphicsub2text_filter_deps="libtesseract" hflip_vulkan_filter_deps="vulkan spirv_compiler" histeq_filter_deps="gpl" hqdn3d_filter_deps="gpl" diff --git a/doc/filters.texi b/doc/filters.texi index 5c1432311a..ea056be66b 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -25819,6 +25819,61 @@ ffmpeg -i "https://streams.videolan.org/ffmpeg/mkv_subtitles.mkv" -filter_comple @end example @end itemize +@section graphicsub2text + +Converts graphic subtitles to text subtitles by performing OCR. + +For this filter to be available, ffmpeg needs to be compiled with libtesseract (see https://github.com/tesseract-ocr/tesseract). +Language models need to be downloaded from https://github.com/tesseract-ocr/tessdata and put into as subfolder named 'tessdata' or into a folder specified via the environment variable 'TESSDATA_PREFIX'. +The path can also be specified via filter option (see below). + +Note: These models are including the data for both OCR modes. + +Inputs: +- 0: Subtitles [bitmap] + +Outputs: +- 0: Subtitles [text] + +It accepts the following parameters: + +@table @option +@item ocr_mode +The character recognition mode to use. + +Supported OCR modes are: + +@table @var +@item 0, tesseract +This is the classic libtesseract operation mode. It is fast but less accurate than LSTM. +@item 1, lstm +Newer OCR implementation based on ML models. Provides usually better results, requires more processing resources. +@item 2, both +Use a combination of both modes. +@end table + +@item tessdata_path +The path to a folder containing the language models to be used. + +@item language +The recognition language. It needs to match the first three characters of a language model file in the tessdata path. + +@end table + + +@subsection Examples + +@itemize +@item +Convert DVB graphic subtitles to ASS (text) subtitles + +Note: For this to work, you need to have the data file 'eng.traineddata' in a 'tessdata' subfolder (see above). +@example +ffmpeg ffmpeg -loglevel verbose -i "https://streams.videolan.org/streams/ts/video_subs_ttxt%2Bdvbsub.ts" -filter_complex "[0:13]graphicsub2text=ocr_mode=both" -c:s ass -y output.mkv +@end example +@end itemize + + @section graphicsub2video Renders graphic subtitles as video frames. diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 2224e5fe5f..3b972e134b 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -296,6 +296,8 @@ OBJS-$(CONFIG_GBLUR_VULKAN_FILTER) += vf_gblur_vulkan.o vulkan.o vulka OBJS-$(CONFIG_GEQ_FILTER) += vf_geq.o OBJS-$(CONFIG_GRADFUN_FILTER) += vf_gradfun.o OBJS-$(CONFIG_GRAPHICSUB2VIDEO_FILTER) += vf_overlaygraphicsubs.o framesync.o +OBJS-$(CONFIG_GRAPHICSUB2TEXT_FILTER) += sf_graphicsub2text.o +OBJS-$(CONFIG_GRAPHICSUB2VIDEO_FILTER) += vf_overlaygraphicsubs.o framesync.o OBJS-$(CONFIG_GRAPHMONITOR_FILTER) += f_graphmonitor.o OBJS-$(CONFIG_GRAYWORLD_FILTER) += vf_grayworld.o OBJS-$(CONFIG_GREYEDGE_FILTER) += vf_colorconstancy.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 6adde2b9f6..f70f08dc5a 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -545,6 +545,7 @@ extern const AVFilter ff_avf_showwaves; extern const AVFilter ff_avf_showwavespic; extern const AVFilter ff_vaf_spectrumsynth; extern const AVFilter ff_sf_censor; +extern const AVFilter ff_sf_graphicsub2text; extern const AVFilter ff_sf_showspeaker; extern const AVFilter ff_sf_splitcc; extern const AVFilter ff_sf_stripstyles; diff --git a/libavfilter/sf_graphicsub2text.c b/libavfilter/sf_graphicsub2text.c new file mode 100644 index 0000000000..ef10d60efd --- /dev/null +++ b/libavfilter/sf_graphicsub2text.c @@ -0,0 +1,354 @@ +/* + * Copyright (c) 2021 softworkz + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * subtitle filter to convert graphical subs to text subs via OCR + */ + +#include +#include + +#include "libavutil/opt.h" +#include "subtitles.h" + +typedef struct SubOcrContext { + const AVClass *class; + int w, h; + + TessBaseAPI *tapi; + TessOcrEngineMode ocr_mode; + char *tessdata_path; + char *language; + + int readorder_counter; + + AVFrame *pending_frame; +} SubOcrContext; + + +static int init(AVFilterContext *ctx) +{ + SubOcrContext *s = ctx->priv; + const char* tver = TessVersion(); + int ret; + + s->tapi = TessBaseAPICreate(); + + if (!s->tapi || !tver || !strlen(tver)) { + av_log(ctx, AV_LOG_ERROR, "Failed to access libtesseract\n"); + return AVERROR(ENOSYS); + } + + av_log(ctx, AV_LOG_VERBOSE, "Initializing libtesseract, version: %s\n", tver); + + ret = TessBaseAPIInit4(s->tapi, s->tessdata_path, s->language, s->ocr_mode, NULL, 0, NULL, NULL, 0, 1); + if (ret < 0 ) { + av_log(ctx, AV_LOG_ERROR, "Failed to initialize libtesseract. Error: %d\n", ret); + return AVERROR(ENOSYS); + } + + ret = TessBaseAPISetVariable(s->tapi, "tessedit_char_blacklist", "|"); + if (ret < 0 ) { + av_log(ctx, AV_LOG_ERROR, "Failed to set 'tessedit_char_blacklist'. Error: %d\n", ret); + return AVERROR(EINVAL); + } + + return 0; +} + +static void uninit(AVFilterContext *ctx) +{ + SubOcrContext *s = ctx->priv; + + if (s->tapi) { + TessBaseAPIEnd(s->tapi); + TessBaseAPIDelete(s->tapi); + } +} + +static int query_formats(AVFilterContext *ctx) +{ + AVFilterFormats *formats, *formats2; + AVFilterLink *inlink = ctx->inputs[0]; + AVFilterLink *outlink = ctx->outputs[0]; + static const enum AVSubtitleType in_fmts[] = { AV_SUBTITLE_FMT_BITMAP, AV_SUBTITLE_FMT_NONE }; + static const enum AVSubtitleType out_fmts[] = { AV_SUBTITLE_FMT_ASS, AV_SUBTITLE_FMT_NONE }; + int ret; + + /* set input format */ + formats = ff_make_format_list(in_fmts); + if ((ret = ff_formats_ref(formats, &inlink->outcfg.formats)) < 0) + return ret; + + /* set output format */ + formats2 = ff_make_format_list(out_fmts); + if ((ret = ff_formats_ref(formats2, &outlink->incfg.formats)) < 0) + return ret; + + return 0; +} + +static int config_input(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + SubOcrContext *s = ctx->priv; + + if (s->w <= 0 || s->h <= 0) { + s->w = inlink->w; + s->h = inlink->h; + } + return 0; +} + +static int config_output(AVFilterLink *outlink) +{ + const AVFilterContext *ctx = outlink->src; + SubOcrContext *s = ctx->priv; + + outlink->format = AV_SUBTITLE_FMT_ASS; + outlink->w = s->w; + outlink->h = s->h; + + return 0; +} + +static uint8_t* create_grayscale_image(AVFilterContext *ctx, AVSubtitleArea *area) +{ + uint8_t gray_pal[256]; + const size_t img_size = area->buf[0]->size; + const uint8_t* img = area->buf[0]->data; + uint8_t* gs_img = av_malloc(img_size); + + if (!gs_img) + return NULL; + + for (unsigned i = 0; i < 256; i++) { + const uint8_t *col = (uint8_t*)&area->pal[i]; + const int val = (int)col[3] * FFMAX3(col[0], col[1], col[2]); + gray_pal[i] = (uint8_t)(val >> 8); + } + + for (unsigned i = 0; i < img_size; i++) + gs_img[i] = 255 - gray_pal[img[i]]; + + return gs_img; +} + +static int convert_area(AVFilterContext *ctx, AVSubtitleArea *area) +{ + SubOcrContext *s = ctx->priv; + char *ocr_text = NULL; + int ret; + uint8_t *gs_img = create_grayscale_image(ctx, area); + + if (!gs_img) + return AVERROR(ENOMEM); + + area->type = AV_SUBTITLE_FMT_ASS; + TessBaseAPISetImage(s->tapi, gs_img, area->w, area->h, 1, area->linesize[0]); + TessBaseAPISetSourceResolution(s->tapi, 70); + + ret = TessBaseAPIRecognize(s->tapi, NULL); + if (ret == 0) + ocr_text = TessBaseAPIGetUTF8Text(s->tapi); + + if (!ocr_text) { + av_log(ctx, AV_LOG_WARNING, "OCR didn't return a text. ret=%d\n", ret); + area->ass = NULL; + } + else { + const size_t len = strlen(ocr_text); + + if (len > 0 && ocr_text[len - 1] == '\n') + ocr_text[len - 1] = 0; + + av_log(ctx, AV_LOG_VERBOSE, "OCR Result: %s\n", ocr_text); + + area->ass = av_strdup(ocr_text); + + TessDeleteText(ocr_text); + } + + av_freep(&gs_img); + av_buffer_unref(&area->buf[0]); + area->type = AV_SUBTITLE_FMT_ASS; + + return 0; +} + +static int filter_frame(AVFilterLink *inlink, AVFrame *frame) +{ + AVFilterContext *ctx = inlink->dst; + SubOcrContext *s = ctx->priv; + AVFilterLink *outlink = inlink->dst->outputs[0]; + int ret, frame_sent = 0; + + if (s->pending_frame) { + const uint64_t pts_diff = frame->subtitle_pts - s->pending_frame->subtitle_pts; + + if (pts_diff == 0) { + // This is just a repetition of the previous frame, ignore it + av_frame_free(&frame); + return 0; + } + + s->pending_frame->subtitle_end_time = (uint32_t)(pts_diff / 1000); + + ret = ff_filter_frame(outlink, s->pending_frame); + s->pending_frame = NULL; + if (ret < 0) + return ret; + + frame_sent = 1; + + if (frame->num_subtitle_areas == 0) { + // No need to forward this empty frame + av_frame_free(&frame); + return 0; + } + } + + ret = av_frame_make_writable(frame); + + if (ret < 0) { + av_frame_free(&frame); + return ret; + } + + frame->format = AV_SUBTITLE_FMT_ASS; + + av_log(ctx, AV_LOG_DEBUG, "filter_frame sub_pts: %"PRIu64", start_time: %d, end_time: %d, num_areas: %d\n", + frame->subtitle_pts, frame->subtitle_start_time, frame->subtitle_end_time, frame->num_subtitle_areas); + + if (frame->num_subtitle_areas > 1 && + frame->subtitle_areas[0]->y > frame->subtitle_areas[frame->num_subtitle_areas - 1]->y) { + + for (unsigned i = 0; i < frame->num_subtitle_areas / 2; i++) + FFSWAP(AVSubtitleArea*, frame->subtitle_areas[i], frame->subtitle_areas[frame->num_subtitle_areas - i - 1]); + } + + for (unsigned i = 0; i < frame->num_subtitle_areas; i++) { + AVSubtitleArea *area = frame->subtitle_areas[i]; + + ret = convert_area(ctx, area); + if (ret < 0) + return ret; + + if (area->ass && area->ass[0] != '\0') { + char *tmp = area->ass; + + if (i == 0) + area->ass = avpriv_ass_get_dialog(s->readorder_counter++, 0, "Default", NULL, tmp); + else { + AVSubtitleArea* area0 = frame->subtitle_areas[0]; + char* tmp2 = area0->ass; + area0->ass = av_asprintf("%s\\N%s", area0->ass, tmp); + av_free(tmp2); + area->ass = NULL; + } + + av_free(tmp); + } + } + + if (frame->num_subtitle_areas > 1) { + for (unsigned i = 1; i < frame->num_subtitle_areas; i++) { + AVSubtitleArea* area = frame->subtitle_areas[i]; + + for (unsigned n = 0; n < FF_ARRAY_ELEMS(area->buf); n++) + av_buffer_unref(&area->buf[n]); + + av_freep(&area->text); + av_freep(&area->ass); + av_freep(&frame->subtitle_areas[i]); + } + + AVSubtitleArea* area0 = frame->subtitle_areas[0]; + av_freep(&frame->subtitle_areas); + frame->subtitle_areas = av_malloc_array(1, sizeof(AVSubtitleArea*)); + frame->subtitle_areas[0] = area0; + frame->num_subtitle_areas = 1; + } + + // When decoders can't determine the end time, they are setting it either to UINT32_NAX + // or 30s (dvbsub). + if (frame->num_subtitle_areas > 0 && frame->subtitle_end_time >= 30000) { + // Can't send it without end time, wait for the next frame to determine the end_display time + s->pending_frame = frame; + + if (frame_sent) + return 0; + + // To keep all going, send an empty frame instead + frame = ff_get_subtitles_buffer(outlink, AV_SUBTITLE_FMT_ASS); + if (!frame) + return AVERROR(ENOMEM); + + av_frame_copy_props(frame, s->pending_frame); + frame->subtitle_end_time = 1; + } + + return ff_filter_frame(outlink, frame); +} + +#define OFFSET(x) offsetof(SubOcrContext, x) +#define FLAGS (AV_OPT_FLAG_SUBTITLE_PARAM | AV_OPT_FLAG_FILTERING_PARAM) + +static const AVOption graphicsub2text_options[] = { + { "ocr_mode", "set ocr mode", OFFSET(ocr_mode), AV_OPT_TYPE_INT, {.i64=OEM_TESSERACT_ONLY}, OEM_TESSERACT_ONLY, 2, FLAGS, "ocr_mode" }, + { "tesseract", "classic tesseract ocr", 0, AV_OPT_TYPE_CONST, {.i64=OEM_TESSERACT_ONLY}, 0, 0, FLAGS, "ocr_mode" }, + { "lstm", "lstm (ML based)", 0, AV_OPT_TYPE_CONST, {.i64=OEM_LSTM_ONLY}, 0, 0, FLAGS, "ocr_mode" }, + { "both", "use both models combined", 0, AV_OPT_TYPE_CONST, {.i64=OEM_TESSERACT_LSTM_COMBINED}, 0, 0, FLAGS, "ocr_mode" }, + { "tessdata_path", "path to tesseract data", OFFSET(tessdata_path), AV_OPT_TYPE_STRING, {.str = NULL}, 0, 0, FLAGS, NULL }, + { "language", "ocr language", OFFSET(language), AV_OPT_TYPE_STRING, {.str = "eng"}, 0, 0, FLAGS, NULL }, + { NULL }, +}; + +AVFILTER_DEFINE_CLASS(graphicsub2text); + +static const AVFilterPad inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + .filter_frame = filter_frame, + .config_props = config_input, + }, +}; + +static const AVFilterPad outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_SUBTITLE, + .config_props = config_output, + }, +}; + +const AVFilter ff_sf_graphicsub2text = { + .name = "graphicsub2text", + .description = NULL_IF_CONFIG_SMALL("Convert graphical subtitles to text subtitles via OCR"), + .init = init, + .uninit = uninit, + .priv_size = sizeof(SubOcrContext), + .priv_class = &graphicsub2text_class, + FILTER_INPUTS(inputs), + FILTER_OUTPUTS(outputs), + FILTER_QUERY_FUNC(query_formats), +};