diff mbox series

[FFmpeg-devel,v10,2/5] avformat/rcwtdec: add RCWT Closed Captions demuxer

Message ID 20240328201132.1804716-3-marth64@proxyid.net
State New
Headers show
Series RCWT Closed Captions demuxer (v10) | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Marth64 March 28, 2024, 8:11 p.m. UTC
RCWT (Raw Captions With Time) is a format native to ccextractor,
a commonly used OSS tool for processing 608/708 Closed Captions (CC).
RCWT can be used to archive the original extracted CC bitstream.
The muxer was added in January 2024. In this commit, add the demuxer.

One can now demux RCWT files for rendering in ccaption_dec or interop
with ccextractor (which produces RCWT). Using the muxer/demuxer combo,
the CC bits can be kept for processing or rendering with either tool.
This can be an effective way to backup an original CC stream, including
format extensions like EIA-708 and overall original presentation.

Signed-off-by: Marth64 <marth64@proxyid.net>
---
 doc/demuxers.texi        |  30 ++++++++++
 libavformat/Makefile     |   1 +
 libavformat/allformats.c |   1 +
 libavformat/rcwtdec.c    | 123 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 155 insertions(+)
 create mode 100644 libavformat/rcwtdec.c

Comments

Tomas Härdin March 29, 2024, 11:51 a.m. UTC | #1
> +    /* demux */
> +    while (!avio_feof(avf->pb)) {

Can we please get away from this way of reading subtitles? Every other
type of media (audio, video) are capable of being streamed, but not
subtitles, precisely because all of them do all parsing in the
read_header() call. We have a perfectly good generic index and seeking
functionality. My recent experiment with srt shows it's possible to
read packets in read_packet() like every other demuxer..

/Tomas
Marth64 March 29, 2024, 2:28 p.m. UTC | #2
Tomas Härdin:
> Can we please get away from this way of reading subtitles? Every other
> type of media (audio, video) are capable of being streamed, but not
> subtitles, precisely because all of them do all parsing in the
> read_header() call. We have a perfectly good generic index and seeking
> functionality. My recent experiment with srt shows it's possible to
> read packets in read_packet() like every other demuxer..

Is there an example I can follow?  Is this something that can be fixed
in an enhancement patch or is it a deal-breaker to merge this?
Tomas Härdin March 29, 2024, 5:21 p.m. UTC | #3
fre 2024-03-29 klockan 09:28 -0500 skrev Marth64:
> Tomas Härdin:
> > Can we please get away from this way of reading subtitles? Every
> > other
> > type of media (audio, video) are capable of being streamed, but not
> > subtitles, precisely because all of them do all parsing in the
> > read_header() call. We have a perfectly good generic index and
> > seeking
> > functionality. My recent experiment with srt shows it's possible to
> > read packets in read_packet() like every other demuxer..
> 
> Is there an example I can follow?

Not yet, but perhaps once I get my srtdec patchset through there will
be something to follow. Might follow it up with a similar patch for
webvttdec

> Is this something that can be fixed
> in an enhancement patch or is it a deal-breaker to merge this?

Nah it can be done at a later point as an enhancement if you prefer

/Tomas
Marth64 March 29, 2024, 6:03 p.m. UTC | #4
Tomas Härdin :
> once I get my srtdec patchset through there will
> be something to follow.
I see the patch now. I agree, this looks like a good step. Thank you!

> Nah it can be done at a later point as an enhancement if you prefer
Yes, please. I am happy to do it, but I think will be
smoother to do after an example is there (since it introduces
a new pattern for subtitles). Thank you for understanding.
Michael Niedermayer March 30, 2024, 12:23 a.m. UTC | #5
On Thu, Mar 28, 2024 at 03:11:29PM -0500, Marth64 wrote:
[...]

> +static int rcwt_probe(const AVProbeData *p)
> +{
> +    return p->buf_size > RCWT_HEADER_SIZE   &&
> +           AV_RB16(p->buf) == 0xCCCC        &&
> +           AV_RB8(p->buf + 2) == 0xED       &&
> +           AV_RB16(p->buf + 6) == 0x0001    ? 50 : 0;
> +}
> +
> +const FFInputFormat ff_rcwt_demuxer = {
> +    .p.name         = "rcwt",
> +    .p.long_name    = NULL_IF_CONFIG_SMALL("RCWT (Raw Captions With Time)"),
> +    .p.extensions   = "bin",

this causes a mp3 i have to be misdetected
~/videos/sbQ9.bin
(this is a actual file i had not a file crafted for this)

i think the entry for extensions should be removed (which fixes this)
having a ".bin" is not a strong indication that its rcwt

thx

[...]
Stefano Sabatini March 30, 2024, 2:55 p.m. UTC | #6
On date Saturday 2024-03-30 01:23:53 +0100, Michael Niedermayer wrote:
> On Thu, Mar 28, 2024 at 03:11:29PM -0500, Marth64 wrote:
> [...]
> 
> > +static int rcwt_probe(const AVProbeData *p)
> > +{
> > +    return p->buf_size > RCWT_HEADER_SIZE   &&
> > +           AV_RB16(p->buf) == 0xCCCC        &&
> > +           AV_RB8(p->buf + 2) == 0xED       &&
> > +           AV_RB16(p->buf + 6) == 0x0001    ? 50 : 0;
> > +}
> > +
> > +const FFInputFormat ff_rcwt_demuxer = {
> > +    .p.name         = "rcwt",
> > +    .p.long_name    = NULL_IF_CONFIG_SMALL("RCWT (Raw Captions With Time)"),
> > +    .p.extensions   = "bin",
> 

> this causes a mp3 i have to be misdetected
> ~/videos/sbQ9.bin
> (this is a actual file i had not a file crafted for this)
> 
> i think the entry for extensions should be removed (which fixes this)
> having a ".bin" is not a strong indication that its rcwt

Is this blocking or can it be addressed later? Also, if this needs to
be modified the muxer should be as well.
Marth64 March 30, 2024, 5:12 p.m. UTC | #7
> i think the entry for extensions should be removed (which fixes this)
> having a ".bin" is not a strong indication that its rcwt

> Is this blocking or can it be addressed later? Also, if this needs to
> be modified the muxer should be as well.

I can address both today in a new set. .bin is pretty generic
(although it is what ccextractor uses), so I get it. Thanks.
Michael Niedermayer March 31, 2024, 3:07 p.m. UTC | #8
On Sat, Mar 30, 2024 at 03:55:13PM +0100, Stefano Sabatini wrote:
> On date Saturday 2024-03-30 01:23:53 +0100, Michael Niedermayer wrote:
> > On Thu, Mar 28, 2024 at 03:11:29PM -0500, Marth64 wrote:
> > [...]
> > 
> > > +static int rcwt_probe(const AVProbeData *p)
> > > +{
> > > +    return p->buf_size > RCWT_HEADER_SIZE   &&
> > > +           AV_RB16(p->buf) == 0xCCCC        &&
> > > +           AV_RB8(p->buf + 2) == 0xED       &&
> > > +           AV_RB16(p->buf + 6) == 0x0001    ? 50 : 0;
> > > +}
> > > +
> > > +const FFInputFormat ff_rcwt_demuxer = {
> > > +    .p.name         = "rcwt",
> > > +    .p.long_name    = NULL_IF_CONFIG_SMALL("RCWT (Raw Captions With Time)"),
> > > +    .p.extensions   = "bin",
> > 
> 
> > this causes a mp3 i have to be misdetected
> > ~/videos/sbQ9.bin
> > (this is a actual file i had not a file crafted for this)
> > 
> > i think the entry for extensions should be removed (which fixes this)
> > having a ".bin" is not a strong indication that its rcwt
> 
> Is this blocking or can it be addressed later? Also, if this needs to

droping the "bin" from the demuxer should be trivial to do, the extension
is IIRC used mainly for probing and its wrong for probing to associate bin with
any specific format.


> be modified the muxer should be as well.

maybe, yes

thx

[...]
Marth64 April 2, 2024, 5:15 a.m. UTC | #9
Sorry for the delay, v11 coming shortly with the fix.
diff mbox series

Patch

diff --git a/doc/demuxers.texi b/doc/demuxers.texi
index b70f3a38d7..04293c4813 100644
--- a/doc/demuxers.texi
+++ b/doc/demuxers.texi
@@ -1038,6 +1038,36 @@  the command:
 ffplay -f rawvideo -pixel_format rgb24 -video_size 320x240 -framerate 10 input.raw
 @end example
 
+@anchor{rcwtdec}
+@section rcwt
+
+RCWT (Raw Captions With Time) is a format native to ccextractor, a commonly
+used open source tool for processing 608/708 Closed Captions (CC) sources.
+For more information on the format, see @ref{rcwtenc,,,ffmpeg-formats}.
+
+This demuxer implements the specification as of March 2024, which has
+been stable and unchanged since April 2014.
+
+@subsection Examples
+
+@itemize
+@item
+Render CC to ASS using the built-in decoder:
+@example
+ffmpeg -i CC.rcwt.bin CC.ass
+@end example
+Note that if your output appears to be empty, you may have to manually
+set the decoder's @option{data_field} option to pick the desired CC substream.
+
+@item
+Convert an RCWT backup to Scenarist (SCC) format:
+@example
+ffmpeg -i CC.rcwt.bin -c:s copy CC.scc
+@end example
+Note that the SCC format does not support all of the possible CC extensions
+that can be stored in RCWT (such as EIA-708).
+@end itemize
+
 @section sbg
 
 SBaGen script demuxer.
diff --git a/libavformat/Makefile b/libavformat/Makefile
index 44aa485029..5d77cba7f1 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -493,6 +493,7 @@  OBJS-$(CONFIG_QOA_DEMUXER)               += qoadec.o
 OBJS-$(CONFIG_R3D_DEMUXER)               += r3d.o
 OBJS-$(CONFIG_RAWVIDEO_DEMUXER)          += rawvideodec.o
 OBJS-$(CONFIG_RAWVIDEO_MUXER)            += rawenc.o
+OBJS-$(CONFIG_RCWT_DEMUXER)              += rcwtdec.o subtitles.o
 OBJS-$(CONFIG_RCWT_MUXER)                += rcwtenc.o subtitles.o
 OBJS-$(CONFIG_REALTEXT_DEMUXER)          += realtextdec.o subtitles.o
 OBJS-$(CONFIG_REDSPARK_DEMUXER)          += redspark.o
diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index 9df42bb87a..ae925dcf60 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -391,6 +391,7 @@  extern const FFInputFormat  ff_qoa_demuxer;
 extern const FFInputFormat  ff_r3d_demuxer;
 extern const FFInputFormat  ff_rawvideo_demuxer;
 extern const FFOutputFormat ff_rawvideo_muxer;
+extern const FFInputFormat  ff_rcwt_demuxer;
 extern const FFOutputFormat ff_rcwt_muxer;
 extern const FFInputFormat  ff_realtext_demuxer;
 extern const FFInputFormat  ff_redspark_demuxer;
diff --git a/libavformat/rcwtdec.c b/libavformat/rcwtdec.c
new file mode 100644
index 0000000000..91f994c3ab
--- /dev/null
+++ b/libavformat/rcwtdec.c
@@ -0,0 +1,123 @@ 
+/*
+ * RCWT (Raw Captions With Time) demuxer
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/*
+ * RCWT (Raw Captions With Time) is a format native to ccextractor, a commonly
+ * used open source tool for processing 608/708 Closed Captions (CC) sources.
+ *
+ * This demuxer implements the specification as of March 2024, which has
+ * been stable and unchanged since April 2014.
+ *
+ * A free specification of RCWT can be found here:
+ * @url{https://github.com/CCExtractor/ccextractor/blob/master/docs/BINARY_FILE_FORMAT.TXT}
+ */
+
+#include "avformat.h"
+#include "demux.h"
+#include "internal.h"
+#include "subtitles.h"
+#include "libavutil/intreadwrite.h"
+
+#define RCWT_HEADER_SIZE                    11
+
+typedef struct RCWTContext {
+    FFDemuxSubtitlesQueue q;
+} RCWTContext;
+
+static int rcwt_read_header(AVFormatContext *avf)
+{
+    RCWTContext *rcwt = avf->priv_data;
+
+    AVStream      *st;
+    uint8_t       header[RCWT_HEADER_SIZE];
+    int           ret;
+
+    /* read header */
+    ret = ffio_read_size(avf->pb, header, RCWT_HEADER_SIZE);
+    if (ret < 0)
+        return ret;
+
+    if (AV_RB16(header + 6) != 0x0001) {
+        av_log(avf, AV_LOG_ERROR, "RCWT format version is not compatible "
+                                  "(only version 0.001 is known)\n");
+        return AVERROR_INVALIDDATA;
+    }
+
+    av_log(avf, AV_LOG_DEBUG, "RCWT writer application: %02X version: %02x\n",
+                              header[3], header[5]);
+
+    /* setup stream */
+    st = avformat_new_stream(avf, NULL);
+    if (!st)
+        return AVERROR(ENOMEM);
+
+    st->codecpar->codec_type = AVMEDIA_TYPE_SUBTITLE;
+    st->codecpar->codec_id   = AV_CODEC_ID_EIA_608;
+
+    avpriv_set_pts_info(st, 64, 1, 1000);
+
+    /* demux */
+    while (!avio_feof(avf->pb)) {
+        AVPacket      *sub;
+        int64_t       cluster_pos       = avio_tell(avf->pb);
+        int64_t       cluster_pts       = avio_rl64(avf->pb);
+        int           cluster_nb_blocks = avio_rl16(avf->pb);
+
+        if (cluster_nb_blocks == 0)
+            continue;
+
+        sub = ff_subtitles_queue_insert(&rcwt->q, NULL, 0, 0);
+        if (!sub)
+            return AVERROR(ENOMEM);
+
+        ret = av_get_packet(avf->pb, sub, cluster_nb_blocks * 3);
+        if (ret < 0)
+            return ret;
+
+        sub->pos = cluster_pos;
+        sub->pts = cluster_pts;
+    }
+
+    ff_subtitles_queue_finalize(avf, &rcwt->q);
+
+    return 0;
+}
+
+static int rcwt_probe(const AVProbeData *p)
+{
+    return p->buf_size > RCWT_HEADER_SIZE   &&
+           AV_RB16(p->buf) == 0xCCCC        &&
+           AV_RB8(p->buf + 2) == 0xED       &&
+           AV_RB16(p->buf + 6) == 0x0001    ? 50 : 0;
+}
+
+const FFInputFormat ff_rcwt_demuxer = {
+    .p.name         = "rcwt",
+    .p.long_name    = NULL_IF_CONFIG_SMALL("RCWT (Raw Captions With Time)"),
+    .p.extensions   = "bin",
+    .p.flags        = AVFMT_TS_DISCONT,
+    .priv_data_size = sizeof(RCWTContext),
+    .flags_internal = FF_INFMT_FLAG_INIT_CLEANUP,
+    .read_probe     = rcwt_probe,
+    .read_header    = rcwt_read_header,
+    .read_packet    = ff_subtitles_read_packet,
+    .read_seek2     = ff_subtitles_read_seek,
+    .read_close     = ff_subtitles_read_close
+};