[FFmpeg-devel] libavcodec/htmlsubtitles: skip newline characters at the beginning of the subtitle

Submitted by Evgeny Shulgina on June 16, 2017, 6:23 p.m.

Details

Message ID 1497637426-4209-1-git-send-email-izaronplatz@gmail.com
State New
Headers show

Commit Message

Evgeny Shulgina June 16, 2017, 6:23 p.m.
We can have a certain number of newline characters at the very beginning
of the subtitles. We must skip them to get the correct result.
---
 libavcodec/htmlsubtitles.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Evgeny Shulgina June 16, 2017, 6:30 p.m.
Info about this patch:

Working with one video, I noticed that ffmpeg incorrectly extracts out
subtitles. Some subtitles have disappeared.
I have a video file as Matroska (.mkv) container, subtitles are there in
the SubRip (.srt) format.

Output of `ffmpeg`: https://pastebin.com/raw/vyiMNFbe
Output of my Matroska parser:  https://pastebin.com/raw/8XsXkm0C
Output of `mkvextract`: https://pastebin.com/raw/7K7nkwsd

When I open the .mkv file in VLC Player, these magical subtitles are
displayed correctly.
But of all the extracted subtitle files, only the second file works
correctly (from my parser) in VLC, in other 2 files these subtitle lines
have dissapeared.

Therefore, I suggest deleting the newline characters at the beginning of
the line, and working on. FATE tests works without error.

2017-06-16 21:23 GMT+03:00 Evgeny Shulgin <izaronplatz@gmail.com>:

> We can have a certain number of newline characters at the very beginning
> of the subtitles. We must skip them to get the correct result.
> ---
>  libavcodec/htmlsubtitles.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> index 16295da..b144aaa 100644
> --- a/libavcodec/htmlsubtitles.c
> +++ b/libavcodec/htmlsubtitles.c
> @@ -62,6 +62,9 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst,
> const char *in)
>      strcpy(stack[0].param[PARAM_COLOR], "{\\c}");
>      strcpy(stack[0].param[PARAM_FACE],  "{\\fn}");
>
> +    while (*in && (*in == '\r' || *in == '\n'))
> +        in++;
> +
>      for (; !end && *in; in++) {
>          switch (*in) {
>          case '\r':
> --
> 2.7.4
>
>
wm4 June 16, 2017, 7:57 p.m.
On Fri, 16 Jun 2017 21:30:44 +0300
Evgeny Shulgin <izaronplatz@gmail.com> wrote:

> Info about this patch:
> 
> Working with one video, I noticed that ffmpeg incorrectly extracts out
> subtitles. Some subtitles have disappeared.
> I have a video file as Matroska (.mkv) container, subtitles are there in
> the SubRip (.srt) format.
> 
> Output of `ffmpeg`: https://pastebin.com/raw/vyiMNFbe
> Output of my Matroska parser:  https://pastebin.com/raw/8XsXkm0C
> Output of `mkvextract`: https://pastebin.com/raw/7K7nkwsd
> 
> When I open the .mkv file in VLC Player, these magical subtitles are
> displayed correctly.
> But of all the extracted subtitle files, only the second file works
> correctly (from my parser) in VLC, in other 2 files these subtitle lines
> have dissapeared.
> 
> Therefore, I suggest deleting the newline characters at the beginning of
> the line, and working on. FATE tests works without error.

Most of this should probably go into the commit message (without
pastebin links, and instead with excerpts of interesting subtitle
lines).

Anyway, I don't understand why this makes subtitle lines actually
disappear?
Evgeny Shulgina June 16, 2017, 8:40 p.m.
> Anyway, I don't understand why this makes subtitle lines actually
> disappear?

Because SubRip format does not allow to have empty lines, that is:

   1. Subtitle text itself on one or more lines
   2. A blank line containing no text, indicating the end of this subtitle

So, when we have '\n' at the start or two '\n'-s in a row, the code
interrupts the recording of sub. But in this video we could get subtitles
as "\r\n\r\n\r\nСкандинавия", and this video is played perfectly in VLC
player like it's "Скандинавия". So we had to deal with it. Anyway, this
.mkv was recorded successfully a couple of years ago:

  Metadata:
    encoder         : libebml v1.3.0 + libmatroska v1.4.1
    creation_time   : 2015-05-30T16:56:01.000000Z

_STATISTICS_WRITING_APP: mkvmerge v7.0.0 ('Where We Going') 32bit built on
Jun  9 2014 15:08:34


> Most of this should probably go into the commit message

How can I edit the patch? Send the updated patch as attachment in the emal?
I don't see how can I do it on the patchwork site. Thanks!
wm4 June 20, 2017, 8:07 a.m.
On Fri, 16 Jun 2017 23:40:32 +0300
Evgeny Shulgin <izaronplatz@gmail.com> wrote:

> > Anyway, I don't understand why this makes subtitle lines actually
> > disappear?  
> 
> Because SubRip format does not allow to have empty lines, that is:
> 
>    1. Subtitle text itself on one or more lines
>    2. A blank line containing no text, indicating the end of this subtitle
> 
> So, when we have '\n' at the start or two '\n'-s in a row, the code
> interrupts the recording of sub. But in this video we could get subtitles
> as "\r\n\r\n\r\nСкандинавия", and this video is played perfectly in VLC
> player like it's "Скандинавия". So we had to deal with it. Anyway, this
> .mkv was recorded successfully a couple of years ago:
> 
>   Metadata:
>     encoder         : libebml v1.3.0 + libmatroska v1.4.1
>     creation_time   : 2015-05-30T16:56:01.000000Z
> 
> _STATISTICS_WRITING_APP: mkvmerge v7.0.0 ('Where We Going') 32bit built on
> Jun  9 2014 15:08:34

I could understand if this was in the .srt parser, but this is just the
srt to ass converter. Does the latter really drop anything after the
first newline?

> 
> > Most of this should probably go into the commit message  
> 
> How can I edit the patch? Send the updated patch as attachment in the emal?
> I don't see how can I do it on the patchwork site. Thanks!

Just use -v2 and maybe --in-reply-to= with the message ID.

Patch hide | download patch | download mbox

diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
index 16295da..b144aaa 100644
--- a/libavcodec/htmlsubtitles.c
+++ b/libavcodec/htmlsubtitles.c
@@ -62,6 +62,9 @@  int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
     strcpy(stack[0].param[PARAM_COLOR], "{\\c}");
     strcpy(stack[0].param[PARAM_FACE],  "{\\fn}");
 
+    while (*in && (*in == '\r' || *in == '\n'))
+        in++;
+
     for (; !end && *in; in++) {
         switch (*in) {
         case '\r':