Message ID | 1497637426-4209-1-git-send-email-izaronplatz@gmail.com |
---|---|
State | New |
Headers | show |
Info about this patch: Working with one video, I noticed that ffmpeg incorrectly extracts out subtitles. Some subtitles have disappeared. I have a video file as Matroska (.mkv) container, subtitles are there in the SubRip (.srt) format. Output of `ffmpeg`: https://pastebin.com/raw/vyiMNFbe Output of my Matroska parser: https://pastebin.com/raw/8XsXkm0C Output of `mkvextract`: https://pastebin.com/raw/7K7nkwsd When I open the .mkv file in VLC Player, these magical subtitles are displayed correctly. But of all the extracted subtitle files, only the second file works correctly (from my parser) in VLC, in other 2 files these subtitle lines have dissapeared. Therefore, I suggest deleting the newline characters at the beginning of the line, and working on. FATE tests works without error. 2017-06-16 21:23 GMT+03:00 Evgeny Shulgin <izaronplatz@gmail.com>: > We can have a certain number of newline characters at the very beginning > of the subtitles. We must skip them to get the correct result. > --- > libavcodec/htmlsubtitles.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > index 16295da..b144aaa 100644 > --- a/libavcodec/htmlsubtitles.c > +++ b/libavcodec/htmlsubtitles.c > @@ -62,6 +62,9 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, > const char *in) > strcpy(stack[0].param[PARAM_COLOR], "{\\c}"); > strcpy(stack[0].param[PARAM_FACE], "{\\fn}"); > > + while (*in && (*in == '\r' || *in == '\n')) > + in++; > + > for (; !end && *in; in++) { > switch (*in) { > case '\r': > -- > 2.7.4 > >
On Fri, 16 Jun 2017 21:30:44 +0300 Evgeny Shulgin <izaronplatz@gmail.com> wrote: > Info about this patch: > > Working with one video, I noticed that ffmpeg incorrectly extracts out > subtitles. Some subtitles have disappeared. > I have a video file as Matroska (.mkv) container, subtitles are there in > the SubRip (.srt) format. > > Output of `ffmpeg`: https://pastebin.com/raw/vyiMNFbe > Output of my Matroska parser: https://pastebin.com/raw/8XsXkm0C > Output of `mkvextract`: https://pastebin.com/raw/7K7nkwsd > > When I open the .mkv file in VLC Player, these magical subtitles are > displayed correctly. > But of all the extracted subtitle files, only the second file works > correctly (from my parser) in VLC, in other 2 files these subtitle lines > have dissapeared. > > Therefore, I suggest deleting the newline characters at the beginning of > the line, and working on. FATE tests works without error. Most of this should probably go into the commit message (without pastebin links, and instead with excerpts of interesting subtitle lines). Anyway, I don't understand why this makes subtitle lines actually disappear?
> Anyway, I don't understand why this makes subtitle lines actually > disappear? Because SubRip format does not allow to have empty lines, that is: 1. Subtitle text itself on one or more lines 2. A blank line containing no text, indicating the end of this subtitle So, when we have '\n' at the start or two '\n'-s in a row, the code interrupts the recording of sub. But in this video we could get subtitles as "\r\n\r\n\r\nСкандинавия", and this video is played perfectly in VLC player like it's "Скандинавия". So we had to deal with it. Anyway, this .mkv was recorded successfully a couple of years ago: Metadata: encoder : libebml v1.3.0 + libmatroska v1.4.1 creation_time : 2015-05-30T16:56:01.000000Z _STATISTICS_WRITING_APP: mkvmerge v7.0.0 ('Where We Going') 32bit built on Jun 9 2014 15:08:34 > Most of this should probably go into the commit message How can I edit the patch? Send the updated patch as attachment in the emal? I don't see how can I do it on the patchwork site. Thanks!
On Fri, 16 Jun 2017 23:40:32 +0300 Evgeny Shulgin <izaronplatz@gmail.com> wrote: > > Anyway, I don't understand why this makes subtitle lines actually > > disappear? > > Because SubRip format does not allow to have empty lines, that is: > > 1. Subtitle text itself on one or more lines > 2. A blank line containing no text, indicating the end of this subtitle > > So, when we have '\n' at the start or two '\n'-s in a row, the code > interrupts the recording of sub. But in this video we could get subtitles > as "\r\n\r\n\r\nСкандинавия", and this video is played perfectly in VLC > player like it's "Скандинавия". So we had to deal with it. Anyway, this > .mkv was recorded successfully a couple of years ago: > > Metadata: > encoder : libebml v1.3.0 + libmatroska v1.4.1 > creation_time : 2015-05-30T16:56:01.000000Z > > _STATISTICS_WRITING_APP: mkvmerge v7.0.0 ('Where We Going') 32bit built on > Jun 9 2014 15:08:34 I could understand if this was in the .srt parser, but this is just the srt to ass converter. Does the latter really drop anything after the first newline? > > > Most of this should probably go into the commit message > > How can I edit the patch? Send the updated patch as attachment in the emal? > I don't see how can I do it on the patchwork site. Thanks! Just use -v2 and maybe --in-reply-to= with the message ID.
diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c index 16295da..b144aaa 100644 --- a/libavcodec/htmlsubtitles.c +++ b/libavcodec/htmlsubtitles.c @@ -62,6 +62,9 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) strcpy(stack[0].param[PARAM_COLOR], "{\\c}"); strcpy(stack[0].param[PARAM_FACE], "{\\fn}"); + while (*in && (*in == '\r' || *in == '\n')) + in++; + for (; !end && *in; in++) { switch (*in) { case '\r':