diff mbox series

[FFmpeg-devel,11/23] lavc/ass_split: fix parsing utf8 scripts

Message ID 20200406175218.1299994-12-jstebbins@jetheaddev.com
State Superseded
Headers show
Series [FFmpeg-devel,01/23] lavc/movtextdec: fix ass header colors | expand

Checks

Context Check Description
andriy/ffmpeg-patchwork warning Failed to apply patch

Commit Message

John Stebbins April 6, 2020, 5:52 p.m. UTC
The [Script Info] section was skipped if starts with UTF8 BOM
---
 libavcodec/ass_split.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Nicolas George April 6, 2020, 6:08 p.m. UTC | #1
John Stebbins (12020-04-06):
> The [Script Info] section was skipped if starts with UTF8 BOM
> ---
>  libavcodec/ass_split.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/libavcodec/ass_split.c b/libavcodec/ass_split.c
> index 67da7c6d84..94c32667af 100644
> --- a/libavcodec/ass_split.c
> +++ b/libavcodec/ass_split.c
> @@ -354,6 +354,9 @@ static int ass_split(ASSSplitContext *ctx, const char *buf)
>      if (ctx->current_section >= 0)
>          buf = ass_split_section(ctx, buf);
>  

> +    if(!memcmp(buf, "\xef\xbb\xbf", 3)) { // Skip UTF-8 BOM header
> +        buf += 3;
> +    }

This doesn't look correct: BOM should be skipped only at the very
beginning of the file. And the braces could be skipped.

>      while (buf && *buf) {
>          if (sscanf(buf, "[%15[0-9A-Za-z+ ]]%c", section, &c) == 2) {
>              buf += strcspn(buf, "\n");

Regards,
John Stebbins April 6, 2020, 6:27 p.m. UTC | #2
On Mon, 2020-04-06 at 20:08 +0200, Nicolas George wrote:
> John Stebbins (12020-04-06):
> > The [Script Info] section was skipped if starts with UTF8 BOM
> > ---
> >  libavcodec/ass_split.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/libavcodec/ass_split.c b/libavcodec/ass_split.c
> > index 67da7c6d84..94c32667af 100644
> > --- a/libavcodec/ass_split.c
> > +++ b/libavcodec/ass_split.c
> > @@ -354,6 +354,9 @@ static int ass_split(ASSSplitContext *ctx,
> > const char *buf)
> >      if (ctx->current_section >= 0)
> >          buf = ass_split_section(ctx, buf);
> >  
> > +    if(!memcmp(buf, "\xef\xbb\xbf", 3)) { // Skip UTF-8 BOM header
> > +        buf += 3;
> > +    }
> 
> This doesn't look correct: BOM should be skipped only at the very
> beginning of the file. And the braces could be skipped.
> 
> >      while (buf && *buf) {
> >          if (sscanf(buf, "[%15[0-9A-Za-z+ ]]%c", section, &c) == 2)
> > {
> >              buf += strcspn(buf, "\n");
> 
> 

Oh, whoops, I missed that ass_split gets called for a number of things.
This belongs at the beginning of ff_ass_split() I believe?

In the sample I ran into this with, there's a BOM at the beginning of
the mkv private data for the track.

I'll remove the braces...
Nicolas George April 6, 2020, 6:32 p.m. UTC | #3
John Stebbins (12020-04-06):
> Oh, whoops, I missed that ass_split gets called for a number of things.
> This belongs at the beginning of ff_ass_split() I believe?

I think so too.

> In the sample I ran into this with, there's a BOM at the beginning of
> the mkv private data for the track.

I think the private data is supposed to be the beginning of the file.
Also, I think the BOM should not be there when embedding, but we can
support slightly invalid files.

Regards,
diff mbox series

Patch

diff --git a/libavcodec/ass_split.c b/libavcodec/ass_split.c
index 67da7c6d84..94c32667af 100644
--- a/libavcodec/ass_split.c
+++ b/libavcodec/ass_split.c
@@ -354,6 +354,9 @@  static int ass_split(ASSSplitContext *ctx, const char *buf)
     if (ctx->current_section >= 0)
         buf = ass_split_section(ctx, buf);
 
+    if(!memcmp(buf, "\xef\xbb\xbf", 3)) { // Skip UTF-8 BOM header
+        buf += 3;
+    }
     while (buf && *buf) {
         if (sscanf(buf, "[%15[0-9A-Za-z+ ]]%c", section, &c) == 2) {
             buf += strcspn(buf, "\n");