diff mbox

[FFmpeg-devel] avcodec/htmlsubtitles: Replace very slow redundant sscanf() calls by cleaner and faster code

Message ID 20170611155845.27939-1-michael@niedermayer.cc
State Accepted
Commit 4132218b87cd6fb13abd162e3037ef4563286baa
Headers show

Commit Message

Michael Niedermayer June 11, 2017, 3:58 p.m. UTC
This reduces the worst case from O(n²) to O(n) time

Fixes Timeout
Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/htmlsubtitles.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Comments

wm4 June 12, 2017, 9:35 a.m. UTC | #1
On Sun, 11 Jun 2017 17:58:45 +0200
Michael Niedermayer <michael@niedermayer.cc> wrote:

> This reduces the worst case from O(n²) to O(n) time
> 
> Fixes Timeout
> Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328
> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/htmlsubtitles.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> index 16295daa0c..70311c66d5 100644
> --- a/libavcodec/htmlsubtitles.c
> +++ b/libavcodec/htmlsubtitles.c
> @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
>      char *param, buffer[128], tmp[128];
>      int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0;
>      SrtStack stack[16];
> +    int closing_brace_missing = 0;
>  
>      stack[0].tag[0] = 0;
>      strcpy(stack[0].param[PARAM_SIZE],  "{\\fs}");
> @@ -83,11 +84,20 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
>                          and all microdvd like styles such as {Y:xxx} */
>              len = 0;
>              an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
> -            if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) ||
> -                (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) {
> -                in += len - 1;
> -            } else
> -                av_bprint_chars(dst, *in, 1);
> +
> +            if (!closing_brace_missing) {
> +                if (   (an != 1 && in[1] == '\\')
> +                    || (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == ':')) {
> +                    char *bracep = strchr(in+2, '}');
> +                    if (bracep) {
> +                        in = bracep;
> +                        break;
> +                    } else
> +                        closing_brace_missing = 1;
> +                }
> +            }
> +
> +            av_bprint_chars(dst, *in, 1);
>              break;
>          case '<':
>              tag_close = in[1] == '/';

IMO better than before - now anyone can understand this code, and it's
faster. I'm not maintainer of this though.

Would it be possible to move this switch case to a separate function?
ff_htmlmarkup_to_ass is a bit too big.

Btw. as far as ASS tag stripping is concerned, this seems to ignore
drawings, but maybe they're typically not used in SRT.
Michael Niedermayer June 12, 2017, 10:02 p.m. UTC | #2
On Mon, Jun 12, 2017 at 11:35:14AM +0200, wm4 wrote:
> On Sun, 11 Jun 2017 17:58:45 +0200
> Michael Niedermayer <michael@niedermayer.cc> wrote:
> 
> > This reduces the worst case from O(n²) to O(n) time
> > 
> > Fixes Timeout
> > Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328
> > 
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  libavcodec/htmlsubtitles.c | 20 +++++++++++++++-----
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> > index 16295daa0c..70311c66d5 100644
> > --- a/libavcodec/htmlsubtitles.c
> > +++ b/libavcodec/htmlsubtitles.c
> > @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
> >      char *param, buffer[128], tmp[128];
> >      int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0;
> >      SrtStack stack[16];
> > +    int closing_brace_missing = 0;
> >  
> >      stack[0].tag[0] = 0;
> >      strcpy(stack[0].param[PARAM_SIZE],  "{\\fs}");
> > @@ -83,11 +84,20 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
> >                          and all microdvd like styles such as {Y:xxx} */
> >              len = 0;
> >              an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
> > -            if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) ||
> > -                (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) {
> > -                in += len - 1;
> > -            } else
> > -                av_bprint_chars(dst, *in, 1);
> > +
> > +            if (!closing_brace_missing) {
> > +                if (   (an != 1 && in[1] == '\\')
> > +                    || (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == ':')) {
> > +                    char *bracep = strchr(in+2, '}');
> > +                    if (bracep) {
> > +                        in = bracep;
> > +                        break;
> > +                    } else
> > +                        closing_brace_missing = 1;
> > +                }
> > +            }
> > +
> > +            av_bprint_chars(dst, *in, 1);
> >              break;
> >          case '<':
> >              tag_close = in[1] == '/';
> 
> IMO better than before - now anyone can understand this code, and it's
> faster. I'm not maintainer of this though.
> 

> Would it be possible to move this switch case to a separate function?
> ff_htmlmarkup_to_ass is a bit too big.

patch posted which moves the code on top of this patch

about the "an +=" stuff, i tried to find some description of the
subtitle format that trigers it but i had no luck. So i tried not to
change its behavior  ...


[...]
diff mbox

Patch

diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
index 16295daa0c..70311c66d5 100644
--- a/libavcodec/htmlsubtitles.c
+++ b/libavcodec/htmlsubtitles.c
@@ -56,6 +56,7 @@  int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
     char *param, buffer[128], tmp[128];
     int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0;
     SrtStack stack[16];
+    int closing_brace_missing = 0;
 
     stack[0].tag[0] = 0;
     strcpy(stack[0].param[PARAM_SIZE],  "{\\fs}");
@@ -83,11 +84,20 @@  int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
                         and all microdvd like styles such as {Y:xxx} */
             len = 0;
             an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
-            if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) ||
-                (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) {
-                in += len - 1;
-            } else
-                av_bprint_chars(dst, *in, 1);
+
+            if (!closing_brace_missing) {
+                if (   (an != 1 && in[1] == '\\')
+                    || (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == ':')) {
+                    char *bracep = strchr(in+2, '}');
+                    if (bracep) {
+                        in = bracep;
+                        break;
+                    } else
+                        closing_brace_missing = 1;
+                }
+            }
+
+            av_bprint_chars(dst, *in, 1);
             break;
         case '<':
             tag_close = in[1] == '/';