Message ID | 20170608215356.23864-1-michael@niedermayer.cc |
---|---|
State | New |
Headers | show |
On Thu, 8 Jun 2017 23:53:55 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote: > Fixes Timeout > Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328 > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > libavcodec/htmlsubtitles.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > index 16295daa0c..ba4f269b3f 100644 > --- a/libavcodec/htmlsubtitles.c > +++ b/libavcodec/htmlsubtitles.c > @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > char *param, buffer[128], tmp[128]; > int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0; > SrtStack stack[16]; > + const char *next_closep = NULL; > > stack[0].tag[0] = 0; > strcpy(stack[0].param[PARAM_SIZE], "{\\fs}"); > @@ -83,8 +84,15 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > and all microdvd like styles such as {Y:xxx} */ > len = 0; > an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0; > - if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > - (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > + > + if(!next_closep || next_closep <= in) { > + next_closep = strchr(in+1, '}'); > + if (!next_closep) > + next_closep = in + strlen(in); > + } > + > + if (*next_closep == '}' && (an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > + *next_closep == '}' && (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > in += len - 1; > } else > av_bprint_chars(dst, *in, 1); I'm not convinced that bad performance with an obscure fuzzed sample is worth the complexity increase here. I'd rather ask, why the heck is it using sscanf in the first place? The existing code is certainly unreadable already. (Could you tell what it does without staring at the scanf manpage for a while? And then guarantee correctness/performance/security?)
On Fri, Jun 09, 2017 at 04:32:41PM +0200, wm4 wrote: > On Thu, 8 Jun 2017 23:53:55 +0200 > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > Fixes Timeout > > Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328 > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > --- > > libavcodec/htmlsubtitles.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > index 16295daa0c..ba4f269b3f 100644 > > --- a/libavcodec/htmlsubtitles.c > > +++ b/libavcodec/htmlsubtitles.c > > @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > char *param, buffer[128], tmp[128]; > > int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0; > > SrtStack stack[16]; > > + const char *next_closep = NULL; > > > > stack[0].tag[0] = 0; > > strcpy(stack[0].param[PARAM_SIZE], "{\\fs}"); > > @@ -83,8 +84,15 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > and all microdvd like styles such as {Y:xxx} */ > > len = 0; > > an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0; > > - if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > > - (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > > + > > + if(!next_closep || next_closep <= in) { > > + next_closep = strchr(in+1, '}'); > > + if (!next_closep) > > + next_closep = in + strlen(in); > > + } > > + > > + if (*next_closep == '}' && (an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > > + *next_closep == '}' && (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > > in += len - 1; > > } else > > av_bprint_chars(dst, *in, 1); > > I'm not convinced that bad performance with an obscure fuzzed sample is > worth the complexity increase here. If we want to make ff_htmlmarkup_to_ass() not go into near infinite loops with crafted data then this or a equivalent change is likely needed. > > I'd rather ask, why the heck is it using sscanf in the first place? I suspect the use of scanf is fewer lines of code than the alternative but iam not the author of this, i dont know for sure why it was written as it was. > The existing code is certainly unreadable already. (Could you tell what > it does without staring at the scanf manpage for a while? And then > guarantee correctness/performance/security?) I know the scanf syntax well enough to not need the manpage much, but i was toying around with the idea of replacing the scanf already. I can do that if people want but it will be more code than the scanf based one. Also the next_closep based code is needed as it is even without scanf(). What that code does is prevent searching the closing '}' repeatly for each byte which is very slow if its done for nearly every single byte in a long string. [...]
On Fri, 9 Jun 2017 19:39:36 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote: > On Fri, Jun 09, 2017 at 04:32:41PM +0200, wm4 wrote: > > On Thu, 8 Jun 2017 23:53:55 +0200 > > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > Fixes Timeout > > > Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328 > > > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > --- > > > libavcodec/htmlsubtitles.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > > index 16295daa0c..ba4f269b3f 100644 > > > --- a/libavcodec/htmlsubtitles.c > > > +++ b/libavcodec/htmlsubtitles.c > > > @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > char *param, buffer[128], tmp[128]; > > > int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0; > > > SrtStack stack[16]; > > > + const char *next_closep = NULL; > > > > > > stack[0].tag[0] = 0; > > > strcpy(stack[0].param[PARAM_SIZE], "{\\fs}"); > > > @@ -83,8 +84,15 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > and all microdvd like styles such as {Y:xxx} */ > > > len = 0; > > > an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0; > > > - if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > > > - (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > > > + > > > + if(!next_closep || next_closep <= in) { > > > + next_closep = strchr(in+1, '}'); > > > + if (!next_closep) > > > + next_closep = in + strlen(in); > > > + } > > > + > > > + if (*next_closep == '}' && (an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > > > + *next_closep == '}' && (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > > > in += len - 1; > > > } else > > > av_bprint_chars(dst, *in, 1); > > > > I'm not convinced that bad performance with an obscure fuzzed sample is > > worth the complexity increase here. > > If we want to make ff_htmlmarkup_to_ass() not go into near infinite > loops with crafted data then this or a equivalent change is likely > needed. Maybe if someone starts abusing this case. > > > > > I'd rather ask, why the heck is it using sscanf in the first place? > > I suspect the use of scanf is fewer lines of code than the alternative > but iam not the author of this, i dont know for sure why it was written > as it was. I'd probably prefer explicit code here (as long as it's not typical "tricky C string processing" style), but it's hard to tell.
On Sat, Jun 10, 2017 at 01:36:29PM +0200, wm4 wrote: > On Fri, 9 Jun 2017 19:39:36 +0200 > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > On Fri, Jun 09, 2017 at 04:32:41PM +0200, wm4 wrote: > > > On Thu, 8 Jun 2017 23:53:55 +0200 > > > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > > > Fixes Timeout > > > > Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328 > > > > > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > > --- > > > > libavcodec/htmlsubtitles.c | 12 ++++++++++-- > > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > > > index 16295daa0c..ba4f269b3f 100644 > > > > --- a/libavcodec/htmlsubtitles.c > > > > +++ b/libavcodec/htmlsubtitles.c > > > > @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > > char *param, buffer[128], tmp[128]; > > > > int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0; > > > > SrtStack stack[16]; > > > > + const char *next_closep = NULL; > > > > > > > > stack[0].tag[0] = 0; > > > > strcpy(stack[0].param[PARAM_SIZE], "{\\fs}"); > > > > @@ -83,8 +84,15 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > > and all microdvd like styles such as {Y:xxx} */ > > > > len = 0; > > > > an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0; > > > > - if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > > > > - (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > > > > + > > > > + if(!next_closep || next_closep <= in) { > > > > + next_closep = strchr(in+1, '}'); > > > > + if (!next_closep) > > > > + next_closep = in + strlen(in); > > > > + } > > > > + > > > > + if (*next_closep == '}' && (an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || > > > > + *next_closep == '}' && (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { > > > > in += len - 1; > > > > } else > > > > av_bprint_chars(dst, *in, 1); > > > > > > I'm not convinced that bad performance with an obscure fuzzed sample is > > > worth the complexity increase here. > > > > If we want to make ff_htmlmarkup_to_ass() not go into near infinite > > loops with crafted data then this or a equivalent change is likely > > needed. > > Maybe if someone starts abusing this case. If iam not mistaken, all the issues found by oss fuzz are made public 90 days or so after they are found. Anyone who wants to exploit a timeout/infinite loop just needs to search for timeout and ffmpeg and use the tescase ... So we are guranteed that anyone who wants to exploit this has the ability to do so as long as they can use the search mask and are able to remux the data into whatever format they need. And i belive this publication of issues is the right thing to do. Do you have a better idea than the next_closep code to fix this ? If not, do you agree that we push this fix ? > > > > > > > > > I'd rather ask, why the heck is it using sscanf in the first place? > > > > I suspect the use of scanf is fewer lines of code than the alternative > > but iam not the author of this, i dont know for sure why it was written > > as it was. > > I'd probably prefer explicit code here (as long as it's not typical > "tricky C string processing" style), but it's hard to tell. its a few calls to C standard lib string functions and some straight in[1] == this && in[2] == that stuff i think [...]
Hi, On Fri, Jun 9, 2017 at 1:39 PM, Michael Niedermayer <michael@niedermayer.cc> wrote: > If we want to make ff_htmlmarkup_to_ass() not go into near infinite > loops What's a near-infinite loop? Is it non-infinite loop, i.e. just a loop? Or is there special something special about it? :-). "Fix [..] near-infinite loop" just means "fix loop", and I think you meant "optimize [..] loop", right? Ronald
On Sat, 2017-06-10 at 16:18 +0200, Michael Niedermayer wrote: > So we are guranteed that anyone who wants to exploit this has the > ability to do so as long as they can use the search mask and are able > to remux the data into whatever format they need. > And i belive this publication of issues is the right thing to do. It looks like the current code has quadratic time requirements. Is everything else in FFmpeg actually guaranteed to not need quadratic time? Can anything really rely on FFmpeg decoding of hostile data not taking a long time? To the degree that it would be reasonable to have a system using FFmpeg on untrusted data without timeouts or capacity to kill the process? To me being slow on malicious data doesn't seem like a real security issue. > Do you have a better idea than the next_closep code to fix this ? > If not, do you agree that we push this fix ? Since the slow behavior seems unlikely to occur except with intentionally malicious or completely corrupted data, I'm not sure it's worth actually fixing it. But I think it could be done somewhat more cleanly as follows: Instead of using scanf to find matches for "{Y:xxx}" or "{\xxx}", where "xxx" is arbitrarily long, match for "{Y:" or "{\", and then do the "skip until next }" as a separate step after you've confirmed a match. Then you can optimize that to avoid quadratic behavior by setting a flag when you find no "}", and not searching again if it's already set. Also, the current code seems buggy, or at least I don't see why you'd want it to behave like it now does. It skips "{\xxx}" tags when the "an" variable is not set to 1. I assume the intent was to only allow the first "\an" tag through. But as implemented now it seems to allow ANY "{\xxx}" tags through after the first \an and before possible second one. I don't think that's intentional?
On Sat, Jun 10, 2017 at 11:14:45AM -0400, Ronald S. Bultje wrote: > Hi, > > On Fri, Jun 9, 2017 at 1:39 PM, Michael Niedermayer <michael@niedermayer.cc> > wrote: > > > If we want to make ff_htmlmarkup_to_ass() not go into near infinite > > loops > > > What's a near-infinite loop? Is it non-infinite loop, i.e. just a loop? Or > is there special something special about it? :-). "Fix [..] near-infinite > loop" just means "fix loop", and I think you meant "optimize [..] loop", > right? Its a loop that takes quadratic time. and should take linear time after this or an alternative fix near infinite was meant in the sense that it is not infinite but in practical terms is behaving simiar to infinite [...]
On Sat, Jun 10, 2017 at 11:31:42PM +0300, Uoti Urpala wrote: > On Sat, 2017-06-10 at 16:18 +0200, Michael Niedermayer wrote: > > So we are guranteed that anyone who wants to exploit this has the > > ability to do so as long as they can use the search mask and are able > > to remux the data into whatever format they need. > > And i belive this publication of issues is the right thing to do. > > It looks like the current code has quadratic time requirements. Is > everything else in FFmpeg actually guaranteed to not need quadratic > time? Can anything really rely on FFmpeg decoding of hostile data not > taking a long time? There are currently 16 open timeout bugs in ffmpeg oss fuzz. Iam interrested in fixing these. I dont know if there are more but thats what ossfuzz found so far. Some of these may be true infinite loops, I admit fully my interrest is in fixing these issues and not in asking how many more there are. Iam interrested even if there are codecs for which it cannot be fixed. Some users only need a subset of codecs > To the degree that it would be reasonable to have a > system using FFmpeg on untrusted data without timeouts or capacity to > kill the process? To me being slow on malicious data doesn't seem like > a real security issue. If possible exterenal limits on time and memory are always a good idea. > > > > Do you have a better idea than the next_closep code to fix this ? > > If not, do you agree that we push this fix ? > > Since the slow behavior seems unlikely to occur except with > intentionally malicious or completely corrupted data, I'm not sure it's > worth actually fixing it. But I think it could be done somewhat more > cleanly as follows: > Instead of using scanf to find matches for "{Y:xxx}" or "{\xxx}", where > "xxx" is arbitrarily long, match for "{Y:" or "{\", and then do the > "skip until next }" as a separate step after you've confirmed a match. > Then you can optimize that to avoid quadratic behavior by setting a > flag when you find no "}", and not searching again if it's already set. ill send a patch that does that. > > Also, the current code seems buggy, or at least I don't see why you'd > want it to behave like it now does. It skips "{\xxx}" tags when the > "an" variable is not set to 1. I assume the intent was to only allow > the first "\an" tag through. But as implemented now it seems to allow > ANY "{\xxx}" tags through after the first \an and before possible > second one. I don't think that's intentional? iam not the author of the original code [...]
diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c index 16295daa0c..ba4f269b3f 100644 --- a/libavcodec/htmlsubtitles.c +++ b/libavcodec/htmlsubtitles.c @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) char *param, buffer[128], tmp[128]; int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0; SrtStack stack[16]; + const char *next_closep = NULL; stack[0].tag[0] = 0; strcpy(stack[0].param[PARAM_SIZE], "{\\fs}"); @@ -83,8 +84,15 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) and all microdvd like styles such as {Y:xxx} */ len = 0; an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0; - if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || - (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { + + if(!next_closep || next_closep <= in) { + next_closep = strchr(in+1, '}'); + if (!next_closep) + next_closep = in + strlen(in); + } + + if (*next_closep == '}' && (an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 && len > 0)) || + *next_closep == '}' && (len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 0 && len > 0)) { in += len - 1; } else av_bprint_chars(dst, *in, 1);
Fixes Timeout Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- libavcodec/htmlsubtitles.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)