Message ID | 20170701220942.20818-1-michael@niedermayer.cc |
---|---|
State | New |
Headers | show |
On Sun, 2 Jul 2017 00:09:42 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote: > This reduces the number of strstr() calls per byte > This diasalows empty tags like '< >' as well as '<' in tags like '<ab<cd<<ef>' > > Fixes timeout > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 > > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > libavcodec/htmlsubtitles.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > index be5c9316ca..67abc94085 100644 > --- a/libavcodec/htmlsubtitles.c > +++ b/libavcodec/htmlsubtitles.c > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > case '<': > tag_close = in[1] == '/'; > len = 0; > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) { > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= 1 && len > 0) { > const char *tagname = buffer; > while (*tagname == ' ') > tagname++; > if ((param = strchr(tagname, ' '))) > *param++ = 0; > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && *tagname) || > ( tag_close && sptr > 0 && !strcmp(stack[sptr-1].tag, tagname))) { > int i, j, unknown = 0; > in += len + tag_close; Invalid syntax is not unusual in SRT files. Are you sure this doesn't make the output worse in files that do not use the syntax correctly?
On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote: > On Sun, 2 Jul 2017 00:09:42 +0200 > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > This reduces the number of strstr() calls per byte > > This diasalows empty tags like '< >' as well as '<' in tags like '<ab<cd<<ef>' > > > > Fixes timeout > > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 > > > > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > --- > > libavcodec/htmlsubtitles.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > index be5c9316ca..67abc94085 100644 > > --- a/libavcodec/htmlsubtitles.c > > +++ b/libavcodec/htmlsubtitles.c > > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > case '<': > > tag_close = in[1] == '/'; > > len = 0; > > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) { > > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= 1 && len > 0) { > > const char *tagname = buffer; > > while (*tagname == ' ') > > tagname++; > > if ((param = strchr(tagname, ' '))) > > *param++ = 0; > > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || > > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && *tagname) || > > ( tag_close && sptr > 0 && !strcmp(stack[sptr-1].tag, tagname))) { > > int i, j, unknown = 0; > > in += len + tag_close; > > Invalid syntax is not unusual in SRT files. Are you sure this doesn't > make the output worse in files that do not use the syntax correctly? I do not know if this makes the output worse for files with invalid syntax I also do not know if this makes the output better for files with invalid syntax i dont have a large library of (real world invalid syntax) srt files whith which to find cases where it makes a difference. You seem to know alot about srt, maybe you can pick some srt files and add fate tests, so we have better coverage of odd and nasty cases about this patch specifically, what do you suggest ? should i try to fix this while maintaining existing behavior for invalid syntax exactly? (i dont know how that code would look) [...]
On 7/2/17, Michael Niedermayer <michael@niedermayer.cc> wrote: > On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote: >> On Sun, 2 Jul 2017 00:09:42 +0200 >> Michael Niedermayer <michael@niedermayer.cc> wrote: >> >> > This reduces the number of strstr() calls per byte >> > This diasalows empty tags like '< >' as well as '<' in tags like >> > '<ab<cd<<ef>' >> > >> > Fixes timeout >> > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 >> > >> > Found-by: continuous fuzzing process >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> >> > --- >> > libavcodec/htmlsubtitles.c | 4 ++-- >> > 1 file changed, 2 insertions(+), 2 deletions(-) >> > >> > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c >> > index be5c9316ca..67abc94085 100644 >> > --- a/libavcodec/htmlsubtitles.c >> > +++ b/libavcodec/htmlsubtitles.c >> > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint >> > *dst, const char *in) >> > case '<': >> > tag_close = in[1] == '/'; >> > len = 0; >> > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= >> > 1 && len > 0) { >> > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= >> > 1 && len > 0) { >> > const char *tagname = buffer; >> > while (*tagname == ' ') >> > tagname++; >> > if ((param = strchr(tagname, ' '))) >> > *param++ = 0; >> > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || >> > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && >> > *tagname) || >> > ( tag_close && sptr > 0 && >> > !strcmp(stack[sptr-1].tag, tagname))) { >> > int i, j, unknown = 0; >> > in += len + tag_close; >> >> Invalid syntax is not unusual in SRT files. Are you sure this doesn't >> make the output worse in files that do not use the syntax correctly? > > I do not know if this makes the output worse for files with invalid syntax > I also do not know if this makes the output better for files with invalid > syntax > > i dont have a large library of (real world invalid syntax) srt files > whith which to find cases where it makes a difference. > > You seem to know alot about srt, maybe you can pick some srt files > and add fate tests, so we have better coverage of odd and nasty cases > > about this patch specifically, what do you suggest ? > should i try to fix this while maintaining existing behavior for > invalid syntax exactly? (i dont know how that code would look) WHat's so wrong with code that you want it changed in bad way?
On Sun, 2 Jul 2017 13:43:57 +0200 Michael Niedermayer <michael@niedermayer.cc> wrote: > On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote: > > On Sun, 2 Jul 2017 00:09:42 +0200 > > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > This reduces the number of strstr() calls per byte > > > This diasalows empty tags like '< >' as well as '<' in tags like '<ab<cd<<ef>' > > > > > > Fixes timeout > > > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 > > > > > > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > --- > > > libavcodec/htmlsubtitles.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > > index be5c9316ca..67abc94085 100644 > > > --- a/libavcodec/htmlsubtitles.c > > > +++ b/libavcodec/htmlsubtitles.c > > > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > case '<': > > > tag_close = in[1] == '/'; > > > len = 0; > > > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) { > > > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= 1 && len > 0) { > > > const char *tagname = buffer; > > > while (*tagname == ' ') > > > tagname++; > > > if ((param = strchr(tagname, ' '))) > > > *param++ = 0; > > > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || > > > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && *tagname) || > > > ( tag_close && sptr > 0 && !strcmp(stack[sptr-1].tag, tagname))) { > > > int i, j, unknown = 0; > > > in += len + tag_close; > > > > Invalid syntax is not unusual in SRT files. Are you sure this doesn't > > make the output worse in files that do not use the syntax correctly? > > I do not know if this makes the output worse for files with invalid syntax > I also do not know if this makes the output better for files with invalid > syntax > > i dont have a large library of (real world invalid syntax) srt files > whith which to find cases where it makes a difference. > > You seem to know alot about srt, maybe you can pick some srt files > and add fate tests, so we have better coverage of odd and nasty cases > > about this patch specifically, what do you suggest ? > should i try to fix this while maintaining existing behavior for > invalid syntax exactly? (i dont know how that code would look) > > [...] I don't know either, but due to the messy syntax situation, I'd rather not change the code in unpredictable ways just to get faster fuzzing.
On Mon, Jul 03, 2017 at 11:44:26AM +0200, Paul B Mahol wrote: > On 7/2/17, Michael Niedermayer <michael@niedermayer.cc> wrote: > > On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote: > >> On Sun, 2 Jul 2017 00:09:42 +0200 > >> Michael Niedermayer <michael@niedermayer.cc> wrote: > >> > >> > This reduces the number of strstr() calls per byte > >> > This diasalows empty tags like '< >' as well as '<' in tags like > >> > '<ab<cd<<ef>' > >> > > >> > Fixes timeout > >> > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 > >> > > >> > Found-by: continuous fuzzing process > >> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > >> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > >> > --- > >> > libavcodec/htmlsubtitles.c | 4 ++-- > >> > 1 file changed, 2 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > >> > index be5c9316ca..67abc94085 100644 > >> > --- a/libavcodec/htmlsubtitles.c > >> > +++ b/libavcodec/htmlsubtitles.c > >> > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint > >> > *dst, const char *in) > >> > case '<': > >> > tag_close = in[1] == '/'; > >> > len = 0; > >> > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= > >> > 1 && len > 0) { > >> > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= > >> > 1 && len > 0) { > >> > const char *tagname = buffer; > >> > while (*tagname == ' ') > >> > tagname++; > >> > if ((param = strchr(tagname, ' '))) > >> > *param++ = 0; > >> > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || > >> > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && > >> > *tagname) || > >> > ( tag_close && sptr > 0 && > >> > !strcmp(stack[sptr-1].tag, tagname))) { > >> > int i, j, unknown = 0; > >> > in += len + tag_close; > >> > >> Invalid syntax is not unusual in SRT files. Are you sure this doesn't > >> make the output worse in files that do not use the syntax correctly? > > > > I do not know if this makes the output worse for files with invalid syntax > > I also do not know if this makes the output better for files with invalid > > syntax > > > > i dont have a large library of (real world invalid syntax) srt files > > whith which to find cases where it makes a difference. > > > > You seem to know alot about srt, maybe you can pick some srt files > > and add fate tests, so we have better coverage of odd and nasty cases > > > > about this patch specifically, what do you suggest ? > > should i try to fix this while maintaining existing behavior for > > invalid syntax exactly? (i dont know how that code would look) > > WHat's so wrong with code that you want it changed in bad way? IIRC If you have tags like <<<<<<<<<<<<<<<<<<<<<<a> you end up checking <<<<<<<<<<<<<<<<<<<<<<a> <<<<<<<<<<<<<<<<<<<<<a> <<<<<<<<<<<<<<<<<<<<a> ... then form the closing tag </<<<<<<<<<<<<<<<<<<<<<a> </<<<<<<<<<<<<<<<<<<<<a> </<<<<<<<<<<<<<<<<<<<a> ... and then search the whole input for each closing tag On top of this there are special cases and limitations like fixed size buffers and a fixed size stack, so it will not always do all the steps above. But in practice this is slow. The same issue should happen on real files if they contain syntax issues and conatin a lot of unescaped '<' [...]
On Mon, Jul 03, 2017 at 12:40:13PM +0200, wm4 wrote: > On Sun, 2 Jul 2017 13:43:57 +0200 > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote: > > > On Sun, 2 Jul 2017 00:09:42 +0200 > > > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > > > This reduces the number of strstr() calls per byte > > > > This diasalows empty tags like '< >' as well as '<' in tags like '<ab<cd<<ef>' > > > > > > > > Fixes timeout > > > > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 > > > > > > > > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > > --- > > > > libavcodec/htmlsubtitles.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > > > index be5c9316ca..67abc94085 100644 > > > > --- a/libavcodec/htmlsubtitles.c > > > > +++ b/libavcodec/htmlsubtitles.c > > > > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > > case '<': > > > > tag_close = in[1] == '/'; > > > > len = 0; > > > > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) { > > > > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= 1 && len > 0) { > > > > const char *tagname = buffer; > > > > while (*tagname == ' ') > > > > tagname++; > > > > if ((param = strchr(tagname, ' '))) > > > > *param++ = 0; > > > > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || > > > > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && *tagname) || > > > > ( tag_close && sptr > 0 && !strcmp(stack[sptr-1].tag, tagname))) { > > > > int i, j, unknown = 0; > > > > in += len + tag_close; > > > > > > Invalid syntax is not unusual in SRT files. Are you sure this doesn't > > > make the output worse in files that do not use the syntax correctly? > > > > I do not know if this makes the output worse for files with invalid syntax > > I also do not know if this makes the output better for files with invalid > > syntax > > > > i dont have a large library of (real world invalid syntax) srt files > > whith which to find cases where it makes a difference. > > > > You seem to know alot about srt, maybe you can pick some srt files > > and add fate tests, so we have better coverage of odd and nasty cases > > > > about this patch specifically, what do you suggest ? > > should i try to fix this while maintaining existing behavior for > > invalid syntax exactly? (i dont know how that code would look) > > > > [...] > > I don't know either, but due to the messy syntax situation, I'd rather > not change the code in unpredictable ways just to get faster fuzzing. Its primarly intended to fix the potential denial of service. That is moderate sized files with short duration, few streams, low resolution, low channel count should not consume huge amounts of cpu time. [...]
On Mon, Jul 03, 2017 at 05:50:08PM +0200, Michael Niedermayer wrote: > On Mon, Jul 03, 2017 at 12:40:13PM +0200, wm4 wrote: > > On Sun, 2 Jul 2017 13:43:57 +0200 > > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > On Sun, Jul 02, 2017 at 01:14:00PM +0200, wm4 wrote: > > > > On Sun, 2 Jul 2017 00:09:42 +0200 > > > > Michael Niedermayer <michael@niedermayer.cc> wrote: > > > > > > > > > This reduces the number of strstr() calls per byte > > > > > This diasalows empty tags like '< >' as well as '<' in tags like '<ab<cd<<ef>' > > > > > > > > > > Fixes timeout > > > > > Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 > > > > > > > > > > Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > > > --- > > > > > libavcodec/htmlsubtitles.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c > > > > > index be5c9316ca..67abc94085 100644 > > > > > --- a/libavcodec/htmlsubtitles.c > > > > > +++ b/libavcodec/htmlsubtitles.c > > > > > @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) > > > > > case '<': > > > > > tag_close = in[1] == '/'; > > > > > len = 0; > > > > > - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) { > > > > > + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= 1 && len > 0) { > > > > > const char *tagname = buffer; > > > > > while (*tagname == ' ') > > > > > tagname++; > > > > > if ((param = strchr(tagname, ' '))) > > > > > *param++ = 0; > > > > > - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || > > > > > + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && *tagname) || > > > > > ( tag_close && sptr > 0 && !strcmp(stack[sptr-1].tag, tagname))) { > > > > > int i, j, unknown = 0; > > > > > in += len + tag_close; > > > > > > > > Invalid syntax is not unusual in SRT files. Are you sure this doesn't > > > > make the output worse in files that do not use the syntax correctly? > > > > > > I do not know if this makes the output worse for files with invalid syntax > > > I also do not know if this makes the output better for files with invalid > > > syntax > > > > > > i dont have a large library of (real world invalid syntax) srt files > > > whith which to find cases where it makes a difference. > > > > > > You seem to know alot about srt, maybe you can pick some srt files > > > and add fate tests, so we have better coverage of odd and nasty cases > > > > > > about this patch specifically, what do you suggest ? > > > should i try to fix this while maintaining existing behavior for > > > invalid syntax exactly? (i dont know how that code would look) > > > > > > [...] > > > > I don't know either, but due to the messy syntax situation, I'd rather > > not change the code in unpredictable ways just to get faster fuzzing. > > Its primarly intended to fix the potential denial of service. > That is moderate sized files with short duration, few streams, low > resolution, low channel count should not consume huge amounts of > cpu time. 5 days passed with no comments and no suggestions on how to implement this better. does anyone object to the fix as in the patch to be applied ? thx [...]
On Sat, Jul 08, 2017 at 11:07:48PM +0200, Michael Niedermayer wrote: [...] > 5 days passed with no comments and no suggestions on how to implement > this better. > does anyone object to the fix as in the patch to be applied ? misc patterns I found you may want to give a try (note that these are completely broken garbage you may want to ignore): 82 00:04:50,430 --> 00:05:01,030 <<flash gordon <E9>pisode 4 saison 1> www.SeriesSub.com> 57 00:03:38,318 --> 00:03:42,778 <<THE HIGH ROLLERS>> <<Grandes Jogadores>> 2 00:01:10,000 --> 00:01:14,500 >>> RebelSubTeam <<< 306 00:20:31,849 --> 00:20:56,839 <font color="##FFFF00"> <<<<www.egfire.com>>>></font> 37 00:02:37,750 --> 00:02:43,700 <b>~ASUKO MARCH!~ >>:<< translation by: cangii Retiming by: furransu</b> 1214 02:11:00,738 --> 02:11:05,538 <b><<< ÓÍÀÊÑ ÒÈÉÌ < 2015 > UNACS TEAM >>></b> 1 00:00:01,000 --> 00:00:02,308 <<<<<< Antes <<<<<< Ela é hum... 544 00:53:43,780 --> 00:53:45,941 <<That's OK, >> - he calmed himself. 545 00:53:46,216 --> 00:53:49,083 <<lt's not a long way to the hotel, the hotel is within easy reach. 894 00:37:59,693 --> 00:38:01,591 mint asztalt foglaltatni a <<>Le Cirque-ben. I really have a LOT of matches, so these are very common patterns. Feel free to apply your patch if it generally doesn't make these cases worse. Regards,
On Sun, Jul 09, 2017 at 02:28:28PM +0200, Clément Bœsch wrote: [...] > 544 > 00:53:43,780 --> 00:53:45,941 > <<That's OK, >> - he calmed himself. > Side note here: "<<" and ">>" are used for « and ». It's important to not break this case IMO. > > 545 > 00:53:46,216 --> 00:53:49,083 > <<lt's not a long way to the hotel, > the hotel is within easy reach. > Same here, the confusion between 'l' and 'I' makes me think it's the result of an OCR, so the use of these ascii quotation should be pretty common.
On Sun, Jul 09, 2017 at 02:28:28PM +0200, Clément Bœsch wrote: > On Sat, Jul 08, 2017 at 11:07:48PM +0200, Michael Niedermayer wrote: > [...] > > 5 days passed with no comments and no suggestions on how to implement > > this better. > > does anyone object to the fix as in the patch to be applied ? > > misc patterns I found you may want to give a try (note that these are > completely broken garbage you may want to ignore): > > 82 > 00:04:50,430 --> 00:05:01,030 > <<flash gordon > <E9>pisode 4 > saison 1> > www.SeriesSub.com> > > > 57 > 00:03:38,318 --> 00:03:42,778 > <<THE HIGH ROLLERS>> > <<Grandes Jogadores>> > > > 2 > 00:01:10,000 --> 00:01:14,500 > >>> RebelSubTeam <<< > > > 306 > 00:20:31,849 --> 00:20:56,839 > <font color="##FFFF00"> > <<<<www.egfire.com>>>></font> > > > 37 > 00:02:37,750 --> 00:02:43,700 > <b>~ASUKO MARCH!~ > >>:<< > translation by: cangii > Retiming by: furransu</b> > > > 1214 > 02:11:00,738 --> 02:11:05,538 > <b><<< ÓÍÀÊÑ ÒÈÉÌ < 2015 > UNACS TEAM >>></b> > > > 1 > 00:00:01,000 --> 00:00:02,308 > <<<<<< Antes <<<<<< > Ela é hum... > > > 544 > 00:53:43,780 --> 00:53:45,941 > <<That's OK, >> - he calmed himself. > > > 545 > 00:53:46,216 --> 00:53:49,083 > <<lt's not a long way to the hotel, > the hotel is within easy reach. > > > 894 > 00:37:59,693 --> 00:38:01,591 > mint asztalt foglaltatni > a <<>Le Cirque-ben. > > > I really have a LOT of matches, so these are very common patterns. > > Feel free to apply your patch if it generally doesn't make these cases > worse. patch posted that adds this to fate ill upload the srt file (copy and paste of above) the patch seems not to change the output for this i will apply the fate test before the patch here [...]
On Sun, Jul 09, 2017 at 02:28:28PM +0200, Clément Bœsch wrote: > On Sat, Jul 08, 2017 at 11:07:48PM +0200, Michael Niedermayer wrote: > [...] > > 5 days passed with no comments and no suggestions on how to implement > > this better. > > does anyone object to the fix as in the patch to be applied ? > > misc patterns I found you may want to give a try (note that these are > completely broken garbage you may want to ignore): > > 82 > 00:04:50,430 --> 00:05:01,030 > <<flash gordon > <E9>pisode 4 > saison 1> > www.SeriesSub.com> > > > 57 > 00:03:38,318 --> 00:03:42,778 > <<THE HIGH ROLLERS>> > <<Grandes Jogadores>> > > > 2 > 00:01:10,000 --> 00:01:14,500 > >>> RebelSubTeam <<< > > > 306 > 00:20:31,849 --> 00:20:56,839 > <font color="##FFFF00"> > <<<<www.egfire.com>>>></font> > > > 37 > 00:02:37,750 --> 00:02:43,700 > <b>~ASUKO MARCH!~ > >>:<< > translation by: cangii > Retiming by: furransu</b> > > > 1214 > 02:11:00,738 --> 02:11:05,538 > <b><<< ÓÍÀÊÑ ÒÈÉÌ < 2015 > UNACS TEAM >>></b> > > > 1 > 00:00:01,000 --> 00:00:02,308 > <<<<<< Antes <<<<<< > Ela é hum... > > > 544 > 00:53:43,780 --> 00:53:45,941 > <<That's OK, >> - he calmed himself. > > > 545 > 00:53:46,216 --> 00:53:49,083 > <<lt's not a long way to the hotel, > the hotel is within easy reach. > > > 894 > 00:37:59,693 --> 00:38:01,591 > mint asztalt foglaltatni > a <<>Le Cirque-ben. > > > I really have a LOT of matches, so these are very common patterns. > > Feel free to apply your patch if it generally doesn't make these cases > worse. the patch doesnt change the fate test, applied thanks [...]
diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c index be5c9316ca..67abc94085 100644 --- a/libavcodec/htmlsubtitles.c +++ b/libavcodec/htmlsubtitles.c @@ -110,13 +110,13 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in) case '<': tag_close = in[1] == '/'; len = 0; - if (sscanf(in+tag_close+1, "%127[^>]>%n", buffer, &len) >= 1 && len > 0) { + if (sscanf(in+tag_close+1, "%127[^<>]>%n", buffer, &len) >= 1 && len > 0) { const char *tagname = buffer; while (*tagname == ' ') tagname++; if ((param = strchr(tagname, ' '))) *param++ = 0; - if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack)) || + if ((!tag_close && sptr < FF_ARRAY_ELEMS(stack) && *tagname) || ( tag_close && sptr > 0 && !strcmp(stack[sptr-1].tag, tagname))) { int i, j, unknown = 0; in += len + tag_close;
This reduces the number of strstr() calls per byte This diasalows empty tags like '< >' as well as '<' in tags like '<ab<cd<<ef>' Fixes timeout Fixes: 1817/clusterfuzz-testcase-minimized-5104230530547712 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- libavcodec/htmlsubtitles.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)