Message ID | 20180328030705.22815-1-philipl@overt.org |
---|---|
State | Superseded |
Headers | show |
On Tue, 27 Mar 2018 20:07:05 -0700 Philip Langdale <philipl@overt.org> wrote: > See the earlier fix for movtextdec for details. The equivalent bug is > present on the encoder side as well. > > We need to track the text length in 'characters' (which seems to really > mean codepoints) to ensure that styles are applied across the correct > ranges. > > Signed-off-by: Philip Langdale <philipl@overt.org> > --- > libavcodec/movtextenc.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c > index d795e317c3..fd0743f752 100644 > --- a/libavcodec/movtextenc.c > +++ b/libavcodec/movtextenc.c > @@ -304,11 +304,33 @@ static void mov_text_color_cb(void *priv, unsigned int color, unsigned int color > */ > } > > +static uint16_t utf8_strlen(const char *text, int len) > +{ > + uint16_t i = 0, ret = 0; > + while (i < len) { > + char c = text[i]; > + if (c >= 0) > + i += 1; > + else if ((c & 0xE0) == 0xC0) > + i += 2; > + else if ((c & 0xF0) == 0xE0) > + i += 3; > + else if ((c & 0xF8) == 0xF0) > + i += 4; > + else > + return 0; > + ret++; > + } > + return ret; > +} > + > static void mov_text_text_cb(void *priv, const char *text, int len) > { > + uint16_t utf8_len = utf8_strlen(text, len); > MovTextContext *s = priv; > av_bprint_append_data(&s->buffer, text, len); > - s->text_pos += len; > + // If it's not utf-8, just use the byte length > + s->text_pos += utf8_len ? utf8_len : len; > } > > static void mov_text_new_line_cb(void *priv, int forced) LGTM. Maybe a later patch that adds the UTF-8 length as lavu function (or macro for SPEEED?) eould be nice, since we seem to need that often.
I think first check (ascii case) should be explicit bit-test, or (if there is some reason to use sign-bit approach) at least "c" should be signed char instead of simply char. 2018-03-28 6:07 GMT+03:00 Philip Langdale <philipl@overt.org>: > See the earlier fix for movtextdec for details. The equivalent bug is > present on the encoder side as well. > > We need to track the text length in 'characters' (which seems to really > mean codepoints) to ensure that styles are applied across the correct > ranges. > > Signed-off-by: Philip Langdale <philipl@overt.org> > --- > libavcodec/movtextenc.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c > index d795e317c3..fd0743f752 100644 > --- a/libavcodec/movtextenc.c > +++ b/libavcodec/movtextenc.c > @@ -304,11 +304,33 @@ static void mov_text_color_cb(void *priv, unsigned > int color, unsigned int color > */ > } > > +static uint16_t utf8_strlen(const char *text, int len) > +{ > + uint16_t i = 0, ret = 0; > + while (i < len) { > + char c = text[i]; > + if (c >= 0) > + i += 1; > + else if ((c & 0xE0) == 0xC0) > + i += 2; > + else if ((c & 0xF0) == 0xE0) > + i += 3; > + else if ((c & 0xF8) == 0xF0) > + i += 4; > + else > + return 0; > + ret++; > + } > + return ret; > +} > + > static void mov_text_text_cb(void *priv, const char *text, int len) > { > + uint16_t utf8_len = utf8_strlen(text, len); > MovTextContext *s = priv; > av_bprint_append_data(&s->buffer, text, len); > - s->text_pos += len; > + // If it's not utf-8, just use the byte length > + s->text_pos += utf8_len ? utf8_len : len; > } > > static void mov_text_new_line_cb(void *priv, int forced) > -- > 2.14.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
On Wed, 28 Mar 2018 14:02:04 +0300 Andrey Turkin <andrey.turkin@gmail.com> wrote: > I think first check (ascii case) should be explicit bit-test, or (if there > is some reason to use sign-bit approach) at least "c" should be signed char > instead of simply char. Good point actually. It relies on char signedness, which is not guaranteed.
On Wed, 28 Mar 2018 13:14:57 +0200 wm4 <nfxjfg@googlemail.com> wrote: > On Wed, 28 Mar 2018 14:02:04 +0300 > Andrey Turkin <andrey.turkin@gmail.com> wrote: > > > I think first check (ascii case) should be explicit bit-test, or > > (if there is some reason to use sign-bit approach) at least "c" > > should be signed char instead of simply char. > > Good point actually. It relies on char signedness, which is not > guaranteed. Fair point. I'm updating it. --phil
On Wed, 28 Mar 2018 12:01:47 +0200 wm4 <nfxjfg@googlemail.com> wrote: > > LGTM. > > Maybe a later patch that adds the UTF-8 length as lavu function (or > macro for SPEEED?) eould be nice, since we seem to need that often. I can do that. Although, I'm not sure the SPEEED version is readable enough to justify using it. :-) http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html --phil
diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c index d795e317c3..fd0743f752 100644 --- a/libavcodec/movtextenc.c +++ b/libavcodec/movtextenc.c @@ -304,11 +304,33 @@ static void mov_text_color_cb(void *priv, unsigned int color, unsigned int color */ } +static uint16_t utf8_strlen(const char *text, int len) +{ + uint16_t i = 0, ret = 0; + while (i < len) { + char c = text[i]; + if (c >= 0) + i += 1; + else if ((c & 0xE0) == 0xC0) + i += 2; + else if ((c & 0xF0) == 0xE0) + i += 3; + else if ((c & 0xF8) == 0xF0) + i += 4; + else + return 0; + ret++; + } + return ret; +} + static void mov_text_text_cb(void *priv, const char *text, int len) { + uint16_t utf8_len = utf8_strlen(text, len); MovTextContext *s = priv; av_bprint_append_data(&s->buffer, text, len); - s->text_pos += len; + // If it's not utf-8, just use the byte length + s->text_pos += utf8_len ? utf8_len : len; } static void mov_text_new_line_cb(void *priv, int forced)
See the earlier fix for movtextdec for details. The equivalent bug is present on the encoder side as well. We need to track the text length in 'characters' (which seems to really mean codepoints) to ensure that styles are applied across the correct ranges. Signed-off-by: Philip Langdale <philipl@overt.org> --- libavcodec/movtextenc.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)