Message ID | VI1PR01MB132737EACFF8FAB98B44EFF2C09D0@VI1PR01MB1327.eurprd01.prod.exchangelabs.com |
---|---|
State | Superseded |
Headers | show |
On Thu, Dec 15, 2016 at 10:43:15PM +0000, Erik Bråthen Solem wrote: > According to the format specification (3GPP TS 26.245, section 5.2) "storage > lengths are specified as byte-counts, wheras highlighting is specified using > character offsets." This patch replaces byte counting with character counting > for highlighting. See the following page for a link to the specification: > https://gpac.wp.mines-telecom.fr/mp4box/ttxt-format-documentation/ > --- > libavcodec/movtextenc.c | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c > index 20e01e2..3ae015a 100644 > --- a/libavcodec/movtextenc.c > +++ b/libavcodec/movtextenc.c > @@ -70,6 +70,7 @@ typedef struct { > uint8_t style_fontsize; > uint32_t style_color; > uint16_t text_pos; > + uint16_t text_pos_chars; > } MovTextContext; [...] > @@ -302,7 +303,10 @@ static void mov_text_text_cb(void *priv, const char *text, int len) > { > MovTextContext *s = priv; > av_bprint_append_data(&s->buffer, text, len); > - s->text_pos += len; > + s->text_pos += len; // length of text in bytes > + for (int i = 0; i < len; i++) // length of text in UTF-8 characters > + if ((text[i] & 0xC0) != 0x80) > + s->text_pos_chars++; > } > > static void mov_text_new_line_cb(void *priv, int forced) > @@ -310,6 +314,7 @@ static void mov_text_new_line_cb(void *priv, int forced) > MovTextContext *s = priv; > av_bprint_append_data(&s->buffer, "\n", 1); > s->text_pos += 1; > + s->text_pos_chars += 1; > } The code isnt really my area but is there a check to prevent text_pos and text_pos_chars from overflowing the 16bit range ? [...]
Good question. Since text_pos_chars never exceeds the existing variable text_pos, I did not think about this. No, there are no checks. The spec says that "Authors should limit the string in each text sample to not more than 2048 bytes, for maximum terminal interoperability", but the code does not enforce this limit (or the maximum uint16_t value of 65535 for that matter). The likeli- hood of exceeding this limit is very small, but it does not hurt to add a check. In any case text_pos >= text_pos_chars, so it should be sufficient to check just text_pos. In mov_text_new_line_cb we only increment by 1, so checking if s->text_pos == 0 after that is enough. In mov_text_text_cb this check can be used instead, placed before the length len is added to text_pos: if (len > UINT16_MAX || (s->text_pos > UINT16_MAX - len)) // Overflow I am new to the project's source code and do not know how errors and warnings should be handled, but could it be an idea to print a warning if text_pos > 2048, and print an error message and abort in case of overflow? Or should the rest of the text just be truncated? PS. Please excuse the duplicate patch that was sent an hour or two ago. It is identical to the one I submitted a couple of days ago and I have no idea why or how that happened.
On Sun, Dec 18, 2016 at 10:09:54PM +0000, Erik Bråthen Solem wrote: > Good question. Since text_pos_chars never exceeds the existing > variable text_pos, I did not think about this. > > No, there are no checks. The spec says that "Authors should limit the > string in each text sample to not more than 2048 bytes, for maximum > terminal interoperability", but the code does not enforce this limit > (or the maximum uint16_t value of 65535 for that matter). The likeli- > hood of exceeding this limit is very small, but it does not hurt to > add a check. In any case text_pos >= text_pos_chars, so it should be > sufficient to check just text_pos. In mov_text_new_line_cb we only > increment by 1, so checking if s->text_pos == 0 after that is enough. > In mov_text_text_cb this check can be used instead, placed before the > length len is added to text_pos: > if (len > UINT16_MAX || (s->text_pos > UINT16_MAX - len)) // Overflow > > I am new to the project's source code and do not know how errors and > warnings should be handled, but could it be an idea to print a > warning if text_pos > 2048, and print an error message and abort in > case of overflow? Or should the rest of the text just be truncated? i dont know about the 2048 case, but yes at the point were it would overflow error and returning an error (with cleanup if needed) is what we normally do. (something else would make sense if it occurs in real world files and isnt just an error/damagd unsalvagable blob) [...]
diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c index 20e01e2..3ae015a 100644 --- a/libavcodec/movtextenc.c +++ b/libavcodec/movtextenc.c @@ -70,6 +70,7 @@ typedef struct { uint8_t style_fontsize; uint32_t style_color; uint16_t text_pos; + uint16_t text_pos_chars; } MovTextContext; typedef struct { @@ -216,10 +217,10 @@ static void mov_text_style_cb(void *priv, const char style, int close) } s->style_attributes_temp->style_flag = 0; - s->style_attributes_temp->style_start = AV_RB16(&s->text_pos); + s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_chars); } else { if (s->style_attributes_temp->style_flag) { //break the style record here and start a new one - s->style_attributes_temp->style_end = AV_RB16(&s->text_pos); + s->style_attributes_temp->style_end = AV_RB16(&s->text_pos_chars); av_dynarray_add(&s->style_attributes, &s->count, s->style_attributes_temp); s->style_attributes_temp = av_malloc(sizeof(*s->style_attributes_temp)); if (!s->style_attributes_temp) { @@ -230,10 +231,10 @@ static void mov_text_style_cb(void *priv, const char style, int close) } s->style_attributes_temp->style_flag = s->style_attributes[s->count - 1]->style_flag; - s->style_attributes_temp->style_start = AV_RB16(&s->text_pos); + s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_chars); } else { s->style_attributes_temp->style_flag = 0; - s->style_attributes_temp->style_start = AV_RB16(&s->text_pos); + s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_chars); } } switch (style){ @@ -248,7 +249,7 @@ static void mov_text_style_cb(void *priv, const char style, int close) break; } } else { - s->style_attributes_temp->style_end = AV_RB16(&s->text_pos); + s->style_attributes_temp->style_end = AV_RB16(&s->text_pos_chars); av_dynarray_add(&s->style_attributes, &s->count, s->style_attributes_temp); s->style_attributes_temp = av_malloc(sizeof(*s->style_attributes_temp)); @@ -273,7 +274,7 @@ static void mov_text_style_cb(void *priv, const char style, int close) break; } if (s->style_attributes_temp->style_flag) { //start of new style record - s->style_attributes_temp->style_start = AV_RB16(&s->text_pos); + s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_chars); } } s->box_flags |= STYL_BOX; @@ -284,11 +285,11 @@ static void mov_text_color_cb(void *priv, unsigned int color, unsigned int color MovTextContext *s = priv; if (color_id == 2) { //secondary color changes if (s->box_flags & HLIT_BOX) { //close tag - s->hlit.end = AV_RB16(&s->text_pos); + s->hlit.end = AV_RB16(&s->text_pos_chars); } else { s->box_flags |= HCLR_BOX; s->box_flags |= HLIT_BOX; - s->hlit.start = AV_RB16(&s->text_pos); + s->hlit.start = AV_RB16(&s->text_pos_chars); s->hclr.color = color | (0xFF << 24); //set alpha value to FF } } @@ -302,7 +303,10 @@ static void mov_text_text_cb(void *priv, const char *text, int len) { MovTextContext *s = priv; av_bprint_append_data(&s->buffer, text, len); - s->text_pos += len; + s->text_pos += len; // length of text in bytes + for (int i = 0; i < len; i++) // length of text in UTF-8 characters + if ((text[i] & 0xC0) != 0x80) + s->text_pos_chars++; } static void mov_text_new_line_cb(void *priv, int forced) @@ -310,6 +314,7 @@ static void mov_text_new_line_cb(void *priv, int forced) MovTextContext *s = priv; av_bprint_append_data(&s->buffer, "\n", 1); s->text_pos += 1; + s->text_pos_chars += 1; } static const ASSCodesCallbacks mov_text_callbacks = { @@ -328,6 +333,7 @@ static int mov_text_encode_frame(AVCodecContext *avctx, unsigned char *buf, size_t j; s->text_pos = 0; + s->text_pos_chars = 0; s->count = 0; s->box_flags = 0; s->style_entries = 0;