[FFmpeg-devel] lavc/movtextdec: fix incorrect offset calculation for UTF-8 characters

Submitted by Erik BrĂ¥then Solem on March 8, 2017, 1:37 a.m.

Details

Message ID VI1P194MB02559AFE902B6EF4363C57B1C02E0@VI1P194MB0255.EURP194.PROD.OUTLOOK.COM
State New
Headers show

Commit Message

Erik BrĂ¥then Solem March 8, 2017, 1:37 a.m.
The 3GPP Timed Text (TTXT / tx3g / mov_text) specification counts multibyte UTF-8 characters as one single character, ffmpeg currently counts bytes. This patch inserts an if test such that:
1. continuation bytes are not counted during decoding
2. style boxes will not split these characters

Fixes trac #6021 (decoding part).

---
 libavcodec/movtextdec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/libavcodec/movtextdec.c b/libavcodec/movtextdec.c
index 6de1500..2c7a204 100644
--- a/libavcodec/movtextdec.c
+++ b/libavcodec/movtextdec.c
@@ -342,6 +342,7 @@  static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end,
     }
 
     while (text < text_end) {
+        if ((*text & 0xC0) != 0x80) { /* Boxes never split multibyte chars */
         if (m->box_flags & STYL_BOX) {
             for (i = 0; i < m->style_entries; i++) {
                 if (m->s[i]->style_flag && text_pos == m->s[i]->style_end) {
@@ -387,6 +388,8 @@  static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end,
                 }
             }
         }
+        text_pos++;
+        }
 
         switch (*text) {
         case '\r':
@@ -399,7 +402,6 @@  static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end,
             break;
         }
         text++;
-        text_pos++;
     }
 
     return 0;