From patchwork Sat Dec 17 03:39:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Erik_Br=C3=A5then_Solem?= X-Patchwork-Id: 1840 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.65.86 with SMTP id o83csp153802vsa; Fri, 16 Dec 2016 19:49:50 -0800 (PST) X-Received: by 10.194.47.242 with SMTP id g18mr5276862wjn.203.1481946590512; Fri, 16 Dec 2016 19:49:50 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d79si6516250wmi.16.2016.12.16.19.49.49; Fri, 16 Dec 2016 19:49:50 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@hotmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=hotmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3D09B689C09; Sat, 17 Dec 2016 05:49:40 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from BAY004-OMC1S13.hotmail.com (bay004-omc1s13.hotmail.com [65.54.190.24]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C1F1E689A76 for ; Sat, 17 Dec 2016 05:38:59 +0200 (EET) Received: from EUR01-DB5-obe.outbound.protection.outlook.com ([65.54.190.59]) by BAY004-OMC1S13.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Fri, 16 Dec 2016 19:39:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=vQe6PlevTf45amyt6bl5UmtMPeeFdXEVS27JpXbfADo=; b=nWhtSnpabUY3H/ATqJ8R7O4kJyz6g74HtPUFOLldYKIV7ETTkQyjageaWc0dwrghwA+oWE057SGlVO6BBMKSyuLzNJGb8RjpOzrLCFOoKTcCYSgutzotSfmm3V8qnfpWrH4uG557P2FKdiBXGqPvdN0wf6o+wX9B7ufxKbcdjwpOP23uh7RqPoVY2TPniAUpv7qThPPx5yE4kLmG0B7DgAPsUWDrKZb0tSq/AVIBjqisftfc2qlIyYMcJJkd1IF89pSHwXyY9qmRFmLgQCRdjDtY/Xl94VGlFVG/SCGeF6rhGb6aXL5/GYDpzc5bUupjdnsX/1xXNRngXbyvQZdL8w== Received: from VE1EUR01FT057.eop-EUR01.prod.protection.outlook.com (10.152.2.58) by VE1EUR01HT095.eop-EUR01.prod.protection.outlook.com (10.152.3.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.771.7; Sat, 17 Dec 2016 03:39:02 +0000 Received: from VI1PR01MB1327.eurprd01.prod.exchangelabs.com (10.152.2.52) by VE1EUR01FT057.mail.protection.outlook.com (10.152.3.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.761.6 via Frontend Transport; Sat, 17 Dec 2016 03:39:02 +0000 Received: from VI1PR01MB1327.eurprd01.prod.exchangelabs.com ([10.162.119.17]) by VI1PR01MB1327.eurprd01.prod.exchangelabs.com ([10.162.119.17]) with mapi id 15.01.0771.018; Sat, 17 Dec 2016 03:39:02 +0000 From: =?iso-8859-1?Q?Erik_Br=E5then_Solem?= To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH 1/1] libavcodec/movtextdec.c: fixing decoding for UTF-8 (ticket 6021) Thread-Index: AQHSWBcbSo9ZvOY/fkmeSSiF9WQKBg== Date: Sat, 17 Dec 2016 03:39:02 +0000 Message-ID: Accept-Language: nb-NO, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-slblob-mailprops: KBSGuA5p8vBwGUIbaa71nVxVtS+1zehy5FyzCDjxB8PRWQ5XijKHO84J/i0y08lF/y3m/3WaIKT7CQmRc88uTmTQSft+fux6m/aLxUTI5P0gaH2qNh5lM2U0nknE1JWw9gUV5fk+VoZySxZmENsj+qsRXJu496zYD7IBhhfBUYMGClPJrlG4xh8MYaUwIdqzG8AEkuULXrTlrfslVk/4kbxguGnv9xuOKmGOQgCW22hQI/AeBrKvQGHXhZlJ2FTN2aZgLko+7jP6OQ4U6gBGuBPEJ/n/jNE8vO3Bk82XMr8229AbCNDt80t4v/5tRmRX3AsdFmtdAGm9PztYQAlglrdYWV//h75vt+73CDQbuBFFxgPxO6QOgmG3S/QOCgZBCwIVjh0ZcMgCysXniNfU71r5jpA+RUuEBPHNM4rkLFC+hLFLo54cluBycrZxqDHMfkJxM4hoIjJ4AGlroZ3lryhG1fCFPofA9HKGdYZlfe5DK8glIgAGnKYIKR1wD9/VNthR8NX4T3JVfADJogXfUdflUpf7TWI6ZQTRiKZpWuAGldmlX3eZ2pSsuoD11cI5dhzWCblGrnJQom1KMkysfjmzAV+CO+J170oPZ25FvlY/2ka+fSxsNioCLg3Wcm4NWcHhpq7i1FNjFxQz/kBgUnrytKy2F3xjh+Tcvv651dDimKKM4SfYhc0uVieX4NYxqyRqIzkVkZwfooU5HVW5jWWG5mX2Ju8I8RstBmOrJgR/XPQLtJBCBwnEowguY7oIQaZUD4vAakQVquA8tRIgYaY4a74q/8sEzzR3JGSI7Hs= authentication-results: ffmpeg.org; dkim=none (message not signed) header.d=none; ffmpeg.org; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:A12EAC82DB7589231CD0A294D5FBFD9F8E3A38C250B35B0278717A63D7E811A2; UpperCasedChecksum:EB62D80F78EAF51C7322A8AEF35C94039FE3B88E7A4B2078A28956BDD67AC030; SizeAsReceived:8131; Count:36 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 36 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; VE1EUR01HT095; 7:ZaasLZUNyLTouA05FkFEsiVMR9RChidSpVvm36Xo2c4uL9eJrW9jHMVW9PIxWxKHvs+IjWMFcrQCCDRY5QxIryS6MlleO6E2MyOzB1R0dnjhM8tj1X+y/hmX1BIohHNblZJRuhjfPwUvpQki8moMQA3JBaC/aoItkWSl1dhn0Fs6wOeK9cWFLCFwdpsHLhZsQZ0I4fXEv4oiERyq3zak+Ijn2B49b04ws6E97ETHNgeZiity03no5NL+/dB4zSnQ4c5ad2Bft2bZkqYkMbD3YkONTywVelf/K9Lq5AoF7l6o+lj7qkGuK7NpVforNfG8x9dRTMPEEK+4ToDLoeGUtelN1ZSvW5iMsno4mWyX/QkmjEfAMtmRa8Opjukj6r5/4SHW7Rlkz9R+9Mm03UsQvX0hxvMzM3x+H1JUBGyXTdZW9+47lM6I2gtSTRR382fRgWAlWRNCU5o0CxJ5RBN/Xw== x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003); DIR:OUT; SFP:1102; SCL:1; SRVR:VE1EUR01HT095; H:VI1PR01MB1327.eurprd01.prod.exchangelabs.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: c0617cc8-e98d-4bde-90ea-08d4262e3d98 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(1601124038)(1603103113)(1601125047); SRVR:VE1EUR01HT095; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015012)(82015046); SRVR:VE1EUR01HT095; BCL:0; PCL:0; RULEID:; SRVR:VE1EUR01HT095; x-forefront-prvs: 0159AC2B97 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Dec 2016 03:39:02.3594 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1EUR01HT095 X-OriginalArrivalTime: 17 Dec 2016 03:39:05.0003 (UTC) FILETIME=[1D86C7B0:01D25817] X-Mailman-Approved-At: Sat, 17 Dec 2016 05:49:38 +0200 Subject: [FFmpeg-devel] [PATCH 1/1] libavcodec/movtextdec.c: fixing decoding for UTF-8 (ticket 6021) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?iso-8859-1?Q?Erik_Br=E5then_Solem?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Character offsets were interpreted as byte offsets, resulting in misplaced styling tags where multibyte characters were involved. The entire subtitle stream would even be rendered invalid if such a misplaced tag happened to split a multibyte character. This patch fixes this for UTF-8; UTF-16 was and still is broken. These are the only supported encodings according to the spec. --- libavcodec/movtextdec.c | 95 +++++++++++++++++++++++++++---------------------- 1 file changed, 53 insertions(+), 42 deletions(-) diff --git a/libavcodec/movtextdec.c b/libavcodec/movtextdec.c index 7b5b161..6e1ff73 100644 --- a/libavcodec/movtextdec.c +++ b/libavcodec/movtextdec.c @@ -328,6 +328,7 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end, int i = 0; int j = 0; int text_pos = 0; + int text_pos_chars = 0; if (text < text_end && m->box_flags & TWRP_BOX) { if (m->w.wrap_flag == 1) { @@ -338,50 +339,59 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end, } while (text < text_end) { - if (m->box_flags & STYL_BOX) { - for (i = 0; i < m->style_entries; i++) { - if (m->s[i]->style_flag && text_pos == m->s[i]->style_end) { - av_bprintf(buf, "{\\r}"); + if ((*text & 0xC0) != 0x80) { // Boxes never split multibyte characters + if (m->box_flags & STYL_BOX) { + for (i = 0; i < m->style_entries; i++) { + if (m->s[i]->style_flag && + text_pos_chars == m->s[i]->style_end) + { + av_bprintf(buf, "{\\r}"); + } } - } - for (i = 0; i < m->style_entries; i++) { - if (m->s[i]->style_flag && text_pos == m->s[i]->style_start) { - if (m->s[i]->style_flag & STYLE_FLAG_BOLD) - av_bprintf(buf, "{\\b1}"); - if (m->s[i]->style_flag & STYLE_FLAG_ITALIC) - av_bprintf(buf, "{\\i1}"); - if (m->s[i]->style_flag & STYLE_FLAG_UNDERLINE) - av_bprintf(buf, "{\\u1}"); - av_bprintf(buf, "{\\fs%d}", m->s[i]->fontsize); - for (j = 0; j < m->ftab_entries; j++) { - if (m->s[i]->style_fontID == m->ftab[j]->fontID) - av_bprintf(buf, "{\\fn%s}", m->ftab[j]->font); + for (i = 0; i < m->style_entries; i++) { + if (m->s[i]->style_flag + && text_pos_chars == m->s[i]->style_start) + { + if (m->s[i]->style_flag & STYLE_FLAG_BOLD) + av_bprintf(buf, "{\\b1}"); + if (m->s[i]->style_flag & STYLE_FLAG_ITALIC) + av_bprintf(buf, "{\\i1}"); + if (m->s[i]->style_flag & STYLE_FLAG_UNDERLINE) + av_bprintf(buf, "{\\u1}"); + /* (No need to print font style if equal to default?) */ + av_bprintf(buf, "{\\fs%d}", m->s[i]->fontsize); + for (j = 0; j < m->ftab_entries; j++) { + if (m->s[i]->style_fontID == m->ftab[j]->fontID) + av_bprintf(buf, "{\\fn%s}", m->ftab[j]->font); + } } } } - } - if (m->box_flags & HLIT_BOX) { - if (text_pos == m->h.hlit_start) { - /* If hclr box is present, set the secondary color to the color - * specified. Otherwise, set primary color to white and secondary - * color to black. These colors will come from TextSampleModifier - * boxes in future and inverse video technique for highlight will - * be implemented. - */ - if (m->box_flags & HCLR_BOX) { - av_bprintf(buf, "{\\2c&H%02x%02x%02x&}", m->c.hlit_color[2], - m->c.hlit_color[1], m->c.hlit_color[0]); - } else { - av_bprintf(buf, "{\\1c&H000000&}{\\2c&HFFFFFF&}"); + if (m->box_flags & HLIT_BOX) { + if (text_pos_chars == m->h.hlit_start) { + /* If hclr box is present, set the secondary color to the + * color specified. Otherwise, set primary color to white + * and secondary color to black. These colors will come from + * TextSampleModifier boxes in future and inverse video + * technique for highlight will be implemented. + */ + if (m->box_flags & HCLR_BOX) { + av_bprintf(buf, "{\\2c&H%02x%02x%02x&}", + m->c.hlit_color[2], m->c.hlit_color[1], + m->c.hlit_color[0]); + } else { + av_bprintf(buf, "{\\1c&H000000&}{\\2c&HFFFFFF&}"); + } } - } - if (text_pos == m->h.hlit_end) { - if (m->box_flags & HCLR_BOX) { - av_bprintf(buf, "{\\2c&H000000&}"); - } else { - av_bprintf(buf, "{\\1c&HFFFFFF&}{\\2c&H000000&}"); + if (text_pos_chars == m->h.hlit_end) { + if (m->box_flags & HCLR_BOX) { + av_bprintf(buf, "{\\2c&H000000&}"); + } else { + av_bprintf(buf, "{\\1c&HFFFFFF&}{\\2c&H000000&}"); + } } } + text_pos_chars++; } switch (*text) { @@ -412,10 +422,10 @@ static int mov_text_init(AVCodecContext *avctx) { MovTextContext *m = avctx->priv_data; ret = mov_text_tx3g(avctx, m); if (ret == 0) { - return ff_ass_subtitle_header(avctx, m->d.font, m->d.fontsize, m->d.color, - m->d.back_color, m->d.bold, m->d.italic, - m->d.underline, ASS_DEFAULT_BORDERSTYLE, - m->d.alignment); + return ff_ass_subtitle_header(avctx, m->d.font, m->d.fontsize, + m->d.color, m->d.back_color, m->d.bold, + m->d.italic, m->d.underline, + ASS_DEFAULT_BORDERSTYLE, m->d.alignment); } else return ff_ass_subtitle_header_default(avctx); } @@ -491,7 +501,8 @@ static int mov_text_decode_frame(AVCodecContext *avctx, for (size_t i = 0; i < box_count; i++) { if (tsmb_type == box_types[i].type) { - if (m->tracksize + m->size_var + box_types[i].base_size > avpkt->size) + if (m->tracksize + m->size_var + + ox_types[i].base_size > avpkt->size) break; ret_tsmb = box_types[i].decode(tsmb, m, avpkt); if (ret_tsmb == -1)