From patchwork Wed Mar 8 01:37:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Erik_Br=C3=A5then_Solem?= X-Patchwork-Id: 2796 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp818172vsy; Tue, 7 Mar 2017 17:37:33 -0800 (PST) X-Received: by 10.223.164.140 with SMTP id g12mr2722596wrb.87.1488937053476; Tue, 07 Mar 2017 17:37:33 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f141si15095133wme.164.2017.03.07.17.37.32; Tue, 07 Mar 2017 17:37:33 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@hotmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hotmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 96537688248; Wed, 8 Mar 2017 03:37:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from COL004-OMC3S8.hotmail.com (col004-omc3s8.hotmail.com [65.55.34.146]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8087168077C for ; Wed, 8 Mar 2017 03:37:15 +0200 (EET) Received: from EUR01-HE1-obe.outbound.protection.outlook.com ([65.55.34.137]) by COL004-OMC3S8.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 7 Mar 2017 17:37:27 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=kG++a6t8dv8DMxKgRm0IuflZzIf9KXdlPV2zihAO8Rc=; b=WMMsc60AmlXskXmPufV+fBlaei1D+uCeJDbjBlT2xL8PA+daAS5Lej0ZkD5rf8mQkU7YS/uiMvx8tI8j4oerrejnmBBVLbRY42nsQHNiUi6YoiP/RMQdisNWlV2SpMSI2TxEpgDk5LbGPC8hnrzpBlZ2b2QId/BvKCm7pr20ptulU5CYhJa8lzRynJonILHNSw96omNAEdilFG9uat78/nqWe+M/6xQhp6pP/nLnEA6I6KitAvZCMllW1NmR4rZNEAMpaFx2QZw6V7NYYG1Xsuyg9t9tZMc7sdCMNeXyzxjFFCDRaCjXMwBpI3HYH71Xxjxz942r/HuVTikxKax/cQ== Received: from DB5EUR01FT036.eop-EUR01.prod.protection.outlook.com (10.152.4.51) by DB5EUR01HT121.eop-EUR01.prod.protection.outlook.com (10.152.4.186) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.933.11; Wed, 8 Mar 2017 01:37:24 +0000 Received: from VI1P194MB0255.EURP194.PROD.OUTLOOK.COM (10.152.4.57) by DB5EUR01FT036.mail.protection.outlook.com (10.152.5.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.947.7 via Frontend Transport; Wed, 8 Mar 2017 01:37:24 +0000 Received: from VI1P194MB0255.EURP194.PROD.OUTLOOK.COM ([10.175.185.10]) by VI1P194MB0255.EURP194.PROD.OUTLOOK.COM ([10.175.185.10]) with mapi id 15.01.0947.020; Wed, 8 Mar 2017 01:37:24 +0000 From: =?iso-8859-1?Q?Erik_Br=E5then_Solem?= To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH] lavc/movtextdec: fix incorrect offset calculation for UTF-8 characters Thread-Index: AQHSl6yJkOiuZfUNNUu0OIDl52rTVQ== Date: Wed, 8 Mar 2017 01:37:24 +0000 Message-ID: Accept-Language: nb-NO, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-slblob-mailprops: EpEO96k6Wol0/lCXfT0jZv4c0XvIzPj9HHuSAF+9JmRyj7p+zXbaxfH0JShsdekYiPEDOPP5D/JgfHlUJOLFEtdeyCBFJ1kunOMbaGtf46wymVlxf9tcKqN2zVso2k4SdJrXVABYF1QPqBxHF+YjIA811TV3lzsDs4y+iZfoR9PDfY1WOYmHOF0hk2LYxkh98B4FUj8pndA+vqRRccCb1FLLFjAfT3yj45xp5aaBakSv1MleETO2HAZisz8kQjAOAoDfEMokM+Jx7HEr0Jim7Yj46TR3BFN1xg0cbYI6gNqb4hYvPgOaE7VRMSK55wqg2D5qXC7P4vAks9D4+/Nk4qUjDdpxRsWgGXf1XaCNuUlgjKrA2aCMQKvqKZNqE3dA9Pjjwl7qdbM9YbWebWllASzDBWqoXkWw9SnAm99Y4cCfaheKY3bn+CghFlu51BMRIB1ONw3dF05xrnfvVlmB2hUFXqXN2PdbaadZZzQacgpZWEhwnX6g2XK1oDvV56wSG4q94p0Ev4cDq3KFvljkZl/0A4b79+UDQwbe/HxNycN6xRXAIXMG9VmTT11BEAkS//U/z3rxw8e8cjuO+IzSW/fdLiwlqZElzl9I7wZwmJzzndr+8IWzNHfCa46s46JBME702/SePDsgz/M1fayRl9QzAV8FvUNydXR21gycr1jBpi4oOVVSKGbRLr7VaEpRb0t5yWGI5/9cPBkKk6+Ymnt92OztZr+Lt7hAicNC1hJZSNmZ5nHxlS/C6ANq2zNgaADl+9a1smI= authentication-results: ffmpeg.org; dkim=none (message not signed) header.d=none; ffmpeg.org; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:7426EBD073FD856FFE7813B728627B64E014D2F087CCF23EA22DEC862F6392A8; UpperCasedChecksum:A1AF326B06456B66A80968F123260B318AC623A61E96DCC31FBCE8EA91670584; SizeAsReceived:8411; Count:36 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 36 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; DB5EUR01HT121; 5:Xwb3mvRqg0wJ8kELaOChD5Us/tbKGolqCdC6xlfkeKUshnL0umR76wlIXjBQJlSiemF+T4lywZH3TdY5XHlaoPZ/qBOckJ7JYnL6pHlmvHkYoAomxXI+d6qJzRpnj8da+XHGZWFeDdAP/wLN9gpFlx/D6S+h5D6567ATLCACGPw=; 24:NMJJVPpuROv/aZeYJ+Nfd97IVQZCNvSmSEHj3VrDAEFX/g/VKira2/mdx3JlORp5DsoJurABCcfDJom9Tz6YsBIWCyDbt29oNQ430PQD6oE=; 7:ebh4E5VoNyg2zuTkIDMEHTsYK30SIbrahe/qkqAitIvliLz0Xt5gdCwDcwNpuEcYxOiLNE3sUvJUuKPl4YdMw1zizkjQNORe7FkssqjN8+iumwyC6HqDicvpBNSkGKlbZ7E3zUhSU0sKKo0dc78blLkr+M7Z3FWo05NhBiVcRPqVZ3nU1waf1pH+bvbUE2mw6lj0aBVHfI1DOpI29YAJ4O3YUA56dRw8rQi6/NNanQdIzwXluW6dDcBQqztVh+d1gGuKvP9r5XX/Fk66KKL8IlP0bz0/mZdW8DH8ukHXrifona1lZqoMKFB3XdPdQJkc x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900016); DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR01HT121; H:VI1P194MB0255.EURP194.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 223b0090-37fc-4e20-49e9-08d465c3ab2a x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(201702181095)(1603101448)(1601125254)(1701031045); SRVR:DB5EUR01HT121; x-ms-exchange-slblob-mailprops: 6qklvm/at0oNK1fmfUnb+ZOj324kI+i0S62gW9xa0kQkbjl8oAN/bfc+ZeBjZiQ03cjI5GqinM6qd6/S49GRsRF3u9iEYqyFPxKzg/RnjNpsJFXrtsmnNjQa+ifxFIIZmqlAw6jPm8eaUcjk8ezuOvXPVR9RVT+Axp91exyWbm0rljLtNQZx0+vYVe7dooj4w4f6BFnJNhALxSU/RgHW+19FLib1fRoGq6MHkznmcInRhCLPpcwgtcPgAftB10RIqYq847mzu0LlrRIPOknuj54ByiNckWVfb0r+ah1/gvjKW3VNZmlp009v5b5+u4mIAQj8QG3f6yd1L01zd0WfOQ3jPLHP/qVr08OG42p78q5KUw3IzDovZg8Yuh7BAi6zB3njaET15cOTsyaKvYQPA61oxm4NP8VYfkPkuVqlfW/99mULt+GU11mQELDMEQSYboi2dTmMPq3kToh0lzLUpkwWgnvaxS3oYPs9WRXcZYzChtydlcJp3bCUSe2mWa1+5YXQK0wOo+2UpyM4G9SE2y4/UutJSVEV/udebFTXmeT8kEWMO/XzeTFxejiaShjs0ive59cEdN8adkGf3KklbvUliIILCSmc5LzPxTeXp1RqKA+dcNG8Bx3j3YPpH++7qMewNy0Cs1eDLpVUdENLB8ujxx52tmFcZTSjMAaPmAQYcLeP/I0U0oJt1SOpOa7q96Wg7vOKLh3sJRUnTR7FGradTkccM77z99plRuaVJJtMQH/mcBgx/OvaEZLieRqAdPIej9BkEDoL7iqT+bGFASyM4A6rEjEOxpYaGaD01EukvsyyDkGgFnWRyHJqa8zV x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015087)(444000031); SRVR:DB5EUR01HT121; BCL:0; PCL:0; RULEID:; SRVR:DB5EUR01HT121; x-forefront-prvs: 02408926C4 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Mar 2017 01:37:24.1745 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR01HT121 X-OriginalArrivalTime: 08 Mar 2017 01:37:27.0237 (UTC) FILETIME=[8B2DCF50:01D297AC] Subject: [FFmpeg-devel] [PATCH] lavc/movtextdec: fix incorrect offset calculation for UTF-8 characters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?iso-8859-1?Q?Erik_Br=E5then_Solem?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The 3GPP Timed Text (TTXT / tx3g / mov_text) specification counts multibyte UTF-8 characters as one single character, ffmpeg currently counts bytes. This patch inserts an if test such that: 1. continuation bytes are not counted during decoding 2. style boxes will not split these characters Fixes trac #6021 (decoding part). --- libavcodec/movtextdec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libavcodec/movtextdec.c b/libavcodec/movtextdec.c index 6de1500..2c7a204 100644 --- a/libavcodec/movtextdec.c +++ b/libavcodec/movtextdec.c @@ -342,6 +342,7 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end, } while (text < text_end) { + if ((*text & 0xC0) != 0x80) { /* Boxes never split multibyte chars */ if (m->box_flags & STYL_BOX) { for (i = 0; i < m->style_entries; i++) { if (m->s[i]->style_flag && text_pos == m->s[i]->style_end) { @@ -387,6 +388,8 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end, } } } + text_pos++; + } switch (*text) { case '\r': @@ -399,7 +402,6 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end, break; } text++; - text_pos++; } return 0;