From patchwork Wed Mar 8 01:36:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Erik_Br=C3=A5then_Solem?= X-Patchwork-Id: 2795 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp818019vsy; Tue, 7 Mar 2017 17:36:57 -0800 (PST) X-Received: by 10.28.73.196 with SMTP id w187mr3069688wma.5.1488937017250; Tue, 07 Mar 2017 17:36:57 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id x2si2263006wrc.9.2017.03.07.17.36.56; Tue, 07 Mar 2017 17:36:57 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@hotmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hotmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 15A91687EC1; Wed, 8 Mar 2017 03:36:41 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from BAY004-OMC2S27.hotmail.com (bay004-omc2s27.hotmail.com [65.54.190.102]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 012EE68077C for ; Wed, 8 Mar 2017 03:36:33 +0200 (EET) Received: from EUR01-VE1-obe.outbound.protection.outlook.com ([65.54.190.125]) by BAY004-OMC2S27.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 7 Mar 2017 17:36:45 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=phr5O7D2uKnpB4ZvFqXfT2OGJLa2p53+NGDTGkgRNCc=; b=SPDTQxKjoKPY6dTBcLNFoveQNYXXEVb6v6ayuQ0MOPaUR9TvhVSh5JkDmFus6wBKUK69IGz9V6UQUb55+KRY/N3ZWLALdGvaCc8YCJg7FsAgbY79toQYAHpsxSDJZ1uKiB5VustvTXSnfpFcYBTi23Mr8VsnptenjxswMwF5fsk0gOIFIapXPofPx3L/NEvKO4dmUwjOIgbyZ55fcxTJc4lpwUEq1tCx8xDJ+qujjGGOp7JsuWNEAYR0Z1+pM8Fm2xG/6fEsTt5of1HtborTKEhZHrSj/6ztYxmgYfF3PkfeheSaMIqPh3K9gre+yFUpTVR7C4lBXDNlGKvtNBA83w== Received: from DB5EUR01FT036.eop-EUR01.prod.protection.outlook.com (10.152.4.57) by DB5EUR01HT037.eop-EUR01.prod.protection.outlook.com (10.152.5.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.933.11; Wed, 8 Mar 2017 01:36:43 +0000 Received: from VI1P194MB0255.EURP194.PROD.OUTLOOK.COM (10.152.4.57) by DB5EUR01FT036.mail.protection.outlook.com (10.152.5.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.947.7 via Frontend Transport; Wed, 8 Mar 2017 01:36:42 +0000 Received: from VI1P194MB0255.EURP194.PROD.OUTLOOK.COM ([10.175.185.10]) by VI1P194MB0255.EURP194.PROD.OUTLOOK.COM ([10.175.185.10]) with mapi id 15.01.0947.020; Wed, 8 Mar 2017 01:36:42 +0000 From: =?iso-8859-1?Q?Erik_Br=E5then_Solem?= To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH] lavc/movtextenc: fix incorrect offset calculation for UTF-8 characters Thread-Index: AQHSl6xwMjWyWjyMukSe451DsXUjJw== Date: Wed, 8 Mar 2017 01:36:42 +0000 Message-ID: Accept-Language: nb-NO, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-slblob-mailprops: uIjBWfsAVx9qU05a+6KBZg8OJ5qA1Gcp0ykwL6/Hb0rVbdA7PH7xWV/y0TY4xJ1yN7KU/f+HgalEw5KXJ7ptXi1Y2VZFn1cecIp/E1eZuswZ7AqWHJv1xDGV6b5d159c81OBZzvap+rKzHM2l9gOoNHvZcYA0kT1XQivIaScmc1LFxZy4SGHh5lcnmG4Njtp2hJ2VJdqXrtzbUZWhqMnj/SYJpHAfMK0bK22cpEIWp0LErIkCcDI3EZljFf0wHAhr9wfTerWlL+3VtoH0DTsRxf8cSiYElaDUm4bQi3g21AmhInNR9Fr+SHGhMAveDEDuFndgJ4dJB0Vjz7vXJcna71LkpAv5KdCqo2YDpsLyu1eDScDNIO2NAVxTkEysbAQmXMwH841/6A7VF6IoPQd4B5UB9iobTVObuRq5IngGDIxCbcTiV3I88zIE3L2rDV7WsUXAJq7Zs+5Dkej2Zru/7J0U++Tc1V/a94CZVH6aTcxwd+rwVn/m961KvTzrXVQd6sL+ZQrRJnqbInep5Cjb8sryK0seRwAa9LHFVmAVgohFeQfEaDkfqRKX5XL22S4QDCIMxVnq89PR2KH5yZtLoJl7R6QTM+XpdviV8cyO/OiGjyWEwKDJE12rNKvXMxbh9TGwoc5K/CszMj6OqOudzYPhbeJA/Qo+BDJLmthnNgU6sN4+yHGPOne29Y5WhnBPg625tbFvgpTru8l+sFwRg== authentication-results: ffmpeg.org; dkim=none (message not signed) header.d=none; ffmpeg.org; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:67BB424ECABA4ABDD5ECE387D9CA643460D8576A003CEC4AFEB8A1B407F3268E; UpperCasedChecksum:834A1CC4F763A446CA266CE363CCF5C1EA014913483AC9E6DBD164AACC719C2F; SizeAsReceived:8345; Count:36 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 36 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; DB5EUR01HT037; 5:3zVLon9aZAFreTQWaPVasZArYyzW/glfTxdE4TsocPR87yzVKxDSWFiN7n+KpjhuURIJjqf1tZoeOnS8Vz7wIg1vHX8JQpSOZpMgvSEsVxhTAI56XWFgL0C/kD+Dzdcfl2RG320V9Vz1EeuC6GjSWfOQJZXcDw6pBtycsW3tWhI=; 24:0njWPFi8bZo7kgRXUXS56riuvEsuvPPCA+nsZstCULOSVfQxuRSr6fCqPTWq3YG0wNoDaq5g7jd06JuqUb80TPx6ajp/VtraubqKxVgQ/tU=; 7:vCdtr7f5o6olDfMThC8YuU6bTevpxTCaMMmWDGHGI/UrOZJTUTv3nbcEikqFV808yNhz7cTktTm0maoOi9f02qxR901Ri4IKaGky5Vjd2V23DROZ3XFvH2gLPetN5S31679tUwl6eFsd9N4QHg9DEpqSRHfrHsutBPk80QMLyfcwtXCwM75NBxP0eauuG4h1PS5K70sfBL4FXtXDGeZ/lZxCEoHojLC3Jaz1PZjky1jKy7zp72kaDdo7XUHWLQoE8xSpTiB73lyIbyl9xwQsEBQN0Q1IxmFcV+I6Gn6q6fd8fcxxrtNGYr9H53aeTtOR x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900016); DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR01HT037; H:VI1P194MB0255.EURP194.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: b435f48a-7467-438a-7e39-08d465c39222 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(201702181095)(1603101448)(1601125254)(1701031045); SRVR:DB5EUR01HT037; x-ms-exchange-slblob-mailprops: g0UYh8+NHeVMEuV2Irl4IaG5cAn2xXXgb+ojNyJoMsfeioRDIBVVLMeexnWXpTIxNqlcqoMkxzy8lCc5DllcVghbgVDNJ9Q0REOuFO1W5jBWHpaMAxYVqHxgtWmm4GrUUg5M86PKGAGOaQzvZiqpbnZXI1NobnDfC+CgtiA2uSvONNK4mhsbPx0rE9WM+0ba6jk5T0SzkMfRFcbDfIXkfpAb4N5GjjeOl8orfQ+gIxzC4WOt60hKuWVxbpApZ1ZGGg1h1a77SuNi+Z+LFM8s95ZKQqOFryDp0pifSY2FOG8Fp2rIGghm1hb1WxhfmRUBWE1JVPbUFubRxVKi1FSDFiL9V+yFZ3Qa2AXr1vYg90sgaBGjczyI0YxmCH1vmHZiu/3pLfAs5gVRRErRsTxn7jgnMLlBHmFmAnasKobTjY9stxrzZXcDCfsJ9j/34xdKIPnjaZn2i7OQ/EfxtY09zCitgPqfUms8DJlYGW8lUAb+O0bXYUiq01An3shHwvBthVJldYoIYDN+CkVevWlb+7apRFyDKJnzSrxyyC6RpxnZGH+TPEH9xwbgJEFiCG5hOku7IWyRyzbA7W5eSjNThGj4dOIdOK/OtkZbshQVoQptX/84/IVIiH/Nozl+ADSMLTqCDq6occHkHMBHsLIrcJbf84P2aqgMLJvRIBdeW3wMh4C2sBS75hVepeFk0ehFOfXo1RDegtPf/yi+p0erGwGj9FDLvv7+SZ0Xskwmen9d5iCEgIXgx1KDe9iDcUcjjNOmlfwlY50= x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015087)(444000031); SRVR:DB5EUR01HT037; BCL:0; PCL:0; RULEID:; SRVR:DB5EUR01HT037; x-forefront-prvs: 02408926C4 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Mar 2017 01:36:42.8439 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR01HT037 X-OriginalArrivalTime: 08 Mar 2017 01:36:45.0706 (UTC) FILETIME=[726CAEA0:01D297AC] Subject: [FFmpeg-devel] [PATCH] lavc/movtextenc: fix incorrect offset calculation for UTF-8 characters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?iso-8859-1?Q?Erik_Br=E5then_Solem?= Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" The 3GPP Timed Text (TTXT / tx3g / mov_text) specification counts multibyte UTF-8 characters as one single character, ffmpeg currently counts bytes. This produces files where style boxes have incorrect offsets. This patch introduces: 1. a separate variable that keeps track of the byte count 2. a for loop that excludes continuation bytes from the character counting Fixes trac #6021 (encoding part). --- libavcodec/movtextenc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c index 20e01e2..8d09ff4 100644 --- a/libavcodec/movtextenc.c +++ b/libavcodec/movtextenc.c @@ -70,6 +70,7 @@ typedef struct { uint8_t style_fontsize; uint32_t style_color; uint16_t text_pos; + uint16_t byte_size; } MovTextContext; typedef struct { @@ -302,7 +303,10 @@ static void mov_text_text_cb(void *priv, const char *text, int len) { MovTextContext *s = priv; av_bprint_append_data(&s->buffer, text, len); - s->text_pos += len; + for (int i = 0; i < len; i++) + if ((text[i] & 0xC0) != 0x80) + s->text_pos++; /* increase character count */ + s->byte_size += len; /* increase byte count */ } static void mov_text_new_line_cb(void *priv, int forced) @@ -310,6 +314,7 @@ static void mov_text_new_line_cb(void *priv, int forced) MovTextContext *s = priv; av_bprint_append_data(&s->buffer, "\n", 1); s->text_pos += 1; + s->byte_size += 1; } static const ASSCodesCallbacks mov_text_callbacks = { @@ -328,6 +333,7 @@ static int mov_text_encode_frame(AVCodecContext *avctx, unsigned char *buf, size_t j; s->text_pos = 0; + s->byte_size = 0; s->count = 0; s->box_flags = 0; s->style_entries = 0; @@ -362,7 +368,7 @@ static int mov_text_encode_frame(AVCodecContext *avctx, unsigned char *buf, } } - AV_WB16(buf, s->text_pos); + AV_WB16(buf, s->byte_size); buf += 2; if (!av_bprint_is_complete(&s->buffer)) {