From patchwork Wed Mar 28 15:16:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 8210 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.1.70 with SMTP id c67csp584855jad; Wed, 28 Mar 2018 08:16:41 -0700 (PDT) X-Google-Smtp-Source: AIpwx493j8i+Hgq/j/MgXM1U65/ma4AWwM8nxOquRluCJ7nRush3IO2IAKpZNFKhph0xBRsjlDLW X-Received: by 10.223.182.132 with SMTP id j4mr3332408wre.66.1522250200930; Wed, 28 Mar 2018 08:16:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522250200; cv=none; d=google.com; s=arc-20160816; b=KXeKzsOJH6j9G0HT/lGMvFcG49swfMEfUPsCOun/jq0QucUcDHNfRVV0D9Z3AwXG// CN2gfK8wVgbm6fQ7/35FHhMHY2oKIPgIHKIw/6ePQUJNJnwOQkBYhnYs+dGIEA/Odp1O ZOGmVEj8j5wx4L8lWOJTK3v2zNC1rm44Z/U6M3w0eeRHNw4VaKSczVU/Q6pEM0sILf3w kcJRNhqiN+kTNWQWVTH1aX90Q16KL9R+i9q/a9+Q72xBe9vYmjGaa0glNbZzO6afK3Tn BlGmtYNBplZ6nZfpWtja9ZvvQbpWFAR1xroiqbEhvPlmBj7UdRW3dfDhsI6uFyhq2BLs vivA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=XLsFkW4rMopQdATJeJMWvjhtffJdvcoF4K/+meoP25s=; b=AjrTI/6DJhSIXiIVh0t2jKXDGjeGE7kpGOUfpfxOARk8xHuGOTv5HG8dyXAs2LdrMU Iw0esZq5qFNS8z5GcK+3cPYDjoE2lFeSxxqad+yesU0X21gE1tCaCx81aqyIe55KPKwf IKi/9uSzCgZ4lA0HLNRNzPImfgY3kuFo/BXz/NXWEfBliZhvStJ7yGsms03V4lVNQMv8 NjzCvliSWbBW2xH3lARl/zEW7IftAcSX7+eukkbIxfBDZBpKY+H7YPlDl4PLrzeFnPwv 4StzYTW/CKifIJ2abq51IqI+t67s1cOVT3s1+0sRGIdFNGH1v8RQ2WVcABEFoaS8+xBv 7KQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=WQGVhK2S; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f132si2611378wmf.92.2018.03.28.08.16.40; Wed, 28 Mar 2018 08:16:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=WQGVhK2S; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 052B9689C1C; Wed, 28 Mar 2018 18:16:21 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-it0-f97.google.com (mail-it0-f97.google.com [209.85.214.97]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8A4B1689B22 for ; Wed, 28 Mar 2018 18:16:14 +0300 (EEST) Received: by mail-it0-f97.google.com with SMTP id r19-v6so4251309itc.0 for ; Wed, 28 Mar 2018 08:16:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=sIvCQrhkZrCgiCAF7GMQkUxauEgTsu2mesPRzel/q+4=; b=iIcI3MHixDLFkG+MsKzhJedglneAZBXE2uhJqE52QtYNSUZCkLBfOaxb3L7PrJGwf5 np/MPsh/glBBQ60cJk0wx9vC2Xqz+56sA4kOGidOAw5eTLAO+uYBeis+0w3Z0UTWdkZ9 huZS2F0R3ihqqQJ+ZBxIHdOCJY8OF0yfwqLeVSMMdcMci7t1fcaK2Vln+jiWMUBb0wSB JlEimezuzZXcm/k3NhrRZENRMAxXMIZlDl5QiWoS/d5B0/UDBrMeYN0SG2EIHztOGsN+ i8S6R6mDZU/0ElNIQ6/I/Z/w7hS8ZKsBmQzT9CVZZ710WKiNybkvxPe9GZP2YsmSSxIg yFHA== X-Gm-Message-State: AElRT7EN8Y692gnHXI8ztv6dMU2Ic6i6AmX6i7XumClC7O6Pr+uaJdTV JKf5G5KF9hEGfmi3iryWYoXkkKNwmGVcK44ZnGse4N/El8WLMw== X-Received: by 2002:a24:8b43:: with SMTP id g64-v6mr4143524ite.69.1522250191491; Wed, 28 Mar 2018 08:16:31 -0700 (PDT) Received: from mail.overt.org (155.208.178.107.bc.googleusercontent.com. [107.178.208.155]) by smtp-relay.gmail.com with ESMTPS id a1-v6sm950793ith.11.2018.03.28.08.16.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Mar 2018 08:16:31 -0700 (PDT) X-Relaying-Domain: gapps.overt.org Received: from authenticated-user (mail.overt.org [107.178.208.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id E1140600BD for ; Wed, 28 Mar 2018 15:16:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=overt.org; s=mail; t=1522250191; bh=tbf2lxFbxyIvvKdXq/STt+L/2MMxJwzturMsd45FVcU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=WQGVhK2SEYtibZKluKp+7OIok8Jzo8emFvY0OQmwA0hiC0ASQzHLqCZDOQw1h8lok chUZPQc3LQmp3FFasS5eTcPNJM9BCqC0mjn8mWn6rZKt7wvw66n7oQLdE9WkQwVqAa Y/e1Z3DG9vQyAkQ/Kw1CDSAiVmnBsen49XtLcXD5+B6aUVE5zgEmFwZUduZljFs3U3 BDgM0f/1IrmfJyzj9yFBtwAdqWbtFecV/mG8a4mYU+kk52fHYLZ4cdl276T2P+GGsz EWgl5OxaqlFFaAYosrTSZrNWk1tQqtMG0XNNzQmxqKciw62YLvT1k8q505y7q8X9E6 3ksdJMlIe1Lrw== From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Mar 2018 08:16:12 -0700 Message-Id: <20180328151612.9781-1-philipl@overt.org> In-Reply-To: <20180328131457.522c1d84@debian> References: <20180328131457.522c1d84@debian> Subject: [FFmpeg-devel] [PATCH] movtextenc: fix handling of utf-8 subtitles X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" See the earlier fix for movtextdec for details. The equivalent bug is present on the encoder side as well. We need to track the text length in 'characters' (which seems to really mean codepoints) to ensure that styles are applied across the correct ranges. Signed-off-by: Philip Langdale --- libavcodec/movtextenc.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/libavcodec/movtextenc.c b/libavcodec/movtextenc.c index d795e317c3..e1d2ae446c 100644 --- a/libavcodec/movtextenc.c +++ b/libavcodec/movtextenc.c @@ -304,11 +304,33 @@ static void mov_text_color_cb(void *priv, unsigned int color, unsigned int color */ } +static uint16_t utf8_strlen(const char *text, int len) +{ + uint16_t i = 0, ret = 0; + while (i < len) { + char c = text[i]; + if ((c & 0x80) == 0) + i += 1; + else if ((c & 0xE0) == 0xC0) + i += 2; + else if ((c & 0xF0) == 0xE0) + i += 3; + else if ((c & 0xF8) == 0xF0) + i += 4; + else + return 0; + ret++; + } + return ret; +} + static void mov_text_text_cb(void *priv, const char *text, int len) { + uint16_t utf8_len = utf8_strlen(text, len); MovTextContext *s = priv; av_bprint_append_data(&s->buffer, text, len); - s->text_pos += len; + // If it's not utf-8, just use the byte length + s->text_pos += utf8_len ? utf8_len : len; } static void mov_text_new_line_cb(void *priv, int forced)