From patchwork Sun Jan 16 18:16:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oneric X-Patchwork-Id: 33597 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp2124269iog; Sun, 16 Jan 2022 10:17:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJzdSAXxxJ55nbAxWzcY15jW0F/OU/fwxzXTbVYpMaMesgY3LhkY1U0OFBlaJCESXR+fpLox X-Received: by 2002:a50:cc04:: with SMTP id m4mr9765243edi.17.1642357041133; Sun, 16 Jan 2022 10:17:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642357041; cv=none; d=google.com; s=arc-20160816; b=bZsZy8ZxBFPyHM+Z+bAPIo8At/YWMWBoRXVzYUFUWEDa8ukbHLfC6e4ElKZMs2c6Gz uSABY8Yn0s09rOem8hMw1YNJeXQ8ZwZf3q9BbelRHxqet86VUFYJsTkeUVWUdCIGAMDb pERKtg1R+7yMHmXgNRCmIRSP92THBWA8m5xWGxHvN50WkKnAp1dLyUol9uzlEfmSrI2k 5a0CKA/+ncUiVw5qvOuBbOgLFMePQ4QG7L6THaA96gXO5Iap/XDXnXhwl1g86i+67nsd BSqEghqZUg3kEbbMmlgdNF+PtXsT+NbXzkJB7L+qdVxMoKl9U0vEycLm4es8jJA9KECo 7djA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=Lo+qD8rqkSrDVEtwIdsaoyYDW6va5JxHRmBPM8DGRig=; b=edQv6/RSNV5gOPSYYJADzeZyU+776cR1sQkEGqlnLBxCIo5H+7s8WH3j/1P4D9q0vN 9W6nRfmVmtwxVgRd6tjWSN7/oFTceLZnKXT+as5B8+DmbagCtqy4ohcj3ShJtnEV0P6+ oQSlzWDjyOw159/jmm0ITzodbWS4jnUHAyRZernJoKE2SYV9QTZHVShAVj3grTiv6YQn O9AmjnwN3ZWIyRqldPkQkvv6aY4sI7Tzxf2SA3B7oQ76yPRmY5REC3dknBZ1mPPD7NFx W3VzsoNkNXZgYL04/JCv29FOXi/igb1mwfHk/WrUjE/jb1rZptwtXvj4XrmamkQlXoWH pQ0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=OFSB68+j; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id mp8si6970903ejc.704.2022.01.16.10.17.19; Sun, 16 Jan 2022 10:17:21 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=OFSB68+j; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8C53F68AD3D; Sun, 16 Jan 2022 20:17:14 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [85.215.255.25]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6C00E68A64A for ; Sun, 16 Jan 2022 20:17:08 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1642357027; s=strato-dkim-0002; d=oneric.de; h=Message-Id:Date:Subject:To:From:Cc:Date:From:Subject:Sender; bh=n8jZZfWYJ8fhGdjBqiwUsQzfJIv7DILOpRSvk8KNUHs=; b=OFSB68+jaYOA86JcXqnZDTmSa/DO0BmIGaDZrSAAF+zEJ6d2AF2PSR8lOmZEgPAFaD 1gduWve+J8Qq6wD56gAWIesHTuKeghscbIYs45F1VIK9QblMupVrGXaHmIux+A/OT70T pdmLEWGuCYn5meWlpdhyHupOcS9u7NN4oVALYA+kAmj99b/3AcDshiE9X9USAIM7dyf1 Vu93NhvFtIrkF0gn2zKk9OoTWqpLH8h6GU+UzdwzztkSYTcor793Qih/CJDbITwJ1pSq tOred0XTsF46UZah5hqW745KZya/vY1Dw60T4pP9w9OFKwm3enNGQtoIT9K49CKKMlPY UrlA== Authentication-Results: strato.com; dkim=none X-RZG-AUTH: ":I2IBZ0mrW/AWQXwgB4oxKM1YsW1lFUznrLvi/XReWqAAlWwZ8wlvfXmGs4jUQ0oz8ZbhHexs8fhgUyUAL4sh6WddW7pKJHLR+TAAqZF+" X-RZG-CLASS-ID: mo00 Received: from koenig-desktop.workgroup by smtp.strato.de (RZmta 47.37.6 AUTH) with ESMTPSA id j48e79y0GIH70Qg (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate) for ; Sun, 16 Jan 2022 19:17:07 +0100 (CET) From: Oneric To: ffmpeg-devel@ffmpeg.org Date: Sun, 16 Jan 2022 19:16:54 +0100 Message-Id: <20220116181655.6407-1-oneric@oneric.de> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ao60/3hiU1YK Backslashes cannot be escaped by backslashes in any ASS renderer, but unless followed by a few specific characters it is just printed as a regular character. Insert a word-joiner character after a backslash to break up the active sequences without changing the visual output. Also the existing \{ and \} escapes are specific to libass only. --- The patch assumes UTF-8 encoding in ff_ass_bprint_text_event (WebVTT requires UTF-8 per sepc). If we cannot assume a particular encoding, please advise how to best insert a word-joiner character in the correct encoding. --- libavcodec/ass.c | 5 ++++- libavcodec/webvttdec.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/libavcodec/ass.c b/libavcodec/ass.c index 725e4d42ba..461e110ca4 100644 --- a/libavcodec/ass.c +++ b/libavcodec/ass.c @@ -157,8 +157,11 @@ void ff_ass_bprint_text_event(AVBPrint *buf, const char *p, int size, /* standard ASS escaping so random characters don't get mis-interpreted * as ASS */ - } else if (!keep_ass_markup && strchr("{}\\", *p)) { + } else if (!keep_ass_markup && strchr("{}", *p)) { av_bprintf(buf, "\\%c", *p); + } else if (!keep_ass_markup && *p == '\\') { + // append word-joiner U+2060 as UTF-8 to break up sequences like \N + av_bprintf(buf, "\\\xe2\x81\xa0"); /* some packets might end abruptly (no \0 at the end, like for example * in some cases of demuxing from a classic video container), some diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 0093f328fa..8cb739697a 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -37,7 +37,7 @@ static const struct { {"", "{\\i1}"}, {"", "{\\i0}"}, {"", "{\\b1}"}, {"", "{\\b0}"}, {"", "{\\u1}"}, {"", "{\\u0}"}, - {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts + {"{", "\\{"}, {"}", "\\}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, {"‎", ""}, {"‏", ""}, // FIXME: properly honor bidi marks {"&", "&"}, {" ", "\\h"}, From patchwork Sun Jan 16 18:16:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Oneric X-Patchwork-Id: 33598 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp2124367iog; Sun, 16 Jan 2022 10:17:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzJd2KTFaV/i9k+zBoJBhxXqFiXj625repKFhdHDzUJMCwPT+JX6zSkqABup2PuLXby8JZQ X-Received: by 2002:a17:906:99c4:: with SMTP id s4mr4743528ejn.713.1642357051606; Sun, 16 Jan 2022 10:17:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642357051; cv=none; d=google.com; s=arc-20160816; b=uJxLPV0kDLBC4ePVxdUmhMh58s0EpKOOwaaKelEtkltJEGAvRzuQ/5KbTyiTgmXCCr 3lRTaC/U5G4wC0iM7p0VXej+0nwzo9B7yM2jFS3Igs2ET6eV58L1iz96xbr98GsS27aJ zpghikaNPIq0SKft9wLEMiBQlMI3rb74/0c8MgtCk8RD+88Wpgv62m/Qbgu5ndg/dOeP KISIU1yUXBtFbbMO92jAcDEUDKjDreFx7pwQ4Gnx0QhDdECsDtzj9hwVtgJga7FHYTBd 4kt7Xzkr9WCjY5NgG6nvh3vFpXxZZcRcWhTH/UFzq+EPMNojEtDsXE2eWakCuTQY5CqM CJlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=9tEKbZG9GrcdmL1/DPUlnRJyZfdvQQbHDpaZa27lbq0=; b=GBDCdHh6tO3QDbAGnldxhTWCDCiq8GvOJW4W/aT7hqBmtGG4ToQnRuJ7+Hdy4sLggE ToiWGpsl42SjUD6bxulFQiexDINUZyTP132Q5IYUKIKt4tKb+jLeEoGlW/gFN0/sJ1IL JeutMseTw/2kMjSykDvv9/R0pnyNIHP9gdqR2CRHqgMRfhMie6q72ego4iIzQ9DuuKPC yFE4ZdHq7ap8xJv2bBD0TXSH63otzd7gxcAawaM1HkmDAeZ6nuQwdQnzW0iSxiCabIg+ FA+PabUMbQYvijt+S0ib+pkwQvAnmEUg9QVoVj0RRvEL6Yut0g2FvJThV5NAftXEYeQG /RqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=mhLYeXvo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s4si7003258ejn.608.2022.01.16.10.17.31; Sun, 16 Jan 2022 10:17:31 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=mhLYeXvo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AD81768ACB6; Sun, 16 Jan 2022 20:17:15 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.218]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 69A4168A2EC for ; Sun, 16 Jan 2022 20:17:08 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1642357027; s=strato-dkim-0002; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=PKb51C9hO+pdef8z3kvAI+Mobn1RvjBHXul0pJWAxdI=; b=mhLYeXvoSu/M1/ms6/jhNtHTRNPdv0WMaSP8he4Pa+VblEwK18/ANzKb6IzglF9awY Tr/KSIG7g1ZXacAZU8GCMBSi6XpLjILTEPK2z0m/WsXxSAi2VBf3IUmKoRgNiw39DYJ1 Qjn05ozMQkY5uNVkRlhKX3JT3sxYr67NM8ruE4im7gzghss/OBgMzEeO3VNoIM2t0G23 MzssHXR8XoKWlxL74DxEM2hlCz8pHoIQlgIMCm+ABtxrGBEUdS2Ytc1TI+5F/Nt2NN/9 uCzmf+Ytm2r/YywEn+Sy7IKjoJ5n44P9efCb4xzXqMlPqWhbHumNC6EBrwr/hwK79JrZ kMMA== Authentication-Results: strato.com; dkim=none X-RZG-AUTH: ":I2IBZ0mrW/AWQXwgB4oxKM1YsW1lFUznrLvi/XReWqAAlWwZ8wlvfXmGs4jUQ0oz8ZbhHexs8fhgUyUAL4sh6WddW7pKJHLR+TAAqZF+" X-RZG-CLASS-ID: mo00 Received: from koenig-desktop.workgroup by smtp.strato.de (RZmta 47.37.6 AUTH) with ESMTPSA id j48e79y0GIH70Qh (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate) for ; Sun, 16 Jan 2022 19:17:07 +0100 (CET) From: Oneric To: ffmpeg-devel@ffmpeg.org Date: Sun, 16 Jan 2022 19:16:55 +0100 Message-Id: <20220116181655.6407-2-oneric@oneric.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220116181655.6407-1-oneric@oneric.de> References: <20220116181655.6407-1-oneric@oneric.de> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] avcodec/webvttdec: honour bidi marks X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 9ECDe8l6nO63 WebVTT files are required to be encoded as UTF-8 by its spec, so just insert the bytes for UTF-8 encoded bidi-marks. --- libavcodec/webvttdec.c | 2 +- tests/ref/fate/sub-webvtt2 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 8cb739697a..7d996928eb 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -39,7 +39,7 @@ static const struct { {"", "{\\u1}"}, {"", "{\\u0}"}, {"{", "\\{"}, {"}", "\\}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, - {"‎", ""}, {"‏", ""}, // FIXME: properly honor bidi marks + {"‎", "\xe2\x80\x8e"}, {"‏", "\xe2\x80\x8f"}, {"&", "&"}, {" ", "\\h"}, }; diff --git a/tests/ref/fate/sub-webvtt2 b/tests/ref/fate/sub-webvtt2 index 357b8178ea..4cd1d86a9a 100644 --- a/tests/ref/fate/sub-webvtt2 +++ b/tests/ref/fate/sub-webvtt2 @@ -20,6 +20,6 @@ Dialogue: 0,0:00:12.50,0:00:32.50,Default,,0,0,0,,OK, let’s go. Dialogue: 0,0:00:38.00,0:00:43.00,Default,,0,0,0,,I want to 愛あい love you\NThat's not proper English! Dialogue: 0,0:00:43.00,0:00:46.00,Default,,0,0,0,,{\i1}キツネ{\i0}じゃない キツネじゃない\N乙女おとめは Dialogue: 0,0:00:50.00,0:00:55.00,Default,,0,0,0,,Some time ago in a rather distant place.... -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456 +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: ‏123456‎\NAscending: 123456 Dialogue: 0,0:01:00.00,0:01:05.00,Default,,0,0,0,,>> Never gonna give you up Never gonna let you down\NNever\hgonna\hrun\haround & desert\hyou Dialogue: 0,0:55:00.00,1:00:00.00,Default,,0,0,0,,Transcrit par Célestes™