From patchwork Sun Dec 10 16:37:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Oneric X-Patchwork-Id: 45024 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp2468525pzf; Sun, 10 Dec 2023 08:37:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IE0RRy9ZB5DYjqoP+0D3XqmBmtB437JK10KX0sYecv8IzRrEcLeNVewDwYLCUWKv8qcT8zS X-Received: by 2002:a17:907:1602:b0:a01:ee03:37ec with SMTP id cw2-20020a170907160200b00a01ee0337ecmr2917303ejd.3.1702226274586; Sun, 10 Dec 2023 08:37:54 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dk24-20020a170906f0d800b00a1db8d8e75csi2937573ejb.31.2023.12.10.08.37.54; Sun, 10 Dec 2023 08:37:54 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=mY+pIBEN; dkim=neutral (no key) header.i=@ffmpeg.org header.s=strato-dkim-0003 header.b=qe7qvWNj; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD28168D0A9; Sun, 10 Dec 2023 18:37:42 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [85.215.255.22]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E338A68D048 for ; Sun, 10 Dec 2023 18:37:34 +0200 (EET) ARC-Seal: i=1; a=rsa-sha256; t=1702226254; cv=none; d=strato.com; s=strato-dkim-0002; b=bF1VHX10JDGPhcAHkMjWJD9ZUc0HV4Tn+pPaZJhNsHrZ+8KEKgw16j2vrTu89XehVK uelWgjKu3XqF8DljwpdrC9ItC4HIxXZwXpaXUbikkcce2Lx6R6AphYguJIHssVGf8vhW x33YKHQANmLuJ+4v7SlOeqQ5X4AKiZkPC5StzT0EN1F2QxlhIbVpqhdwmqVhajh4wKE6 UogJ+jYN2vfxi26y1WdvNGpT2e4WtXNBRKNW1AJ+Tra/+Yz0DwniV6z3zMeFMLVN+giN C8fIqFiV42ZUtAMBDMdAOfdUC6DOVE49Zzwe39c6hFq5dMhhPQhZ1Kb4VL6tLo3oZfuG xaQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=V3BflC+g4Ay2sUArV3tt2vBnlfa8LIxVWX58xd1WzwM=; b=flfL4A4D/NRctsf1YRarBSfJmAsQ+LoRe1xxOG3GeRb3pR7hMb/E43Gc3wVGyHouF0 5bu3G6UcnzTWFbZN7FQBHRybA/hl9aTsUZJod6W2P0Y2id24sHvawnovWwTib+xhibpS abcY1z3N3RJVJqOZrjFZz6tpCmjuntocNNJUuk07UD6HhcUBNpBGN1YqnnfgLaG4RfM7 V0382XSTjKTfu/ist6gjPfezCAEarlrBVt71ORjv8dx2Ia4t0nRNPSzhp+CpJ4YlEZ/p et0ePVql8IcEpc1QWX0wRP892/wUlcCfafAk7aRrzVY1DAtwSgyMsSkcdKasKDdFn0Qu UrMA== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0002; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=V3BflC+g4Ay2sUArV3tt2vBnlfa8LIxVWX58xd1WzwM=; b=mY+pIBENeLRlyfG4oFJBEStF9YebADsd3fj8fD8UJKhFPlsDUgu9kOyMBWf0dpqVuK EG4c8VguIFkBuHU6rr5SfoKbNOAjLzNlww/0l7HxpK5gHIrvhd0xCv01jiNhsAARTjCm mIudJwUjtVatwO3z8qoRRsOHIvn3kMMOdR3vaR159COLjM9pZ2wyhFZTlWaks6jUBwBy KMD+jqJWIY7+E+Y8EODakKYTXr/VASWFGRtIrV+RYt8iECQfEBuxXDXTnKwkX7Cf+9Zm MwgaKbf6p8OaT9sK2DetrItJ98mk9EtlvRfGr71WD57exG95B8x3h8THKbke9jT/RoDN NTtA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0003; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=V3BflC+g4Ay2sUArV3tt2vBnlfa8LIxVWX58xd1WzwM=; b=qe7qvWNjHhueFg8WJFVH/muvYudAhR4MA4FKHkQV1NHnFJ2qZdg0nPBE8diEbOqa91 rjZVSCYNs5pxj5mSWWDA== X-RZG-AUTH: ":I2IBZ0mrW/AWQXwgB4oxKM1YsW1lFUznrLvi/XReWqAAlWwZ8wlvfXmGs4jUQ0oz8ZbhHexs8fhgUyAHJ90htHJwb5tQKk/WXOwm3hdv" Received: from abhoth.workgroup by smtp.strato.de (RZmta 49.10.0 AUTH) with ESMTPSA id g26a92zBAGbYWJj (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Sun, 10 Dec 2023 17:37:34 +0100 (CET) From: Oneric To: ffmpeg-devel@ffmpeg.org Date: Sun, 10 Dec 2023 17:37:13 +0100 Message-Id: <20231210163715.4016-2-oneric@oneric.de> In-Reply-To: <20231210163715.4016-1-oneric@oneric.de> References: <20231210163715.4016-1-oneric@oneric.de> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 1/3] avcodec/webvttdec: honour bidi marks X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: mFo0d5QRBgCO --- “” --- libavcodec/webvttdec.c | 2 +- tests/ref/fate/sub-webvtt2 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 690f00dc47..990d150f16 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -39,7 +39,7 @@ static const struct { {"", "{\\u1}"}, {"", "{\\u0}"}, {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, - {"‎", ""}, {"‏", ""}, // FIXME: properly honor bidi marks + {"‎", "\xe2\x80\x8e"}, {"‏", "\xe2\x80\x8f"}, {"&", "&"}, {" ", "\\h"}, }; diff --git a/tests/ref/fate/sub-webvtt2 b/tests/ref/fate/sub-webvtt2 index 1d236eabdc..31fb5f83a7 100644 --- a/tests/ref/fate/sub-webvtt2 +++ b/tests/ref/fate/sub-webvtt2 @@ -21,6 +21,6 @@ Dialogue: 0,0:00:12.50,0:00:32.50,Default,,0,0,0,,OK, let’s go. Dialogue: 0,0:00:38.00,0:00:43.00,Default,,0,0,0,,I want to 愛あい love you\NThat's not proper English! Dialogue: 0,0:00:43.00,0:00:46.00,Default,,0,0,0,,{\i1}キツネ{\i0}じゃない キツネじゃない\N乙女おとめは Dialogue: 0,0:00:50.00,0:00:55.00,Default,,0,0,0,,Some time ago in a rather distant place.... -Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: 123456\NAscending: 123456 +Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0,0,0,,Descending: ‏123456‎\NAscending: 123456 Dialogue: 0,0:01:00.00,0:01:05.00,Default,,0,0,0,,>> Never gonna give you up Never gonna let you down\NNever\hgonna\hrun\haround & desert\hyou Dialogue: 0,0:55:00.00,1:00:00.00,Default,,0,0,0,,Transcrit par Célestes™ From patchwork Sun Dec 10 16:37:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Oneric X-Patchwork-Id: 45025 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp2468591pzf; Sun, 10 Dec 2023 08:38:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IHjLtzyy9SRj0//xZyCBMqltBJAFAEDLUJahS5DrmqtdfAc0nY2rF4hhks2Q/ZDwFPQf3jy X-Received: by 2002:a50:8d12:0:b0:54b:2894:d198 with SMTP id s18-20020a508d12000000b0054b2894d198mr2099588eds.4.1702226283114; Sun, 10 Dec 2023 08:38:03 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dz14-20020a0564021d4e00b0054c9c352ecesi2745523edb.427.2023.12.10.08.38.02; Sun, 10 Dec 2023 08:38:03 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=Y69Lyir0; dkim=neutral (no key) header.i=@ffmpeg.org; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C6D3568D0E0; Sun, 10 Dec 2023 18:37:43 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [85.215.255.24]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E6AF268D051 for ; Sun, 10 Dec 2023 18:37:34 +0200 (EET) ARC-Seal: i=1; a=rsa-sha256; t=1702226254; cv=none; d=strato.com; s=strato-dkim-0002; b=o2uXxepme+iia1yPzz1XmRcv0dbsZPEi7cr32ocwSwopG4iLTjICkPRtBEiH1d10t6 S7BzSFBiRh93BDmNK6DDC81w//OCtZ2H2GQpybpGazQXmDjrEYkkSJJZZSR3qmSipJaT 9J6tquCEZ7d7vf70evIeW9fM9Ad3j127vkEuMFHCxm0KC9rBcw2npUcGcRQQVo6HEkgb cTTrna+PXI4kkxYAw4P9jQ3RWlr/F/mWPLqWOogbwSGjN8OCS8vsDiwgiDFFc4jAzlOe wWxPW3pEew0/Mj/xdvUAi7+yza3dG8SvArH4ZuvjSSz2oMb3ec+Loh73IF4FyFQKCI5/ ecpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=K7Em598Wr0ZzDdwATb7axorye1MgOK23ZZLkBWtHEqA=; b=VepmeKHGCV9OWwnbvCiwmcmk2lTkkVBapFhXE3TENQwrL2di1qAjY9i4WIYg6zY/F7 pNEVZ4OmxxLV9uPeBtlh6n2il7lz6RVMbqTf2X0mUKF7K4uALrYDYfHWayE89Cio69mY q0zBU2XO0rf+HhXFHtUE31xqUbOePJma7wwEuc5hO+vHXyhiynycu6oCJ3NuoxWTksGv KRZdw5YuQ/qxI4L9bDKyo82BKP1/+k9PSv6016oCk0ECb9Ez8CLPe05Yh6m1HYpoLBYR ++c3BWjbKXbQKPfaXXP8pAVUB3l9DZHgQzQD5S1rjdY5O8KD7QpNVW3ebJMac7VoU+XK Pcmw== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0002; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=K7Em598Wr0ZzDdwATb7axorye1MgOK23ZZLkBWtHEqA=; b=Y69Lyir02b/nphwq8EFjGHiBNSIBDT6MvWgwuVtoyt2GgbP4oXSMD9Cdr+w4p8sP5P gwEfGoqG3mCJqULdrAxzw8pGwfJvNae2zAe+5XGHKmJZ4OYz7z8DkH3Q/Z2Pl5SqwvOB k9QSeZh0Zzz+f/y70LQhEwEvGu1UNEW7YO74l3DK/ONvDcwXkfv+kWAwXLoF2Z615gPV YB7c9kUTHTHO7oiW5afSBoeasqpsOB4p/meKTZrCdcg3zHWkJZKkrSUSMNqJ1gL6oZyb RGkGDrDZw2ukxzGvHvEXPGpsJd3mGMjy76dXX1c9mllFgy4yr/XVUytOOkyFwGHxyIPI in1A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0003; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=K7Em598Wr0ZzDdwATb7axorye1MgOK23ZZLkBWtHEqA=; b=dIYU0rv89N4jmjyUrDJtN3vgzccAeSp+QjE284/dyTywmi6BYLmi9b5W7f20Mxzwmq ZJGxFqNNY3VhHrZ9IjAA== X-RZG-AUTH: ":I2IBZ0mrW/AWQXwgB4oxKM1YsW1lFUznrLvi/XReWqAAlWwZ8wlvfXmGs4jUQ0oz8ZbhHexs8fhgUyAHJ90htHJwb5tQKk/WXOwm3hdv" Received: from abhoth.workgroup by smtp.strato.de (RZmta 49.10.0 AUTH) with ESMTPSA id g26a92zBAGbYWJk (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Sun, 10 Dec 2023 17:37:34 +0100 (CET) From: Oneric To: ffmpeg-devel@ffmpeg.org Date: Sun, 10 Dec 2023 17:37:14 +0100 Message-Id: <20231210163715.4016-3-oneric@oneric.de> In-Reply-To: <20231210163715.4016-1-oneric@oneric.de> References: <20231210163715.4016-1-oneric@oneric.de> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/3] avcodec/{ass, webvttdec}: fix handling of backslashes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 2oWhymMKdvDf Backslashes cannot be escaped by a backslash in any ASS renderer, but unless followed by specific characters it is just printed out. Insert a word-joiner character after a backslash to break up active sequences without changing the visual output. --- “” --- libavcodec/ass.c | 9 ++++++++- libavcodec/webvttdec.c | 2 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/libavcodec/ass.c b/libavcodec/ass.c index 5058dc8337..a68d3568b4 100644 --- a/libavcodec/ass.c +++ b/libavcodec/ass.c @@ -183,9 +183,16 @@ void ff_ass_bprint_text_event(AVBPrint *buf, const char *p, int size, /* standard ASS escaping so random characters don't get mis-interpreted * as ASS */ - } else if (!keep_ass_markup && strchr("{}\\", *p)) { + } else if (!keep_ass_markup && strchr("{}", *p)) { av_bprintf(buf, "\\%c", *p); + /* append word-joiner U+2060 as UTF-8 to break up sequences like \N */ + } else if (!keep_ass_markup && *p == '\\') { + if (p_end - p <= 3 || strncmp(p + 1, "\xe2\x81\xa0", 3)) + av_bprintf(buf, "\\\xe2\x81\xa0"); + else + av_bprintf(buf, "\\"); + /* some packets might end abruptly (no \0 at the end, like for example * in some cases of demuxing from a classic video container), some * might be terminated with \n or \r\n which we have to remove (for diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 990d150f16..6e55bc5499 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -37,7 +37,7 @@ static const struct { {"", "{\\i1}"}, {"", "{\\i0}"}, {"", "{\\b1}"}, {"", "{\\b0}"}, {"", "{\\u1}"}, {"", "{\\u0}"}, - {"{", "\\{"}, {"}", "\\}"}, // escape to avoid ASS markup conflicts + {"{", "\\{"}, {"}", "\\}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, {"‎", "\xe2\x80\x8e"}, {"‏", "\xe2\x80\x8f"}, {"&", "&"}, {" ", "\\h"}, From patchwork Sun Dec 10 16:37:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Oneric X-Patchwork-Id: 45026 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp2468646pzf; Sun, 10 Dec 2023 08:38:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IFHaZ1AKz5Fppf4GMJNyaoDYkGVr+qKJm0UqDIz4zfAdcd3MLdJ9vpWZvoMdxw4Uzribq26 X-Received: by 2002:a05:6402:1346:b0:54c:4fec:f4 with SMTP id y6-20020a056402134600b0054c4fec00f4mr1060209edw.131.1702226291396; Sun, 10 Dec 2023 08:38:11 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y9-20020a50e609000000b0054c72455321si2913271edm.648.2023.12.10.08.38.10; Sun, 10 Dec 2023 08:38:11 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@oneric.de header.s=strato-dkim-0002 header.b=X7bKbxCU; dkim=neutral (no key) header.i=@ffmpeg.org header.s=strato-dkim-0003; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DEFBC68D0FD; Sun, 10 Dec 2023 18:37:44 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.220]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0153268D059 for ; Sun, 10 Dec 2023 18:37:34 +0200 (EET) ARC-Seal: i=1; a=rsa-sha256; t=1702226254; cv=none; d=strato.com; s=strato-dkim-0002; b=rjEYBrz5lQmxNrfKZ1BG6Bm6q4YBsgdayPdG9wNwW2MnRiA28cGt7vV867Qptp9SOw SSyQ/jHelQ00i35efvGur/datuOm7tTBmUVuUUqGgsluzlvvwnlKdTDWHt5WinfrPQq6 sNAbPah+A22ykKPdrX/6v3Eeql/fhwvyHVumeIG0+FJVon1oA4t3TRdMMVxndcrIIVkA XRkNz4nT5pFBJIH3xQem3bFBLurdDS9C95eKoIJERf3snekH4CLFtmgIRlDO9YIZ1Dnm +KLp3a5q9lJBp9GyzQJjGaiiSKdd+lRYQBcvMbhYsT5AbYBsP+aErzigFpZgDWqyGevD OZFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=2rKGD9Wpd/X1AT8j+7Jdv6KBmdSz7NSzKaQ9Wy+vMMw=; b=ZZ1LBT/JsyLdPDMTrvv+yqpQRfuJXjalsz7I6lFaBduk64KAbBMlDCi9P5vfBnnT4R IUpIgq2WBhKSqZBFWjuMBcgHyQdQf88Hf8RTQkF03hdQRJMLA0yltj34ql6d0Qh1drnM EaOoX7TZKm9nKSxFP83GHq6iMSmnc1tT6KxYVJwBYH32i1d0LawWQBQ4bqgNmshFv5vJ XfAJpQG0P/XmSzbC8xboLT/0SOe+GMLg2TDKgHrHGQRflJWe3k/Z8cHnb5wuKLu3Drvb 6eF6ddQiQbWOwee9kCZjsVSUchGBowBRssDlNW2g2e0+g5BxD+WIJ3grdDHgaTjgToK0 hKYQ== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0002; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=2rKGD9Wpd/X1AT8j+7Jdv6KBmdSz7NSzKaQ9Wy+vMMw=; b=X7bKbxCUIwUiWbIYgGGAiTQH9cz9wqvtJ0HK9wlpfo44BydESbqmTJfDmo5OyjdzAX vBkoxQeKWFr6ve5iyr6PhWma2pulaJQN9ff3oa3+3qxV8eWntTu7o+oo4ZjLo5xPhg8e oSiBZaOH3wJcnMtfwQPREb59zsg83p2+HTdXiBaxRtcN5sTjwURmNMJv2RIZYrrO1k71 dNyus7FFejtgmafwC2ufRa8e5zDCYZfVMwLE9eqYXzYX4V0MvdRpY5J+AAMj0PT61LWM adkSrSZG1TBlY8hRjGqt9Rcw2fKqObiAv8eHGX26jnKJW3dSdwIF1Z4ZAo66w+FjnlT3 ZV9g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1702226254; s=strato-dkim-0003; d=oneric.de; h=References:In-Reply-To:Message-Id:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=2rKGD9Wpd/X1AT8j+7Jdv6KBmdSz7NSzKaQ9Wy+vMMw=; b=h4t3/Xru57NmQCJEz5uihIAAOK9wTj+ATLFs05SqGvr842gYOZ+fC2uV+ZqRjXUQqT 1aBQ9nkeQvcav5ZoEQBA== X-RZG-AUTH: ":I2IBZ0mrW/AWQXwgB4oxKM1YsW1lFUznrLvi/XReWqAAlWwZ8wlvfXmGs4jUQ0oz8ZbhHexs8fhgUyAHJ90htHJwb5tQKk/WXOwm3hdv" Received: from abhoth.workgroup by smtp.strato.de (RZmta 49.10.0 AUTH) with ESMTPSA id g26a92zBAGbYWJl (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Sun, 10 Dec 2023 17:37:34 +0100 (CET) From: Oneric To: ffmpeg-devel@ffmpeg.org Date: Sun, 10 Dec 2023 17:37:15 +0100 Message-Id: <20231210163715.4016-4-oneric@oneric.de> In-Reply-To: <20231210163715.4016-1-oneric@oneric.de> References: <20231210163715.4016-1-oneric@oneric.de> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 3/3] avcodec/{ass, webvttdec}: more portable curly brace escapes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Zgmp6CGqiG1n Unlike what the old comment suggested, standard ASS has no character escape mechanism, but a closing curly bracket doesn't even need one. For manual authored sub files using a full-width variant of an apropiate font and with scaling and psacing modifiers is a common workaround. This is not an option here, but we can still make things much less bad. Now the desired opening bracket still shows up in libass and standard renders will merely display a backslash in its place instead of stripping the following text like before. --- “” --- libavcodec/ass.c | 12 ++++++++---- libavcodec/webvttdec.c | 2 +- tests/ref/fate/sub-webvtt | 2 +- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/libavcodec/ass.c b/libavcodec/ass.c index a68d3568b4..e7a1ac0eb5 100644 --- a/libavcodec/ass.c +++ b/libavcodec/ass.c @@ -181,10 +181,14 @@ void ff_ass_bprint_text_event(AVBPrint *buf, const char *p, int size, if (linebreaks && strchr(linebreaks, *p)) { av_bprintf(buf, "\\N"); - /* standard ASS escaping so random characters don't get mis-interpreted - * as ASS */ - } else if (!keep_ass_markup && strchr("{}", *p)) { - av_bprintf(buf, "\\%c", *p); + /* cancel curly brackets to avoid bogus override tag blocks + * hiding text. Standard ASS has no character escapes, + * though (only) libass provides \{ and \}. + * Unpaired closing brackets don't need escaping at all though and + * to make the situation less bad in standard ASS insert an empty block + */ + } else if (!keep_ass_markup && *p == '{') { + av_bprintf(buf, "\\{{}"); /* append word-joiner U+2060 as UTF-8 to break up sequences like \N */ } else if (!keep_ass_markup && *p == '\\') { diff --git a/libavcodec/webvttdec.c b/libavcodec/webvttdec.c index 6e55bc5499..35bdbe805d 100644 --- a/libavcodec/webvttdec.c +++ b/libavcodec/webvttdec.c @@ -37,7 +37,7 @@ static const struct { {"", "{\\i1}"}, {"", "{\\i0}"}, {"", "{\\b1}"}, {"", "{\\b0}"}, {"", "{\\u1}"}, {"", "{\\u0}"}, - {"{", "\\{"}, {"}", "\\}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts + {"{", "\\{{}"}, {"\\", "\\\xe2\x81\xa0"}, // escape to avoid ASS markup conflicts {">", ">"}, {"<", "<"}, {"‎", "\xe2\x80\x8e"}, {"‏", "\xe2\x80\x8f"}, {"&", "&"}, {" ", "\\h"}, diff --git a/tests/ref/fate/sub-webvtt b/tests/ref/fate/sub-webvtt index 2317c7d5a0..7f0a306361 100644 --- a/tests/ref/fate/sub-webvtt +++ b/tests/ref/fate/sub-webvtt @@ -21,7 +21,7 @@ Dialogue: 0,0:00:22.00,0:00:24.00,Default,,0,0,0,,at the AMNH. Dialogue: 0,0:00:24.00,0:00:26.00,Default,,0,0,0,,Thank you for walking down here. Dialogue: 0,0:00:27.00,0:00:30.00,Default,,0,0,0,,And I want to do a follow-up on the last conversation we did.\Nmultiple lines\Nagain Dialogue: 0,0:00:30.00,0:00:31.50,Default,,0,0,0,,When we e-mailed— -Dialogue: 0,0:00:30.50,0:00:32.50,Default,,0,0,0,,Didn't we {\b1}talk {\i1}about\N{\i0} enough{\b0} in that conversation? \{I'm not an ASS comment\} +Dialogue: 0,0:00:30.50,0:00:32.50,Default,,0,0,0,,Didn't we {\b1}talk {\i1}about\N{\i0} enough{\b0} in that conversation? \{{}I'm not an ASS comment} Dialogue: 0,0:00:32.00,0:00:35.50,Default,,0,0,0,,No! No no no no; 'cos 'cos obviously 'cos Dialogue: 0,0:00:32.50,0:00:33.50,Default,,0,0,0,,{\i1}Laughs{\i0} Dialogue: 0,0:00:35.50,0:00:38.00,Default,,0,0,0,,You know I'm so excited my glasses are falling off here.