From patchwork Fri Jan 14 01:13:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aman Karmani X-Patchwork-Id: 33583 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp987262iog; Thu, 13 Jan 2022 17:17:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJxcqLqZ/8WZFWkP0TwO2w10D7DEMB5dn7+KBPdnu7Z5FEA5W3Smg6rQEs0k2b3RdnI/ns39 X-Received: by 2002:a17:906:5048:: with SMTP id e8mr5499109ejk.651.1642123047977; Thu, 13 Jan 2022 17:17:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642123047; cv=none; d=google.com; s=arc-20160816; b=IjVHVXZYJbFA3F62TmOPKVZgYf3ts8U26L/OiIlvhsw0JjA+sMxsBtPcLDI4XQDOdS PiyqeHSsJSngkwtp1pyRDx2+BJatfp5jdU1xE9M/D1zLbwzMfwXqvyNanwwP11dJa+Ep XqapNO8Cm2rWqSUz+R3F8xibPcc0+V1JGroIL5hvHkQCRNt3HbH6eMWJ1dqGUi3r0ScZ 2KFLLLiIQiIuYE2QKHBPtURL6lR7RpZmoSdnxhyD4oZAeX5AUHLyHFOk9nMFOEs3Wa7F ew2yLG20RKb6pl1eXZZew1G0B0WjnPAY4OSCnjx0u6i1/MmLDBRPDoiOZKRaW4PVPRYA FDTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:mime-version:fcc:date:from:references :in-reply-to:message-id:dkim-signature:delivered-to; bh=bZ00FryTwFg3nH1eVfbXAU0nnaP9oE8OIc0cLVeVNvg=; b=Z5vGrIsBGg97TS0QDX2U3jHhQLLTO4n82eAJo+mqkB4H+c8Y6q1BHKai0+HO1EnVDE ckzV4eC+KtVAFiqc7TXCAHpDRX2HIPzERl4heupnaUbRr9IURievaqCBOt8fmhZiaPD9 I7enhIZW0jXca3WTBsP1nMHf/8+ocFG3OFK+ZY4sL4+ac3OPnQiU1daBqDUjDPIpm9cp saSlePVKEBHHbEnag9h5bL5+2in5PgBF5dYPo3/FkLeEsBPSnahjj3Y2vdsSlps1+SsB G6ki+ASY+pjFRLtHCQqIkxeJ6+wa0Y8mILYwf0oBzZRnyPSoL16Vt3BlcNEOUFEfNqef bHoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=XnAFlV6u; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j20si2893753edw.264.2022.01.13.17.17.27; Thu, 13 Jan 2022 17:17:27 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=XnAFlV6u; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8FF8F68B6BF; Fri, 14 Jan 2022 03:14:07 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A7D2F68B678 for ; Fri, 14 Jan 2022 03:14:00 +0200 (EET) Received: by mail-pl1-f175.google.com with SMTP id u11so6764541plh.13 for ; Thu, 13 Jan 2022 17:14:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=6lL1Drvf74udQ6x2F0uYxEkOnLM6KMGfvH0WCeTBYYg=; b=XnAFlV6uYuddcSvFWdaasMwPOrvbxc6dUhnx5eHDcFH1jIzAWoMRjwQlGMJ8t3tINn Xk17qVq286WSGyAKBPPluettDV+zZy2d4ZwSj6AfEBRHYx8DuBy8egIklPsipggD3rFM 2yF8ZgS6S0KK+TP1Tkel1yrKZ68gs2fnrUd+RwYrxO9gOUzT2V0nR5yZXUazDJeeRE0I DQGJGn7IljLeEkWNDd6Ogz0XH5sJBatYom6Xb9TzZFsB8vZr4hm40ADUUc2mvs2RqgAp mcUaf3mLU2DVOWP/E23ZswChjcyeZXzKdUqpigArgNQRcnnM9IBNrb4Qnu4WGR5hpiEV 4N3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=6lL1Drvf74udQ6x2F0uYxEkOnLM6KMGfvH0WCeTBYYg=; b=0ZMTYLoCdq4tN4Lu65Xkb0WnTedYHSnVPwSWdLYByLt5m4JFPorgpSJNFXReLmqS9w SW8TPi7YfsdmFbXriAHsBb9/OKQA6u8M1ihMkc9AQgE0L4XAe5nfL2ZJxpWB4fqDIaKl 3a7SO9E51j2CmWUW3wKdRrFpBx7KnEns4asDFlmUY4/tbypE3GKIk5dxFvNJxmIqL+yC nKvoN1OQmA8Sq5uUUSeaCuM9NqXtnv22lKe4UcvHHLYbmhJIV7UgjzUPV4rq26uXcPsH APhQM13PtS2ASli6K3/OAqr5lW8FDgDkbuuL70uqpqnQzEC+p9sLe4tXjmi2oEgNnuqp kW5Q== X-Gm-Message-State: AOAM530c+JjqlaQMncpwT2imXzPShDGRkfs3/ndji6ir+HnzfzG4Q27w 1P6HS6wriatUeYBt38idy1M7ZceJnC+pbQ== X-Received: by 2002:a17:902:b08b:b0:149:ee23:8907 with SMTP id p11-20020a170902b08b00b00149ee238907mr7550934plr.59.1642122838726; Thu, 13 Jan 2022 17:13:58 -0800 (PST) Received: from [127.0.0.1] (master.gitmailbox.com. [34.83.118.50]) by smtp.gmail.com with ESMTPSA id x1sm3198184pgh.44.2022.01.13.17.13.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jan 2022 17:13:58 -0800 (PST) Message-Id: In-Reply-To: References: From: ffmpegagent Date: Fri, 14 Jan 2022 01:13:31 +0000 Fcc: Sent MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 22/24] avutil/ass_split: Add parsing of hard-space tags (\h) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: softworkz Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: jk1ThAZzsMJB From: softworkz The \h tag in ASS/SSA is indicating a non-breaking space. See https://github.com/Aegisub/aegisite/blob/master/source/docs/3.2/ ASS_Tags.html.md The ass_split implementation is used by almost all text subtitle encoders and it didn't handle this tag. Interestingly, several tests are testing for \h parsing and had incorrect reference data for those tests. The \h tag is specific to ASS and doesn't have any meaning outside of ASS. Still, the reference data for ttmlenc, textenc and webvttenc were full of \h tags even though this tag doesn't have a meaning there. Signed-off-by: softworkz --- libavutil/ass_split.c | 7 +++++++ tests/ref/fate/mov-mp4-ttml-dfxp | 8 ++++---- tests/ref/fate/mov-mp4-ttml-stpp | 8 ++++---- tests/ref/fate/sub-textenc | 10 +++++----- tests/ref/fate/sub-ttmlenc | 8 ++++---- tests/ref/fate/sub-webvttenc | 10 +++++----- 6 files changed, 29 insertions(+), 22 deletions(-) diff --git a/libavutil/ass_split.c b/libavutil/ass_split.c index c5963351fc..30512dfc74 100644 --- a/libavutil/ass_split.c +++ b/libavutil/ass_split.c @@ -484,6 +484,7 @@ int avpriv_ass_split_override_codes(const ASSCodesCallbacks *callbacks, void *pr while (buf && *buf) { if (text && callbacks->text && (sscanf(buf, "\\%1[nN]", new_line) == 1 || + sscanf(buf, "\\%1[hH]", new_line) == 1 || !strncmp(buf, "{\\", 2))) { callbacks->text(priv, text, text_len); text = NULL; @@ -492,6 +493,12 @@ int avpriv_ass_split_override_codes(const ASSCodesCallbacks *callbacks, void *pr if (callbacks->new_line) callbacks->new_line(priv, new_line[0] == 'N'); buf += 2; + } else if (sscanf(buf, "\\%1[hH]", new_line) == 1) { + if (callbacks->hard_space) + callbacks->hard_space(priv); + else if (callbacks->text) + callbacks->text(priv, " ", 1); + buf += 2; } else if (!strncmp(buf, "{\\", 2)) { buf++; while (*buf == '\\') { diff --git a/tests/ref/fate/mov-mp4-ttml-dfxp b/tests/ref/fate/mov-mp4-ttml-dfxp index e24b5d618b..e565ffa1f6 100644 --- a/tests/ref/fate/mov-mp4-ttml-dfxp +++ b/tests/ref/fate/mov-mp4-ttml-dfxp @@ -1,9 +1,9 @@ -2e7e01c821c111466e7a2844826b7f6d *tests/data/fate/mov-mp4-ttml-dfxp.mp4 -8519 tests/data/fate/mov-mp4-ttml-dfxp.mp4 +658884e1b789e75c454b25bdf71283c9 *tests/data/fate/mov-mp4-ttml-dfxp.mp4 +8486 tests/data/fate/mov-mp4-ttml-dfxp.mp4 #tb 0: 1/1000 #media_type 0: data #codec_id 0: none -0, 0, 0, 68500, 7866, 0x456c36b7 +0, 0, 0, 68500, 7833, 0x31b22193 { "packets": [ { @@ -15,7 +15,7 @@ "dts_time": "0.000000", "duration": 68500, "duration_time": "68.500000", - "size": "7866", + "size": "7833", "pos": "44", "flags": "K_" } diff --git a/tests/ref/fate/mov-mp4-ttml-stpp b/tests/ref/fate/mov-mp4-ttml-stpp index 77bd23b7bf..f25b5b2d28 100644 --- a/tests/ref/fate/mov-mp4-ttml-stpp +++ b/tests/ref/fate/mov-mp4-ttml-stpp @@ -1,9 +1,9 @@ -cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4 -8547 tests/data/fate/mov-mp4-ttml-stpp.mp4 +c9570de0ccebc858b0c662a7e449582c *tests/data/fate/mov-mp4-ttml-stpp.mp4 +8514 tests/data/fate/mov-mp4-ttml-stpp.mp4 #tb 0: 1/1000 #media_type 0: data #codec_id 0: none -0, 0, 0, 68500, 7866, 0x456c36b7 +0, 0, 0, 68500, 7833, 0x31b22193 { "packets": [ { @@ -15,7 +15,7 @@ cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4 "dts_time": "0.000000", "duration": 68500, "duration_time": "68.500000", - "size": "7866", + "size": "7833", "pos": "44", "flags": "K_" } diff --git a/tests/ref/fate/sub-textenc b/tests/ref/fate/sub-textenc index 3ea56b38f0..910ca3d6e3 100644 --- a/tests/ref/fate/sub-textenc +++ b/tests/ref/fate/sub-textenc @@ -160,18 +160,18 @@ but show this: {normal text} \ N is a forced line break \ h is a hard space Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed. -The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D +The line will never break automatically right before or after a hard space. :-D 31 00:00:54,501 --> 00:00:56,500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) 32 00:00:56,501 --> 00:00:58,500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) Show this: \TEST and this: \-) @@ -179,10 +179,10 @@ Show this: \TEST and this: \-) 33 00:00:58,501 --> 00:01:00,500 -A letter followed by 05 hard spaces: A\h\h\h\h\h +A letter followed by 05 hard spaces: A A letter followed by normal spaces: A A letter followed by no hard spaces: A -05 hard spaces between letters: A\h\h\h\h\hA +05 hard spaces between letters: A A 5 normal spaces between letters: A A ^--Forced line break diff --git a/tests/ref/fate/sub-ttmlenc b/tests/ref/fate/sub-ttmlenc index 4df8f8796f..aea09bb31e 100644 --- a/tests/ref/fate/sub-ttmlenc +++ b/tests/ref/fate/sub-ttmlenc @@ -109,16 +109,16 @@ end="00:00:54.500">Hide these tags:
also hide these tags:
but show this: {normal text}


\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D

+ end="00:01:00.500">
\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The line will never break automatically right before or after a hard space. :-D


\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)

+ end="00:00:56.500">
A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)

\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)

+ end="00:00:58.500"> A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)


A letter followed by 05 hard spaces: A\h\h\h\h\h
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A\h\h\h\h\hA
5 normal spaces between letters: A A

^--Forced line break

+ end="00:01:00.500">
A letter followed by 05 hard spaces: A
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A A
5 normal spaces between letters: A A

^--Forced line break

Both line should be strikethrough,
yes.
Correctly closed tags
should be hidden.

diff --git a/tests/ref/fate/sub-webvttenc b/tests/ref/fate/sub-webvttenc index 45ae0b6131..f4172dcc84 100644 --- a/tests/ref/fate/sub-webvttenc +++ b/tests/ref/fate/sub-webvttenc @@ -132,26 +132,26 @@ but show this: {normal text} \ N is a forced line break \ h is a hard space Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed. -The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D +The line will never break automatically right before or after a hard space. :-D 00:54.501 --> 00:56.500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) 00:56.501 --> 00:58.500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) Show this: \TEST and this: \-) 00:58.501 --> 01:00.500 -A letter followed by 05 hard spaces: A\h\h\h\h\h +A letter followed by 05 hard spaces: A A letter followed by normal spaces: A A letter followed by no hard spaces: A -05 hard spaces between letters: A\h\h\h\h\hA +05 hard spaces between letters: A A 5 normal spaces between letters: A A ^--Forced line break