From patchwork Thu Jan 20 02:48:32 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Aman Karmani
X-Patchwork-Id: 33692
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp5410674iog;
Wed, 19 Jan 2022 18:52:49 -0800 (PST)
X-Google-Smtp-Source:
ABdhPJxFz2hba4p0bdPKQTblwKD/DIIbku79fvcnbVXHlBUhm0IbWd9+3/HJNilqViBvwLcoeAwV
X-Received: by 2002:a05:6402:397:: with SMTP id
o23mr34323219edv.194.1642647169453;
Wed, 19 Jan 2022 18:52:49 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1642647169; cv=none;
d=google.com; s=arc-20160816;
b=yyCPpGT8yk6PQJZZv+q2fAhIHtXbg7hpC/uHPbxsl+SuvUviOjsdH+df6gcH0ReW6H
fk2mwQMI1KBjvsnswHX30DKHY795rIq3ZWM7NkoVFb5NHIM250y1b/vvdyt1gn20yu3n
cZnkpcFt5RAwNn8S+wDbqIr+kZuvgJtpfWkf8CZxtpSjwQ/e1egMCmSFa7u9RzCyIGLh
0GOko2p9/ZumjJKYFftZcbmgOWdQL3KWZTvEjKBEKNIAvCceRhrZOOazoVC8wH+cqa7P
1A1Z3yxqEIRA/aRoIcgSFC65KgetCHzrwpl6fUYGXLDLg5iPaBO7wzc9xPIKvhIMLTE9
88Yw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
s=arc-20160816;
h=sender:errors-to:content-transfer-encoding:cc:reply-to
:list-subscribe:list-help:list-post:list-archive:list-unsubscribe
:list-id:precedence:subject:to:mime-version:fcc:date:from:references
:in-reply-to:message-id:dkim-signature:delivered-to;
bh=rkP/Qrum9T3chQjUroGznsGKwaooTUABfMIJ57FZQE0=;
b=EfG0QLFWJqebBIpYbc9FL0QXMg8H7uOVNo1cSdZAT9K0VUkxkqqkVNgf2UrvC5eVfE
oUw+/58IiNX6NFc/7DgaZawRTVcyURANFThqDVvsXWpJtM8F5TMFxq2pVmUmTZZ5nyZ8
mzwjzpSelFvnNSRCnBjtMI0mhGRu+fKEeaspXhULmOF3AhfKfOcl5xA9mQ3AtzcM69vF
2/INizbW3/AbTmKKx8DStah+tJx9TMQok7lU7bvF+a02e1nVv3ZcyNGx3QTBzcPxoN6k
W/jN3QBP81VkIsfxAjxKimXJ5QTP8K0nfm23rZKZ1G/kBWe2IGq6JihRpXq3xDTfJOPu
hKTQ==
ARC-Authentication-Results: i=1; mx.google.com;
dkim=neutral (body hash did not verify) header.i=@gmail.com
header.s=20210112 header.b=dkzH5bsh;
spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
designates 79.124.17.100 as permitted sender)
smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Return-Path:
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
by mx.google.com with ESMTP id
dp19si1249148ejc.482.2022.01.19.18.52.49;
Wed, 19 Jan 2022 18:52:49 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
dkim=neutral (body hash did not verify) header.i=@gmail.com
header.s=20210112 header.b=dkzH5bsh;
spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
designates 79.124.17.100 as permitted sender)
smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 807F068B278;
Thu, 20 Jan 2022 04:49:12 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com
[209.85.214.169])
by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E48AB68B23B
for ; Thu, 20 Jan 2022 04:49:02 +0200 (EET)
Received: by mail-pl1-f169.google.com with SMTP id e8so3989781plh.8
for ; Wed, 19 Jan 2022 18:49:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=message-id:in-reply-to:references:from:date:subject:fcc
:content-transfer-encoding:mime-version:to:cc;
bh=48JI/fyGWe3oVwgLzjA80sLhmJf2dB34oToKDrQ9NXU=;
b=dkzH5bshqiu7htPQJNVH5hmERO24EDqLriAy/Zz+9kufALc4JcpiKCQJXYXqo+kxn/
Yfoo0/XEc73MsZsTnPnrbv8P6jiOwM7AkyJHbLcynsH0aINWyA+LZbzaxBh7YYAhJyDs
mupI8lPnfj8UIvk6ZXaQfzYHv+DOxj1KHtB4FTKb92xRefhzZ6vf4S8k6xntXfoKepsm
N6uFyVabhUoCw9q1R8xJFP91kVaag2YaKgxupXGg8+a7ZFUa9pjtqflyUXyLfOpeCc3L
vKnOWlVx51v7DJzcMsgwDspYKabeFdcFoV02K4DIvGcS2GdQGpzdBIFkmJPjxd0/9TGF
M2fA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:message-id:in-reply-to:references:from:date
:subject:fcc:content-transfer-encoding:mime-version:to:cc;
bh=48JI/fyGWe3oVwgLzjA80sLhmJf2dB34oToKDrQ9NXU=;
b=u6lIm9SFZXoOelqgWOT4m20e0S4xw2DIf2tBFPyIghLNimbKIDHJ60LUwANW+VvnqB
ALfd/lOc92yphEiVILBFmeL3I+hf9xUHY90FPzSa07pdsc5EnWjDAlvFgt3KSMfPNtVF
qpwVn5n4az6bOHA0G0a0tHkZBV7mdH/savafUajQTmzMKQWy+jdTvPJ0pRuibYFHgH3e
5zYU+U5C0k3dgQWdOritkajdUa36KyFQ6gtwtv6UqitjOqQmIi60Dluk6WHUm8cVPl/7
fvrZEdNi2OxtxnrWGnEDSBFOBZ1/GiKZz1+8s5Iy2/LKVYWI749v2ELnY9nCQJbj4/B2
rVqg==
X-Gm-Message-State: AOAM532VdFk4VrcAg3dAph2LeUriFdzYKrLPITXe+rw7g2Lzk/crRYzK
Vvu/0PS100aDj/Jle+smcViwo16IYJ8=
X-Received: by 2002:a17:903:41c1:b0:14a:6879:9333 with SMTP id
u1-20020a17090341c100b0014a68799333mr36627460ple.36.1642646940814;
Wed, 19 Jan 2022 18:49:00 -0800 (PST)
Received: from [127.0.0.1] (master.gitmailbox.com. [34.83.118.50])
by smtp.gmail.com with ESMTPSA id y8sm812641pgs.31.2022.01.19.18.49.00
(version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
Wed, 19 Jan 2022 18:49:00 -0800 (PST)
Message-Id:
<1a0c6e01f37c9d57f0e34743927af3f2a88df1ff.1642646916.git.ffmpegagent@gmail.com>
In-Reply-To:
References:
From: ffmpegagent
Date: Thu, 20 Jan 2022 02:48:32 +0000
Fcc: Sent
MIME-Version: 1.0
To: ffmpeg-devel@ffmpeg.org
Subject: [FFmpeg-devel] [PATCH v2 22/26] avutil/ass_split: Add parsing of
hard-space tags (\h)
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Reply-To: FFmpeg development discussions and patches
Cc: Michael Niedermayer ,
softworkz ,
Andreas Rheinhardt
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel"
X-TUID: rPIDLHPAjJ4q
From: softworkz
The \h tag in ASS/SSA is indicating a non-breaking space. See
https://github.com/Aegisub/aegisite/blob/master/source/docs/3.2/
ASS_Tags.html.md
The ass_split implementation is used by almost all text subtitle
encoders and it didn't handle this tag. Interestingly, several
tests are testing for \h parsing and had incorrect reference data
for those tests.
The \h tag is specific to ASS and doesn't have any meaning outside
of ASS.
Still, the reference data for ttmlenc, textenc and webvttenc were
full of \h tags even though this tag doesn't have a meaning there.
Signed-off-by: softworkz
---
libavutil/ass_split.c | 7 +++++++
tests/ref/fate/.gitattributes | 3 +++
tests/ref/fate/mov-mp4-ttml-dfxp | 8 ++++----
tests/ref/fate/mov-mp4-ttml-stpp | 8 ++++----
tests/ref/fate/sub-textenc | 10 +++++-----
tests/ref/fate/sub-ttmlenc | 8 ++++----
tests/ref/fate/sub-webvttenc | 10 +++++-----
7 files changed, 32 insertions(+), 22 deletions(-)
create mode 100644 tests/ref/fate/.gitattributes
diff --git a/libavutil/ass_split.c b/libavutil/ass_split.c
index c5963351fc..30512dfc74 100644
--- a/libavutil/ass_split.c
+++ b/libavutil/ass_split.c
@@ -484,6 +484,7 @@ int avpriv_ass_split_override_codes(const ASSCodesCallbacks *callbacks, void *pr
while (buf && *buf) {
if (text && callbacks->text &&
(sscanf(buf, "\\%1[nN]", new_line) == 1 ||
+ sscanf(buf, "\\%1[hH]", new_line) == 1 ||
!strncmp(buf, "{\\", 2))) {
callbacks->text(priv, text, text_len);
text = NULL;
@@ -492,6 +493,12 @@ int avpriv_ass_split_override_codes(const ASSCodesCallbacks *callbacks, void *pr
if (callbacks->new_line)
callbacks->new_line(priv, new_line[0] == 'N');
buf += 2;
+ } else if (sscanf(buf, "\\%1[hH]", new_line) == 1) {
+ if (callbacks->hard_space)
+ callbacks->hard_space(priv);
+ else if (callbacks->text)
+ callbacks->text(priv, " ", 1);
+ buf += 2;
} else if (!strncmp(buf, "{\\", 2)) {
buf++;
while (*buf == '\\') {
diff --git a/tests/ref/fate/.gitattributes b/tests/ref/fate/.gitattributes
new file mode 100644
index 0000000000..19be64d085
--- /dev/null
+++ b/tests/ref/fate/.gitattributes
@@ -0,0 +1,3 @@
+sub-textenc -diff
+sub-ttmlenc -diff
+sub-webvttenc -diff
diff --git a/tests/ref/fate/mov-mp4-ttml-dfxp b/tests/ref/fate/mov-mp4-ttml-dfxp
index e24b5d618b..e565ffa1f6 100644
--- a/tests/ref/fate/mov-mp4-ttml-dfxp
+++ b/tests/ref/fate/mov-mp4-ttml-dfxp
@@ -1,9 +1,9 @@
-2e7e01c821c111466e7a2844826b7f6d *tests/data/fate/mov-mp4-ttml-dfxp.mp4
-8519 tests/data/fate/mov-mp4-ttml-dfxp.mp4
+658884e1b789e75c454b25bdf71283c9 *tests/data/fate/mov-mp4-ttml-dfxp.mp4
+8486 tests/data/fate/mov-mp4-ttml-dfxp.mp4
#tb 0: 1/1000
#media_type 0: data
#codec_id 0: none
-0, 0, 0, 68500, 7866, 0x456c36b7
+0, 0, 0, 68500, 7833, 0x31b22193
{
"packets": [
{
@@ -15,7 +15,7 @@
"dts_time": "0.000000",
"duration": 68500,
"duration_time": "68.500000",
- "size": "7866",
+ "size": "7833",
"pos": "44",
"flags": "K_"
}
diff --git a/tests/ref/fate/mov-mp4-ttml-stpp b/tests/ref/fate/mov-mp4-ttml-stpp
index 77bd23b7bf..f25b5b2d28 100644
--- a/tests/ref/fate/mov-mp4-ttml-stpp
+++ b/tests/ref/fate/mov-mp4-ttml-stpp
@@ -1,9 +1,9 @@
-cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4
-8547 tests/data/fate/mov-mp4-ttml-stpp.mp4
+c9570de0ccebc858b0c662a7e449582c *tests/data/fate/mov-mp4-ttml-stpp.mp4
+8514 tests/data/fate/mov-mp4-ttml-stpp.mp4
#tb 0: 1/1000
#media_type 0: data
#codec_id 0: none
-0, 0, 0, 68500, 7866, 0x456c36b7
+0, 0, 0, 68500, 7833, 0x31b22193
{
"packets": [
{
@@ -15,7 +15,7 @@ cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4
"dts_time": "0.000000",
"duration": 68500,
"duration_time": "68.500000",
- "size": "7866",
+ "size": "7833",
"pos": "44",
"flags": "K_"
}
diff --git a/tests/ref/fate/sub-textenc b/tests/ref/fate/sub-textenc
index 3ea56b38f0..910ca3d6e3 100644
--- a/tests/ref/fate/sub-textenc
+++ b/tests/ref/fate/sub-textenc
@@ -160,18 +160,18 @@ but show this: {normal text}
\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
-The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D
+The line will never break automatically right before or after a hard space. :-D
31
00:00:54,501 --> 00:00:56,500
-\h\h\h\h\hA (05 hard spaces followed by a letter)
+ A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
32
00:00:56,501 --> 00:00:58,500
-\h\h\h\h\hA (05 hard spaces followed by a letter)
+ A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)
@@ -179,10 +179,10 @@ Show this: \TEST and this: \-)
33
00:00:58,501 --> 00:01:00,500
-A letter followed by 05 hard spaces: A\h\h\h\h\h
+A letter followed by 05 hard spaces: A
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
-05 hard spaces between letters: A\h\h\h\h\hA
+05 hard spaces between letters: A A
5 normal spaces between letters: A A
^--Forced line break
diff --git a/tests/ref/fate/sub-ttmlenc b/tests/ref/fate/sub-ttmlenc
index 4df8f8796f..aea09bb31e 100644
--- a/tests/ref/fate/sub-ttmlenc
+++ b/tests/ref/fate/sub-ttmlenc
@@ -109,16 +109,16 @@
end="00:00:54.500">Hide these tags:
also hide these tags:
but show this: {normal text}
\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D
+ end="00:01:00.500">
\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The line will never break automatically right before or after a hard space. :-D
\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
+ end="00:00:56.500">
A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)
+ end="00:00:58.500"> A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)
A letter followed by 05 hard spaces: A\h\h\h\h\h
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A\h\h\h\h\hA
5 normal spaces between letters: A A
^--Forced line break
+ end="00:01:00.500">
A letter followed by 05 hard spaces: A
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A A
5 normal spaces between letters: A A
^--Forced line break
Both line should be strikethrough,
yes.
Correctly closed tags
should be hidden.
diff --git a/tests/ref/fate/sub-webvttenc b/tests/ref/fate/sub-webvttenc
index 45ae0b6131..f4172dcc84 100644
--- a/tests/ref/fate/sub-webvttenc
+++ b/tests/ref/fate/sub-webvttenc
@@ -132,26 +132,26 @@ but show this: {normal text}
\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
-The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D
+The line will never break automatically right before or after a hard space. :-D
00:54.501 --> 00:56.500
-\h\h\h\h\hA (05 hard spaces followed by a letter)
+ A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
00:56.501 --> 00:58.500
-\h\h\h\h\hA (05 hard spaces followed by a letter)
+ A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)
00:58.501 --> 01:00.500
-A letter followed by 05 hard spaces: A\h\h\h\h\h
+A letter followed by 05 hard spaces: A
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
-05 hard spaces between letters: A\h\h\h\h\hA
+05 hard spaces between letters: A A
5 normal spaces between letters: A A
^--Forced line break