From patchwork Thu Jan 20 03:25:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aman Karmani X-Patchwork-Id: 33728 Delivered-To: andriy.gelman@gmail.com Received: by 2002:a0c:e44f:0:0:0:0:0 with SMTP id d15csp658276qvm; Wed, 19 Jan 2022 19:29:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJw8CVlb/s3wQFa1Jfsy6snoAGBJQ8t3Rri8zKRvkBS0GNig+Gmh8ow9Iq5afjeY8hWjSsQz X-Received: by 2002:a17:906:4fd6:: with SMTP id i22mr26478317ejw.70.1642649369714; Wed, 19 Jan 2022 19:29:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642649369; cv=none; d=google.com; s=arc-20160816; b=tm7Fs0JLOZygKWUHtUB61yhEL/mjM3w2nOX8nFwtvH+3vI75oy3+LVJh54D7bh9KPP Opa8WA7NM/kAJxUKBKf9ttyP7OVC2/2qPpchgu5PdcIAVjL51nLlPA05ND7soL9mpCp3 Hh8ClcGJzb+gxBfB0ipC6wIu4L6iqtnnNyvo/P6ffLSX0WIW86pNC5cLQMxUasrJdsUr kMG0gKAJeySXMFchEwmk7NKhh9lWkuAk8Wm6FrIm8XFi6zECfGFTY6fcMzfF94Bvo06B iqjLXB3ZKvV0We9kCF7anzBsBgcdUPIwZhx/rj9LAgVntcYwXfe+EQU3dpDp7Ihn+Xiv fEIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:mime-version:fcc:date:from:references :in-reply-to:message-id:dkim-signature:delivered-to; bh=rkP/Qrum9T3chQjUroGznsGKwaooTUABfMIJ57FZQE0=; b=H09la/RPK2IuiYsg+mhEDBLLTb5Zsj49tRx4SljAWQYG3dnvdMZ022bzWPAivFg9d5 9OwXzuZ4q6B5cu5i3yEfdLZq/m2lye1G4Ff6CEquGRxhNMbgfj98AeUqjZmAh1aS/dO9 bY1F5Y+BGS02+ud5uhIOYl9NARPE0IhsIemkcJPDP8Y4kpybj/bvmWT2vhkMr/9R3JHB mCotnss1jjQP9lkbdXa6QLpkfnWz7Kl3XlFDuER8pZtw+kD8fZBxNKcUyPcnRxIwL8sg zTzwmDLR1QgmyrqT3d1i7UOz1v4w+Wjilxc7DCVWiYG7QW7bGNPop8ROpBLchfbRF0JZ LZnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=FnpDqS8j; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id cw6si1141588ejc.915.2022.01.19.19.29.29; Wed, 19 Jan 2022 19:29:29 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=FnpDqS8j; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 49DAC68B318; Thu, 20 Jan 2022 05:26:08 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D3E7968B2E7 for ; Thu, 20 Jan 2022 05:26:00 +0200 (EET) Received: by mail-pf1-f176.google.com with SMTP id x16so656474pfu.13 for ; Wed, 19 Jan 2022 19:26:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=48JI/fyGWe3oVwgLzjA80sLhmJf2dB34oToKDrQ9NXU=; b=FnpDqS8jRVEnAn6Zz34PtdO4RR+53FjJFredU9pdvWqP6W5i4ju7RMoDunXm22rfs5 +RMGa1AMD8TbTeOeWiiBDPHikQWIb+2SrveEcVd+TU5QPETcDmiA2nQhgRfvNEfJXyif 0JPaO6SgoqXo98BcIkeOhbgOneMkOLrm3jUjI3aWPLpHVfHgMWwvfp3KVAW1oX2oAyt2 SmP5WwDSvbT2C73mTpFrdyi9LXQi6ugPkGgczIrzo5Q5AoDRquQcpPdAV8yyNmCQqIjn 13+Zxu4RbY28IO9eh1EkTCzND2KcCE4sf6MtykzLd0V16Q4TCjs6gqTRysP3KeGrKhor x/Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=48JI/fyGWe3oVwgLzjA80sLhmJf2dB34oToKDrQ9NXU=; b=S0s5D12BL+CZJykJDDo/x9XtkqGtHbs/I7uqTAB6qeoDS1E+cKvNEBDijlXBlPwnXo id+AoMGi7fXcEkuHm8I5xkRt0AGM2AgetFZFhaGmHDYYBvrDIccyrb6onz8QQEALwZWp KsjDiPuHUm66M16K+WP99fE9tklUcED4Okjvu4ieyDz1tgB/uoI+r8smQZFU8vgs/mxV GVppc3HWsuIs8UVMCdHcHojdBM5upKibWinKddcjRcx2MI2J1rLLpIit9LUtVzgRUhry 1PvPFKr4dVPeiFk+/wPWi+uDwavmTiASWtk9/gJ1EXtw5eQuTCFBvYsHo9cbTbtcvY97 IkCg== X-Gm-Message-State: AOAM533y/GPObfi2vcnr+NC2s0XUdZOfISkxeFtVMW2Mxjr1lz9doCqd MyXFbXFZkg9e1DoJGfWmdyO5101T59M= X-Received: by 2002:a63:8143:: with SMTP id t64mr495361pgd.288.1642649158772; Wed, 19 Jan 2022 19:25:58 -0800 (PST) Received: from [127.0.0.1] (master.gitmailbox.com. [34.83.118.50]) by smtp.gmail.com with ESMTPSA id c17sm1032337pfc.171.2022.01.19.19.25.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Jan 2022 19:25:58 -0800 (PST) Message-Id: In-Reply-To: References: From: ffmpegagent Date: Thu, 20 Jan 2022 03:25:30 +0000 Fcc: Sent MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH v3 22/26] avutil/ass_split: Add parsing of hard-space tags (\h) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Michael Niedermayer , softworkz , Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Mcyz4sTo9tb1 Content-Length: 10254 From: softworkz The \h tag in ASS/SSA is indicating a non-breaking space. See https://github.com/Aegisub/aegisite/blob/master/source/docs/3.2/ ASS_Tags.html.md The ass_split implementation is used by almost all text subtitle encoders and it didn't handle this tag. Interestingly, several tests are testing for \h parsing and had incorrect reference data for those tests. The \h tag is specific to ASS and doesn't have any meaning outside of ASS. Still, the reference data for ttmlenc, textenc and webvttenc were full of \h tags even though this tag doesn't have a meaning there. Signed-off-by: softworkz --- libavutil/ass_split.c | 7 +++++++ tests/ref/fate/.gitattributes | 3 +++ tests/ref/fate/mov-mp4-ttml-dfxp | 8 ++++---- tests/ref/fate/mov-mp4-ttml-stpp | 8 ++++---- tests/ref/fate/sub-textenc | 10 +++++----- tests/ref/fate/sub-ttmlenc | 8 ++++---- tests/ref/fate/sub-webvttenc | 10 +++++----- 7 files changed, 32 insertions(+), 22 deletions(-) create mode 100644 tests/ref/fate/.gitattributes diff --git a/libavutil/ass_split.c b/libavutil/ass_split.c index c5963351fc..30512dfc74 100644 --- a/libavutil/ass_split.c +++ b/libavutil/ass_split.c @@ -484,6 +484,7 @@ int avpriv_ass_split_override_codes(const ASSCodesCallbacks *callbacks, void *pr while (buf && *buf) { if (text && callbacks->text && (sscanf(buf, "\\%1[nN]", new_line) == 1 || + sscanf(buf, "\\%1[hH]", new_line) == 1 || !strncmp(buf, "{\\", 2))) { callbacks->text(priv, text, text_len); text = NULL; @@ -492,6 +493,12 @@ int avpriv_ass_split_override_codes(const ASSCodesCallbacks *callbacks, void *pr if (callbacks->new_line) callbacks->new_line(priv, new_line[0] == 'N'); buf += 2; + } else if (sscanf(buf, "\\%1[hH]", new_line) == 1) { + if (callbacks->hard_space) + callbacks->hard_space(priv); + else if (callbacks->text) + callbacks->text(priv, " ", 1); + buf += 2; } else if (!strncmp(buf, "{\\", 2)) { buf++; while (*buf == '\\') { diff --git a/tests/ref/fate/.gitattributes b/tests/ref/fate/.gitattributes new file mode 100644 index 0000000000..19be64d085 --- /dev/null +++ b/tests/ref/fate/.gitattributes @@ -0,0 +1,3 @@ +sub-textenc -diff +sub-ttmlenc -diff +sub-webvttenc -diff diff --git a/tests/ref/fate/mov-mp4-ttml-dfxp b/tests/ref/fate/mov-mp4-ttml-dfxp index e24b5d618b..e565ffa1f6 100644 --- a/tests/ref/fate/mov-mp4-ttml-dfxp +++ b/tests/ref/fate/mov-mp4-ttml-dfxp @@ -1,9 +1,9 @@ -2e7e01c821c111466e7a2844826b7f6d *tests/data/fate/mov-mp4-ttml-dfxp.mp4 -8519 tests/data/fate/mov-mp4-ttml-dfxp.mp4 +658884e1b789e75c454b25bdf71283c9 *tests/data/fate/mov-mp4-ttml-dfxp.mp4 +8486 tests/data/fate/mov-mp4-ttml-dfxp.mp4 #tb 0: 1/1000 #media_type 0: data #codec_id 0: none -0, 0, 0, 68500, 7866, 0x456c36b7 +0, 0, 0, 68500, 7833, 0x31b22193 { "packets": [ { @@ -15,7 +15,7 @@ "dts_time": "0.000000", "duration": 68500, "duration_time": "68.500000", - "size": "7866", + "size": "7833", "pos": "44", "flags": "K_" } diff --git a/tests/ref/fate/mov-mp4-ttml-stpp b/tests/ref/fate/mov-mp4-ttml-stpp index 77bd23b7bf..f25b5b2d28 100644 --- a/tests/ref/fate/mov-mp4-ttml-stpp +++ b/tests/ref/fate/mov-mp4-ttml-stpp @@ -1,9 +1,9 @@ -cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4 -8547 tests/data/fate/mov-mp4-ttml-stpp.mp4 +c9570de0ccebc858b0c662a7e449582c *tests/data/fate/mov-mp4-ttml-stpp.mp4 +8514 tests/data/fate/mov-mp4-ttml-stpp.mp4 #tb 0: 1/1000 #media_type 0: data #codec_id 0: none -0, 0, 0, 68500, 7866, 0x456c36b7 +0, 0, 0, 68500, 7833, 0x31b22193 { "packets": [ { @@ -15,7 +15,7 @@ cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4 "dts_time": "0.000000", "duration": 68500, "duration_time": "68.500000", - "size": "7866", + "size": "7833", "pos": "44", "flags": "K_" } diff --git a/tests/ref/fate/sub-textenc b/tests/ref/fate/sub-textenc index 3ea56b38f0..910ca3d6e3 100644 --- a/tests/ref/fate/sub-textenc +++ b/tests/ref/fate/sub-textenc @@ -160,18 +160,18 @@ but show this: {normal text} \ N is a forced line break \ h is a hard space Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed. -The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D +The line will never break automatically right before or after a hard space. :-D 31 00:00:54,501 --> 00:00:56,500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) 32 00:00:56,501 --> 00:00:58,500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) Show this: \TEST and this: \-) @@ -179,10 +179,10 @@ Show this: \TEST and this: \-) 33 00:00:58,501 --> 00:01:00,500 -A letter followed by 05 hard spaces: A\h\h\h\h\h +A letter followed by 05 hard spaces: A A letter followed by normal spaces: A A letter followed by no hard spaces: A -05 hard spaces between letters: A\h\h\h\h\hA +05 hard spaces between letters: A A 5 normal spaces between letters: A A ^--Forced line break diff --git a/tests/ref/fate/sub-ttmlenc b/tests/ref/fate/sub-ttmlenc index 4df8f8796f..aea09bb31e 100644 --- a/tests/ref/fate/sub-ttmlenc +++ b/tests/ref/fate/sub-ttmlenc @@ -109,16 +109,16 @@ end="00:00:54.500">Hide these tags:
also hide these tags:
but show this: {normal text}


\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D

+ end="00:01:00.500">
\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The line will never break automatically right before or after a hard space. :-D


\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)

+ end="00:00:56.500">
A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)

\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)

+ end="00:00:58.500"> A (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)


A letter followed by 05 hard spaces: A\h\h\h\h\h
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A\h\h\h\h\hA
5 normal spaces between letters: A A

^--Forced line break

+ end="00:01:00.500">
A letter followed by 05 hard spaces: A
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A A
5 normal spaces between letters: A A

^--Forced line break

Both line should be strikethrough,
yes.
Correctly closed tags
should be hidden.

diff --git a/tests/ref/fate/sub-webvttenc b/tests/ref/fate/sub-webvttenc index 45ae0b6131..f4172dcc84 100644 --- a/tests/ref/fate/sub-webvttenc +++ b/tests/ref/fate/sub-webvttenc @@ -132,26 +132,26 @@ but show this: {normal text} \ N is a forced line break \ h is a hard space Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed. -The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D +The line will never break automatically right before or after a hard space. :-D 00:54.501 --> 00:56.500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) 00:56.501 --> 00:58.500 -\h\h\h\h\hA (05 hard spaces followed by a letter) + A (05 hard spaces followed by a letter) A (Normal spaces followed by a letter) A (No hard spaces followed by a letter) Show this: \TEST and this: \-) 00:58.501 --> 01:00.500 -A letter followed by 05 hard spaces: A\h\h\h\h\h +A letter followed by 05 hard spaces: A A letter followed by normal spaces: A A letter followed by no hard spaces: A -05 hard spaces between letters: A\h\h\h\h\hA +05 hard spaces between letters: A A 5 normal spaces between letters: A A ^--Forced line break