diff mbox series

[FFmpeg-devel,v3,3/6] avcodec/ccaption_dec: ignore leading non-breaking spaces

Message ID 20240312060005.2111135-4-marth64@proxyid.net
State New
Headers show
Series Closed Captions improvements (phase 1) | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Marth64 March 12, 2024, 6 a.m. UTC
In Closed Captions (US), the non-breaking space (0xA0) can be used to
align text horizontally from the left when used as a leading character.
However, CC decoder does not ignore it as a leading character
like it does an ordinary space, so a blank padding is rendered
on the black CC box. This is not the intended viewing experience.

Ignore the leading non-breaking spaces, thus creating the intended
transparency which aligns the text. Since all characters are
fixed-width in CC, it can be handled the same way as we currently
treat leading ordinary spaces.

Also, as a nit, lowercase the NBSP's hex code in the entry table to match
casing of the other hex codes.

Signed-off-by: Marth64 <marth64@proxyid.net>
---
 libavcodec/ccaption_dec.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Stefano Sabatini March 12, 2024, 1:50 p.m. UTC | #1
On date Tuesday 2024-03-12 01:00:02 -0500, Marth64 wrote:
> In Closed Captions (US), the non-breaking space (0xA0) can be used to
> align text horizontally from the left when used as a leading character.
> However, CC decoder does not ignore it as a leading character
> like it does an ordinary space, so a blank padding is rendered
> on the black CC box. This is not the intended viewing experience.
> 
> Ignore the leading non-breaking spaces, thus creating the intended
> transparency which aligns the text. Since all characters are
> fixed-width in CC, it can be handled the same way as we currently
> treat leading ordinary spaces.
> 
> Also, as a nit, lowercase the NBSP's hex code in the entry table to match
> casing of the other hex codes.
> 
> Signed-off-by: Marth64 <marth64@proxyid.net>
> ---
>  libavcodec/ccaption_dec.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Still LGTM.
Marth64 March 17, 2024, 4:27 a.m. UTC | #2
Hi Stefano,

I would like to withdraw this patch for now.
I have found content where this breaks center
justified text further (which is already not supported).

I will instead try to implement center justification
and incorporate this through that route.

Please ignore it.

Thank you for understanding,
diff mbox series

Patch

diff --git a/libavcodec/ccaption_dec.c b/libavcodec/ccaption_dec.c
index 9d4a93647c..25b0f2e064 100644
--- a/libavcodec/ccaption_dec.c
+++ b/libavcodec/ccaption_dec.c
@@ -91,7 +91,7 @@  enum cc_charset {
         ENTRY(0x36, "\u00a3")                            \
         ENTRY(0x37, "\u266a")                            \
         ENTRY(0x38, "\u00e0")                            \
-        ENTRY(0x39, "\u00A0")                            \
+        ENTRY(0x39, "\u00a0")                            \
         ENTRY(0x3a, "\u00e8")                            \
         ENTRY(0x3b, "\u00e2")                            \
         ENTRY(0x3c, "\u00ea")                            \
@@ -471,7 +471,8 @@  static int capture_screen(CCaptionSubContext *ctx)
             const char *row = screen->characters[i];
             const char *charset = screen->charsets[i];
             j = 0;
-            while (row[j] == ' ' && charset[j] == CCSET_BASIC_AMERICAN)
+            while ((row[j] == ' '  && charset[j] == CCSET_BASIC_AMERICAN) ||
+                   (row[j] == 0x39 && charset[j] == CCSET_SPECIAL_AMERICAN))
                 j++;
             if (!tab || j < tab)
                 tab = j;
@@ -491,7 +492,9 @@  static int capture_screen(CCaptionSubContext *ctx)
             j = 0;
 
             /* skip leading space */
-            while (row[j] == ' ' && charset[j] == CCSET_BASIC_AMERICAN && j < tab)
+            while (j < tab &&
+                   (row[j] == ' '  && charset[j] == CCSET_BASIC_AMERICAN) ||
+                   (row[j] == 0x39 && charset[j] == CCSET_SPECIAL_AMERICAN))
                 j++;
 
             x = ASS_DEFAULT_PLAYRESX * (0.1 + 0.0250 * j);