diff mbox

[FFmpeg-devel] lavf/mpegts: Convert service name and service provider to utf-8

Message ID CAB0OVGqyDdLjmBNJ3f=Kfpjw0GR3zWxkwh5-L61_Yx_wA2sz4Q@mail.gmail.com
State Superseded
Headers show

Commit Message

Carl Eugen Hoyos Feb. 8, 2019, 12:38 a.m. UTC
Hi!

Attached patch fixes ticket #6320, tested with the sample from ticket #7069.

Please comment, Carl Eugen

Comments

Jan Ekström Feb. 8, 2019, 12:42 a.m. UTC | #1
On Fri, Feb 8, 2019 at 2:38 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> Hi!
>
> Attached patch fixes ticket #6320, tested with the sample from ticket #7069.
>
> Please comment, Carl Eugen

Nice!

I will try to test this tomorrow. To tell you the truth, I am quite
jealous that the text encodings you require are already in upstream
libiconv.

Jan
Carl Eugen Hoyos Feb. 8, 2019, 12:44 a.m. UTC | #2
2019-02-08 1:42 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> On Fri, Feb 8, 2019 at 2:38 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>
>> Hi!
>>
>> Attached patch fixes ticket #6320, tested with the sample from ticket
>> #7069.
>>
>> Please comment, Carl Eugen
>
> Nice!
>
> I will try to test this tomorrow. To tell you the truth, I am quite
> jealous that the text encodings you require are already in upstream
> libiconv.

The sample you provided (for which it would have been
extra-nice to tell us where it came from, I found it meanwhile)
does not allow converting with iconv, or at least I couldn't
guess the encoding...

Carl Eugen
Jan Ekström Feb. 8, 2019, 1:07 a.m. UTC | #3
On Fri, Feb 8, 2019 at 2:44 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2019-02-08 1:42 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> > On Fri, Feb 8, 2019 at 2:38 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> Attached patch fixes ticket #6320, tested with the sample from ticket
> >> #7069.
> >>
> >> Please comment, Carl Eugen
> >
> > Nice!
> >
> > I will try to test this tomorrow. To tell you the truth, I am quite
> > jealous that the text encodings you require are already in upstream
> > libiconv.
>
> The sample you provided (for which it would have been
> extra-nice to tell us where it came from, I found it meanwhile)
> does not allow converting with iconv, or at least I couldn't
> guess the encoding...
>
> Carl Eugen

The samples came from MBS (ARIB captions one) and Tokyo MX (the PID
switch one), if that's what you mean. The channels themselves
shouldn't matter too much as all broadcasts in Japan generally follow
the ARIB specs/recommendations as far as how they're multiplexed.
They're part of various samples I've been capturing in Japan along the
years.

The text coding is ARIB STD-24 and together with the EN300-468 Table00
one I linked both were nack'd at upstream due to not being encodings
utilized in general computing, so the iconv modules now live in the
linuxtv repository*. Which means that they most likely are not
packaged anywhere.
* https://git.linuxtv.org/v4l-utils.git/tree/contrib/gconv

Libaribb24 is another implementation of the ARIB STD-B24 text coding
(in addition to the subtitle part), and I've been so far utilizing it
in my WIP branch.

Jan
Carl Eugen Hoyos Feb. 8, 2019, 1:19 a.m. UTC | #4
2019-02-08 2:07 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> On Fri, Feb 8, 2019 at 2:44 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>
>> 2019-02-08 1:42 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
>> > On Fri, Feb 8, 2019 at 2:38 AM Carl Eugen Hoyos <ceffmpeg@gmail.com>
>> > wrote:
>> >>
>> >> Hi!
>> >>
>> >> Attached patch fixes ticket #6320, tested with the sample from ticket
>> >> #7069.
>> >>
>> >> Please comment, Carl Eugen
>> >
>> > Nice!
>> >
>> > I will try to test this tomorrow. To tell you the truth, I am quite
>> > jealous that the text encodings you require are already in upstream
>> > libiconv.
>>
>> The sample you provided (for which it would have been
>> extra-nice to tell us where it came from, I found it meanwhile)
>> does not allow converting with iconv, or at least I couldn't
>> guess the encoding...
>
> The samples came from MBS (ARIB captions one) and Tokyo MX (the PID
> switch one), if that's what you mean.

For our context, the sample comes from ticket #2261.
But thank you for the information about Japan, will try another encoding!

Carl Eugen
Jan Ekström Feb. 8, 2019, 1:30 a.m. UTC | #5
On Fri, Feb 8, 2019 at 3:19 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2019-02-08 2:07 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> > On Fri, Feb 8, 2019 at 2:44 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >>
> >> 2019-02-08 1:42 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> >> > On Fri, Feb 8, 2019 at 2:38 AM Carl Eugen Hoyos <ceffmpeg@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi!
> >> >>
> >> >> Attached patch fixes ticket #6320, tested with the sample from ticket
> >> >> #7069.
> >> >>
> >> >> Please comment, Carl Eugen
> >> >
> >> > Nice!
> >> >
> >> > I will try to test this tomorrow. To tell you the truth, I am quite
> >> > jealous that the text encodings you require are already in upstream
> >> > libiconv.
> >>
> >> The sample you provided (for which it would have been
> >> extra-nice to tell us where it came from, I found it meanwhile)
> >> does not allow converting with iconv, or at least I couldn't
> >> guess the encoding...
> >
> > The samples came from MBS (ARIB captions one) and Tokyo MX (the PID
> > switch one), if that's what you mean.
>
> For our context, the sample comes from ticket #2261.
> But thank you for the information about Japan, will try another encoding!
>
> Carl Eugen

Yes, Paul was nice enough to open that ticket back then. It is a
capture from back when Tokyo MX still did proper SD-to-HD PID
switching.

Jan
Marton Balint Feb. 8, 2019, 10:09 p.m. UTC | #6
On Fri, 8 Feb 2019, Carl Eugen Hoyos wrote:

> Hi!
>
> Attached patch fixes ticket #6320, tested with the sample from ticket #7069.
>
> Please comment, Carl Eugen
>
> From fdcd141a29f336925681193a9cdd3f4eaa5c368e Mon Sep 17 00:00:00 2001
> From: Carl Eugen Hoyos <ceffmpeg@gmail.com>
> Date: Fri, 8 Feb 2019 01:35:33 +0100
> Subject: [PATCH] lavf/mpegts: Convert service_name and service_provider to
>  utf-8.
> 
> Fixes ticket #6320.
> ---
>  libavformat/mpegts.c |   33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/libavformat/mpegts.c b/libavformat/mpegts.c
> index b04fd7b..dde610f 100644
> --- a/libavformat/mpegts.c
> +++ b/libavformat/mpegts.c
> @@ -37,6 +37,9 @@
>  #include "avio_internal.h"
>  #include "mpeg.h"
>  #include "isom.h"
> +#if CONFIG_ICONV
> +#include <iconv.h>
> +#endif
>
>  /* maximum size in which we look for synchronization if
>   * synchronization is lost */
> @@ -674,6 +677,36 @@ static char *getstr8(const uint8_t **pp, const uint8_t *p_end)
>          return NULL;
>      if (len > p_end - p)
>          return NULL;
> +#if CONFIG_ICONV
> +    if (len && *p < 0x20) {
> +        const char *encodings[] = {
> +            "ISO6937", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8",
> +            "ISO-8859-9", "ISO-8859-10", "ISO-8859-11", "", "ISO-8859-13",
> +            "ISO-8859-14", "ISO-8859-15", "", "", "", "",
> +            "", "ISO-10646", "KSC_5601", "GB2312", "ISO-10646", "UTF-8", "",
> +            "", "", "", "", "", "", "", "", ""
> +        };
> +        iconv_t cd;
> +        char *in, *out;
> +        size_t inlen = len - 1, outlen = inlen * 6;
> +        cd = iconv_open("UTF-8", encodings[*p]);

Can you add support for the ISO-8859-any case where p[0] == 0x10, p[1] == 0x00
and p[2] == any? I will upload a sample to the trac ticket.

> +        if (cd == (iconv_t)-1)
> +            goto no_iconv;
> +        str =
> +        out = av_malloc(outlen);

I prefer a single line equation. Also outlen + 1 would be much more safe 
because outlen can be 0.

> +        if (!str)
> +            return NULL;

you are leaking iconv context here.

> +        in = (char *)p + 1;
> +        if (iconv(cd, &in, &inlen, &out, &outlen) == -1) {
> +            iconv_close(cd);
> +            goto no_iconv;
> +        }

and here

> +        *out = 0;
> +        *pp = in;

maybe safer to use *pp = p + len, I am not sure iconv always consumes all the data.

> +        return str;
> +    }
> +no_iconv:
> +#endif
>      str = av_malloc(len + 1);
>      if (!str)
>          return NULL;

Thanks,
Marton
diff mbox

Patch

From fdcd141a29f336925681193a9cdd3f4eaa5c368e Mon Sep 17 00:00:00 2001
From: Carl Eugen Hoyos <ceffmpeg@gmail.com>
Date: Fri, 8 Feb 2019 01:35:33 +0100
Subject: [PATCH] lavf/mpegts: Convert service_name and service_provider to
 utf-8.

Fixes ticket #6320.
---
 libavformat/mpegts.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/libavformat/mpegts.c b/libavformat/mpegts.c
index b04fd7b..dde610f 100644
--- a/libavformat/mpegts.c
+++ b/libavformat/mpegts.c
@@ -37,6 +37,9 @@ 
 #include "avio_internal.h"
 #include "mpeg.h"
 #include "isom.h"
+#if CONFIG_ICONV
+#include <iconv.h>
+#endif
 
 /* maximum size in which we look for synchronization if
  * synchronization is lost */
@@ -674,6 +677,36 @@  static char *getstr8(const uint8_t **pp, const uint8_t *p_end)
         return NULL;
     if (len > p_end - p)
         return NULL;
+#if CONFIG_ICONV
+    if (len && *p < 0x20) {
+        const char *encodings[] = {
+            "ISO6937", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8",
+            "ISO-8859-9", "ISO-8859-10", "ISO-8859-11", "", "ISO-8859-13",
+            "ISO-8859-14", "ISO-8859-15", "", "", "", "",
+            "", "ISO-10646", "KSC_5601", "GB2312", "ISO-10646", "UTF-8", "",
+            "", "", "", "", "", "", "", "", ""
+        };
+        iconv_t cd;
+        char *in, *out;
+        size_t inlen = len - 1, outlen = inlen * 6;
+        cd = iconv_open("UTF-8", encodings[*p]);
+        if (cd == (iconv_t)-1)
+            goto no_iconv;
+        str =
+        out = av_malloc(outlen);
+        if (!str)
+            return NULL;
+        in = (char *)p + 1;
+        if (iconv(cd, &in, &inlen, &out, &outlen) == -1) {
+            iconv_close(cd);
+            goto no_iconv;
+        }
+        *out = 0;
+        *pp = in;
+        return str;
+    }
+no_iconv:
+#endif
     str = av_malloc(len + 1);
     if (!str)
         return NULL;
-- 
1.7.10.4