diff mbox

[FFmpeg-devel] lavf/mpegts: Do not print the character coding as part of service name

Message ID CAB0OVGoUb9evJnXOHJsa4E9M7p-D5Kz6ezxcRNGhN8RCXDJCpQ@mail.gmail.com
State Superseded
Headers show

Commit Message

Carl Eugen Hoyos Feb. 7, 2019, 3:46 p.m. UTC
2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
>> +    if (p[0] < 0x20) {
>> +        p++;
>> +        if (--len < 0)
>> +            return NULL;
>> +    }
>
> If I understand section "A.2 Selection of character table" of ETSI EN
> 300 468 correctly, you need to drop an additional byte if the first
> byte (p[0]) is 0x1F, or an additional two bytes if the first one is
> 0x10.

New patch attached.

Thank you, Carl Eugen

Comments

Marton Balint Feb. 7, 2019, 8:40 p.m. UTC | #1
On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote:

> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
>> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
>>> +    if (p[0] < 0x20) {
>>> +        p++;
>>> +        if (--len < 0)
>>> +            return NULL;
>>> +    }
>>
>> If I understand section "A.2 Selection of character table" of ETSI EN
>> 300 468 correctly, you need to drop an additional byte if the first
>> byte (p[0]) is 0x1F, or an additional two bytes if the first one is
>> 0x10.
>
> New patch attached.

Don't simply drop charset information. You should convert the strings to 
UTF-8 based on that. Mpegts encoding needs updating too to be able to keep 
the international characters in case of stream copy...

Thanks,
Marton
Carl Eugen Hoyos Feb. 7, 2019, 9:51 p.m. UTC | #2
2019-02-07 21:40 GMT+01:00, Marton Balint <cus@passwd.hu>:
>
>
> On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote:
>
>> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
>>> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
>>>> +    if (p[0] < 0x20) {
>>>> +        p++;
>>>> +        if (--len < 0)
>>>> +            return NULL;
>>>> +    }
>>>
>>> If I understand section "A.2 Selection of character table" of ETSI EN
>>> 300 468 correctly, you need to drop an additional byte if the first
>>> byte (p[0]) is 0x1F, or an additional two bytes if the first one is
>>> 0x10.
>>
>> New patch attached.
>
> Don't simply drop charset information.

> You should convert the strings to UTF-8 based on that.

(Yes we should)
Do you have a sample that needs this?
If not, a patch that fixes a user-reported issue would rot
because a better (unneeded) fix is possible.

Carl Eugen
Jan Ekström Feb. 7, 2019, 10:17 p.m. UTC | #3
On Thu, Feb 7, 2019 at 5:46 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
> > On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
> >> +    if (p[0] < 0x20) {
> >> +        p++;
> >> +        if (--len < 0)
> >> +            return NULL;
> >> +    }
> >
> > If I understand section "A.2 Selection of character table" of ETSI EN
> > 300 468 correctly, you need to drop an additional byte if the first
> > byte (p[0]) is 0x1F, or an additional two bytes if the first one is
> > 0x10.
>
> New patch attached.
>

This is weird.

I applied this on top of my WIP ARIB branch
(https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff) and now I
lost service name / provider altogether...
I mean, the stuff in the branch is "quick and dirty" at this point,
but it's still peculiar.

This can be seen with both the PID switch sample I recently posted as
well as with the ARIB captions sample I posted in the libaribb24
wrapper thread.

Jan
Carl Eugen Hoyos Feb. 7, 2019, 10:26 p.m. UTC | #4
2019-02-07 23:17 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> On Thu, Feb 7, 2019 at 5:46 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>
>> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
>> > On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
>> >> +    if (p[0] < 0x20) {
>> >> +        p++;
>> >> +        if (--len < 0)
>> >> +            return NULL;
>> >> +    }
>> >
>> > If I understand section "A.2 Selection of character table" of ETSI EN
>> > 300 468 correctly, you need to drop an additional byte if the first
>> > byte (p[0]) is 0x1F, or an additional two bytes if the first one is
>> > 0x10.
>>
>> New patch attached.
>>
>
> This is weird.
>
> I applied this on top of my WIP ARIB branch
> (https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff) and now I
> lost service name / provider altogether...
> I mean, the stuff in the branch is "quick and dirty" at this point,
> but it's still peculiar.
>
> This can be seen with both the PID switch sample I recently posted as
> well as with the ARIB captions sample I posted in the libaribb24
> wrapper thread.

Reason is that instead of "(*p < 0x20)" it has to be "(len && *p < 0x20)".

This is also the missing sample, I don't know if I'll succeed with the
conversion.

Carl Eugen
Jan Ekström Feb. 7, 2019, 10:34 p.m. UTC | #5
On Fri, Feb 8, 2019 at 12:26 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2019-02-07 23:17 GMT+01:00, Jan Ekström <jeebjp@gmail.com>:
> > On Thu, Feb 7, 2019 at 5:46 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >>
> >> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
> >> > On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
> >> >> +    if (p[0] < 0x20) {
> >> >> +        p++;
> >> >> +        if (--len < 0)
> >> >> +            return NULL;
> >> >> +    }
> >> >
> >> > If I understand section "A.2 Selection of character table" of ETSI EN
> >> > 300 468 correctly, you need to drop an additional byte if the first
> >> > byte (p[0]) is 0x1F, or an additional two bytes if the first one is
> >> > 0x10.
> >>
> >> New patch attached.
> >>
> >
> > This is weird.
> >
> > I applied this on top of my WIP ARIB branch
> > (https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff) and now I
> > lost service name / provider altogether...
> > I mean, the stuff in the branch is "quick and dirty" at this point,
> > but it's still peculiar.
> >
> > This can be seen with both the PID switch sample I recently posted as
> > well as with the ARIB captions sample I posted in the libaribb24
> > wrapper thread.
>
> Reason is that instead of "(*p < 0x20)" it has to be "(len && *p < 0x20)".
>
> This is also the missing sample, I don't know if I'll succeed with the
> conversion.
>

It seems like the person who implemented the ARIB STD-B24 encoding and
got it merged to linuxtv also implemented ETSI EN 300 468?
https://git.linuxtv.org/v4l-utils.git/tree/contrib/gconv/en300-468-tab00.c

It is under LGPLv2.1+ so similarly usable. I did take a look at the
ARIB one yesterday and while it seems to base an awful lot on the
magic that comes from including iconv/{skeleton,loop}.c it doesn't
seem completely impenetrable...

Jan
Marton Balint Feb. 7, 2019, 11:11 p.m. UTC | #6
On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote:

> 2019-02-07 21:40 GMT+01:00, Marton Balint <cus@passwd.hu>:
>>
>>
>> On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote:
>>
>>> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>:
>>>> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote:
>>>>> +    if (p[0] < 0x20) {
>>>>> +        p++;
>>>>> +        if (--len < 0)
>>>>> +            return NULL;
>>>>> +    }
>>>>
>>>> If I understand section "A.2 Selection of character table" of ETSI EN
>>>> 300 468 correctly, you need to drop an additional byte if the first
>>>> byte (p[0]) is 0x1F, or an additional two bytes if the first one is
>>>> 0x10.
>>>
>>> New patch attached.
>>
>> Don't simply drop charset information.
>
>> You should convert the strings to UTF-8 based on that.
>
> (Yes we should)
> Do you have a sample that needs this?

There are tons of samples that needs this, two examples:

- samples/ffmpeg-bugs/roundup/issue1871/dvbt.ts
- samples/MPEG2/res_change_ffmpeg_aspect.ts

> If not, a patch that fixes a user-reported issue would rot
> because a better (unneeded) fix is possible.

In this case I think an incomplete fix hurts more, because after it the 
API user will have no way to determine the encoding. Using Iconv to 
convert between the code pages should not be too much work.

Regards,
Marton
Carl Eugen Hoyos Feb. 8, 2019, 12:47 a.m. UTC | #7
2019-02-08 0:11 GMT+01:00, Marton Balint <cus@passwd.hu>:
>
> On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote:
>
>> 2019-02-07 21:40 GMT+01:00, Marton Balint <cus@passwd.hu>:

>>>> New patch attached.
>>>
>>> Don't simply drop charset information.
>>
>>> You should convert the strings to UTF-8 based on that.
>>
>> (Yes we should)
>> Do you have a sample that needs this?
>
> There are tons of samples that needs this, two examples:
>
> - samples/ffmpeg-bugs/roundup/issue1871/dvbt.ts
> - samples/MPEG2/res_change_ffmpeg_aspect.ts

These are cool, I found the one from ticket #2261 and
I realized that "ORF NÖ" also needs a conversion;-)

>> If not, a patch that fixes a user-reported issue would rot
>> because a better (unneeded) fix is possible.
>
> In this case I think an incomplete fix hurts more, because after it the
> API user will have no way to determine the encoding.

Removed the part of the patch that would have removed the "?"
for non-iconv and unknown encodings.

Carl Eugen
diff mbox

Patch

From a63c53bc224b6a1d53197e3dafd2ec9c6097d3c0 Mon Sep 17 00:00:00 2001
From: Carl Eugen Hoyos <ceffmpeg@gmail.com>
Date: Thu, 7 Feb 2019 16:45:19 +0100
Subject: [PATCH] lavf/mpegts: Do not print the character coding as part of
 service name.

Fixes ticket #6320.
---
 libavformat/mpegts.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/libavformat/mpegts.c b/libavformat/mpegts.c
index b04fd7b..f161c0f 100644
--- a/libavformat/mpegts.c
+++ b/libavformat/mpegts.c
@@ -674,6 +674,18 @@  static char *getstr8(const uint8_t **pp, const uint8_t *p_end)
         return NULL;
     if (len > p_end - p)
         return NULL;
+    if (*p < 0x20) {
+        if (*p == 0x10) {
+            p += 2;
+            len -= 2;
+        } else if (*p == 0x1f) {
+            p++;
+            len--;
+        }
+        p++;
+        if (--len < 0)
+            return NULL;
+    }
     str = av_malloc(len + 1);
     if (!str)
         return NULL;
-- 
1.7.10.4