Message ID | CAB0OVGoUb9evJnXOHJsa4E9M7p-D5Kz6ezxcRNGhN8RCXDJCpQ@mail.gmail.com |
---|---|
State | Superseded |
Headers | show |
On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote: > 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>: >> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote: >>> + if (p[0] < 0x20) { >>> + p++; >>> + if (--len < 0) >>> + return NULL; >>> + } >> >> If I understand section "A.2 Selection of character table" of ETSI EN >> 300 468 correctly, you need to drop an additional byte if the first >> byte (p[0]) is 0x1F, or an additional two bytes if the first one is >> 0x10. > > New patch attached. Don't simply drop charset information. You should convert the strings to UTF-8 based on that. Mpegts encoding needs updating too to be able to keep the international characters in case of stream copy... Thanks, Marton
2019-02-07 21:40 GMT+01:00, Marton Balint <cus@passwd.hu>: > > > On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote: > >> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>: >>> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote: >>>> + if (p[0] < 0x20) { >>>> + p++; >>>> + if (--len < 0) >>>> + return NULL; >>>> + } >>> >>> If I understand section "A.2 Selection of character table" of ETSI EN >>> 300 468 correctly, you need to drop an additional byte if the first >>> byte (p[0]) is 0x1F, or an additional two bytes if the first one is >>> 0x10. >> >> New patch attached. > > Don't simply drop charset information. > You should convert the strings to UTF-8 based on that. (Yes we should) Do you have a sample that needs this? If not, a patch that fixes a user-reported issue would rot because a better (unneeded) fix is possible. Carl Eugen
On Thu, Feb 7, 2019 at 5:46 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > > 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>: > > On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote: > >> + if (p[0] < 0x20) { > >> + p++; > >> + if (--len < 0) > >> + return NULL; > >> + } > > > > If I understand section "A.2 Selection of character table" of ETSI EN > > 300 468 correctly, you need to drop an additional byte if the first > > byte (p[0]) is 0x1F, or an additional two bytes if the first one is > > 0x10. > > New patch attached. > This is weird. I applied this on top of my WIP ARIB branch (https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff) and now I lost service name / provider altogether... I mean, the stuff in the branch is "quick and dirty" at this point, but it's still peculiar. This can be seen with both the PID switch sample I recently posted as well as with the ARIB captions sample I posted in the libaribb24 wrapper thread. Jan
2019-02-07 23:17 GMT+01:00, Jan Ekström <jeebjp@gmail.com>: > On Thu, Feb 7, 2019 at 5:46 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: >> >> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>: >> > On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote: >> >> + if (p[0] < 0x20) { >> >> + p++; >> >> + if (--len < 0) >> >> + return NULL; >> >> + } >> > >> > If I understand section "A.2 Selection of character table" of ETSI EN >> > 300 468 correctly, you need to drop an additional byte if the first >> > byte (p[0]) is 0x1F, or an additional two bytes if the first one is >> > 0x10. >> >> New patch attached. >> > > This is weird. > > I applied this on top of my WIP ARIB branch > (https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff) and now I > lost service name / provider altogether... > I mean, the stuff in the branch is "quick and dirty" at this point, > but it's still peculiar. > > This can be seen with both the PID switch sample I recently posted as > well as with the ARIB captions sample I posted in the libaribb24 > wrapper thread. Reason is that instead of "(*p < 0x20)" it has to be "(len && *p < 0x20)". This is also the missing sample, I don't know if I'll succeed with the conversion. Carl Eugen
On Fri, Feb 8, 2019 at 12:26 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > > 2019-02-07 23:17 GMT+01:00, Jan Ekström <jeebjp@gmail.com>: > > On Thu, Feb 7, 2019 at 5:46 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > >> > >> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>: > >> > On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote: > >> >> + if (p[0] < 0x20) { > >> >> + p++; > >> >> + if (--len < 0) > >> >> + return NULL; > >> >> + } > >> > > >> > If I understand section "A.2 Selection of character table" of ETSI EN > >> > 300 468 correctly, you need to drop an additional byte if the first > >> > byte (p[0]) is 0x1F, or an additional two bytes if the first one is > >> > 0x10. > >> > >> New patch attached. > >> > > > > This is weird. > > > > I applied this on top of my WIP ARIB branch > > (https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff) and now I > > lost service name / provider altogether... > > I mean, the stuff in the branch is "quick and dirty" at this point, > > but it's still peculiar. > > > > This can be seen with both the PID switch sample I recently posted as > > well as with the ARIB captions sample I posted in the libaribb24 > > wrapper thread. > > Reason is that instead of "(*p < 0x20)" it has to be "(len && *p < 0x20)". > > This is also the missing sample, I don't know if I'll succeed with the > conversion. > It seems like the person who implemented the ARIB STD-B24 encoding and got it merged to linuxtv also implemented ETSI EN 300 468? https://git.linuxtv.org/v4l-utils.git/tree/contrib/gconv/en300-468-tab00.c It is under LGPLv2.1+ so similarly usable. I did take a look at the ARIB one yesterday and while it seems to base an awful lot on the magic that comes from including iconv/{skeleton,loop}.c it doesn't seem completely impenetrable... Jan
On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote: > 2019-02-07 21:40 GMT+01:00, Marton Balint <cus@passwd.hu>: >> >> >> On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote: >> >>> 2019-02-07 16:34 GMT+01:00, Moritz Barsnick <barsnick@gmx.net>: >>>> On Thu, Feb 07, 2019 at 14:40:40 +0100, Carl Eugen Hoyos wrote: >>>>> + if (p[0] < 0x20) { >>>>> + p++; >>>>> + if (--len < 0) >>>>> + return NULL; >>>>> + } >>>> >>>> If I understand section "A.2 Selection of character table" of ETSI EN >>>> 300 468 correctly, you need to drop an additional byte if the first >>>> byte (p[0]) is 0x1F, or an additional two bytes if the first one is >>>> 0x10. >>> >>> New patch attached. >> >> Don't simply drop charset information. > >> You should convert the strings to UTF-8 based on that. > > (Yes we should) > Do you have a sample that needs this? There are tons of samples that needs this, two examples: - samples/ffmpeg-bugs/roundup/issue1871/dvbt.ts - samples/MPEG2/res_change_ffmpeg_aspect.ts > If not, a patch that fixes a user-reported issue would rot > because a better (unneeded) fix is possible. In this case I think an incomplete fix hurts more, because after it the API user will have no way to determine the encoding. Using Iconv to convert between the code pages should not be too much work. Regards, Marton
2019-02-08 0:11 GMT+01:00, Marton Balint <cus@passwd.hu>: > > On Thu, 7 Feb 2019, Carl Eugen Hoyos wrote: > >> 2019-02-07 21:40 GMT+01:00, Marton Balint <cus@passwd.hu>: >>>> New patch attached. >>> >>> Don't simply drop charset information. >> >>> You should convert the strings to UTF-8 based on that. >> >> (Yes we should) >> Do you have a sample that needs this? > > There are tons of samples that needs this, two examples: > > - samples/ffmpeg-bugs/roundup/issue1871/dvbt.ts > - samples/MPEG2/res_change_ffmpeg_aspect.ts These are cool, I found the one from ticket #2261 and I realized that "ORF NÖ" also needs a conversion;-) >> If not, a patch that fixes a user-reported issue would rot >> because a better (unneeded) fix is possible. > > In this case I think an incomplete fix hurts more, because after it the > API user will have no way to determine the encoding. Removed the part of the patch that would have removed the "?" for non-iconv and unknown encodings. Carl Eugen
From a63c53bc224b6a1d53197e3dafd2ec9c6097d3c0 Mon Sep 17 00:00:00 2001 From: Carl Eugen Hoyos <ceffmpeg@gmail.com> Date: Thu, 7 Feb 2019 16:45:19 +0100 Subject: [PATCH] lavf/mpegts: Do not print the character coding as part of service name. Fixes ticket #6320. --- libavformat/mpegts.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/libavformat/mpegts.c b/libavformat/mpegts.c index b04fd7b..f161c0f 100644 --- a/libavformat/mpegts.c +++ b/libavformat/mpegts.c @@ -674,6 +674,18 @@ static char *getstr8(const uint8_t **pp, const uint8_t *p_end) return NULL; if (len > p_end - p) return NULL; + if (*p < 0x20) { + if (*p == 0x10) { + p += 2; + len -= 2; + } else if (*p == 0x1f) { + p++; + len--; + } + p++; + if (--len < 0) + return NULL; + } str = av_malloc(len + 1); if (!str) return NULL; -- 1.7.10.4