Message ID | 20210721172809.22974-1-jeebjp@gmail.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] avformat/mov: parse 3gpp titl from media or track udta | expand |
Context | Check | Description |
---|---|---|
andriy/x86_make | success | Make finished |
andriy/x86_make_fate | success | Make fate finished |
andriy/PPC64_make | success | Make finished |
andriy/PPC64_make_fate | success | Make fate finished |
On Wed, Jul 21, 2021 at 8:28 PM Jan Ekström <jeebjp@gmail.com> wrote: > > Seems to be: > * Utilized by Handbrake for track titling > * Actually defined as "title for the media" > > Definition from 3GPP TS 26.244 follows: > > Field Type Details Value > BoxHeader.Size Unsigned int(32) BOX_SIZE > BoxHeader.Type Unsigned int(32) 'titl' > BoxHeader.Version Unsigned int(8) 0 > BoxHeader.Flags Bit(24) 0 > Pad Bit(1) 0 > Language Unsigned int(5)[3] Packed ISO-639-2/T language code > Title String Text of title > > Semantics: > > Language: declares the language code for the following text. See > ISO 639-2/T for the set of three character codes. Each character > is packed as the difference between its ASCII value and 0x60. > > The code is confined to being three lower-case letters, so these > values are strictly positive. > > Title: null-terminated string in either UTF-8 or UTF-16 characters, > giving a title information. If UTF-16 is used, the string shall > start with the BYTE ORDER MARK (0xFEFF). > --- A sample for this sort of metadata can be seen with https://0x0.st/-zjq.m4v , which was posted at https://github.com/mpv-player/mpv/issues/8488 . The sample contains both "name" and "titl" boxes: [udta: User Data Box] position = 3991500 size = 71 [name] position = 3991508 size = 28 [titl] position = 3991536 size = 35 ...out of which if I read QTFF documentation correctly "name" should not be utilized for user-facing naming, and "titl" is actually a user-facing metadata box. Thus I implemented the latter. Jan
On Wed, Jul 21, 2021 at 8:30 PM Jan Ekström <jeebjp@gmail.com> wrote: > > On Wed, Jul 21, 2021 at 8:28 PM Jan Ekström <jeebjp@gmail.com> wrote: > > > > Seems to be: > > * Utilized by Handbrake for track titling > > * Actually defined as "title for the media" > > > > Definition from 3GPP TS 26.244 follows: > > > > Field Type Details Value > > BoxHeader.Size Unsigned int(32) BOX_SIZE > > BoxHeader.Type Unsigned int(32) 'titl' > > BoxHeader.Version Unsigned int(8) 0 > > BoxHeader.Flags Bit(24) 0 > > Pad Bit(1) 0 > > Language Unsigned int(5)[3] Packed ISO-639-2/T language code > > Title String Text of title > > > > Semantics: > > > > Language: declares the language code for the following text. See > > ISO 639-2/T for the set of three character codes. Each character > > is packed as the difference between its ASCII value and 0x60. > > > > The code is confined to being three lower-case letters, so these > > values are strictly positive. > > > > Title: null-terminated string in either UTF-8 or UTF-16 characters, > > giving a title information. If UTF-16 is used, the string shall > > start with the BYTE ORDER MARK (0xFEFF). > > --- > > A sample for this sort of metadata can be seen with > https://0x0.st/-zjq.m4v , which was posted at > https://github.com/mpv-player/mpv/issues/8488 . > > The sample contains both "name" and "titl" boxes: > [udta: User Data Box] > position = 3991500 > size = 71 > [name] > position = 3991508 > size = 28 > [titl] > position = 3991536 > size = 35 > > ...out of which if I read QTFF documentation correctly "name" should > not be utilized for user-facing naming, and "titl" is actually a > user-facing metadata box. Thus I implemented the latter. > > Jan Ping. Includes a link to a testable file, and the specification of the box is in the commit message :) Jan
On Sat, Jul 24, 2021 at 9:48 PM Jan Ekström <jeebjp@gmail.com> wrote: > > On Wed, Jul 21, 2021 at 8:30 PM Jan Ekström <jeebjp@gmail.com> wrote: > > > > On Wed, Jul 21, 2021 at 8:28 PM Jan Ekström <jeebjp@gmail.com> wrote: > > > > > > Seems to be: > > > * Utilized by Handbrake for track titling > > > * Actually defined as "title for the media" > > > > > > Definition from 3GPP TS 26.244 follows: > > > > > > Field Type Details Value > > > BoxHeader.Size Unsigned int(32) BOX_SIZE > > > BoxHeader.Type Unsigned int(32) 'titl' > > > BoxHeader.Version Unsigned int(8) 0 > > > BoxHeader.Flags Bit(24) 0 > > > Pad Bit(1) 0 > > > Language Unsigned int(5)[3] Packed ISO-639-2/T language code > > > Title String Text of title > > > > > > Semantics: > > > > > > Language: declares the language code for the following text. See > > > ISO 639-2/T for the set of three character codes. Each character > > > is packed as the difference between its ASCII value and 0x60. > > > > > > The code is confined to being three lower-case letters, so these > > > values are strictly positive. > > > > > > Title: null-terminated string in either UTF-8 or UTF-16 characters, > > > giving a title information. If UTF-16 is used, the string shall > > > start with the BYTE ORDER MARK (0xFEFF). > > > --- > > > > A sample for this sort of metadata can be seen with > > https://0x0.st/-zjq.m4v , which was posted at > > https://github.com/mpv-player/mpv/issues/8488 . > > > > The sample contains both "name" and "titl" boxes: > > [udta: User Data Box] > > position = 3991500 > > size = 71 > > [name] > > position = 3991508 > > size = 28 > > [titl] > > position = 3991536 > > size = 35 > > > > ...out of which if I read QTFF documentation correctly "name" should > > not be utilized for user-facing naming, and "titl" is actually a > > user-facing metadata box. Thus I implemented the latter. > > > > Jan > > Ping. > > Includes a link to a testable file, and the specification of the box > is in the commit message :) > > Jan Will apply tomorrow after work unless there are any comments. The only thing I more or less was unsure about was whether there was any nice way of first figure out the length of the string (since it might not go all the way to the end of the box) and then allocating a buffer for it for parsing, or if people wanted static buffers of arbitrary length (I saw arbitrarily ~200 char limited things nearby, which probably work for most things, but would indeed lead to randomly cut strings). At the point of writing the patch, since the value was an unsigned int, I decided to allocate a buffer of left_bytes + 1 as that seemed simple enough, and then pass that buffer into either the AVFormatContext metadata, or the AVStream's metadata. The way that per-language metadata is handled (adding the language as a suffix a la "title-fin" for example) was noticed being done in another function in this file, and that style was followed. It did make me think if at some point having a set of language-specific sub-keys for a base metadata key would at some point make sense. Also, is that linked sample good enough for a FATE test (3,9MiB), or should I try to generate a more limited sample? Jan
diff --git a/libavformat/mov.c b/libavformat/mov.c index 040babed95..9edb3d6596 100644 --- a/libavformat/mov.c +++ b/libavformat/mov.c @@ -291,6 +291,111 @@ static int mov_metadata_hmmt(MOVContext *c, AVIOContext *pb, unsigned len) return 0; } +// 3GPP TS 26.244, 8.2 3GPP asset meta data +static int mov_metadata_titl(MOVContext *c, AVIOContext *pb, unsigned len) +{ + AVFormatContext *s = c->fc; + int version = -1, ret = AVERROR_BUG; + unsigned left_bytes = len, langcode = 0, flags = 100, bom = 0, buf_size = 0; + char language[4] = { 0 }; + AVStream *st = NULL; + char *title_buf = NULL; + const char key[] = "title"; + + // 4 byte FullBox header, 2 byte lang. code, at least 1 byte for string + if (len < 4 + 2 + 1) { + av_log(s, AV_LOG_ERROR, "3GPP titl box too short!\n"); + return AVERROR_INVALIDDATA; + } + + if (s->nb_streams >= 1) + st = s->streams[s->nb_streams-1]; + + // FullBox header + version = avio_r8(pb); + flags = avio_rb24(pb); + left_bytes -= 4; + + if (version != 0 || flags != 0) { + av_log(s, AV_LOG_ERROR, + "Invalid nonzero version (%d) or flags (%x) for 3GPP titl!\n", + version, flags); + return AVERROR_INVALIDDATA; + } + + langcode = avio_rb16(pb) & ~(1 << 15); + if ((ret = ff_mov_lang_to_iso639(langcode, language)) < 0) { + av_log(s, AV_LOG_ERROR, + "Failed to parse 3GPP titl language code %x: %s!\n", + langcode, av_err2str(ret)); + return ret; + } + + left_bytes -= 2; + + if (left_bytes <= 1) + // no contents (just null) + return 0; + + buf_size = left_bytes + 1; + if (!(title_buf = av_mallocz(buf_size))) { + av_log(s, AV_LOG_ERROR, + "Could not allocate buffer of length %u for parsed 3GPP titl " + "title string!\n", + left_bytes); + return AVERROR(ENOMEM); + } + + bom = avio_rb16(pb); + left_bytes -= 2; + + if (bom == 0xfeff) + avio_get_str16be(pb, left_bytes, title_buf, buf_size); + else if (bom == 0xfffe) + avio_get_str16le(pb, left_bytes, title_buf, buf_size); + else { + AV_WB16(title_buf, bom); + if (!left_bytes) + title_buf[2] = 0; + else + avio_get_str(pb, left_bytes, title_buf + 2, buf_size - 2); + } + + av_log(s, AV_LOG_TRACE, "%s TitlBox(lang: %s, title: %s)\n", + st ? "track" : "media", + language, title_buf); + + s->event_flags |= AVFMT_EVENT_FLAG_METADATA_UPDATED; + + if (*language && strcmp(language, "und")) { + char lang_key[sizeof(key) + 1 + sizeof(language)] = { 0 }; + snprintf(lang_key, sizeof(lang_key), "%s-%s", key, language); + + if ((ret = av_dict_set(st ? &st->metadata : &s->metadata, + lang_key, title_buf, 0)) < 0) { + av_log(s, AV_LOG_ERROR, + "Failed to set %s metadata key %s to value %s: %s!\n", + st ? "track" : "media", + lang_key, title_buf, + av_err2str(ret)); + goto cleanup; + } + } + + ret = av_dict_set(st ? &st->metadata : &s->metadata, key, title_buf, 0); + if (ret < 0) + av_log(s, AV_LOG_ERROR, + "Failed to set %s metadata key %s to value %s: %s!\n", + st ? "track" : "media", + key, title_buf, + av_err2str(ret)); + +cleanup: + av_freep(&title_buf); + + return ret; +} + static int mov_read_udta_string(MOVContext *c, AVIOContext *pb, MOVAtom atom) { char tmp_key[AV_FOURCC_MAX_STRING_SIZE] = {0}; @@ -349,6 +454,8 @@ static int mov_read_udta_string(MOVContext *c, AVIOContext *pb, MOVAtom atom) case MKTAG( 's','o','s','n'): key = "sort_show"; break; case MKTAG( 's','t','i','k'): key = "media_type"; parse = mov_metadata_int8_no_padding; break; + case MKTAG( 't','i','t','l'): + return mov_metadata_titl(c, pb, atom.size); case MKTAG( 't','r','k','n'): key = "track"; parse = mov_metadata_track_or_disc_number; break; case MKTAG( 't','v','e','n'): key = "episode_id"; break;