diff mbox series

[FFmpeg-devel] avformat/mov: parse 3gpp titl from media or track udta

Message ID 20210721172809.22974-1-jeebjp@gmail.com
State New
Headers show
Series [FFmpeg-devel] avformat/mov: parse 3gpp titl from media or track udta | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Jan Ekström July 21, 2021, 5:28 p.m. UTC
Seems to be utilized by Handbrake for track titling and is
actually defined as "title for the media" as per the
specification.

Definition from 3GPP TS 26.244 follows:

Field               Type                Details                             Value
BoxHeader.Size      Unsigned int(32)                                        BOX_SIZE
BoxHeader.Type      Unsigned int(32)                                        'titl'
BoxHeader.Version   Unsigned int(8)                                         0
BoxHeader.Flags     Bit(24)                                                 0
Pad                 Bit(1)                                                  0
Language            Unsigned int(5)[3]  Packed ISO-639-2/T language code
Title               String              Text of title

Semantics:

Language: declares the language code for the following text. See
ISO 639-2/T for the set of three character codes. Each character
is packed as the difference between its ASCII value and 0x60.

The code is confined to being three lower-case letters, so these
values are strictly positive.

Title: null-terminated string in either UTF-8 or UTF-16 characters,
giving a title information. If UTF-16 is used, the string shall
start with the BYTE ORDER MARK (0xFEFF).
---
 libavformat/mov.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

Comments

Jan Ekström July 21, 2021, 5:30 p.m. UTC | #1
On Wed, Jul 21, 2021 at 8:28 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> Seems to be:
> * Utilized by Handbrake for track titling
> * Actually defined as "title for the media"
>
> Definition from 3GPP TS 26.244 follows:
>
> Field               Type                Details                             Value
> BoxHeader.Size      Unsigned int(32)                                        BOX_SIZE
> BoxHeader.Type      Unsigned int(32)                                        'titl'
> BoxHeader.Version   Unsigned int(8)                                         0
> BoxHeader.Flags     Bit(24)                                                 0
> Pad                 Bit(1)                                                  0
> Language            Unsigned int(5)[3]  Packed ISO-639-2/T language code
> Title               String              Text of title
>
> Semantics:
>
> Language: declares the language code for the following text. See
> ISO 639-2/T for the set of three character codes. Each character
> is packed as the difference between its ASCII value and 0x60.
>
> The code is confined to being three lower-case letters, so these
> values are strictly positive.
>
> Title: null-terminated string in either UTF-8 or UTF-16 characters,
> giving a title information. If UTF-16 is used, the string shall
> start with the BYTE ORDER MARK (0xFEFF).
> ---

A sample for this sort of metadata can be seen with
https://0x0.st/-zjq.m4v , which was posted at
https://github.com/mpv-player/mpv/issues/8488 .

The sample contains both "name" and "titl" boxes:
[udta: User Data Box]
    position = 3991500
    size = 71
    [name]
        position = 3991508
        size = 28
    [titl]
        position = 3991536
        size = 35

...out of which if I read QTFF documentation correctly "name" should
not be utilized for user-facing naming, and "titl" is actually a
user-facing metadata box. Thus I implemented the latter.

Jan
Jan Ekström July 24, 2021, 6:48 p.m. UTC | #2
On Wed, Jul 21, 2021 at 8:30 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 8:28 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >
> > Seems to be:
> > * Utilized by Handbrake for track titling
> > * Actually defined as "title for the media"
> >
> > Definition from 3GPP TS 26.244 follows:
> >
> > Field               Type                Details                             Value
> > BoxHeader.Size      Unsigned int(32)                                        BOX_SIZE
> > BoxHeader.Type      Unsigned int(32)                                        'titl'
> > BoxHeader.Version   Unsigned int(8)                                         0
> > BoxHeader.Flags     Bit(24)                                                 0
> > Pad                 Bit(1)                                                  0
> > Language            Unsigned int(5)[3]  Packed ISO-639-2/T language code
> > Title               String              Text of title
> >
> > Semantics:
> >
> > Language: declares the language code for the following text. See
> > ISO 639-2/T for the set of three character codes. Each character
> > is packed as the difference between its ASCII value and 0x60.
> >
> > The code is confined to being three lower-case letters, so these
> > values are strictly positive.
> >
> > Title: null-terminated string in either UTF-8 or UTF-16 characters,
> > giving a title information. If UTF-16 is used, the string shall
> > start with the BYTE ORDER MARK (0xFEFF).
> > ---
>
> A sample for this sort of metadata can be seen with
> https://0x0.st/-zjq.m4v , which was posted at
> https://github.com/mpv-player/mpv/issues/8488 .
>
> The sample contains both "name" and "titl" boxes:
> [udta: User Data Box]
>     position = 3991500
>     size = 71
>     [name]
>         position = 3991508
>         size = 28
>     [titl]
>         position = 3991536
>         size = 35
>
> ...out of which if I read QTFF documentation correctly "name" should
> not be utilized for user-facing naming, and "titl" is actually a
> user-facing metadata box. Thus I implemented the latter.
>
> Jan

Ping.

Includes a link to a testable file, and the specification of the box
is in the commit message :)

Jan
Jan Ekström Aug. 1, 2021, 8:49 p.m. UTC | #3
On Sat, Jul 24, 2021 at 9:48 PM Jan Ekström <jeebjp@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 8:30 PM Jan Ekström <jeebjp@gmail.com> wrote:
> >
> > On Wed, Jul 21, 2021 at 8:28 PM Jan Ekström <jeebjp@gmail.com> wrote:
> > >
> > > Seems to be:
> > > * Utilized by Handbrake for track titling
> > > * Actually defined as "title for the media"
> > >
> > > Definition from 3GPP TS 26.244 follows:
> > >
> > > Field               Type                Details                             Value
> > > BoxHeader.Size      Unsigned int(32)                                        BOX_SIZE
> > > BoxHeader.Type      Unsigned int(32)                                        'titl'
> > > BoxHeader.Version   Unsigned int(8)                                         0
> > > BoxHeader.Flags     Bit(24)                                                 0
> > > Pad                 Bit(1)                                                  0
> > > Language            Unsigned int(5)[3]  Packed ISO-639-2/T language code
> > > Title               String              Text of title
> > >
> > > Semantics:
> > >
> > > Language: declares the language code for the following text. See
> > > ISO 639-2/T for the set of three character codes. Each character
> > > is packed as the difference between its ASCII value and 0x60.
> > >
> > > The code is confined to being three lower-case letters, so these
> > > values are strictly positive.
> > >
> > > Title: null-terminated string in either UTF-8 or UTF-16 characters,
> > > giving a title information. If UTF-16 is used, the string shall
> > > start with the BYTE ORDER MARK (0xFEFF).
> > > ---
> >
> > A sample for this sort of metadata can be seen with
> > https://0x0.st/-zjq.m4v , which was posted at
> > https://github.com/mpv-player/mpv/issues/8488 .
> >
> > The sample contains both "name" and "titl" boxes:
> > [udta: User Data Box]
> >     position = 3991500
> >     size = 71
> >     [name]
> >         position = 3991508
> >         size = 28
> >     [titl]
> >         position = 3991536
> >         size = 35
> >
> > ...out of which if I read QTFF documentation correctly "name" should
> > not be utilized for user-facing naming, and "titl" is actually a
> > user-facing metadata box. Thus I implemented the latter.
> >
> > Jan
>
> Ping.
>
> Includes a link to a testable file, and the specification of the box
> is in the commit message :)
>
> Jan

Will apply tomorrow after work unless there are any comments.

The only thing I more or less was unsure about was whether there was
any nice way of first figure out the length of the string (since it
might not go all the way to the end of the box) and then allocating a
buffer for it for parsing, or if people wanted static buffers of
arbitrary length (I saw arbitrarily ~200 char limited things nearby,
which probably work for most things, but would indeed lead to randomly
cut strings). At the point of writing the patch, since the value was
an unsigned int, I decided to allocate a buffer of left_bytes + 1 as
that seemed simple enough, and then pass that buffer into either the
AVFormatContext metadata, or the AVStream's metadata.

The way that per-language metadata is handled (adding the language as
a suffix a la "title-fin" for example) was noticed being done in
another function in this file, and that style was followed. It did
make me think if at some point having a set of language-specific
sub-keys for a base metadata key would at some point make sense.

Also, is that linked sample good enough for a FATE test (3,9MiB), or
should I try to generate a more limited sample?

Jan
diff mbox series

Patch

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 040babed95..9edb3d6596 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -291,6 +291,111 @@  static int mov_metadata_hmmt(MOVContext *c, AVIOContext *pb, unsigned len)
     return 0;
 }
 
+// 3GPP TS 26.244, 8.2 3GPP asset meta data
+static int mov_metadata_titl(MOVContext *c, AVIOContext *pb, unsigned len)
+{
+    AVFormatContext *s = c->fc;
+    int version = -1, ret = AVERROR_BUG;
+    unsigned left_bytes = len, langcode = 0, flags = 100, bom = 0, buf_size = 0;
+    char language[4] = { 0 };
+    AVStream *st = NULL;
+    char *title_buf = NULL;
+    const char key[] = "title";
+
+    // 4 byte FullBox header, 2 byte lang. code, at least 1 byte for string
+    if (len < 4 + 2 + 1) {
+        av_log(s, AV_LOG_ERROR, "3GPP titl box too short!\n");
+        return AVERROR_INVALIDDATA;
+    }
+
+    if (s->nb_streams >= 1)
+        st = s->streams[s->nb_streams-1];
+
+    // FullBox header
+    version = avio_r8(pb);
+    flags   = avio_rb24(pb);
+    left_bytes -= 4;
+
+    if (version != 0 || flags != 0) {
+        av_log(s, AV_LOG_ERROR,
+               "Invalid nonzero version (%d) or flags (%x) for 3GPP titl!\n",
+               version, flags);
+        return AVERROR_INVALIDDATA;
+    }
+
+    langcode = avio_rb16(pb) & ~(1 << 15);
+    if ((ret = ff_mov_lang_to_iso639(langcode, language)) < 0) {
+        av_log(s, AV_LOG_ERROR,
+               "Failed to parse 3GPP titl language code %x: %s!\n",
+               langcode, av_err2str(ret));
+        return ret;
+    }
+
+    left_bytes -= 2;
+
+    if (left_bytes <= 1)
+        // no contents (just null)
+        return 0;
+
+    buf_size = left_bytes + 1;
+    if (!(title_buf = av_mallocz(buf_size))) {
+        av_log(s, AV_LOG_ERROR,
+               "Could not allocate buffer of length %u for parsed 3GPP titl "
+               "title string!\n",
+               left_bytes);
+        return AVERROR(ENOMEM);
+    }
+
+    bom = avio_rb16(pb);
+    left_bytes -= 2;
+
+    if (bom == 0xfeff)
+        avio_get_str16be(pb, left_bytes, title_buf, buf_size);
+    else if (bom == 0xfffe)
+        avio_get_str16le(pb, left_bytes, title_buf, buf_size);
+    else {
+        AV_WB16(title_buf, bom);
+        if (!left_bytes)
+            title_buf[2] = 0;
+        else
+            avio_get_str(pb, left_bytes, title_buf + 2, buf_size - 2);
+    }
+
+    av_log(s, AV_LOG_TRACE, "%s TitlBox(lang: %s, title: %s)\n",
+           st ? "track" : "media",
+           language, title_buf);
+
+    s->event_flags |= AVFMT_EVENT_FLAG_METADATA_UPDATED;
+
+    if (*language && strcmp(language, "und")) {
+        char lang_key[sizeof(key) + 1 + sizeof(language)] = { 0 };
+        snprintf(lang_key, sizeof(lang_key), "%s-%s", key, language);
+
+        if ((ret = av_dict_set(st ? &st->metadata : &s->metadata,
+                               lang_key, title_buf, 0)) < 0) {
+            av_log(s, AV_LOG_ERROR,
+                   "Failed to set %s metadata key %s to value %s: %s!\n",
+                   st ? "track" : "media",
+                   lang_key, title_buf,
+                   av_err2str(ret));
+            goto cleanup;
+        }
+    }
+
+    ret = av_dict_set(st ? &st->metadata : &s->metadata, key, title_buf, 0);
+    if (ret < 0)
+        av_log(s, AV_LOG_ERROR,
+               "Failed to set %s metadata key %s to value %s: %s!\n",
+               st ? "track" : "media",
+               key, title_buf,
+               av_err2str(ret));
+
+cleanup:
+    av_freep(&title_buf);
+
+    return ret;
+}
+
 static int mov_read_udta_string(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 {
     char tmp_key[AV_FOURCC_MAX_STRING_SIZE] = {0};
@@ -349,6 +454,8 @@  static int mov_read_udta_string(MOVContext *c, AVIOContext *pb, MOVAtom atom)
     case MKTAG( 's','o','s','n'): key = "sort_show";    break;
     case MKTAG( 's','t','i','k'): key = "media_type";
         parse = mov_metadata_int8_no_padding; break;
+    case MKTAG( 't','i','t','l'):
+        return mov_metadata_titl(c, pb, atom.size);
     case MKTAG( 't','r','k','n'): key = "track";
         parse = mov_metadata_track_or_disc_number; break;
     case MKTAG( 't','v','e','n'): key = "episode_id"; break;