From patchwork Thu Sep 29 22:28:14 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jehan_Pag=C3=A8s?= X-Patchwork-Id: 785 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.140.66 with SMTP id o63csp596088vsd; Thu, 29 Sep 2016 15:33:46 -0700 (PDT) X-Received: by 10.28.100.214 with SMTP id y205mr846672wmb.72.1475188426261; Thu, 29 Sep 2016 15:33:46 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id p4si1117482wma.39.2016.09.29.15.33.44; Thu, 29 Sep 2016 15:33:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 59463689CB2; Fri, 30 Sep 2016 01:33:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ua0-f176.google.com (mail-ua0-f176.google.com [209.85.217.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 395A5689C35 for ; Fri, 30 Sep 2016 01:33:24 +0300 (EEST) Received: by mail-ua0-f176.google.com with SMTP id q42so79129862uaq.2 for ; Thu, 29 Sep 2016 15:33:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=0GlID6fLtSrstCHkvbAuOpVSBpZz6sDcljxwXOeLAC8=; b=jWYz5OsQ6gB0iqt6ad/nSLMDO+4w51QUG0pLqXW5nLXmGMBog4hbPlQ2IIC8h12Y8y bNmTtdaBsP385dLMocwICGebGI1fY2RC3dvfIlYfOUmOw0EGbKm73DDzaasfCH1hI7n6 Sa9ugBhbOYz3o7S43wi0xKhzNUA5573E7uuiHHNbJv/nJs9/W+YWCKNenOQseiYPXlF/ 4Db+IF0I3KltlF44soOQ6SK3GDf84CMQm8q96fgGnF8oVKMuYwzy7a8XEzz7EBo9RTmZ HZnt69vaphdiMqtDFhytjjaUz3deiLObKJDRrTZybyXx1n+L8JscmrNMGQFRclt1I9Pe 5vcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=0GlID6fLtSrstCHkvbAuOpVSBpZz6sDcljxwXOeLAC8=; b=GFNzdZNpHNVC0qqr3JeiFhcrLt9V4dp1Mkjpi3UTqXEMVsHazbUuhSbgkcuJnaNqd2 NOvw4FYULtrWlpLBsaGxis5r0lmROawypARYMWF+kEVq3pDSv+kV+x58ONot9DCm6ZsK K3gCYgQU1kKUryIYQl9nsW4Y/NBzdg/piz2gWYdZ3sWPLr1OM+2W98Y4tXXxj1GM2IJU +JtWpR1kaKTuD4NK69s6ZKrAxIwffJ94EozMI4LGF6KV/5n8Kj7m7qS8LxIFN57QpVmp ay8Z7+8DJKau8bs2q5Ab83okVR/XsbaevhMd4m0yRvAlu9lsIfiA0GyqCWnocTzeS8rk Ucxw== X-Gm-Message-State: AA6/9RkpnXsY6uN4beAmaHTwM4elYbohPe8CIREqOos0fBOnSFrGFM/bt/RtzxDbiVWh0kCAmNpaZVcHWsG4BA== X-Received: by 10.176.86.198 with SMTP id c6mr3323851uab.114.1475188114688; Thu, 29 Sep 2016 15:28:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.159.49.24 with HTTP; Thu, 29 Sep 2016 15:28:14 -0700 (PDT) From: =?UTF-8?B?SmVoYW4gUGFnw6hz?= Date: Fri, 30 Sep 2016 00:28:14 +0200 Message-ID: To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] Improved SAMI support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Hello! So I posted a patch about SAMI support (https://trac.ffmpeg.org/ticket/3118), but I was told it would be ignored on the bug tracker, therefore to use the dev mailing list. So patch attached, and a small description: SAMI subtitles can have several languages (as explained in the spec: https://msdn.microsoft.com/en-us/library/ms971327.aspx). But currently this is not supported by ffmpeg. What ffmpeg does is simply using subtitles for all language, and when 2 subtitles uses the same timestamp (which obviously happen for most text in the file since they are made for the same video), the one later defined in the file will override the first definition. Example: > Some text in Korean (for instance) > […] > The same text but in English. Result: the Korean text will never be displayed. Also you can end up with mix of languages (versions of a subtitle in another lang may have additional timestamps which don't get overriden). As a consequence, I have to edit nearly every SMI file which comes into my hand and delete all text from the language I don't want myself. And that's extremely annoying. My attached patch is a first step. It will only take into account subtitles set for the default language (which means, the first defined in the SAMI file, cf. the spec), or without a language (therefore a subtitle can be common to all langs). Of course, the perfect version should extract 1 subtitle track per lang in the file. This first patch does not (maybe another one later!). But that's still a huge improvement from the current code since usually all the SMI files I find, the default language is indeed the one I need (since they were done for this purpose). I have used this patch for the last 2 days, and that's already a huge relief for me. :-) Could this be integrated into ffmpeg? Thanks! Jehan From 0cd4d92b98ecbf364891038b851740291cf75219 Mon Sep 17 00:00:00 2001 From: Jehan Date: Wed, 28 Sep 2016 03:28:52 +0200 Subject: [PATCH] avformat: basic language support in SAMI subtitles. Different languages are not yet extracted as separate subtitle tracks. This first basic version simply uses the default language and ignore any others. The spec says: "If the user (or author) has not explicitly selected a language, the first Class (language) definition will be used by default." See: https://msdn.microsoft.com/en-us/library/ms971327.aspx --- libavformat/realtextdec.c | 14 ++-- libavformat/samidec.c | 167 ++++++++++++++++++++++++++++++++++++++++++---- libavformat/subtitles.c | 22 +++++- libavformat/subtitles.h | 5 +- 4 files changed, 187 insertions(+), 21 deletions(-) diff --git a/libavformat/realtextdec.c b/libavformat/realtextdec.c index 618d4f7..c0b0e44 100644 --- a/libavformat/realtextdec.c +++ b/libavformat/realtextdec.c @@ -85,10 +85,12 @@ static int realtext_read_header(AVFormatContext *s) if (!av_strncasecmp(buf.str, "codecpar->extradata = av_strdup(buf.str); if (!st->codecpar->extradata) { res = AVERROR(ENOMEM); @@ -105,12 +107,16 @@ static int realtext_read_header(AVFormatContext *s) goto end; } if (!merge) { - const char *begin = ff_smil_get_attr_ptr(buf.str, "begin"); - const char *end = ff_smil_get_attr_ptr(buf.str, "end"); + char *begin = ff_smil_get_attr_ptr(buf.str, "begin"); + char *end = ff_smil_get_attr_ptr(buf.str, "end"); sub->pos = pos; sub->pts = begin ? read_ts(begin) : 0; sub->duration = end ? (read_ts(end) - sub->pts) : duration; + if (begin) + av_free(begin); + if (end) + av_free(end); } } av_bprint_clear(&buf); diff --git a/libavformat/samidec.c b/libavformat/samidec.c index 7ea1bdf..3bd16ea 100644 --- a/libavformat/samidec.c +++ b/libavformat/samidec.c @@ -26,6 +26,7 @@ #include "avformat.h" #include "internal.h" +#include "regex.h" #include "subtitles.h" #include "libavcodec/internal.h" #include "libavutil/avstring.h" @@ -34,6 +35,10 @@ typedef struct { FFDemuxSubtitlesQueue q; + char **lang_names; + char **lang_codes; + char **lang_ids; + int n_langs; } SAMIContext; static int sami_probe(AVProbeData *p) @@ -51,11 +56,19 @@ static int sami_read_header(AVFormatContext *s) SAMIContext *sami = s->priv_data; AVStream *st = avformat_new_stream(s, NULL); AVBPrint buf, hdr_buf; + int is_style = 0; char c = 0; int res = 0, got_first_sync_point = 0; + int ignore_sub; FFTextReader tr; + int64_t last_pos; + int64_t last_pts; + int new_sub = 0; + ff_text_init_avio(s, &tr, s->pb); + sami->n_langs = 0; + if (!st) return AVERROR(ENOMEM); avpriv_set_pts_info(st, 64, 1, 1000); @@ -65,10 +78,87 @@ static int sami_read_header(AVFormatContext *s) av_bprint_init(&buf, 0, AV_BPRINT_SIZE_UNLIMITED); av_bprint_init(&hdr_buf, 0, AV_BPRINT_SIZE_UNLIMITED); + /* Read header */ + while (!ff_text_eof(&tr)) { + int header_end; + int n = ff_smil_extract_next_text_chunk(&tr, &buf, &c); + + if (n == 0) + break; + + header_end = !av_strncasecmp(buf.str, "lang_names, + (sami->n_langs + 1) * sizeof(char*)); + lang_codes = av_realloc(sami->lang_codes, + (sami->n_langs + 1) * sizeof(char*)); + lang_ids = av_realloc(sami->lang_ids, + (sami->n_langs + 1) * sizeof(char*)); + if (lang_names && lang_codes && lang_ids) { + char *lang_name; + char *lang_code; + char *lang_id; + + lang_name = av_strndup(style_text + matchptr[2].rm_so, + matchptr[2].rm_eo - matchptr[2].rm_so); + + lang_code = av_strndup(style_text + matchptr[3].rm_so, + matchptr[3].rm_eo - matchptr[3].rm_so); + + lang_id = av_strndup(style_text + matchptr[1].rm_so, + matchptr[1].rm_eo - matchptr[1].rm_so); + + lang_names[sami->n_langs] = lang_name; + lang_codes[sami->n_langs] = lang_code; + lang_ids[sami->n_langs++] = lang_id; + + sami->lang_names = lang_names; + sami->lang_codes = lang_codes; + sami->lang_ids = lang_ids; + } + style_text = style_text + matchptr[0].rm_eo; + } + } + regfree(®ex); + } + } + av_bprint_clear(&buf); + } + + /* Read body. */ while (!ff_text_eof(&tr)) { AVPacket *sub; const int64_t pos = ff_text_pos(&tr) - (c != 0); - int is_sync, is_body, n = ff_smil_extract_next_text_chunk(&tr, &buf, &c); + int is_sync, is_body, is_p; + int n = ff_smil_extract_next_text_chunk(&tr, &buf, &c); if (n == 0) break; @@ -80,22 +170,60 @@ static int sami_read_header(AVFormatContext *s) } is_sync = !av_strncasecmp(buf.str, "n_langs) { + /* If no languages were set in