From patchwork Mon Feb 22 13:19:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Jan_Ekstr=C3=B6m?= X-Patchwork-Id: 25896 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id C48BA44ACFE for ; Mon, 22 Feb 2021 15:49:33 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ABDDB68A9C8; Mon, 22 Feb 2021 15:49:33 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A449468A326 for ; Mon, 22 Feb 2021 15:49:27 +0200 (EET) Received: by mail-ej1-f54.google.com with SMTP id r17so3094965ejy.13 for ; Mon, 22 Feb 2021 05:49:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=AhwI/dVVCzE+GVPkqDfJ0+bA6lfFLKdY502CmZ4OTvQ=; b=lLUMYysuANDBX+oHs7g55gOB7blsFmr3eRL2L1mq47sKm18LvGAjhDHVx2q8bQnlnk Zl/+EsupR17DES1UPjtiH6rbNkb3yhiAB6hsz8bqJWoSCJnxX4EOJVWQgP1DgsndmErU AftBrFNgVj6kEQVJmMyTuvf7gbmD0vACFYLLc2ti4a+VuXiV3LqppwVPTkLe8piT2/Az dHw//74ne1Ecx16H+3FbN4Jib5J2W7LEVkkE+w0KDi4a6fCNdmiqb+yf3xUqfLoOByS0 watf3Er+8C5tDv3ZbH3n99isnSVRlfdq/yygN3V5mki1apIqGgw4N9RoaukpnduwKG7t qz2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AhwI/dVVCzE+GVPkqDfJ0+bA6lfFLKdY502CmZ4OTvQ=; b=mHRSsILkdVAQF+tuYmbby7EWhy8xOvNr1SWW1Z2NJFtpNZYfv6Y66tIa770WRi5O5d JYX6hRbQ/FciTUOEUdtnGIRdpw/SaFjzTgebYuNxctuXcVOUbbV6n84MmM6C+Buo1+a+ yBxudT1A6z05gsPwFMSx9+QhD5iEjYsw2sxVVEW1aperEbTJCANg5/wzDQ7ybWiJNlyw bljJoiXgg/O2e+VRScO2fJ0KjQIKZ7PSDSo+u7RTvZxFJnNw8/LGEklNHkqfATq7rwca S9JUJ6poXP295ssyAgmMznyhXkmZrsi8qV9hynDK2tc7HnkaeF1km9qHx6sQ4+Kc7BZy FC1A== X-Gm-Message-State: AOAM532eDW+AG7ZZwc2vvRpFzEX3DZSQpjN7+Fm6xxzOdbjpL5Ri8dGu TZ61jMj+eXx9qmqodb1cvQaOOzT+04k= X-Google-Smtp-Source: ABdhPJwhR7U7r6rvsQucatL+ylCz2cmCGKN0wFcp3jmB91fKiOAuyNi/T2ictDKTlH1BO4HFOkvAjA== X-Received: by 2002:a05:6512:398e:: with SMTP id j14mr14465597lfu.9.1613999958326; Mon, 22 Feb 2021 05:19:18 -0800 (PST) Received: from localhost.localdomain (91-159-194-103.elisa-laajakaista.fi. [91.159.194.103]) by smtp.gmail.com with ESMTPSA id v26sm750256lfd.6.2021.02.22.05.19.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Feb 2021 05:19:17 -0800 (PST) From: =?utf-8?q?Jan_Ekstr=C3=B6m?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Feb 2021 15:19:11 +0200 Message-Id: <20210222131914.21335-2-jeebjp@gmail.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210222131914.21335-1-jeebjp@gmail.com> References: <20210222131914.21335-1-jeebjp@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v5 1/4] avutil/{avstring, bprint}: add XML escaping from ffprobe to avutil X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Stefano Sabatini Base escaping only escapes values required for base character data according to part 2.4 of XML, and if additional flags are added single and double quotes can additionally be escaped in order to handle single and double quoted attributes. --- libavutil/avstring.h | 14 ++++++++++++++ libavutil/bprint.c | 29 +++++++++++++++++++++++++++++ libavutil/version.h | 2 +- tools/ffescape.c | 7 +++++-- 4 files changed, 49 insertions(+), 3 deletions(-) diff --git a/libavutil/avstring.h b/libavutil/avstring.h index ee225585b3..fae446c302 100644 --- a/libavutil/avstring.h +++ b/libavutil/avstring.h @@ -324,6 +324,7 @@ enum AVEscapeMode { AV_ESCAPE_MODE_AUTO, ///< Use auto-selected escaping mode. AV_ESCAPE_MODE_BACKSLASH, ///< Use backslash escaping. AV_ESCAPE_MODE_QUOTE, ///< Use single-quote escaping. + AV_ESCAPE_MODE_XML, ///< Use XML non-markup character data escaping. }; /** @@ -343,6 +344,19 @@ enum AVEscapeMode { */ #define AV_ESCAPE_FLAG_STRICT (1 << 1) +/** + * Within AV_ESCAPE_MODE_XML, additionally escape single quotes for single + * quoted attributes. + */ +#define AV_ESCAPE_FLAG_XML_SINGLE_QUOTES (1 << 2) + +/** + * Within AV_ESCAPE_MODE_XML, additionally escape double quotes for double + * quoted attributes. + */ +#define AV_ESCAPE_FLAG_XML_DOUBLE_QUOTES (1 << 3) + + /** * Escape string in src, and put the escaped string in an allocated * string in *dst, which must be freed with av_free(). diff --git a/libavutil/bprint.c b/libavutil/bprint.c index 2f059c5ba6..e12fb263fe 100644 --- a/libavutil/bprint.c +++ b/libavutil/bprint.c @@ -283,6 +283,35 @@ void av_bprint_escape(AVBPrint *dstbuf, const char *src, const char *special_cha av_bprint_chars(dstbuf, '\'', 1); break; + case AV_ESCAPE_MODE_XML: + /* escape XML non-markup character data as per 2.4 by default: */ + /* [^<&]* - ([^<&]* ']]>' [^<&]*) */ + + /* additionally, given one of the AV_ESCAPE_FLAG_XML_* flags, */ + /* escape those specific characters as required. */ + for (; *src; src++) { + switch (*src) { + case '&' : av_bprintf(dstbuf, "%s", "&"); break; + case '<' : av_bprintf(dstbuf, "%s", "<"); break; + case '>' : av_bprintf(dstbuf, "%s", ">"); break; + case '\'': + if (!(flags & AV_ESCAPE_FLAG_XML_SINGLE_QUOTES)) + goto XML_DEFAULT_HANDLING; + + av_bprintf(dstbuf, "%s", "'"); + break; + case '"' : + if (!(flags & AV_ESCAPE_FLAG_XML_DOUBLE_QUOTES)) + goto XML_DEFAULT_HANDLING; + + av_bprintf(dstbuf, "%s", """); + break; +XML_DEFAULT_HANDLING: + default: av_bprint_chars(dstbuf, *src, 1); + } + } + break; + /* case AV_ESCAPE_MODE_BACKSLASH or unknown mode */ default: /* \-escape characters */ diff --git a/libavutil/version.h b/libavutil/version.h index b7c5892a37..356c54d633 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -79,7 +79,7 @@ */ #define LIBAVUTIL_VERSION_MAJOR 56 -#define LIBAVUTIL_VERSION_MINOR 66 +#define LIBAVUTIL_VERSION_MINOR 67 #define LIBAVUTIL_VERSION_MICRO 100 #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \ diff --git a/tools/ffescape.c b/tools/ffescape.c index 0530d28c6d..3647ae5d60 100644 --- a/tools/ffescape.c +++ b/tools/ffescape.c @@ -78,8 +78,10 @@ int main(int argc, char **argv) infilename = optarg; break; case 'f': - if (!strcmp(optarg, "whitespace")) escape_flags |= AV_ESCAPE_FLAG_WHITESPACE; - else if (!strcmp(optarg, "strict")) escape_flags |= AV_ESCAPE_FLAG_STRICT; + if (!strcmp(optarg, "whitespace")) escape_flags |= AV_ESCAPE_FLAG_WHITESPACE; + else if (!strcmp(optarg, "strict")) escape_flags |= AV_ESCAPE_FLAG_STRICT; + else if (!strcmp(optarg, "xml_single_quotes") escape_flags |= AV_ESCAPE_FLAG_XML_SINGLE_QUOTES); + else if (!strcmp(optarg, "xml_double_quotes") escape_flags |= AV_ESCAPE_FLAG_XML_DOUBLE_QUOTES); else { av_log(NULL, AV_LOG_ERROR, "Invalid value '%s' for option -f, " @@ -104,6 +106,7 @@ int main(int argc, char **argv) if (!strcmp(optarg, "auto")) escape_mode = AV_ESCAPE_MODE_AUTO; else if (!strcmp(optarg, "backslash")) escape_mode = AV_ESCAPE_MODE_BACKSLASH; else if (!strcmp(optarg, "quote")) escape_mode = AV_ESCAPE_MODE_QUOTE; + else if (!strcmp(optarg, "xml")) escape_mode = AV_ESCAPE_MODE_XML; else { av_log(NULL, AV_LOG_ERROR, "Invalid value '%s' for option -m, " From patchwork Mon Feb 22 13:19:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jan_Ekstr=C3=B6m?= X-Patchwork-Id: 25894 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id D94D144A67C for ; Mon, 22 Feb 2021 15:25:36 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C260668AA13; Mon, 22 Feb 2021 15:25:36 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 152D768AA13 for ; Mon, 22 Feb 2021 15:25:30 +0200 (EET) Received: by mail-ed1-f43.google.com with SMTP id i14so21634931eds.8 for ; Mon, 22 Feb 2021 05:25:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=T9NanDi8fBl4JQTecYlvXME0L0Ypt7sKuPhi4tlploc=; b=R3c5uoFfgUEKeTp/xxoXSzB33JM6Nzj7czLhc8mepF7k6ksFLMyrFLxfIc85JQFbIo SpD8+Ma1z5GAH2GXeqan9sjfjaq41Ze1FDbRyyxQfSuyqvFl/Q4f8rTaLc/Kfrh2kSgf g1wWLNoOebzO9vG12mSAPndH1UJWzkKlGaky8D9+ZfmGEN+Gt5xfmqJEL8cHqSyLxbIN uLO7qbDIAZ6zIaSQs3TD3w4Fj4WZ8aUQGe6bd+iZAMRbydjEUWs5WP2au/6vRGB3ZCja OkjRTLhEzU71u0pW3pmJtNkK9pZN5bYKMip6i89fT+9oU2telObssTeVr2R9oHim6i+3 Xx3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=T9NanDi8fBl4JQTecYlvXME0L0Ypt7sKuPhi4tlploc=; b=RqRUb662vNY+lpClLjwyXRM34oG0bTwcd0Zgp27IXKozgmtzcZXhg0qj5+DQWiniWR YlbtNuPpsTzEAs0enYVDsovOxLIfxIy8cBwHT760Y0Kqgy+tnCV43ZKo2f0khHJ6Axt6 dlKtf083gI+T/bOF5VTAqNXf8vkQZ/Grdd24S5wxLca6JhnR9YZZfhzrpB/rw6JpvkRu aHQD/VtAElKY+mWvUE/gb/bG2aZ7iafrU9L71WypHlzWVCr4Vw90iWN1bJRlZ6kPrPZf HeZaUj1LwMmc9+l4FxLg9dIWbql8SJoSQBAdiv93c4yqePg2ExJQrhUwy/5AdCl6aPiJ LkbA== X-Gm-Message-State: AOAM5338w7MGOm4ZvOrSaNjcWo8NNqvMqQeWHJDc7f3sJRNqCx4JYapv XLzGEf42icLTuNhCWShF3C5REc8szlQ= X-Google-Smtp-Source: ABdhPJyCAVpn80IQbZ+Hh5m0EqhwPk9BukXQDKQ9K7kSM7kz4v9SjVG8ndMhtnXEwlbLWyPyAvhTKA== X-Received: by 2002:a2e:8619:: with SMTP id a25mr591563lji.234.1613999959017; Mon, 22 Feb 2021 05:19:19 -0800 (PST) Received: from localhost.localdomain (91-159-194-103.elisa-laajakaista.fi. [91.159.194.103]) by smtp.gmail.com with ESMTPSA id v26sm750256lfd.6.2021.02.22.05.19.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Feb 2021 05:19:18 -0800 (PST) From: =?utf-8?q?Jan_Ekstr=C3=B6m?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Feb 2021 15:19:12 +0200 Message-Id: <20210222131914.21335-3-jeebjp@gmail.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210222131914.21335-1-jeebjp@gmail.com> References: <20210222131914.21335-1-jeebjp@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v5 2/4] ffprobe: switch to av_bprint_escape for XML escaping X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Jan Ekström Additionally update the result of the ffprobe XML writing test. Signed-off-by: Jan Ekström --- fftools/ffprobe.c | 32 +++++++++++--------------------- tests/ref/fate/ffprobe_xml | 2 +- 2 files changed, 12 insertions(+), 22 deletions(-) diff --git a/fftools/ffprobe.c b/fftools/ffprobe.c index 3453aa09ff..7635f781a6 100644 --- a/fftools/ffprobe.c +++ b/fftools/ffprobe.c @@ -1672,24 +1672,6 @@ static av_cold int xml_init(WriterContext *wctx) return 0; } -static const char *xml_escape_str(AVBPrint *dst, const char *src, void *log_ctx) -{ - const char *p; - - for (p = src; *p; p++) { - switch (*p) { - case '&' : av_bprintf(dst, "%s", "&"); break; - case '<' : av_bprintf(dst, "%s", "<"); break; - case '>' : av_bprintf(dst, "%s", ">"); break; - case '"' : av_bprintf(dst, "%s", """); break; - case '\'': av_bprintf(dst, "%s", "'"); break; - default: av_bprint_chars(dst, *p, 1); - } - } - - return dst->str; -} - #define XML_INDENT() printf("%*c", xml->indent_level * 4, ' ') static void xml_print_section_header(WriterContext *wctx) @@ -1761,14 +1743,22 @@ static void xml_print_str(WriterContext *wctx, const char *key, const char *valu if (section->flags & SECTION_FLAG_HAS_VARIABLE_FIELDS) { XML_INDENT(); + av_bprint_escape(&buf, key, NULL, + AV_ESCAPE_MODE_XML, AV_ESCAPE_FLAG_XML_DOUBLE_QUOTES); printf("<%s key=\"%s\"", - section->element_name, xml_escape_str(&buf, key, wctx)); + section->element_name, buf.str); av_bprint_clear(&buf); - printf(" value=\"%s\"/>\n", xml_escape_str(&buf, value, wctx)); + + av_bprint_escape(&buf, value, NULL, + AV_ESCAPE_MODE_XML, AV_ESCAPE_FLAG_XML_DOUBLE_QUOTES); + printf(" value=\"%s\"/>\n", buf.str); } else { if (wctx->nb_item[wctx->level]) printf(" "); - printf("%s=\"%s\"", key, xml_escape_str(&buf, value, wctx)); + + av_bprint_escape(&buf, value, NULL, + AV_ESCAPE_MODE_XML, AV_ESCAPE_FLAG_XML_DOUBLE_QUOTES); + printf("%s=\"%s\"", key, buf.str); } av_bprint_finalize(&buf, NULL); diff --git a/tests/ref/fate/ffprobe_xml b/tests/ref/fate/ffprobe_xml index ccea2874b2..bfff7bc332 100644 --- a/tests/ref/fate/ffprobe_xml +++ b/tests/ref/fate/ffprobe_xml @@ -51,7 +51,7 @@ - + From patchwork Mon Feb 22 13:19:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jan_Ekstr=C3=B6m?= X-Patchwork-Id: 25892 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 9405F44BB67 for ; Mon, 22 Feb 2021 15:25:15 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 51F1568A9BB; Mon, 22 Feb 2021 15:25:15 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CCAE4680A4F for ; Mon, 22 Feb 2021 15:25:08 +0200 (EET) Received: by mail-lj1-f177.google.com with SMTP id c8so57804339ljd.12 for ; Mon, 22 Feb 2021 05:25:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=lryvRkquRPEAmzrDAmThpdKSKfQ/4Y8sWmbevdzLD5Q=; b=U+DRwgGje74/D92P0jkr+OgeEG+bQC2crif16GSuFLeKshStBPxwNDeoJhZIuz9mMf axCwQB6q6xkhNfSJxZeFIlnmCb5K03rv7UB7LkFp/pXAcI/nHLV5/8DhdoOnP+2fXhRG vnpsmkP7m1mEPrpUilD1RYMy71aIkoUfSd5Tj1tkSXtovNU10O2g8yloyMiCM+sQkEoK 6XaQkcj+qhR5iK/QBm9GZOKHUhe6DJXvbEJ7gUhmxC3VR2uQ2HLelOdLc2+vL1emkOkn 73hEX7dAXZZLF4j28UyjVFCbGPy4hFLz6iw45XtHeXeTgTYjfX7mhACFEdPHRvgXIrEs rH8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lryvRkquRPEAmzrDAmThpdKSKfQ/4Y8sWmbevdzLD5Q=; b=K6Pn2h1HHO0Icu6yJKKqpEwFvg+lCR1s8Q9CMuGUUPvHStzNSZfryIQs3bwUrBvW0L 2J5z/9QgTk6FyetsHsZ3GtnyaKHq54F9K8Gi/vwTaJif1BODDZNdFbQ9p7jqByc6tzAa fJJ4u8rtseTXhqFIdWDbSuVEM9RX1AT49PCU69HX3WxwYrtEiHmWXEorDd0sFZfyRV+W plcnGz8M2vkZNcZoij8WzhkeeTIUHnqJTCNIRhDvioaQ4ZDFLfnJAJ4eR/KpNVFDo667 CDvbnnmhiXexDzO7PUpnGVlzcWsUNdu2gkCxIFFsmnVlBwvy4n+T8Fs+MluOd+/ullsy oIWg== X-Gm-Message-State: AOAM532brljkyjszaiQPnrhaU6halUKicHolw8tCjU/hTSGze3LCXEy6 5RK+1QAxECsOJHVkIZp+FNKwoAoks3Q= X-Google-Smtp-Source: ABdhPJy5ZMyvckuwLGKwFLeClgT3k9bNPnWzuw9X+L4MjgJu9+AaJcKMe009ILCow2xkoWm97h6aig== X-Received: by 2002:a05:6512:991:: with SMTP id w17mr14076790lft.148.1613999959711; Mon, 22 Feb 2021 05:19:19 -0800 (PST) Received: from localhost.localdomain (91-159-194-103.elisa-laajakaista.fi. [91.159.194.103]) by smtp.gmail.com with ESMTPSA id v26sm750256lfd.6.2021.02.22.05.19.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Feb 2021 05:19:19 -0800 (PST) From: =?utf-8?q?Jan_Ekstr=C3=B6m?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Feb 2021 15:19:13 +0200 Message-Id: <20210222131914.21335-4-jeebjp@gmail.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210222131914.21335-1-jeebjp@gmail.com> References: <20210222131914.21335-1-jeebjp@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v5 3/4] avcodec: enable usage of AV_EF_EXPLODE for subtitle encoders X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Jan Ekström As currently it is the responsibility of the following subtitle encoder to validate the correctness of the incoming ASS dialog line. Signed-off-by: Jan Ekström --- libavcodec/avcodec.h | 5 ++++- libavcodec/options_table.h | 4 ++-- libavcodec/version.h | 2 +- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index 7dbf083a24..9fffe8bcec 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -1634,7 +1634,10 @@ typedef struct AVCodecContext { /** * Error recognition; may misdetect some more or less valid parts as errors. - * - encoding: unused + * - encoding: Set by user. Currently only AV_EF_EXPLODE is valid, and only + * for subtitle encoders, since checking the validity of an ASS + * dialog line is currently the responsibility of the following + * encoder. * - decoding: Set by user. */ int err_recognition; diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h index ded9de4d67..f159a4b410 100644 --- a/libavcodec/options_table.h +++ b/libavcodec/options_table.h @@ -140,11 +140,11 @@ static const AVOption avcodec_options[] = { {"unofficial", "allow unofficial extensions", 0, AV_OPT_TYPE_CONST, {.i64 = FF_COMPLIANCE_UNOFFICIAL }, INT_MIN, INT_MAX, A|V|D|E, "strict"}, {"experimental", "allow non-standardized experimental things", 0, AV_OPT_TYPE_CONST, {.i64 = FF_COMPLIANCE_EXPERIMENTAL }, INT_MIN, INT_MAX, A|V|D|E, "strict"}, {"b_qoffset", "QP offset between P- and B-frames", OFFSET(b_quant_offset), AV_OPT_TYPE_FLOAT, {.dbl = 1.25 }, -FLT_MAX, FLT_MAX, V|E}, -{"err_detect", "set error detection flags", OFFSET(err_recognition), AV_OPT_TYPE_FLAGS, {.i64 = 0 }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, +{"err_detect", "set error detection flags", OFFSET(err_recognition), AV_OPT_TYPE_FLAGS, {.i64 = 0 }, INT_MIN, INT_MAX, A|V|D|E, "err_detect"}, {"crccheck", "verify embedded CRCs", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_CRCCHECK }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, {"bitstream", "detect bitstream specification deviations", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_BITSTREAM }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, {"buffer", "detect improper bitstream length", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_BUFFER }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, -{"explode", "abort decoding on minor error detection", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_EXPLODE }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, +{"explode", "abort decoding on minor error detection", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_EXPLODE }, INT_MIN, INT_MAX, A|V|D|E, "err_detect"}, {"ignore_err", "ignore errors", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_IGNORE_ERR }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, {"careful", "consider things that violate the spec, are fast to check and have not been seen in the wild as errors", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_CAREFUL }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, {"compliant", "consider all spec non compliancies as errors", 0, AV_OPT_TYPE_CONST, {.i64 = AV_EF_COMPLIANT | AV_EF_CAREFUL }, INT_MIN, INT_MAX, A|V|D, "err_detect"}, diff --git a/libavcodec/version.h b/libavcodec/version.h index 9325da1028..85fbbfe766 100644 --- a/libavcodec/version.h +++ b/libavcodec/version.h @@ -29,7 +29,7 @@ #define LIBAVCODEC_VERSION_MAJOR 58 #define LIBAVCODEC_VERSION_MINOR 125 -#define LIBAVCODEC_VERSION_MICRO 100 +#define LIBAVCODEC_VERSION_MICRO 101 #define LIBAVCODEC_VERSION_INT AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \ LIBAVCODEC_VERSION_MINOR, \ From patchwork Mon Feb 22 13:19:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jan_Ekstr=C3=B6m?= X-Patchwork-Id: 25895 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id C3610449C23 for ; Mon, 22 Feb 2021 15:48:11 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9789068A9CE; Mon, 22 Feb 2021 15:48:11 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 424FB689E14 for ; Mon, 22 Feb 2021 15:48:04 +0200 (EET) Received: by mail-ej1-f43.google.com with SMTP id j6so2360292eja.12 for ; Mon, 22 Feb 2021 05:48:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=YEbHG5LHEMrbTvDlRUIsYaEq/I6kaPoSKZ0vT8EnTSY=; b=mumpXHNNxnPaLpxC/19k0XEEW6gd9HxeqdNQFfIXm93fJQm+aAAm16PFo9Lo9J5BUt zwECvuXK4p+MrealF07fO9julbI0c0a+sb5kM3yKqZCRPBBLFYfFU9Mw4q35Ic3F+JCL OzulFHtNaC5NxSCZhI43QECGBUWl52Rm3jjDZZLvrsn09ZmdFUYnZVMlGjyi8AprDHlF hTbpB6JsoUNf6KeNUvN7u+dDtnhARYiwQJC9jliddMKmmDvaT9InxeHIYGTTm6P/pynm DF9/70ejOILoqlGm78EZWtHvUlcYRpi8o1f2/k6wS3YnvzJSGI8nbbirY3NNXpgqO5rV pfWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YEbHG5LHEMrbTvDlRUIsYaEq/I6kaPoSKZ0vT8EnTSY=; b=Lqz0zAvYHJtZSUBqIi098DUq6XFXIifERx1o7g40DdJLaMBiAHY1aiAFye2es8yKze IDFCkwLXgP8wYwJmPDvtuorcjOxc7HQXka2orsQl5cbpesDU4BM6GjJvzliNqNRgcHhq ydPCwzxu052ZXVI2G3mwMEd8f+vHgODZ8pZzjF88KHOm4epzORe52KZn5JDxfVIGTmi+ SLhBqRuKcB239oISkUggbzFXHc1wyd2UkDQhEggdsV6Q7XBDlVPSjyGXeTAfL38IdF4g 0ED63OTsPHcbD6ailReNndCP7UD0AblR3ItXKKG1e5oNuQBuR5Ul9f8I91E3Oc3X8xAY hWXQ== X-Gm-Message-State: AOAM532kRuNjflCStuOZWPscqzD8CORVg4MPAijn3vBSEniPkSZbKHnl sBNPTbgDFhJ/CAwdB2MomHUcrnd/l3w= X-Google-Smtp-Source: ABdhPJwBxhCkd7j/sy4nQFyw1Zsf7WmB4x6e1LpT2anvf0JEwo3ZG/5khJUAKWAARnQqaamWy/AcjQ== X-Received: by 2002:a2e:9685:: with SMTP id q5mr12548558lji.344.1613999960671; Mon, 22 Feb 2021 05:19:20 -0800 (PST) Received: from localhost.localdomain (91-159-194-103.elisa-laajakaista.fi. [91.159.194.103]) by smtp.gmail.com with ESMTPSA id v26sm750256lfd.6.2021.02.22.05.19.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Feb 2021 05:19:20 -0800 (PST) From: =?utf-8?q?Jan_Ekstr=C3=B6m?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Feb 2021 15:19:14 +0200 Message-Id: <20210222131914.21335-5-jeebjp@gmail.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210222131914.21335-1-jeebjp@gmail.com> References: <20210222131914.21335-1-jeebjp@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v5 4/4] {avcodec, avformat}: add TTML encoder and muxer X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Jan Ekström Enables encoding of other subtitle formats into TTML and writing them out as such documents. Signed-off-by: Jan Ekström --- Changelog | 1 + doc/general_contents.texi | 1 + libavcodec/Makefile | 1 + libavcodec/allcodecs.c | 1 + libavcodec/ttmlenc.c | 210 +++++++++++++++++++++++++++++++++++++ libavcodec/ttmlenc.h | 28 +++++ libavcodec/version.h | 4 +- libavformat/Makefile | 1 + libavformat/allformats.c | 1 + libavformat/ttmlenc.c | 174 ++++++++++++++++++++++++++++++ tests/fate/subtitles.mak | 3 + tests/ref/fate/sub-ttmlenc | 122 +++++++++++++++++++++ 12 files changed, 545 insertions(+), 2 deletions(-) create mode 100644 libavcodec/ttmlenc.c create mode 100644 libavcodec/ttmlenc.h create mode 100644 libavformat/ttmlenc.c create mode 100644 tests/ref/fate/sub-ttmlenc diff --git a/Changelog b/Changelog index ba5aebf79d..4c8d8a8b29 100644 --- a/Changelog +++ b/Changelog @@ -77,6 +77,7 @@ version : - OpenEXR image encoder - Simbiosis IMX decoder - Simbiosis IMX demuxer +- TTML subtitle encoder and muxer version 4.3: diff --git a/doc/general_contents.texi b/doc/general_contents.texi index 6acdf441d6..58c9bcf747 100644 --- a/doc/general_contents.texi +++ b/doc/general_contents.texi @@ -1352,6 +1352,7 @@ performance on systems without hardware floating point support). @item SubViewer v1 @tab @tab X @tab @tab X @item SubViewer @tab @tab X @tab @tab X @item TED Talks captions @tab @tab X @tab @tab X +@item TTML @tab X @tab @tab X @tab @item VobSub (IDX+SUB) @tab @tab X @tab @tab X @item VPlayer @tab @tab X @tab @tab X @item WebVTT @tab X @tab X @tab X @tab X diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 6ec1d360ac..4b1be2e37c 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -672,6 +672,7 @@ OBJS-$(CONFIG_TSCC_DECODER) += tscc.o msrledec.o OBJS-$(CONFIG_TSCC2_DECODER) += tscc2.o OBJS-$(CONFIG_TTA_DECODER) += tta.o ttadata.o ttadsp.o OBJS-$(CONFIG_TTA_ENCODER) += ttaenc.o ttaencdsp.o ttadata.o +OBJS-$(CONFIG_TTML_ENCODER) += ttmlenc.o ass_split.o OBJS-$(CONFIG_TWINVQ_DECODER) += twinvqdec.o twinvq.o metasound_data.o OBJS-$(CONFIG_TXD_DECODER) += txd.o OBJS-$(CONFIG_ULTI_DECODER) += ulti.o diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c index 354d146379..a91f19ae31 100644 --- a/libavcodec/allcodecs.c +++ b/libavcodec/allcodecs.c @@ -689,6 +689,7 @@ extern AVCodec ff_subviewer_decoder; extern AVCodec ff_subviewer1_decoder; extern AVCodec ff_text_encoder; extern AVCodec ff_text_decoder; +extern AVCodec ff_ttml_encoder; extern AVCodec ff_vplayer_decoder; extern AVCodec ff_webvtt_encoder; extern AVCodec ff_webvtt_decoder; diff --git a/libavcodec/ttmlenc.c b/libavcodec/ttmlenc.c new file mode 100644 index 0000000000..3972b4368c --- /dev/null +++ b/libavcodec/ttmlenc.c @@ -0,0 +1,210 @@ +/* + * TTML subtitle encoder + * Copyright (c) 2020 24i + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * TTML subtitle encoder + * @see https://www.w3.org/TR/ttml1/ + * @see https://www.w3.org/TR/ttml2/ + * @see https://www.w3.org/TR/ttml-imsc/rec + */ + +#include "avcodec.h" +#include "internal.h" +#include "libavutil/avstring.h" +#include "libavutil/bprint.h" +#include "libavutil/internal.h" +#include "ass_split.h" +#include "ass.h" +#include "ttmlenc.h" + +typedef struct { + AVCodecContext *avctx; + ASSSplitContext *ass_ctx; + AVBPrint buffer; +} TTMLContext; + +static void ttml_text_cb(void *priv, const char *text, int len) +{ + TTMLContext *s = priv; + AVBPrint cur_line = { 0 }; + AVBPrint *buffer = &s->buffer; + + av_bprint_init(&cur_line, len, AV_BPRINT_SIZE_UNLIMITED); + + av_bprint_append_data(&cur_line, text, len); + if (!av_bprint_is_complete(&cur_line)) { + av_log(s->avctx, AV_LOG_ERROR, + "Failed to move the current subtitle dialog to AVBPrint!\n"); + av_bprint_finalize(&cur_line, NULL); + return; + } + + + av_bprint_escape(buffer, cur_line.str, NULL, AV_ESCAPE_MODE_XML, + 0); + + av_bprint_finalize(&cur_line, NULL); +} + +static void ttml_new_line_cb(void *priv, int forced) +{ + TTMLContext *s = priv; + + av_bprintf(&s->buffer, "
"); +} + +static const ASSCodesCallbacks ttml_callbacks = { + .text = ttml_text_cb, + .new_line = ttml_new_line_cb, +}; + +static int ttml_encode_frame(AVCodecContext *avctx, uint8_t *buf, + int bufsize, const AVSubtitle *sub) +{ + TTMLContext *s = avctx->priv_data; + ASSDialog *dialog; + int i; + + av_bprint_clear(&s->buffer); + + for (i=0; inum_rects; i++) { + const char *ass = sub->rects[i]->ass; + + if (sub->rects[i]->type != SUBTITLE_ASS) { + av_log(avctx, AV_LOG_ERROR, "Only SUBTITLE_ASS type supported.\n"); + return AVERROR(EINVAL); + } + +#if FF_API_ASS_TIMING + if (!strncmp(ass, "Dialogue: ", 10)) { + int num; + dialog = ff_ass_split_dialog(s->ass_ctx, ass, 0, &num); + + for (; dialog && num--; dialog++) { + int ret = ff_ass_split_override_codes(&ttml_callbacks, s, + dialog->text); + int log_level = (ret != AVERROR_INVALIDDATA || + avctx->err_recognition & AV_EF_EXPLODE) ? + AV_LOG_ERROR : AV_LOG_WARNING; + + if (ret < 0) { + av_log(avctx, log_level, + "Splitting received ASS dialog failed: %s\n", + av_err2str(ret)); + + if (log_level == AV_LOG_ERROR) + return ret; + } + } + } else { +#endif + dialog = ff_ass_split_dialog2(s->ass_ctx, ass); + if (!dialog) + return AVERROR(ENOMEM); + + { + int ret = ff_ass_split_override_codes(&ttml_callbacks, s, + dialog->text); + int log_level = (ret != AVERROR_INVALIDDATA || + avctx->err_recognition & AV_EF_EXPLODE) ? + AV_LOG_ERROR : AV_LOG_WARNING; + + if (ret < 0) { + av_log(avctx, log_level, + "Splitting received ASS dialog text %s failed: %s\n", + dialog->text, + av_err2str(ret)); + + if (log_level == AV_LOG_ERROR) { + ff_ass_free_dialog(&dialog); + return ret; + } + } + + ff_ass_free_dialog(&dialog); + } +#if FF_API_ASS_TIMING + } +#endif + } + + if (!av_bprint_is_complete(&s->buffer)) + return AVERROR(ENOMEM); + if (!s->buffer.len) + return 0; + + // force null-termination, so in case our destination buffer is + // too small, the return value is larger than bufsize minus null. + if (av_strlcpy(buf, s->buffer.str, bufsize) > bufsize - 1) { + av_log(avctx, AV_LOG_ERROR, "Buffer too small for TTML event.\n"); + return AVERROR_BUFFER_TOO_SMALL; + } + + return s->buffer.len; +} + +static av_cold int ttml_encode_close(AVCodecContext *avctx) +{ + TTMLContext *s = avctx->priv_data; + + ff_ass_split_free(s->ass_ctx); + + av_bprint_finalize(&s->buffer, NULL); + + return 0; +} + +static av_cold int ttml_encode_init(AVCodecContext *avctx) +{ + TTMLContext *s = avctx->priv_data; + + s->avctx = avctx; + + if (!(s->ass_ctx = ff_ass_split(avctx->subtitle_header))) { + return AVERROR_INVALIDDATA; + } + + if (!(avctx->extradata = av_mallocz(TTMLENC_EXTRADATA_SIGNATURE_SIZE + + 1 + AV_INPUT_BUFFER_PADDING_SIZE))) { + return AVERROR(ENOMEM); + } + + avctx->extradata_size = TTMLENC_EXTRADATA_SIGNATURE_SIZE; + memcpy(avctx->extradata, TTMLENC_EXTRADATA_SIGNATURE, + TTMLENC_EXTRADATA_SIGNATURE_SIZE); + + av_bprint_init(&s->buffer, 0, AV_BPRINT_SIZE_UNLIMITED); + + return 0; +} + +AVCodec ff_ttml_encoder = { + .name = "ttml", + .long_name = NULL_IF_CONFIG_SMALL("TTML subtitle"), + .type = AVMEDIA_TYPE_SUBTITLE, + .id = AV_CODEC_ID_TTML, + .priv_data_size = sizeof(TTMLContext), + .init = ttml_encode_init, + .encode_sub = ttml_encode_frame, + .close = ttml_encode_close, + .capabilities = FF_CODEC_CAP_INIT_CLEANUP, +}; diff --git a/libavcodec/ttmlenc.h b/libavcodec/ttmlenc.h new file mode 100644 index 0000000000..c1dd5ec990 --- /dev/null +++ b/libavcodec/ttmlenc.h @@ -0,0 +1,28 @@ +/* + * TTML subtitle encoder shared functionality + * Copyright (c) 2020 24i + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_TTMLENC_H +#define AVCODEC_TTMLENC_H + +#define TTMLENC_EXTRADATA_SIGNATURE "lavc-ttmlenc" +#define TTMLENC_EXTRADATA_SIGNATURE_SIZE (sizeof(TTMLENC_EXTRADATA_SIGNATURE) - 1) + +#endif /* AVCODEC_TTMLENC_H */ diff --git a/libavcodec/version.h b/libavcodec/version.h index 85fbbfe766..8396d68cde 100644 --- a/libavcodec/version.h +++ b/libavcodec/version.h @@ -28,8 +28,8 @@ #include "libavutil/version.h" #define LIBAVCODEC_VERSION_MAJOR 58 -#define LIBAVCODEC_VERSION_MINOR 125 -#define LIBAVCODEC_VERSION_MICRO 101 +#define LIBAVCODEC_VERSION_MINOR 126 +#define LIBAVCODEC_VERSION_MICRO 100 #define LIBAVCODEC_VERSION_INT AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \ LIBAVCODEC_VERSION_MINOR, \ diff --git a/libavformat/Makefile b/libavformat/Makefile index fcb39ce133..330eb83796 100644 --- a/libavformat/Makefile +++ b/libavformat/Makefile @@ -545,6 +545,7 @@ OBJS-$(CONFIG_TRUEHD_DEMUXER) += rawdec.o mlpdec.o OBJS-$(CONFIG_TRUEHD_MUXER) += rawenc.o OBJS-$(CONFIG_TTA_DEMUXER) += tta.o apetag.o img2.o OBJS-$(CONFIG_TTA_MUXER) += ttaenc.o apetag.o img2.o +OBJS-$(CONFIG_TTML_MUXER) += ttmlenc.o OBJS-$(CONFIG_TTY_DEMUXER) += tty.o sauce.o OBJS-$(CONFIG_TY_DEMUXER) += ty.o OBJS-$(CONFIG_TXD_DEMUXER) += txd.o diff --git a/libavformat/allformats.c b/libavformat/allformats.c index 3b69423508..8b98c68e3b 100644 --- a/libavformat/allformats.c +++ b/libavformat/allformats.c @@ -444,6 +444,7 @@ extern AVInputFormat ff_truehd_demuxer; extern AVOutputFormat ff_truehd_muxer; extern AVInputFormat ff_tta_demuxer; extern AVOutputFormat ff_tta_muxer; +extern AVOutputFormat ff_ttml_muxer; extern AVInputFormat ff_txd_demuxer; extern AVInputFormat ff_tty_demuxer; extern AVInputFormat ff_ty_demuxer; diff --git a/libavformat/ttmlenc.c b/libavformat/ttmlenc.c new file mode 100644 index 0000000000..940f8bbd4e --- /dev/null +++ b/libavformat/ttmlenc.c @@ -0,0 +1,174 @@ +/* + * TTML subtitle muxer + * Copyright (c) 2020 24i + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * TTML subtitle muxer + * @see https://www.w3.org/TR/ttml1/ + * @see https://www.w3.org/TR/ttml2/ + * @see https://www.w3.org/TR/ttml-imsc/rec + */ + +#include "avformat.h" +#include "internal.h" +#include "libavcodec/ttmlenc.h" +#include "libavutil/internal.h" + +enum TTMLPacketType { + PACKET_TYPE_PARAGRAPH, + PACKET_TYPE_DOCUMENT, +}; + +typedef struct TTMLMuxContext { + enum TTMLPacketType input_type; + unsigned int document_written; +} TTMLMuxContext; + +static const char ttml_header_text[] = +"\n" +"\n" +" \n" +"
\n"; + +static const char ttml_footer_text[] = +"
\n" +" \n" +"\n"; + +static void ttml_write_time(AVIOContext *pb, const char tag[], + int64_t millisec) +{ + int64_t sec, min, hour; + sec = millisec / 1000; + millisec -= 1000 * sec; + min = sec / 60; + sec -= 60 * min; + hour = min / 60; + min -= 60 * hour; + + avio_printf(pb, "%s=\"%02"PRId64":%02"PRId64":%02"PRId64".%03"PRId64"\"", + tag, hour, min, sec, millisec); +} + +static int ttml_write_header(AVFormatContext *ctx) +{ + TTMLMuxContext *ttml_ctx = ctx->priv_data; + ttml_ctx->document_written = 0; + + if (ctx->nb_streams != 1 || + ctx->streams[0]->codecpar->codec_id != AV_CODEC_ID_TTML) { + av_log(ctx, AV_LOG_ERROR, "Exactly one TTML stream is required!\n"); + return AVERROR(EINVAL); + } + + { + AVStream *st = ctx->streams[0]; + AVIOContext *pb = ctx->pb; + + AVDictionaryEntry *lang = av_dict_get(st->metadata, "language", NULL, + 0); + const char *printed_lang = (lang && lang->value) ? lang->value : ""; + + // Not perfect, but decide whether the packet is a document or not + // by the existence of the lavc ttmlenc extradata. + ttml_ctx->input_type = (st->codecpar->extradata && + st->codecpar->extradata_size >= TTMLENC_EXTRADATA_SIGNATURE_SIZE && + !memcmp(st->codecpar->extradata, + TTMLENC_EXTRADATA_SIGNATURE, + TTMLENC_EXTRADATA_SIGNATURE_SIZE)) ? + PACKET_TYPE_PARAGRAPH : + PACKET_TYPE_DOCUMENT; + + avpriv_set_pts_info(st, 64, 1, 1000); + + if (ttml_ctx->input_type == PACKET_TYPE_PARAGRAPH) + avio_printf(pb, ttml_header_text, printed_lang); + } + + return 0; +} + +static int ttml_write_packet(AVFormatContext *ctx, AVPacket *pkt) +{ + TTMLMuxContext *ttml_ctx = ctx->priv_data; + AVIOContext *pb = ctx->pb; + + switch (ttml_ctx->input_type) { + case PACKET_TYPE_PARAGRAPH: + // write out a paragraph element with the given contents. + avio_printf(pb, " pts); + avio_w8(pb, '\n'); + ttml_write_time(pb, " end", pkt->pts + pkt->duration); + avio_printf(pb, ">"); + avio_write(pb, pkt->data, pkt->size); + avio_printf(pb, "

\n"); + break; + case PACKET_TYPE_DOCUMENT: + // dump the given document out as-is. + if (ttml_ctx->document_written) { + av_log(ctx, AV_LOG_ERROR, + "Attempting to write multiple TTML documents into a " + "single document! The XML specification forbids this " + "as there has to be a single root tag.\n"); + return AVERROR(EINVAL); + } + avio_write(pb, pkt->data, pkt->size); + ttml_ctx->document_written = 1; + break; + default: + av_log(ctx, AV_LOG_ERROR, + "Internal error: invalid TTML input packet type: %d!\n", + ttml_ctx->input_type); + return AVERROR_BUG; + } + + return 0; +} + +static int ttml_write_trailer(AVFormatContext *ctx) +{ + TTMLMuxContext *ttml_ctx = ctx->priv_data; + AVIOContext *pb = ctx->pb; + + if (ttml_ctx->input_type == PACKET_TYPE_PARAGRAPH) + avio_printf(pb, ttml_footer_text); + + return 0; +} + +AVOutputFormat ff_ttml_muxer = { + .name = "ttml", + .long_name = NULL_IF_CONFIG_SMALL("TTML subtitle"), + .extensions = "ttml", + .mime_type = "text/ttml", + .priv_data_size = sizeof(TTMLMuxContext), + .flags = AVFMT_GLOBALHEADER | AVFMT_VARIABLE_FPS | + AVFMT_TS_NONSTRICT, + .subtitle_codec = AV_CODEC_ID_TTML, + .write_header = ttml_write_header, + .write_packet = ttml_write_packet, + .write_trailer = ttml_write_trailer, +}; diff --git a/tests/fate/subtitles.mak b/tests/fate/subtitles.mak index 6323d0f93d..ee65afe35b 100644 --- a/tests/fate/subtitles.mak +++ b/tests/fate/subtitles.mak @@ -106,6 +106,9 @@ fate-sub-scc: CMD = fmtstdout ass -ss 57 -i $(TARGET_SAMPLES)/sub/witch.scc FATE_SUBTITLES-$(call ALLYES, MPEGTS_DEMUXER DVBSUB_DECODER DVBSUB_ENCODER) += fate-sub-dvb fate-sub-dvb: CMD = framecrc -i $(TARGET_SAMPLES)/sub/dvbsubtest_filter.ts -map s:0 -c dvbsub +FATE_SUBTITLES-$(call ALLYES, FILE_PROTOCOL PIPE_PROTOCOL SRT_DEMUXER SUBRIP_DECODER TTML_ENCODER TTML_MUXER) += fate-sub-ttmlenc +fate-sub-ttmlenc: CMD = fmtstdout ttml -i $(TARGET_SAMPLES)/sub/SubRip_capability_tester.srt + FATE_SUBTITLES-$(call ENCMUX, ASS, ASS) += $(FATE_SUBTITLES_ASS-yes) FATE_SUBTITLES += $(FATE_SUBTITLES-yes) diff --git a/tests/ref/fate/sub-ttmlenc b/tests/ref/fate/sub-ttmlenc new file mode 100644 index 0000000000..51eab97817 --- /dev/null +++ b/tests/ref/fate/sub-ttmlenc @@ -0,0 +1,122 @@ + + + +
+

Don't show this text it may be used to insert hidden data

+

SubRip subtitles capability tester 1.3o by ale5000
Use VLC 1.1 or higher as reference for most things and MPC Home Cinema for others
This text should be blue
This text should be red
This text should be black
If you see this with the normal font, the player don't (fully) support font face

+

Hidden

+

This text should be small
This text should be normal
This text should be big

+

This should be an E with an accent: È
日本語
This text should be bold, italics and underline
This text should be small and green
This text should be small and red
This text should be big and brown

+

This line should be bold
This line should be italics
This line should be underline
This line should be strikethrough
Both lines
should be underline

+

>
It would be a good thing to
hide invalid html tags that are closed and show the text in them
but show un-closed invalid html tags
Show not opened tags
<

+

and also
hide invalid html tags with parameters that are closed and show the text in them
but show un-closed invalid html tags
This text should be showed underlined without problems also: 2<3,5>1,4<6
This shouldn't be underlined

+

This text should be in the normal position...

+

This text should NOT be in the normal position

+

Implementation is the same of the ASS tag
This text should be at the
top and horizontally centered

+

This text should be at the
middle and horizontally centered

+

This text should be at the
bottom and horizontally centered

+

This text should be at the
top and horizontally at the left

+

This text should be at the
middle and horizontally at the left
(The second position must be ignored)

+

This text should be at the
bottom and horizontally at the left

+

This text should be at the
top and horizontally at the right

+

This text should be at the
middle and horizontally at the right

+

This text should be at the
bottom and horizontally at the right

+

This could be the most difficult thing to implement

+

First text

+

Second, it shouldn't overlap first

+

Third, it should replace second

+

Fourth, it shouldn't overlap first and third

+

Fifth, it should replace third

+

Sixth, it shouldn't be
showed overlapped

+

TEXT 1 (bottom)

+

text 2

+

Hide these tags:
also hide these tags:
but show this: {normal text}

+


\ N is a forced line break
\ h is a hard space
Normal spaces at the start and at the end of the line are trimmed while hard spaces are not trimmed.
The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hhard\hspace.\h:-D

+


\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)

+

\h\h\h\h\hA (05 hard spaces followed by a letter)
A (Normal spaces followed by a letter)
A (No hard spaces followed by a letter)
Show this: \TEST and this: \-)

+


A letter followed by 05 hard spaces: A\h\h\h\h\h
A letter followed by normal spaces: A
A letter followed by no hard spaces: A
05 hard spaces between letters: A\h\h\h\h\hA
5 normal spaces between letters: A A

^--Forced line break

+

Both line should be strikethrough,
yes.
Correctly closed tags
should be hidden.

+

It shouldn't be strikethrough,
not opened tag showed as text.
Not opened tag showed as text.

+

Three lines should be strikethrough,
yes.
Not closed tags showed as text

+

Both line should be strikethrough but
the wrong closing tag should be showed

+
+ +