From patchwork Wed Apr 12 07:11:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rodger Combs X-Patchwork-Id: 3374 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.3.129 with SMTP id 123csp138857vsd; Wed, 12 Apr 2017 00:11:57 -0700 (PDT) X-Received: by 10.28.107.14 with SMTP id g14mr19035291wmc.106.1491981117731; Wed, 12 Apr 2017 00:11:57 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 7si6662358wmz.168.2017.04.12.00.11.57; Wed, 12 Apr 2017 00:11:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5BE27689759; Wed, 12 Apr 2017 10:11:40 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-io0-f180.google.com (mail-io0-f180.google.com [209.85.223.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1E72B689728 for ; Wed, 12 Apr 2017 10:11:34 +0300 (EEST) Received: by mail-io0-f180.google.com with SMTP id a103so30942122ioj.1 for ; Wed, 12 Apr 2017 00:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=BaebM0314oZ6Hhs0XSdJVpuffF1ilPKCz5Z5Zu/Nf9g=; b=Qo+FNug98lqPE8CaJOVTuoQiiPs7ZDoVoARVPCm8ajv/SthFQTNbD73HnggPyGQJIR KTqvRV8D2/ogxBPQySd9TsFzBMqxG6uj/aGzttTgchhghS5ej8W00QftVszStDMlOE/q JA1A6WvAZMRkMqmZoj3cEufQrwwVkiLqQD2YrID04U3UyI/0LwUFQMFPcBXbs22u1e+C 0kRBMWb4C/cpacfFVBIg8omhmfuNTQ6prBbwiJleiEgySF08+iCY1PCD1sEZIwxYGOgZ UC2tEES9hoYEebm0D6Hi4+fnBRLq0i483jlwF58Uex8l9fSUrYq8109TbuanZ0NpluDf ERMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=BaebM0314oZ6Hhs0XSdJVpuffF1ilPKCz5Z5Zu/Nf9g=; b=mHMd0c8j9k/78JYGgRr9cXkKXbmR6yc2mwCV2emdrqeJNg2BXuodFlX2IA3mQ5wS5V fFq05/SlQn4jfMli+G5U+53biLQ71ozmqyW3VNwyrv9oKgVhcKGwQaBVe2rJ3hcqnaXV Q7qepnc8p47toIHsrTeZOYkC9lLttxmVuCweLkak2d5HKjqSxF+rQ8nYwVQ925v+h0Dp VWXMu9duNCM6UiIpvZ5ANvRzbzy8TiNUa9PoyeoDfhRSvJ5JLQ3GMYRuKBUPw7PRuUF3 EenvMmjiJZ542f1eudRwHTvWfNA1vk3uMQXT9GI/kAjWxfUeQfzplXsrNPIh4dhFFUYc 8wQA== X-Gm-Message-State: AN3rC/4OhbwCP8remFJ3qdFFHTiCEjTEKSw+7lvEKkIYZ+aMeltE+8vUxEA+D50dxqKUFQ== X-Received: by 10.107.143.146 with SMTP id r140mr734642iod.148.1491981098319; Wed, 12 Apr 2017 00:11:38 -0700 (PDT) Received: from Rodgers-MacBook-Pro.local.net (c-73-110-121-59.hsd1.il.comcast.net. [73.110.121.59]) by smtp.gmail.com with ESMTPSA id p3sm3857213ioi.20.2017.04.12.00.11.37 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 12 Apr 2017 00:11:37 -0700 (PDT) From: Rodger Combs To: ffmpeg-devel@ffmpeg.org Date: Wed, 12 Apr 2017 02:11:24 -0500 Message-Id: <20170412071127.60511-2-rodger.combs@gmail.com> X-Mailer: git-send-email 2.11.1 In-Reply-To: <20170412071127.60511-1-rodger.combs@gmail.com> References: <20170412071127.60511-1-rodger.combs@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/5] lavu/bprint: add XML escaping X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavutil/avstring.h | 28 ++++++++++++++++++++++++++++ libavutil/bprint.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/libavutil/avstring.h b/libavutil/avstring.h index 04d2695640..68b753a569 100644 --- a/libavutil/avstring.h +++ b/libavutil/avstring.h @@ -314,6 +314,7 @@ enum AVEscapeMode { AV_ESCAPE_MODE_AUTO, ///< Use auto-selected escaping mode. AV_ESCAPE_MODE_BACKSLASH, ///< Use backslash escaping. AV_ESCAPE_MODE_QUOTE, ///< Use single-quote escaping. + AV_ESCAPE_MODE_XML, ///< Use XML ampersand-escaping; requires UTF-8 input. }; /** @@ -334,6 +335,33 @@ enum AVEscapeMode { #define AV_ESCAPE_FLAG_STRICT (1 << 1) /** + * In addition to the provided list, escape all characters outside the range of + * U+0020 to U+007E. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_NON_ASCII (1 << 2) + +/** + * In addition to the provided list, escape single or double quotes. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_ESCAPE_SINGLE_QUOTE (1 << 3) +#define AV_ESCAPE_FLAG_ESCAPE_DOUBLE_QUOTE (1 << 4) + +/** + * Replace invalid UTF-8 characters with a U+FFFD REPLACEMENT CHARACTER, escaped + * if AV_ESCAPE_FLAG_NON_ASCII is set. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_REPLACE_INVALID_SEQUENCES (1 << 5) + +/** + * Replace invalid UTF-8 characters with a '?', overriding the previous flag. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_REPLACE_INVALID_ASCII (1 << 6) + +/** * Escape string in src, and put the escaped string in an allocated * string in *dst, which must be freed with av_free(). * diff --git a/libavutil/bprint.c b/libavutil/bprint.c index 652775bef9..8e44c57346 100644 --- a/libavutil/bprint.c +++ b/libavutil/bprint.c @@ -302,5 +302,48 @@ void av_bprint_escape(AVBPrint *dstbuf, const char *src, const char *special_cha } av_bprint_chars(dstbuf, '\'', 1); break; + + case AV_ESCAPE_MODE_XML: + /* &;-escape characters */ + while (*src) { + uint8_t tmp; + uint32_t cp; + const char *src1 = src; + GET_UTF8(cp, (uint8_t)*src++, goto err;); + + if ((cp < 0xFF && + ((special_chars && strchr(special_chars, cp)) || + (flags & AV_ESCAPE_FLAG_WHITESPACE) && strchr(WHITESPACES, cp))) || + (!(flags & AV_ESCAPE_FLAG_STRICT) && + (cp == '&' || cp == '<' || cp == '>')) || + ((flags & AV_ESCAPE_FLAG_ESCAPE_SINGLE_QUOTE) && cp == '\'') || + ((flags & AV_ESCAPE_FLAG_ESCAPE_DOUBLE_QUOTE) && cp == '"') || + ((flags & AV_ESCAPE_FLAG_NON_ASCII) && (cp < 0x20 || cp > 0x7e))) { + switch (cp) { + case '&' : av_bprintf(dstbuf, "&"); break; + case '<' : av_bprintf(dstbuf, "<"); break; + case '>' : av_bprintf(dstbuf, ">"); break; + case '"' : av_bprintf(dstbuf, """); break; + case '\'': av_bprintf(dstbuf, "'"); break; + default: av_bprintf(dstbuf, "&#x%"PRIx32";", cp); break; + } + } else { + PUT_UTF8(cp, tmp, av_bprint_chars(dstbuf, tmp, 1);) + } + continue; + err: + if (flags & AV_ESCAPE_FLAG_REPLACE_INVALID_ASCII) { + av_bprint_chars(dstbuf, '?', 1); + } else if (flags & AV_ESCAPE_FLAG_REPLACE_INVALID_SEQUENCES) { + if (flags & AV_ESCAPE_FLAG_NON_ASCII) + av_bprintf(dstbuf, "\xEF\xBF\xBD"); + else + av_bprintf(dstbuf, "�"); + } else { + while (src1 < src) + av_bprint_chars(dstbuf, *src1++, 1); + } + } + break; } }