From patchwork Sat Sep 10 08:45:13 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rodger Combs X-Patchwork-Id: 525 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.140.134 with SMTP id o128csp694237vsd; Sat, 10 Sep 2016 01:45:31 -0700 (PDT) X-Received: by 10.194.94.138 with SMTP id dc10mr7735822wjb.40.1473497131753; Sat, 10 Sep 2016 01:45:31 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m5si6602212wmi.65.2016.09.10.01.45.30; Sat, 10 Sep 2016 01:45:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 14C0F689CE5; Sat, 10 Sep 2016 11:45:17 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-oi0-f67.google.com (mail-oi0-f67.google.com [209.85.218.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0BF6A689CC9 for ; Sat, 10 Sep 2016 11:45:10 +0300 (EEST) Received: by mail-oi0-f67.google.com with SMTP id y2so9759257oie.0 for ; Sat, 10 Sep 2016 01:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id; bh=RjczxBTw26KWO2CU1BfMjoaPlkf2YliqvWUg7kupJJs=; b=eQH9wmVqD/y5ip5IBJSKHhgUHqeKhOT7Vq3rYKzli2WbJfZ35QgCPt0XqR1JxD+D15 wEU0SrvKnjiR3TyfwR0Fg/SgRNMGxT8JSC6KFRtVVrV6s874PNdoO8VhL+GUmKACLvwk Q25pFMEwagDJhHE7hMIBPlvfd3YSLhp8Jf+dgVcnV7oMemkkqkHYn9YWpnArCG/9prT7 AnXjvwi4igbL1XnLIKfl3mhJbLGgjweARue1CZsa0y5fsVp2xJTan0jJEnz2VGqRa2Hb Lff+LzAfaBT0Xh8+v3C2FVptFd9ChP/ME/Sm2ON4wyDEj3uyD8qKCkxIBkDmj3Sf0Olo fx2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id; bh=RjczxBTw26KWO2CU1BfMjoaPlkf2YliqvWUg7kupJJs=; b=Pz2MGBJ8p5LsitDbRpTyJuZoST/JnPQGdedeBb/FQfTNqIhrOxZHS91ZOfCPq4uFD9 Lh3ga3EE6QaKufSLqUKY7OtUxsyz/JUQAlJ1H77oE7nbdao3au06UNUjceDE4/ZGZo7h un/AHJh2MAVpt3cs+7EWMjcgWKrylVIKrOHI7kG2xDs49gCgT8BWad+X9IbJoBBmKV2E gbTyU9yXRDxEygZyxJQSyToPSJobvCxmbvG5h6gmNPSA2TLETYLssrrI/rEiB2XmHJGH n0EW5GIpFJIwRX3Z5sWFW9B6dWYeNWhz4vXnCdAm2TyXphiGtB7HEeedHewN1XbYjA1C mqww== X-Gm-Message-State: AE9vXwNxVtkktZDovfFc9R4lUcGY5YQCQrPCYCoDYyqIXKRIbYiCNeWeIBH3E4NdYuO5pw== X-Received: by 10.202.79.17 with SMTP id d17mr10042392oib.59.1473497119592; Sat, 10 Sep 2016 01:45:19 -0700 (PDT) Received: from Rodgers-MacBook-Pro.local.net (c-73-209-137-129.hsd1.il.comcast.net. [73.209.137.129]) by smtp.gmail.com with ESMTPSA id z125sm8849512itc.0.2016.09.10.01.45.18 for (version=TLS1 cipher=AES128-SHA bits=128/128); Sat, 10 Sep 2016 01:45:18 -0700 (PDT) From: Rodger Combs To: ffmpeg-devel@ffmpeg.org Date: Sat, 10 Sep 2016 03:45:13 -0500 Message-Id: <20160910084515.11048-1-rodger.combs@gmail.com> X-Mailer: git-send-email 2.10.0 Subject: [FFmpeg-devel] [PATCH 1/3] lavu/bprint: add XML escaping X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavutil/avstring.h | 28 ++++++++++++++++++++++++++++ libavutil/bprint.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/libavutil/avstring.h b/libavutil/avstring.h index dd28769..8e97314 100644 --- a/libavutil/avstring.h +++ b/libavutil/avstring.h @@ -309,6 +309,7 @@ enum AVEscapeMode { AV_ESCAPE_MODE_AUTO, ///< Use auto-selected escaping mode. AV_ESCAPE_MODE_BACKSLASH, ///< Use backslash escaping. AV_ESCAPE_MODE_QUOTE, ///< Use single-quote escaping. + AV_ESCAPE_MODE_XML, ///< Use XML ampersand-escaping; requires UTF-8 input. }; /** @@ -329,6 +330,33 @@ enum AVEscapeMode { #define AV_ESCAPE_FLAG_STRICT (1 << 1) /** + * In addition to the provided list, escape all characters outside the range of + * U+0020 to U+007E. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_NON_ASCII (1 << 2) + +/** + * In addition to the provided list, escape single or double quotes. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_ESCAPE_SINGLE_QUOTE (1 << 3) +#define AV_ESCAPE_FLAG_ESCAPE_DOUBLE_QUOTE (1 << 4) + +/** + * Replace invalid UTF-8 characters with a U+FFFD REPLACEMENT CHARACTER, escaped + * if AV_ESCAPE_FLAG_NON_ASCII is set. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_REPLACE_INVALID_SEQUENCES (1 << 5) + +/** + * Replace invalid UTF-8 characters with a '?', overriding the previous flag. + * This only applies to XML-escaping. + */ +#define AV_ESCAPE_FLAG_REPLACE_INVALID_ASCII (1 << 6) + +/** * Escape string in src, and put the escaped string in an allocated * string in *dst, which must be freed with av_free(). * diff --git a/libavutil/bprint.c b/libavutil/bprint.c index 2f059c5..c6b9919 100644 --- a/libavutil/bprint.c +++ b/libavutil/bprint.c @@ -271,6 +271,49 @@ void av_bprint_escape(AVBPrint *dstbuf, const char *src, const char *special_cha mode = AV_ESCAPE_MODE_BACKSLASH; /* TODO: implement a heuristic */ switch (mode) { + case AV_ESCAPE_MODE_XML: + /* &;-escape characters */ + while (*src) { + uint8_t tmp; + uint32_t cp; + const char *src1 = src; + GET_UTF8(cp, (uint8_t)*src++, goto err;); + + if ((cp < 0xFF && + ((special_chars && strchr(special_chars, cp)) || + (flags & AV_ESCAPE_FLAG_WHITESPACE) && strchr(WHITESPACES, cp))) || + (!(flags & AV_ESCAPE_FLAG_STRICT) && + (cp == '&' || cp == '<' || cp == '>')) || + ((flags & AV_ESCAPE_FLAG_ESCAPE_SINGLE_QUOTE) && cp == '\'') || + ((flags & AV_ESCAPE_FLAG_ESCAPE_DOUBLE_QUOTE) && cp == '"') || + ((flags & AV_ESCAPE_FLAG_NON_ASCII) && (cp < 0x20 || cp > 0x7e))) { + switch (cp) { + case '&' : av_bprintf(dstbuf, "&"); break; + case '<' : av_bprintf(dstbuf, "<"); break; + case '>' : av_bprintf(dstbuf, ">"); break; + case '"' : av_bprintf(dstbuf, """); break; + case '\'': av_bprintf(dstbuf, "'"); break; + default: av_bprintf(dstbuf, "&#x%"PRIx32";", cp); break; + } + } else { + PUT_UTF8(cp, tmp, av_bprint_chars(dstbuf, tmp, 1);) + } + continue; + err: + if (flags & AV_ESCAPE_FLAG_REPLACE_INVALID_ASCII) { + av_bprint_chars(dstbuf, '?', 1); + } else if (flags & AV_ESCAPE_FLAG_REPLACE_INVALID_SEQUENCES) { + if (flags & AV_ESCAPE_FLAG_NON_ASCII) + av_bprintf(dstbuf, "\xEF\xBF\xBD"); + else + av_bprintf(dstbuf, "�"); + } else { + while (src1 < src) + av_bprint_chars(dstbuf, *src1++, 1); + } + } + break; + case AV_ESCAPE_MODE_QUOTE: /* enclose the string between '' */ av_bprint_chars(dstbuf, '\'', 1);