diff mbox series

[FFmpeg-devel,v2,1/3] avutil/{avstring, bprint}: add XML escaping from ffprobe to avutil

Message ID 20210111142020.24303-2-jeebjp@gmail.com
State Superseded
Headers show
Series Initial implementation of TTML encoding/muxing
Related show

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Jan Ekström Jan. 11, 2021, 2:20 p.m. UTC
From: Stefano Sabatini <stefasab@gmail.com>

---
 libavutil/avstring.h |  1 +
 libavutil/bprint.c   | 14 ++++++++++++++
 tools/ffescape.c     |  1 +
 3 files changed, 16 insertions(+)

Comments

Nicolas George Jan. 13, 2021, 1:02 p.m. UTC | #1
Jan Ekström (12021-01-11):
> From: Stefano Sabatini <stefasab@gmail.com>
> 
> ---
>  libavutil/avstring.h |  1 +
>  libavutil/bprint.c   | 14 ++++++++++++++
>  tools/ffescape.c     |  1 +
>  3 files changed, 16 insertions(+)

Please see
https://ffmpeg.org/pipermail/ffmpeg-devel/2020-December/273500.html

Regards,
Jan Ekström Jan. 13, 2021, 5:30 p.m. UTC | #2
On Wed, Jan 13, 2021 at 3:02 PM Nicolas George <george@nsup.org> wrote:
>
> Jan Ekström (12021-01-11):
> > From: Stefano Sabatini <stefasab@gmail.com>
> >
> > ---
> >  libavutil/avstring.h |  1 +
> >  libavutil/bprint.c   | 14 ++++++++++++++
> >  tools/ffescape.c     |  1 +
> >  3 files changed, 16 insertions(+)
>
> Please see
> https://ffmpeg.org/pipermail/ffmpeg-devel/2020-December/273500.html

Hi,

Yea, I actually noticed that one, but never got to responding to it.
Sorry about that.

While I do agree that such modes could be useful, currently I have no
need for the additional modes and thus the original code by Stefano
was just moved in avutil (and then utilized from ttmlenc). I did check
the XML specification that this code originally seemed to follow the
requirements (and thus mentioned the exact spot in the spec where this
is mentioned).

Thus, is the lack of those additional XML modes a blocker? Or should
they be added as the need arises?

Jan
Nicolas George Jan. 14, 2021, 11:40 a.m. UTC | #3
Jan Ekström (12021-01-13):
> Yea, I actually noticed that one, but never got to responding to it.
> Sorry about that.

No problem.

> While I do agree that such modes could be useful, currently I have no
> need for the additional modes and thus the original code by Stefano
> was just moved in avutil (and then utilized from ttmlenc). I did check
> the XML specification that this code originally seemed to follow the
> requirements (and thus mentioned the exact spot in the spec where this
> is mentioned).

Yes, it is technically correct. So would be escaping every character as
a numeric entity.

&#x41;&#x64;&#x74;&#x61;&#x27;&#x20;&#x6f;&#x20;&#x65;&#x6c;&#x79;&#x72;&#x61;&#x61;&#x6c;&#x2c;&#x69;&#x20;&#x74;

XML is meant to be both machine-readable and human-readable. Escaping
unnecessarily pulls us away from human-readable.

> Thus, is the lack of those additional XML modes a blocker? Or should
> they be added as the need arises?

Considering that this is so simple that it would have taken you less
time to just do it than to wait for my answer, I would say that I insist
to do things properly immediately, before code starts piling on it and
requires to be revised.

Regards,
Jan Ekström Jan. 14, 2021, 11:46 a.m. UTC | #4
On Thu, Jan 14, 2021 at 1:40 PM Nicolas George <george@nsup.org> wrote:
>
> Jan Ekström (12021-01-13):
> > Yea, I actually noticed that one, but never got to responding to it.
> > Sorry about that.
>
> No problem.
>
> > While I do agree that such modes could be useful, currently I have no
> > need for the additional modes and thus the original code by Stefano
> > was just moved in avutil (and then utilized from ttmlenc). I did check
> > the XML specification that this code originally seemed to follow the
> > requirements (and thus mentioned the exact spot in the spec where this
> > is mentioned).
>
> Yes, it is technically correct. So would be escaping every character as
> a numeric entity.
>
> &#x41;&#x64;&#x74;&#x61;&#x27;&#x20;&#x6f;&#x20;&#x65;&#x6c;&#x79;&#x72;&#x61;&#x61;&#x6c;&#x2c;&#x69;&#x20;&#x74;
>
> XML is meant to be both machine-readable and human-readable. Escaping
> unnecessarily pulls us away from human-readable.

I did not do the selection of what this code (which I just moved)
does, I just moved it and referred to the originator document that
those five symbols match what is defined in the XML spec as per 2.4.

As far as I can tell, it does not do nor was I implying that I would
do something like that.

>
> > Thus, is the lack of those additional XML modes a blocker? Or should
> > they be added as the need arises?
>
> Considering that this is so simple that it would have taken you less
> time to just do it than to wait for my answer, I would say that I insist
> to do things properly immediately, before code starts piling on it and
> requires to be revised.
>

Alright. I only took this position since it required to check out the
XML specification again to see if that stuff was properly defined (it
probably is?), so there wouldn't be guesswork required and rather just
implementing the rules. I can check the spec again, and see if it
clearly defines those cases that you have mentioned.

Jan
diff mbox series

Patch

diff --git a/libavutil/avstring.h b/libavutil/avstring.h
index ee225585b3..79bb920a70 100644
--- a/libavutil/avstring.h
+++ b/libavutil/avstring.h
@@ -324,6 +324,7 @@  enum AVEscapeMode {
     AV_ESCAPE_MODE_AUTO,      ///< Use auto-selected escaping mode.
     AV_ESCAPE_MODE_BACKSLASH, ///< Use backslash escaping.
     AV_ESCAPE_MODE_QUOTE,     ///< Use single-quote escaping.
+    AV_ESCAPE_MODE_XML,       ///< Use XML non-markup character data escaping.
 };
 
 /**
diff --git a/libavutil/bprint.c b/libavutil/bprint.c
index 2f059c5ba6..d825b61b14 100644
--- a/libavutil/bprint.c
+++ b/libavutil/bprint.c
@@ -283,6 +283,20 @@  void av_bprint_escape(AVBPrint *dstbuf, const char *src, const char *special_cha
         av_bprint_chars(dstbuf, '\'', 1);
         break;
 
+    case AV_ESCAPE_MODE_XML:
+        /* escape XML non-markup character data as per 2.4 */
+        for (; *src; src++) {
+            switch (*src) {
+            case '&' : av_bprintf(dstbuf, "%s", "&amp;");  break;
+            case '<' : av_bprintf(dstbuf, "%s", "&lt;");   break;
+            case '>' : av_bprintf(dstbuf, "%s", "&gt;");   break;
+            case '"' : av_bprintf(dstbuf, "%s", "&quot;"); break;
+            case '\'': av_bprintf(dstbuf, "%s", "&apos;"); break;
+            default: av_bprint_chars(dstbuf, *src, 1);
+            }
+        }
+        break;
+
     /* case AV_ESCAPE_MODE_BACKSLASH or unknown mode */
     default:
         /* \-escape characters */
diff --git a/tools/ffescape.c b/tools/ffescape.c
index 0530d28c6d..8537235d5e 100644
--- a/tools/ffescape.c
+++ b/tools/ffescape.c
@@ -104,6 +104,7 @@  int main(int argc, char **argv)
             if      (!strcmp(optarg, "auto"))      escape_mode = AV_ESCAPE_MODE_AUTO;
             else if (!strcmp(optarg, "backslash")) escape_mode = AV_ESCAPE_MODE_BACKSLASH;
             else if (!strcmp(optarg, "quote"))     escape_mode = AV_ESCAPE_MODE_QUOTE;
+            else if (!strcmp(optarg, "xml"))       escape_mode = AV_ESCAPE_MODE_XML;
             else {
                 av_log(NULL, AV_LOG_ERROR,
                        "Invalid value '%s' for option -m, "