diff mbox series

[FFmpeg-devel,5/7] WIP: add an intro to AVWriter

Message ID 20210421122706.9002-6-george@nsup.org
State New
Headers show
Series [FFmpeg-devel,1/7] lavu: add macros to help making future-proof structures.
Related show

Checks

Context Check Description
andriy/configure warning Failed to apply patch
andriy/configure warning Failed to apply patch

Commit Message

Nicolas George April 21, 2021, 12:27 p.m. UTC
Signed-off-by: Nicolas George <george@nsup.org>
---
 avwriter_intro.txt | 261 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 261 insertions(+)
 create mode 100644 avwriter_intro.txt
diff mbox series

Patch

diff --git a/avwriter_intro.txt b/avwriter_intro.txt
new file mode 100644
index 0000000000..b1b54711c9
--- /dev/null
+++ b/avwriter_intro.txt
@@ -0,0 +1,261 @@ 
+XXX make error return more strict
+XXX? make buf size return more foolproof
+
+
+# An introduction to the AVWriter API.
+
+Do not look at the length headers. Do not look at the convoluted API. Just
+imagine these few use cases.
+
+
+## Scenario 1: You need to call a function that takes an AVWriter.
+
+What are AVWriter anyway? They are for when a function returns a string.
+For example, take the very common:
+
+  int av_foobar_to_string(char *buf, size_t buf_size, const AVFoobar *foobar);
+
+It uses buf[] to return the string. In the simplest cases, you call it with
+a buffer on the stack:
+
+  char buf[1024];
+  ret = av_foobar_to_string(buf, sizeof(buf));
+
+Now, with AVWriter. The function you need to call is:
+
+  void av_foobar_write(AVWriter wr, const AVFoobar *foobar);
+
+And you use the av_buf_writer_array() macro to encode buf as an AVWriter:
+
+  char buf[1024];
+  ret = av_foobar_write(av_buf_writer_array(buf), foobar);
+
+For all you know, av_buf_writer_array() could just be a fancy name for a
+macro that just expands to "buf, sizeof(buf)". And if you want to stop here
+and just use AVWriter the way you do in good old C, you can. But if you read
+on, you will see that it is much fancier than that, while being almost as
+simple to use.
+
+
+## Scenario 2: Concatenating the string into a message.
+
+Let us say you have two foobars to present to the user. Good old C:
+
+  char msg[2500], foo1_buf[1000], foo2_buf[1000];
+  av_foobar_to_string(foo1_buf, sizeof(foo1_buf), foo1);
+  av_foobar_to_string(foo2_buf, sizeof(foo2_buf), foo2);
+  snprintf(msg, sizeof(msg), "F%d = [ %s, %s ]", num, foo1_buf, foo2_buf);
+
+But it's ugly. Less ugly, but more complicated:
+
+  char msg[2500];
+  size_t pos = 0;
+  pos += snprintf    (msg + pos, sizeof(msg) - pos, "F%d = [ ", num);
+  av_foobar_to_string(msg + pos, sizeof(msg) - pos, foo1);
+  pos += strlen(msg + pos);
+  pos += snprintf    (msg + pos, sizeof(msg) - pos, ", ");
+  av_foobar_to_string(msg + pos, sizeof(msg) - pos, foo2);
+  pos += strlen(msg + pos);
+  pos += snprintf    (msg + pos, sizeof(msg) - pos, "]");
+
+(Note: that's buggy, the snprintf() in the middle can overflow.)
+
+Well, that was the first thing AVWriter was meant to do: allow to build
+strings by concatenating them together. So, the AVWraper version of this
+code:
+
+  char msg[2500];
+  AVWriter wr = av_buf_writer_array(msg);
+  av_writer_printf(wr, "F%d = [ ", num);
+  av_foobar_write(wr, foo1);
+  av_writer_print(wr, ", ");
+  av_foobar_write(wr, foo2);
+  av_writer_print(wr, " ]");
+
+(I am confident I can teach av_writer_printf() to call av_foobar_write().)
+
+If I have my way, we will have "av_writer_fmt(AVWriter wr, const char *fmt,
+...);" where the arguments can be any type for which somebody have bothered
+to implement a _write() function. But that is for a bit later.
+
+And that's it. av_buf_writer_array() is just fancy macro work to keep track
+with a "size_t pos".
+
+But don't use it, it's rubbish. Use av_dynbuf_writer(), it's fancier.
+
+
+## Scenario 3: The strings are long and precious.
+
+Let us say that the string you are building can be arbitrarily long, and it
+is important to have it in its entirety. With good old C, you have to rely
+on some kind of realloc(). You know the kind of code, it is very boring, I
+will not write it here.
+
+That's exactly what av_buf_writer_array() is for. As a special optimization,
+it starts with a buffer on the stack, which means that for small strings, it
+will be as fast as the char[] version.
+
+XXX+ the commit says av_dynbuf_get_buffer in the doxy and other inconsistencies
+
+  AVWriter wr = av_dynbuf_writer();
+  /* same code as above */
+  char *msg = av_dynbuf_writer_get_data(wr, NULL);
+  use_the_message(msg);
+  av_dynbuf_writer_finalize(wr, NULL, NULL);
+
+WARNING: I have not covered error checking. Without error checking, this
+code will not cause an invalid memory access, and in particular will not be
+a security issue by itself, but it will cause the string to be truncated.
+That means silent data corruption, which is worse than a non-exploitable
+crash.
+
+
+## Scenario 4: So let's do error checks.
+
+The example code above will not crash. At all point,
+av_dynbuf_writer_get_data() will point to a valid 0-terminated buffer. If
+you know your data is can be truncated later, then maybe you can tolerate to
+have it truncated here, especially since that kind of memory allocation
+failure is very very unlikely.
+
+But you are not lazy, and it's better to diaplay "Out Of Cheese Error" than
+to store corrupted data without noticing.
+
+  ret = av_dynbuf_writer_get_error(wr, &size);
+  if  (ret < 0) {
+    fprintf("Could not display foobar, would have been %zu long", size);
+    return ret;
+  }
+
+XXX ça c'est bien, mais av_dynbuf_writer_get_data() devrait retourner une valeur sûre
+
+
+## Scenario 5: We want to keep the string.
+
+Maybe what we will do with the message is something like:
+
+  av_dict_set(metadata, "foo_pair", msg);
+
+But if av_dynbuf_writer() did use use dynamic allocation, we could use
+AV_DICT_DONT_STRDUP_VAL.
+
+That's what the pointers arguments to av_dynbuf_writer_finalize() are for.
+
+  ret = av_dynbuf_writer_finalize(wr, &msg, NULL);
+  if (ret < 0)
+    return ret;
+  av_dict_set(metadata, "foo_pair", msg, AV_DICT_DONT_STRDUP_VAL);
+
+
+XXX+ av_dynbuf_writer_finish error if truncated, av_dynbuf_writer_finish_anyway.
+XXX+ error check is all wrong
+
+
+## Scenario 6: You work with a library that already has string functions.
+## Scenario 7: You work with an API that does its own buffering.
+
+Let us say you are writing a Gtk+ application and you already use GString
+all over the place. (Note: GString is more or less like av_dynbuf_writer(),
+but with the Gtk+ taste instead of the FFmpeg taste; the API example should
+be obvious.) You could always:
+
+  av_foobar_write(wr, foo);
+  char *msg = av_dynbuf_writer_get_data(wr, NULL);
+  g_string_append(str, msg);
+
+But that's clumsy, you have to copy the string. If this comes frequently in
+the code, it would be better to have AVWriter collaborate with GString:
+
+  av_foobar_write(gstring_writer(str), foo);
+
+To be able to do that you need to implement a few callbacks and adapt some
+boilerplate code. It is not difficult, but too long for this introduction.
+
+There is no central repository, neither static nor dynamic. As soon as a
+part of a program has implemented the proper callbacks and set the proper
+pointers, it makes a valid AVWriter that can be used anywhere.
+
+Note that it applies to any kind of library or API: as the name says,
+AVWriter is made to write strings. Any API that looks like writing can be
+desguised as AVWriter.
+
+For example, there is another built-in AVWriter: av_log_writer(obj) it goes
+directly to av_log(obj, AV_LOG_INFO);
+
+
+## Scenario 8: This time, you are the one who returns a string.
+
+This time, you are the one who was about to write:
+
+  int av_bazqux_to_string(char *buf, size_t size, const Bazqux *baz);
+
+and you decide to write it:
+
+  void av_bazqux_write(AVWriter wr, const Bazqux *baz);
+
+There is nothing new to say, you already know how to add things to an
+AVWriter, it is in the second example:
+
+  av_writer_printf(wr, ...);
+
+Just use this, or any of the similar functions, to add whatever you want to
+the AVWriter you got as an argument. There is no error checking to do: it is
+not your responsibility.
+
+At this point, you could implement av_bazqux_to_string() as a trivial
+wrapper:
+
+  int av_bazqux_to_string(char *buf, size_t size, const Bazqux *baz) {
+      av_bazqux_write(av_buf_writer(buf, size), baz);
+      if (strlen(baz) == size - 1)
+	  return AVERROR_BUFFER_TOO_SMALL;
+  }
+
+(Since av_buf_writer() does not keep track of error, we consider that
+reaching the end is already overflowing. We are wasting one char in a very
+rare situation.) But is it really useful?
+
+
+## Scenario 9: What with the types?
+
+If you are observant, you will wonder what happens if we write:
+
+  AVWriter wr = av_log_writer(buf);
+  char *msg = av_dynbuf_writer_get_data(wr, NULL);
+
+The short answer is: it will crash, badly. And that's ok: you knew you were
+doing something stupid. You got what you deserved. Same as if you had written:
+
+  char *msg = NULL;
+  snprintf(msg, strlen(msg), "Good Bye");
+
+The C language is based on the assumption that you know things about the
+code that cannot be expressed in the language itself, in particular with the
+type system. There is no type for non-NULL-pointer, yet there are places in
+the code you know for certain that a pointer is not NULL.
+
+AVWriter is designed with the same philosophy. When you write
+
+  AVWriter wr = av_dynbuf_writer();
+
+you know that wr is actually an AVDynbufWriter, and therefore you can call
+av_dynbuf_writer_get_data() on it.
+
+On the other hand, if you have an AVWriter of unknown origin, you cannot
+call a specific function. But you can call all the av_writer_whatever()
+functions.
+
+If you want to make a special case, to check for a specific kind of
+AVWriter, you can use:
+
+  if (av_dynbuf_writer_check(wr))
+
+In fact, all functions that accept only a specific kind of AVWriter in
+FFmpeg start with some kind of:
+
+  av_assert1(av_dynbuf_writer_check(wr));
+
+That means that if you are working with a libavutil built for development,
+if you make a mistake, you will get a clear error message. But really, there
+is no reason to make a mistake if you are not trying to tie your brain in a
+knot.