From patchwork Wed Apr 21 12:27:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nicolas George X-Patchwork-Id: 27198 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:5014:0:0:0:0:0 with SMTP id e20csp349996iob; Wed, 21 Apr 2021 05:28:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz3Z33W1ZcVk36M5/6kgWTRIvQtRrexi+BlFwS9+tGMMaWLQDqr9csHq1TpDCopLK4coRrx X-Received: by 2002:a17:906:6896:: with SMTP id n22mr33088926ejr.316.1619008092877; Wed, 21 Apr 2021 05:28:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619008092; cv=none; d=google.com; s=arc-20160816; b=ILwnJpk6jVvGeRZXwOCjUDkYB1tUgeCbEkMHQM9bOAH9TSsNpoX3Lz3WQ2j3Kw1XkJ /wP5Gx0wq4J/qoaXVZ/y16BeIUnfWmVgB+20mlRHFkFUY+I+Mnw4QuKqIDGIzP9JQ5wP NzMCMvU5ixjcFWhwYXAGaSyvIoJKpGEJYBP4yfWp2kna8lIV0w4HiTivh49/qb3tRDYP jvSC58vPS6JR8O7pcuAxMvp0zyHgJgA5BWjs5vZQg3WqNt9fdAuhUsIXG8F4L1Mtkaom 4tLYcL+TRfjFtKs9+YcLzZp8KJdxgV303PTMigSsYg0l4FstrsywQ9RdqShX8EHHUeOr r6YQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=by/puLi1KazIx4JK7XX9BQNAonW+3oEn9zP7Kg/QSX8=; b=JHXMazacLC9G4ozUpbOvuS0yzHRdVmA/p4tKzEA6+JZ5ao6pJf+9/3UWlEu9eoSEzW +OtjoTV68o6wvCi/NzSCZ3UURKkU/M5xv6nScxk5oNniiJcbC0BBs4KinLAdm9cvtP/k IKE2iESyuLIf84JzJJRKhlzmq/+6H/9gSV8zAe9SfH+S/EmfUFWDMPlAG2MxUUy5FFox h4syVzeGIe4yjydG/IKFu/KtCR0k9jFflA2OHRabSWqx11oco9JZ0abRGIppQwodC7oz SODIOwJMcIXdmKVGkbgwPTbDRdRt8+wEw+biiJUQQFSHkjAvTgCMDY4HRdvFf15Xf29J oX7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id k14si1686906ejc.229.2021.04.21.05.28.12; Wed, 21 Apr 2021 05:28:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 026DF689B83; Wed, 21 Apr 2021 15:27:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from nef.ens.fr (nef2.ens.fr [129.199.96.40]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E78406899CF for ; Wed, 21 Apr 2021 15:27:13 +0300 (EEST) X-ENS-nef-client: 129.199.129.80 ( name = phare.normalesup.org ) Received: from phare.normalesup.org (phare.normalesup.org [129.199.129.80]) by nef.ens.fr (8.14.4/1.01.28121999) with ESMTP id 13LCRClv006300 for ; Wed, 21 Apr 2021 14:27:12 +0200 Received: by phare.normalesup.org (Postfix, from userid 1001) id 23E95EB5BC; Wed, 21 Apr 2021 14:27:12 +0200 (CEST) From: Nicolas George To: ffmpeg-devel@ffmpeg.org Date: Wed, 21 Apr 2021 14:27:04 +0200 Message-Id: <20210421122706.9002-6-george@nsup.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210421122706.9002-1-george@nsup.org> References: <20210421122706.9002-1-george@nsup.org> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (nef.ens.fr [129.199.96.32]); Wed, 21 Apr 2021 14:27:13 +0200 (CEST) Subject: [FFmpeg-devel] [PATCH 5/7] WIP: add an intro to AVWriter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FzZvV32l9ZUU Signed-off-by: Nicolas George --- avwriter_intro.txt | 261 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 261 insertions(+) create mode 100644 avwriter_intro.txt diff --git a/avwriter_intro.txt b/avwriter_intro.txt new file mode 100644 index 0000000000..b1b54711c9 --- /dev/null +++ b/avwriter_intro.txt @@ -0,0 +1,261 @@ +XXX make error return more strict +XXX? make buf size return more foolproof + + +# An introduction to the AVWriter API. + +Do not look at the length headers. Do not look at the convoluted API. Just +imagine these few use cases. + + +## Scenario 1: You need to call a function that takes an AVWriter. + +What are AVWriter anyway? They are for when a function returns a string. +For example, take the very common: + + int av_foobar_to_string(char *buf, size_t buf_size, const AVFoobar *foobar); + +It uses buf[] to return the string. In the simplest cases, you call it with +a buffer on the stack: + + char buf[1024]; + ret = av_foobar_to_string(buf, sizeof(buf)); + +Now, with AVWriter. The function you need to call is: + + void av_foobar_write(AVWriter wr, const AVFoobar *foobar); + +And you use the av_buf_writer_array() macro to encode buf as an AVWriter: + + char buf[1024]; + ret = av_foobar_write(av_buf_writer_array(buf), foobar); + +For all you know, av_buf_writer_array() could just be a fancy name for a +macro that just expands to "buf, sizeof(buf)". And if you want to stop here +and just use AVWriter the way you do in good old C, you can. But if you read +on, you will see that it is much fancier than that, while being almost as +simple to use. + + +## Scenario 2: Concatenating the string into a message. + +Let us say you have two foobars to present to the user. Good old C: + + char msg[2500], foo1_buf[1000], foo2_buf[1000]; + av_foobar_to_string(foo1_buf, sizeof(foo1_buf), foo1); + av_foobar_to_string(foo2_buf, sizeof(foo2_buf), foo2); + snprintf(msg, sizeof(msg), "F%d = [ %s, %s ]", num, foo1_buf, foo2_buf); + +But it's ugly. Less ugly, but more complicated: + + char msg[2500]; + size_t pos = 0; + pos += snprintf (msg + pos, sizeof(msg) - pos, "F%d = [ ", num); + av_foobar_to_string(msg + pos, sizeof(msg) - pos, foo1); + pos += strlen(msg + pos); + pos += snprintf (msg + pos, sizeof(msg) - pos, ", "); + av_foobar_to_string(msg + pos, sizeof(msg) - pos, foo2); + pos += strlen(msg + pos); + pos += snprintf (msg + pos, sizeof(msg) - pos, "]"); + +(Note: that's buggy, the snprintf() in the middle can overflow.) + +Well, that was the first thing AVWriter was meant to do: allow to build +strings by concatenating them together. So, the AVWraper version of this +code: + + char msg[2500]; + AVWriter wr = av_buf_writer_array(msg); + av_writer_printf(wr, "F%d = [ ", num); + av_foobar_write(wr, foo1); + av_writer_print(wr, ", "); + av_foobar_write(wr, foo2); + av_writer_print(wr, " ]"); + +(I am confident I can teach av_writer_printf() to call av_foobar_write().) + +If I have my way, we will have "av_writer_fmt(AVWriter wr, const char *fmt, +...);" where the arguments can be any type for which somebody have bothered +to implement a _write() function. But that is for a bit later. + +And that's it. av_buf_writer_array() is just fancy macro work to keep track +with a "size_t pos". + +But don't use it, it's rubbish. Use av_dynbuf_writer(), it's fancier. + + +## Scenario 3: The strings are long and precious. + +Let us say that the string you are building can be arbitrarily long, and it +is important to have it in its entirety. With good old C, you have to rely +on some kind of realloc(). You know the kind of code, it is very boring, I +will not write it here. + +That's exactly what av_buf_writer_array() is for. As a special optimization, +it starts with a buffer on the stack, which means that for small strings, it +will be as fast as the char[] version. + +XXX+ the commit says av_dynbuf_get_buffer in the doxy and other inconsistencies + + AVWriter wr = av_dynbuf_writer(); + /* same code as above */ + char *msg = av_dynbuf_writer_get_data(wr, NULL); + use_the_message(msg); + av_dynbuf_writer_finalize(wr, NULL, NULL); + +WARNING: I have not covered error checking. Without error checking, this +code will not cause an invalid memory access, and in particular will not be +a security issue by itself, but it will cause the string to be truncated. +That means silent data corruption, which is worse than a non-exploitable +crash. + + +## Scenario 4: So let's do error checks. + +The example code above will not crash. At all point, +av_dynbuf_writer_get_data() will point to a valid 0-terminated buffer. If +you know your data is can be truncated later, then maybe you can tolerate to +have it truncated here, especially since that kind of memory allocation +failure is very very unlikely. + +But you are not lazy, and it's better to diaplay "Out Of Cheese Error" than +to store corrupted data without noticing. + + ret = av_dynbuf_writer_get_error(wr, &size); + if (ret < 0) { + fprintf("Could not display foobar, would have been %zu long", size); + return ret; + } + +XXX ça c'est bien, mais av_dynbuf_writer_get_data() devrait retourner une valeur sûre + + +## Scenario 5: We want to keep the string. + +Maybe what we will do with the message is something like: + + av_dict_set(metadata, "foo_pair", msg); + +But if av_dynbuf_writer() did use use dynamic allocation, we could use +AV_DICT_DONT_STRDUP_VAL. + +That's what the pointers arguments to av_dynbuf_writer_finalize() are for. + + ret = av_dynbuf_writer_finalize(wr, &msg, NULL); + if (ret < 0) + return ret; + av_dict_set(metadata, "foo_pair", msg, AV_DICT_DONT_STRDUP_VAL); + + +XXX+ av_dynbuf_writer_finish error if truncated, av_dynbuf_writer_finish_anyway. +XXX+ error check is all wrong + + +## Scenario 6: You work with a library that already has string functions. +## Scenario 7: You work with an API that does its own buffering. + +Let us say you are writing a Gtk+ application and you already use GString +all over the place. (Note: GString is more or less like av_dynbuf_writer(), +but with the Gtk+ taste instead of the FFmpeg taste; the API example should +be obvious.) You could always: + + av_foobar_write(wr, foo); + char *msg = av_dynbuf_writer_get_data(wr, NULL); + g_string_append(str, msg); + +But that's clumsy, you have to copy the string. If this comes frequently in +the code, it would be better to have AVWriter collaborate with GString: + + av_foobar_write(gstring_writer(str), foo); + +To be able to do that you need to implement a few callbacks and adapt some +boilerplate code. It is not difficult, but too long for this introduction. + +There is no central repository, neither static nor dynamic. As soon as a +part of a program has implemented the proper callbacks and set the proper +pointers, it makes a valid AVWriter that can be used anywhere. + +Note that it applies to any kind of library or API: as the name says, +AVWriter is made to write strings. Any API that looks like writing can be +desguised as AVWriter. + +For example, there is another built-in AVWriter: av_log_writer(obj) it goes +directly to av_log(obj, AV_LOG_INFO); + + +## Scenario 8: This time, you are the one who returns a string. + +This time, you are the one who was about to write: + + int av_bazqux_to_string(char *buf, size_t size, const Bazqux *baz); + +and you decide to write it: + + void av_bazqux_write(AVWriter wr, const Bazqux *baz); + +There is nothing new to say, you already know how to add things to an +AVWriter, it is in the second example: + + av_writer_printf(wr, ...); + +Just use this, or any of the similar functions, to add whatever you want to +the AVWriter you got as an argument. There is no error checking to do: it is +not your responsibility. + +At this point, you could implement av_bazqux_to_string() as a trivial +wrapper: + + int av_bazqux_to_string(char *buf, size_t size, const Bazqux *baz) { + av_bazqux_write(av_buf_writer(buf, size), baz); + if (strlen(baz) == size - 1) + return AVERROR_BUFFER_TOO_SMALL; + } + +(Since av_buf_writer() does not keep track of error, we consider that +reaching the end is already overflowing. We are wasting one char in a very +rare situation.) But is it really useful? + + +## Scenario 9: What with the types? + +If you are observant, you will wonder what happens if we write: + + AVWriter wr = av_log_writer(buf); + char *msg = av_dynbuf_writer_get_data(wr, NULL); + +The short answer is: it will crash, badly. And that's ok: you knew you were +doing something stupid. You got what you deserved. Same as if you had written: + + char *msg = NULL; + snprintf(msg, strlen(msg), "Good Bye"); + +The C language is based on the assumption that you know things about the +code that cannot be expressed in the language itself, in particular with the +type system. There is no type for non-NULL-pointer, yet there are places in +the code you know for certain that a pointer is not NULL. + +AVWriter is designed with the same philosophy. When you write + + AVWriter wr = av_dynbuf_writer(); + +you know that wr is actually an AVDynbufWriter, and therefore you can call +av_dynbuf_writer_get_data() on it. + +On the other hand, if you have an AVWriter of unknown origin, you cannot +call a specific function. But you can call all the av_writer_whatever() +functions. + +If you want to make a special case, to check for a specific kind of +AVWriter, you can use: + + if (av_dynbuf_writer_check(wr)) + +In fact, all functions that accept only a specific kind of AVWriter in +FFmpeg start with some kind of: + + av_assert1(av_dynbuf_writer_check(wr)); + +That means that if you are working with a libavutil built for development, +if you make a mistake, you will get a clear error message. But really, there +is no reason to make a mistake if you are not trying to tie your brain in a +knot.