[FFmpeg-devel,v6,1/4] doc: Explain what "context" means

Message ID	20240604144919.213799-2-ffmpeg-devel@pileofstuff.org
State	New
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; From: Andrew Sayers <ffmpeg-devel@pileofstuff.org> To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Jun 2024 15:47:21 +0100 Message-ID: <20240604144919.213799-2-ffmpeg-devel@pileofstuff.org> In-Reply-To: <20240604144919.213799-1-ffmpeg-devel@pileofstuff.org> References: <20240418150614.3952107-1-ffmpeg-devel@pileofstuff.org> <20240604144919.213799-1-ffmpeg-devel@pileofstuff.org> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v6 1/4] doc: Explain what "context" means Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: Andrew Sayers <ffmpeg-devel@pileofstuff.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	doc: Explain what "context" means \| expand [FFmpeg-devel,v6,0/4] doc: Explain what "context" means [FFmpeg-devel,v6,1/4] doc: Explain what "context" means [FFmpeg-devel,v6,2/4] lavu: Clarify relationship between AVClass, AVOption and context [FFmpeg-devel,v6,3/4] all: Link to "context" from all public contexts with documentation [FFmpeg-devel,v6,4/4] all: Rewrite documentation for contexts

Context	Check	Description
yinshiyou/make_loongarch64	success	Make finished
yinshiyou/make_fate_loongarch64	success	Make fate finished
andriy/make_x86	success	Make finished
andriy/make_fate_x86	success	Make fate finished

Andrew Sayers June 4, 2024, 2:47 p.m. UTC

Derived from explanations kindly provided by Stefano Sabatini and others:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-April/325903.html
---
 doc/context.md | 430 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 430 insertions(+)
 create mode 100644 doc/context.md

Anton Khirnov June 5, 2024, 8:15 a.m. UTC | #1

Quoting Andrew Sayers (2024-06-04 16:47:21)
> Derived from explanations kindly provided by Stefano Sabatini and others:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2024-April/325903.html
> ---
>  doc/context.md | 430 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 430 insertions(+)
>  create mode 100644 doc/context.md

430 lines to say "context is a struct storing an object's state"?

Stefano Sabatini June 12, 2024, 8:52 p.m. UTC | #2

On date Tuesday 2024-06-04 15:47:21 +0100, Andrew Sayers wrote:
> Derived from explanations kindly provided by Stefano Sabatini and others:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2024-April/325903.html
> ---
>  doc/context.md | 430 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 430 insertions(+)
>  create mode 100644 doc/context.md
> diff --git a/doc/context.md b/doc/context.md
> new file mode 100644
> index 0000000000..bd8cb58696
> --- /dev/null
> +++ b/doc/context.md
> @@ -0,0 +1,430 @@
> +@page Context Introduction to contexts
> +
> +@tableofcontents
> +
> +FFmpeg uses the term “context” to refer to an idiom
> +you have probably used before:
> +
> +```c
> +// C structs often share context between functions:
> +
> +FILE *my_file; // my_file stores information about a filehandle
> +
> +printf(my_file, "hello "); // my_file provides context to this function,
> +printf(my_file, "world!"); // and also to this function
> +```
> +
> +```python
> +# Python classes provide context for the methods they contain:
> +
> +class MyClass:
> +    def print(self,message):
> +        if self.prev_message != message:
> +            self.prev_message = message
> +            print(message)
> +```
> +
> +<!-- marked "c" because Doxygen doesn't support JS highlighting: -->
> +```c
> +// Many JavaScript callbacks accept an optional context argument:
> +
> +const my_object = {};
> +
> +my_array.forEach(function_1, my_object);
> +my_array.forEach(function_2, my_object);
> +```
> +
> +Be careful comparing FFmpeg contexts to things you're already familiar with -
> +FFmpeg may sometimes happen to reuse words you recognise, but mean something
> +completely different.  For example, the AVClass struct has nothing to do with
> +[object-oriented classes](https://en.wikipedia.org/wiki/Class_(computer_programming)).
> +
> +If you've used contexts in other C projects, you may want to read
> +@ref Context_comparison before the rest of the document.

My impression is that this is growing out of scope for a
reference. The doxy is a reference, therefore it should be clean and
terse, and we should avoid adding too much information, enough
information should be right enough. In fact, a reference is different
from a tutorial, and much different from a C tutorial. Also this is
not a treatise comparing different languages and frameworks, as this
would confuse beginners and would annoy experienced developers.

I propose to cut this patch to provide the minimal information you can
expect in a reference, but not more than that. Addition can be added
later, but I think we should try to avoid any unnecessary content, in
the spirit of keeping this a reference. More extensive discussions
might be done in a separate place (the wiki, a blog post etc.), but in
the spirit of a keeping this a reference they should not be put here.

> +
> +@section Context_general “Context” as a general concept
> +
> +@par
> +A context is any data structure used by several functions
> +(or several instances of the same function) that all operate on the same entity.
> +
> +In the broadest sense, “context” is just a way to think about code.

> +You can even use it to think about code written by people who have never
> +heard the term, or who would disagree with you about what it means.
> +Consider the following snippet:
> +
> +```c
> +struct DualWriter {
> +    int fd1, fd2;
> +};
> +
> +ssize_t write_to_two_files(
> +    struct DualWriter *my_writer,
> +    uint8_t *buf,
> +    int buf_size
> +) {
> +
> +    ssize_t bytes_written_1 = write(my_writer->fd1, buf, buf_size);
> +    ssize_t bytes_written_2 = write(my_writer->fd2, buf, buf_size);
> +
> +    if ( bytes_written_1 != bytes_written_2 ) {
> +        // ... handle this edge case ...
> +    }
> +
> +    return bytes_written_1;
> +
> +}
> +
> +int main() {
> +
> +    struct DualWriter my_writer;
> +    my_writer.fd1 = open("file1", 0644, "wb");
> +    my_writer.fd2 = open("file2", 0644, "wb");
> +
> +    write_to_two_files(&my_writer, "hello ", sizeof("hello "));
> +    write_to_two_files(&my_writer, "world!", sizeof("world!"));
> +
> +    close( my_writer.fd1 );
> +    close( my_writer.fd2 );
> +
> +}
> +```
> +
> +The term “context” doesn't appear anywhere in the snippet.  But `DualWriter`
> +is passed to several instances of `write_to_two_files()` that operate on
> +the same entity, so it fits the definition of a context.
> +
> +When reading code that isn't explicitly described in terms of contexts,
> +remember that your interpretation may differ from other people's.
> +For example, FFmpeg's avio_alloc_context() accepts a set of callback functions
> +and an `opaque` argument - even though this function guarantees to *return*
> +a context, it does not require `opaque` to *provide* context for the callback
> +functions.  So you could choose to pass a struct like `DualWriter` as the
> +`opaque` argument, or you could pass callbacks that use `stdin` and `stdout`
> +and just pass a `NULL` argument for `opaque`.

I'd skip all this part, as we assume the reader is already familiar
with C language and with data encapsulation through struct, if he is
not this is not the right place where to teach about C language
fundamentals.

> +
> +When reading code that *is* explicitly described in terms of contexts,
> +remember that the term's meaning is guaranteed by *the project's community*,
> +not *the language it's written in*.  That means guarantees may be more flexible
> +and change more over time.  For example, programming languages that use
> +[encapsulation](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming))
> +will simply refuse to compile code that violates its rules about access,
> +while communities can put up with special cases if they improve code quality.
> +

This looks a bit vague so I'd rather drop this.

> +The next section will discuss what specific conventions FFmpeg developers mean
> +when they describe parts of their code as using “contexts”.
> +
> +@section Context_ffmpeg FFmpeg contexts
> +
> +This section discusses specific context-related conventions used in FFmpeg.
> +Some of these are used in other projects, others are unique to this project.
> +
> +@subsection Context_indicating Indicating context: “Context”, “ctx” etc.
> +
> +```c
> +// Context struct names usually end with `Context`:
> +struct AVSomeContext {
> +  ...
> +};
> +
> +// Functions are usually named after their context,
> +// context parameters usually come first and are often called `ctx`:
> +void av_some_function(AVSomeContext *ctx, ...);
> +```
> +
> +FFmpeg struct names usually signal whether they are contexts (e.g. AVBSFContext
> +or AVCodecContext).  Exceptions to this rule include AVMD5, which is only
> +identified as a context by @ref libavutil/md5.c "the functions that call it".
> +
> +Function names usually signal the context they're associated with (e.g.
> +av_md5_alloc() or avcodec_alloc_context3()).  Exceptions to this rule include
> +@ref avformat.h "AVFormatContext's functions", many of which begin with
> +just `av_`.
> +
> +Functions usually signal their context parameter by putting it first and
> +naming it some variant of `ctx`.  Exceptions include av_bsf_alloc(), which puts
> +its context argument second to emphasise it's an out variable.
> +

> +Some functions fit awkwardly within FFmpeg's context idiom, so they send mixed
> +signals.  For example, av_ambient_viewing_environment_create_side_data() creates
> +an AVAmbientViewingEnvironment context, then adds it to the side-data of an
> +AVFrame context.  So its name hints at one context, its parameter hints at
> +another, and its documentation is silent on the issue.  You might prefer to
> +think of such functions as not having a context, or as “receiving” one context
> +and “producing” another.

I'd skip this paragraph. In fact, I think that API makes perfect
sense, OOP languages adopt such constructs all the time, for example
this could be a static module/class constructor. In other words, we
are not telling anywhere that all functions should take a "context" as
its first argument, and the documentation specify exactly how this
works, if you feel this is not clear or silent probably this is a sign
that that function documentation should be extended.

> +
> +@subsection Context_data_hiding Data hiding: private contexts
> +
> +```c
> +// Context structs often hide private context:
> +struct AVSomeContext {
> +  void *priv_data; // sometimes just called "internal"
> +};
> +```
> +
> +Contexts present a public interface, so changing a context's members forces
> +everyone that uses the library to at least recompile their program,
> +if not rewrite it to remain compatible.  Many contexts reduce this problem
> +by including a private context with a type that is not exposed in the public
> +interface.  Hiding information this way ensures it can be modified without
> +affecting downstream software.
> +
> +Private contexts often store variables users aren't supposed to see
> +(similar to an [OOP](https://en.wikipedia.org/wiki/Object-oriented_programming)
> +private block), but can be used for more than just access control.  They can
> +also store information shared between some but not all instances of a context
> +(e.g. codec-specific functionality), and @ref Context_avoptions
> +"AVOptions-enabled structs" can provide user configuration options through
> +the @ref avoptions "AVOptions API".

I'll skip this section as well: data hiding is a common C technique,
and AVOptions are already covered later in the document or in another
dedicated section.

> +@subsection Context_lifetime Manage lifetime: creation, use, destruction
> +
> +```c
> +void my_function(...) {
> +
> +    // Context structs are allocated then initialized with associated functions:
> +
> +    AVSomeContext *ctx = av_some_context_alloc(...);
> +
> +    // ... configure ctx ...
> +
> +    av_some_context_init(ctx, ...);
> +
> +    // ... use ctx ...
> +
> +    // Context structs are closed then freed with associated functions:
> +
> +    av_some_context_close(ctx);
> +    av_some_context_free(ctx);
> +
> +}
> +```
> +FFmpeg contexts go through the following stages of life:
> +
> +1. allocation (often a function that ends with `_alloc`)
> +   * a range of memory is allocated for use by the structure
> +   * memory is allocated on boundaries that improve caching
> +   * memory is reset to zeroes, some internal structures may be initialized
> +2. configuration (implemented by setting values directly on the context)
> +   * no function for this - calling code populates the structure directly
> +   * memory is populated with useful values
> +   * simple contexts can skip this stage
> +3. initialization (often a function that ends with `_init`)
> +   * setup actions are performed based on the configuration (e.g. opening files)
> +5. normal usage
> +   * most functions are called in this stage
> +   * documentation implies some members are now read-only (or not used at all)
> +   * some contexts allow re-initialization
> +6. closing (often a function that ends with `_close()`)
> +   * teardown actions are performed (e.g. closing files)
> +7. deallocation (often a function that ends with `_free()`)
> +   * memory is returned to the pool of available memory
> +
> +This can mislead object-oriented programmers, who expect something more like:
> +
> +1. allocation (usually a `new` keyword)
> +   * a range of memory is allocated for use by the structure
> +   * memory *may* be reset (e.g. for security reasons)
> +2. initialization (usually a constructor)
> +   * memory is populated with useful values
> +   * related setup actions are performed based on arguments (e.g. opening files)
> +3. normal usage
> +   * most functions are called in this stage
> +   * compiler enforces that some members are read-only (or private)
> +   * no going back to the previous stage
> +4. finalization (usually a destructor)
> +   * teardown actions are performed (e.g. closing files)
> +5. deallocation (usually a `delete` keyword)
> +   * memory is returned to the pool of available memory
> +
> +The remainder of this section discusses FFmpeg's differences from OOP, to help
> +object-oriented programmers avoid misconceptions.  You can safely skip this
> +section if you aren't familiar with the OOP lifetime described above.
> +
> +FFmpeg's allocation stage is broadly similar to the OOP stage of the same name.
> +Both set aside some memory for use by a new entity, but FFmpeg's stage can also
> +do some higher-level operations.  For example, @ref Context_avoptions
> +"AVOptions-enabled structs" set their AVClass member during allocation.
> +
> +FFmpeg's configuration stage involves setting any variables you want before
> +you start using the context.  Complicated FFmpeg structures like AVCodecContext
> +tend to have many members you *could* set, but in practice most programs set
> +few if any of them.  The freeform configuration stage works better than bundling
> +these into the initialization stage, which would lead to functions with
> +impractically many parameters, and would mean each new option was an
> +incompatible change to the API.  One way to understand the problem is to read
> +@ref Context_avoptions "the AVOptions section below" and think how a constructor
> +would handle those options.
> +
> +FFmpeg's initialization stage involves calling a function that sets the context
> +up based on your configuration.
> +
> +FFmpeg's first three stages do the same job as OOP's first two stages.
> +This can mislead object-oriented developers, who expect to do less work in the
> +allocation stage, and more work in the initialization stage.  To simplify this,
> +most FFmpeg contexts provide a combined allocator and initializer function.
> +For historical reasons, suffixes like `_alloc`, `_init`, `_alloc_context` and
> +even `_open` can indicate the function does any combination of allocation and
> +initialization.
> +
> +FFmpeg's "closing" stage is broadly similar to OOP's "finalization" stage,
> +but some contexts allow re-initialization after finalization.  For example,
> +SwrContext lets you call swr_close() then swr_init() to reuse a context.
> +Be aware that some FFmpeg functions happen to use the word "finalize" in a way
> +that has nothing to do with the OOP stage (e.g. av_bsf_list_finalize()).
> +
> +FFmpeg's "deallocation" stage is broadly similar to OOP, but can perform some
> +higher-level functions (similar to the allocation stage).
> +
> +Closing functions usually end with "_close", while deallocation
> +functions usually end with "_free".  Very few contexts need the flexibility of
> +separate "closing" and "deallocation" stages, so many "_free" functions
> +implicitly close the context first.

About this I have mixed feelings, but to me it sounds like a-posteriori
rationalization.

I don't think there is a general rule with the allocation/closing/free
rule for the various FFmpeg APIs, and giving the impression that this
is the case might be misleading. In practice the user needs to master
only a single API at a time (encodering/decoding, muxing/demuxing,
etc.)  each one with possibly slight differences in how the term
close/allocation/free are used. This is probably not optimal, but in
practice it works as the user do not really need to know all the
possible uses of the API (she will work through what she is interested
for the job at hand).

> +
> +@subsection Context_avoptions Configuration options: AVOptions-enabled structs
> +

> +The @ref avoptions "AVOptions API" is a framework to configure user-facing
> +options, e.g. on the command-line or in GUI configuration forms.

This looks wrong. AVOptions is not at all about CLI or GUI options, is
just some way to set/get fields (that is "options") defined in a
struct (a context) using a high level API including: setting multiple
options at once (through a textual encoding or a dictionary),
input/range validation, setting more fields based on a single option
(e.g. the size) etc.

Then you can query the options in a given struct and create
corresponding options in a UI, but this is not the point of AVOptions.

> +To understand FFmpeg's configuration requirements, run `ffmpeg -h full` on the
> +command-line, then ask yourself how you would implement all those options
> +with the C standard [`getopt` function](https://en.wikipedia.org/wiki/Getopt).
> +You can also ask the same question for other approaches - for example, how would
> +you maintain a GUI with 15,000+ configuration options?
> +
> +Most solutions assume you can just put all options in a single code block,
> +which is unworkable at FFmpeg's scale.  Instead, we split configuration
> +across many *AVOptions-enabled structs*, which use the @ref avoptions
> +"AVOptions API" to inspect and configure options, including in private contexts.
> +
> +AVOptions-accessible members of a context should be accessed through the
> +@ref avoptions "AVOptions API" whenever possible, even if they're not hidden
> +in a private context.  That ensures values are validated as they're set, and
> +means you won't have to do as much work if a future version of FFmpeg changes
> +the allowed values.
> +

> +Although not strictly required, it is best to only modify options during
> +the configuration stage.  Initialized structs may be accessed by internal
> +FFmpeg threads, and modifying them can cause weird intermittent bugs.
> +
> +@subsection Context_logging Logging: AVClass context structures
> +
> +FFmpeg's @ref lavu_log "logging facility" needs to be simple to use,
> +but flexible enough to let people debug problems.  And much like options,
> +it needs to work the same across a wide variety of unrelated structs.
> +
> +FFmpeg structs that support the logging framework are called *@ref AVClass
> +context structures*.  The name @ref AVClass was chosen early in FFmpeg's
> +development, but in practice it only came to store information about
> +logging, and about options.

OTOH hand AVOptions and logging should be discussed in the relevant
files, to avoid duplication.

> +
> +@section Context_further Further information about contexts
> +
> +So far, this document has provided a theoretical guide to FFmpeg contexts.
> +This final section provides some alternative approaches to the topic,
> +which may help round out your understanding.
> +
> +@subsection Context_example Learning by example: context for a codec
> +
> +It can help to learn contexts by doing a deep dive into a specific struct.
> +This section will discuss AVCodecContext - an AVOptions-enabled struct
> +that contains information about encoding or decoding one stream of data
> +(e.g. the video in a movie).
> +
> +The name "AVCodecContext" tells us this is a context.  Many of
> +@ref libavcodec/avcodec.h "its functions" start with an `avctx` parameter,
> +indicating this parameter provides context for that function.
> +
> +AVCodecContext::internal contains the private context.  For example,
> +codec-specific information might be stored here.
> +
> +AVCodecContext is allocated with avcodec_alloc_context3(), initialized with
> +avcodec_open2(), and freed with avcodec_free_context().  Most of its members
> +are configured with the @ref avoptions "AVOptions API", but for example you
> +can set AVCodecContext::draw_horiz_band() if your program happens to need it.
> +
> +AVCodecContext provides an abstract interface to many different *codecs*.
> +Options supported by many codecs (e.g. "bitrate") are kept in AVCodecContext
> +and exposed with AVOptions.  Options that are specific to one codec are
> +stored in the private context, and also exposed with AVOptions.
> +
> +AVCodecContext::av_class contains logging metadata to ensure all codec-related
> +error messages look the same, plus implementation details about options.
> +
> +To support a specific codec, AVCodecContext's private context is set to
> +an encoder-specific data type.  For example, the video codec
> +[H.264](https://en.wikipedia.org/wiki/Advanced_Video_Coding) is supported via
> +[the x264 library](https://www.videolan.org/developers/x264.html), and
> +implemented in X264Context.  Although included in the documentation, X264Context
> +is not part of the public API.  That means FFmpeg's @ref ffmpeg_versioning
> +"strict rules about changing public structs" aren't as important here, so a
> +version of FFmpeg could modify X264Context or replace it with another type
> +altogether.  An adverse legal ruling or security problem could even force us to
> +switch to a completely different library without a major version bump.
> +
> +The design of AVCodecContext provides several important guarantees:
> +
> +- lets you use the same interface for any codec
> +- supports common encoder options like "bitrate" without duplicating code
> +- supports encoder-specific options like "profile" without bulking out the public interface
> +- exposes both types of options to users, with help text and detection of missing options
> +- provides uniform logging output
> +- hides implementation details (e.g. its encoding buffer)
> +

> +@subsection Context_comparison Learning by comparison: FFmpeg vs. Curl contexts

About this, I'm still not really convinced that this should be part of
a reference, in the sense that it is adding more information than
really needed and it treats concepts related to the C language rather
than to the FFmpeg API itself.

[...]

Andrew Sayers June 13, 2024, 2:20 p.m. UTC | #3

On Wed, Jun 12, 2024 at 10:52:00PM +0200, Stefano Sabatini wrote:
> On date Tuesday 2024-06-04 15:47:21 +0100, Andrew Sayers wrote:
[...]
> My impression is that this is growing out of scope for a
> reference. The doxy is a reference, therefore it should be clean and
> terse, and we should avoid adding too much information, enough
> information should be right enough. In fact, a reference is different
> from a tutorial, and much different from a C tutorial. Also this is
> not a treatise comparing different languages and frameworks, as this
> would confuse beginners and would annoy experienced developers.
> 
> I propose to cut this patch to provide the minimal information you can
> expect in a reference, but not more than that. Addition can be added
> later, but I think we should try to avoid any unnecessary content, in
> the spirit of keeping this a reference. More extensive discussions
> might be done in a separate place (the wiki, a blog post etc.), but in
> the spirit of a keeping this a reference they should not be put here.

I would agree if we had a tradition of linking to the wiki or regular blog
posts, but even proposing internal links has generated pushback in this thread,
so that feels like making the perfect the enemy of the good.  Let's get this
committed, see how people react, then look for improvements.

In fact, once this is available in the trunk version of the website,
we should ask for feedback from the libav-user ML and #ffmpeg IRC channel.
Then we can expand/move/remove stuff based on feedback.

> 
> > +
> > +@section Context_general “Context” as a general concept
[...]
> I'd skip all this part, as we assume the reader is already familiar
> with C language and with data encapsulation through struct, if he is
> not this is not the right place where to teach about C language
> fundamentals.

I disagree, for a reason I've been looking for an excuse to mention :)

Let's assume 90% of people who use FFmpeg already know something in the doc.
You could say that part of the doc is useless to 90% of the audience.
Or you could say that 90% of FFmpeg users are not our audience.

Looking at it the second way means you need to spend more time on "routing" -
linking to the document in ways that (only) attract your target audience,
making a table of contents with headings that aid skip-readers, etc.
But once you've routed people around the bits they don't care about,
it's fine to have documentation that's only needed by a minority.

Also, less interesting but equally important - context is not a C language
fundamental, it's more like an emergent property of large C projects.  A
developer that came here without knowing e.g. what a struct is could read
any of the online tutorials that explain the concept better than we could.
I'd be happy to link to a good tutorial about contexts if we found one,
but we have to meet people where they are, and this is the best solution
I've been able to find.

> 
> > +
> > +When reading code that *is* explicitly described in terms of contexts,
> > +remember that the term's meaning is guaranteed by *the project's community*,
> > +not *the language it's written in*.  That means guarantees may be more flexible
> > +and change more over time.  For example, programming languages that use
> > +[encapsulation](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming))
> > +will simply refuse to compile code that violates its rules about access,
> > +while communities can put up with special cases if they improve code quality.
> > +
> 
> This looks a bit vague so I'd rather drop this.

This probably looks vague to you because you're part of the 90% of people this
paragraph isn't for.  All programming languages provide some guarantees, and
leave others up to the community to enforce (or not).  Over time, people stop
seeing the language guarantees at all, and assume the only alternative is
anarchy.  For example, if you got involved in a large JavaScript project,
you might be horrified to see almost all structs are the same type ("Object"),
and are implemented as dictionaries that are expected to have certain keys.
But in practice, this stuff gets enforced at the community level well enough.
Similarly, a JS programmer might be horrified to learn FFmpeg needs a whole
major version bump just to add a key to a struct.  This paragraph is there to
nudge people who have stopped seeing things we need them to look out for.

If you'd like to maintain an official FFmpeg blog, I'd be happy to expand the
paragraph above into a medium-sized post, then just link it from the doc.
But that post would be too subjective to be a wiki page - JavaScript is
evolving in a more strongly-typed direction, so it would only make sense to
future readers if they could say "oh yeah this was written in 2024, JS was
still like that back then".  This paragraph is an achievable compromise -
covers enough ground to give people a way to think about the code, short enough
for people who don't care to skip over, and objective enough to belong in
documentation.  We can always change it if we find a better solution.

[...]
> > +Some functions fit awkwardly within FFmpeg's context idiom, so they send mixed
> > +signals.  For example, av_ambient_viewing_environment_create_side_data() creates
> > +an AVAmbientViewingEnvironment context, then adds it to the side-data of an
> > +AVFrame context.  So its name hints at one context, its parameter hints at
> > +another, and its documentation is silent on the issue.  You might prefer to
> > +think of such functions as not having a context, or as “receiving” one context
> > +and “producing” another.
> 
> I'd skip this paragraph. In fact, I think that API makes perfect
> sense, OOP languages adopt such constructs all the time, for example
> this could be a static module/class constructor. In other words, we
> are not telling anywhere that all functions should take a "context" as
> its first argument, and the documentation specify exactly how this
> works, if you feel this is not clear or silent probably this is a sign
> that that function documentation should be extended.

That would be fine if it were just this function, but FFmpeg is littered
with special cases that don't quite fit.  Another example might be
swr_alloc_set_opts2(), which can take an SwrContext in a way that resembles
a context, or can take NULL and allocate a new SwrContext.  And yes,
we could document that edge case, and the next one, and the one after that.
But even if we documented every little special case that existed today,
there's no rule, so new bits of API will just reintroduce the problem again.

There's a deeper issue here - as an expert, when you don't know something,
your default assumption is that it's undefined, and could evolve in future.
When a newbie doesn't know something, their default assumption is that
everybody else knows and they're just stupid.  That assumption drives
newbies away from projects, so it's important to fill in as many blanks as
possible, even if it has to be with simple answers that they eventually
evolve beyond (and feel smart for doing so).

> > +@subsection Context_lifetime Manage lifetime: creation, use, destruction
[...]
> About this I have mixed feelings, but to me it sounds like a-posteriori
> rationalization.
> 
> I don't think there is a general rule with the allocation/closing/free
> rule for the various FFmpeg APIs, and giving the impression that this
> is the case might be misleading. In practice the user needs to master
> only a single API at a time (encodering/decoding, muxing/demuxing,
> etc.)  each one with possibly slight differences in how the term
> close/allocation/free are used. This is probably not optimal, but in
> practice it works as the user do not really need to know all the
> possible uses of the API (she will work through what she is interested
> for the job at hand).

Note: I'm assuming "this" means "this section", not "this paragraph".
Apologies if it was intended as a specific nitpick about closing functions.

TBH, a lot of this document is about inventing memorable rules of thumb.
The alternative is to say "FFmpeg devs can't agree on an answer, so they just
left you to memorise 3,000+ special cases".

Let's assume learning the whole of FFmpeg means understanding 3,000 tokens
(I'm not sure the exact count in 7.0, but it's about that number if you don't
include individual members of structs, arguments to functions etc.).  Let's
also assume it takes an average of ten minutes to learn each token (obviously
that varies - AV_LOG_PANIC will take less, AVCodecContext will take more).
That means you'd have to spend 8 hours a day every day for over two months
to learn FFmpeg.  Obviously there are usable subsets, but they mostly cut out
the simple things, so don't save nearly as much time as you'd think.  If you
want people to pick up FFmpeg, they need to learn a useful subset in about 8
hours, which requires a drastically simplified explanation.

(the above is closely related to an argument from a recent post[1],
but the numbers might help explain the scale of the challenge)

There may not be an explicit rule for context lifetimes, but I've looked at the
code carefully enough to have a nuanced opinion about the number of tokens,
and the edgiest case I've found so far is swr_alloc_set_opts2() (see above).
I'm open to counterexamples, but the model discussed in this section feels
pretty reliable.

> 
> > +
> > +@subsection Context_avoptions Configuration options: AVOptions-enabled structs
> > +
> 
> > +The @ref avoptions "AVOptions API" is a framework to configure user-facing
> > +options, e.g. on the command-line or in GUI configuration forms.
> 
> This looks wrong. AVOptions is not at all about CLI or GUI options, is
> just some way to set/get fields (that is "options") defined in a
> struct (a context) using a high level API including: setting multiple
> options at once (through a textual encoding or a dictionary),
> input/range validation, setting more fields based on a single option
> (e.g. the size) etc.
> 
> Then you can query the options in a given struct and create
> corresponding options in a UI, but this is not the point of AVOptions.

There's a problem here I haven't been communicating clearly enough.
I think that's because I've understated the problem in the past,
so I'll try overstating instead:

"Option" is a meaningless noise word.  A new developer might ask "is this like a
command-line option, or a CMake option?  Is it like those Python functions with
a million keyword arguments, or a config file with sensible defaults?".
Answering "it can be any of those if you need it to be" might help an advanced
user (not our audience), but is bewildering to a newbie who needs a rough guide
for how they're likely to use it.  The only solution that's useful to a newbie
is to provide a frame of reference, preferably in the form of something they
already know how to use.

Having said all that, yes this particular answer is wrong.  Could you apply [2]
so I can start thinking about what to replace it with?

[...]

[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/328970.html
[2] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/329068.html

Stefano Sabatini June 15, 2024, 9:17 a.m. UTC | #4

On date Thursday 2024-06-13 15:20:38 +0100, Andrew Sayers wrote:
> On Wed, Jun 12, 2024 at 10:52:00PM +0200, Stefano Sabatini wrote:
[...]
> > > +@section Context_general “Context” as a general concept
> [...]
> > I'd skip all this part, as we assume the reader is already familiar
> > with C language and with data encapsulation through struct, if he is
> > not this is not the right place where to teach about C language
> > fundamentals.
> 
> I disagree, for a reason I've been looking for an excuse to mention :)
> 

> Let's assume 90% of people who use FFmpeg already know something in the doc.
> You could say that part of the doc is useless to 90% of the audience.
> Or you could say that 90% of FFmpeg users are not our audience.
> 
> Looking at it the second way means you need to spend more time on "routing" -
> linking to the document in ways that (only) attract your target audience,
> making a table of contents with headings that aid skip-readers, etc.
> But once you've routed people around the bits they don't care about,
> it's fine to have documentation that's only needed by a minority.
> 

> Also, less interesting but equally important - context is not a C language
> fundamental, it's more like an emergent property of large C projects.  A
> developer that came here without knowing e.g. what a struct is could read
> any of the online tutorials that explain the concept better than we could.
> I'd be happy to link to a good tutorial about contexts if we found one,
> but we have to meet people where they are, and this is the best solution
> I've been able to find.

The context is just another way to call a struct used to keep an
entity state operated by several functions (that is in other words an
object and its methods), it's mostly about the specific jargon used by
FFmpeg (and used by other C projects as well). In addition to this we
provide some generic utilities (logging+avoptions) which can be used
through AVClass employment.

Giving a longer explanation is making this appear something more
complicated than actually is. My point is that providing more
information than actually needed provides the long-wall-of-text effect
(I need to read through all this to understand it - nah I'd rather
give-up), thus discouraging readers.

> 
> > 
> > > +
> > > +When reading code that *is* explicitly described in terms of contexts,
> > > +remember that the term's meaning is guaranteed by *the project's community*,
> > > +not *the language it's written in*.  That means guarantees may be more flexible
> > > +and change more over time.  For example, programming languages that use
> > > +[encapsulation](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming))
> > > +will simply refuse to compile code that violates its rules about access,
> > > +while communities can put up with special cases if they improve code quality.
> > > +
> > 
> > This looks a bit vague so I'd rather drop this.

I mean, if you read for the first time:
| [the context] term's meaning is guaranteed by *the project's
| community*, not the languaguage it's written for.
| That means guarantees may be more flexible and change more over time.

it's very hard to figure out what these guarantees are about, and this
might apply to every specific language and to every specific term,
that's why I consider this "vague".

[...]
> > > +Some functions fit awkwardly within FFmpeg's context idiom, so they send mixed
> > > +signals.  For example, av_ambient_viewing_environment_create_side_data() creates
> > > +an AVAmbientViewingEnvironment context, then adds it to the side-data of an
> > > +AVFrame context.  So its name hints at one context, its parameter hints at
> > > +another, and its documentation is silent on the issue.  You might prefer to
> > > +think of such functions as not having a context, or as “receiving” one context
> > > +and “producing” another.
> > 
> > I'd skip this paragraph. In fact, I think that API makes perfect
> > sense, OOP languages adopt such constructs all the time, for example
> > this could be a static module/class constructor. In other words, we
> > are not telling anywhere that all functions should take a "context" as
> > its first argument, and the documentation specify exactly how this
> > works, if you feel this is not clear or silent probably this is a sign
> > that that function documentation should be extended.
> 

> That would be fine if it were just this function, but FFmpeg is littered
> with special cases that don't quite fit.

I still fail to see the general rule for which this is creating a
special case. If this is a special case, what is this special case
for?

> Another example might be swr_alloc_set_opts2(), which can take an
> SwrContext in a way that resembles a context, or can take NULL and
> allocate a new SwrContext.  And yes, we could document that edge
> case, and the next one, and the one after that. But even if we
> documented every little special case that existed today, there's no
> rule, so new bits of API will just reintroduce the problem again.

In this specific case I'd say the API is not much ergonomic, and
probably it would have been better to keep operations (allow+set_opts)
separated, but then it is much a choice of the user (you can decide to
keep alloc and set_opts as different operations). On the other hand,
*it is already documented* so there is not much to add.

> There's a deeper issue here - as an expert, when you don't know something,
> your default assumption is that it's undefined, and could evolve in future.
> When a newbie doesn't know something, their default assumption is that
> everybody else knows and they're just stupid.  That assumption drives
> newbies away from projects, so it's important to fill in as many blanks as
> possible, even if it has to be with simple answers that they eventually
> evolve beyond (and feel smart for doing so).
> 
> > > +@subsection Context_lifetime Manage lifetime: creation, use, destruction
> [...]
> > About this I have mixed feelings, but to me it sounds like a-posteriori
> > rationalization.
> > 
> > I don't think there is a general rule with the allocation/closing/free
> > rule for the various FFmpeg APIs, and giving the impression that this
> > is the case might be misleading. In practice the user needs to master
> > only a single API at a time (encodering/decoding, muxing/demuxing,
> > etc.)  each one with possibly slight differences in how the term
> > close/allocation/free are used. This is probably not optimal, but in
> > practice it works as the user do not really need to know all the
> > possible uses of the API (she will work through what she is interested
> > for the job at hand).
> 
> Note: I'm assuming "this" means "this section", not "this paragraph".
> Apologies if it was intended as a specific nitpick about closing functions.
> 
> TBH, a lot of this document is about inventing memorable rules of thumb.

> The alternative is to say "FFmpeg devs can't agree on an answer, so they just
> left you to memorise 3,000+ special cases".

This is not what most people do. You don't have to read all the
manual, especially you don't have to memorize the complete API, but
only the relevant part for the task at hand. If you need to learn
about decoding, you would probably start with the decoding API
(libavcodec/avcodec.h - which is unfortunatly pretty much complex
because the problem is complex), and you can ignore pretty much
everything else. If you only need to work with hash methods, you only
need to learn about that API.

So in general, you only learn the smaller bits needed for the task at
hand. For example I never used the av_ambient_viewing_environment API,
and I will probably never will.

More realistically, people learn from examples, so that's why we
should improve doc/examples. doxy is mostly for providing a complete
reference in case you want to fine tune an already working piece of
code or you have problems understanding specific bits (how are
timestamps handled? When should I ref/unref a frame? How can I get the
name of a channel layout?).

Usually it is a mixed process (you read the API doc, or you read the
examples to get the general idea, and you interpolate between the
two).

> Let's assume learning the whole of FFmpeg means understanding 3,000 tokens
> (I'm not sure the exact count in 7.0, but it's about that number if you don't
> include individual members of structs, arguments to functions etc.).  Let's
> also assume it takes an average of ten minutes to learn each token (obviously
> that varies - AV_LOG_PANIC will take less, AVCodecContext will take more).
> That means you'd have to spend 8 hours a day every day for over two months
> to learn FFmpeg.

> Obviously there are usable subsets, but they mostly cut out the
> simple things, so don't save nearly as much time as you'd think.  If
> you want people to pick up FFmpeg, they need to learn a useful
> subset in about 8 hours, which requires a drastically simplified
> explanation.

I agree, I'm fine with giving a high level description of what we mean
by "context" - but probably one paragraph or two should be enough, but
adding too much information might get the exact opposite effect
(exposing more information than needed - assuming the reader is
familiar with the C language, which we should assume).

> (the above is closely related to an argument from a recent post[1],
> but the numbers might help explain the scale of the challenge)
> 
> There may not be an explicit rule for context lifetimes, but I've looked at the
> code carefully enough to have a nuanced opinion about the number of tokens,
> and the edgiest case I've found so far is swr_alloc_set_opts2() (see above).
> I'm open to counterexamples, but the model discussed in this section feels
> pretty reliable.

swr_alloc_set_opts() is an unfortunate case, probably it would be
better to fix the API to have a dedicated set_opts in place of
aggregating two operations - one of the issues being that the function
cannot distinguish the case of allocation failure (ENOMEM) from
invalid parameters (EINVAL).

That said, I would not mind keeping the general discussion, but
probably I'd cut the part about "finalize" to avoid the reference to
the C++ language, in general I'd avoid comparisons with C++ all around
the place.

> > > +
> > > +@subsection Context_avoptions Configuration options: AVOptions-enabled structs
> > > +
> > 
> > > +The @ref avoptions "AVOptions API" is a framework to configure user-facing
> > > +options, e.g. on the command-line or in GUI configuration forms.
> > 
> > This looks wrong. AVOptions is not at all about CLI or GUI options, is
> > just some way to set/get fields (that is "options") defined in a
> > struct (a context) using a high level API including: setting multiple
> > options at once (through a textual encoding or a dictionary),
> > input/range validation, setting more fields based on a single option
> > (e.g. the size) etc.
> > 
> > Then you can query the options in a given struct and create
> > corresponding options in a UI, but this is not the point of AVOptions.
> 
> There's a problem here I haven't been communicating clearly enough.
> I think that's because I've understated the problem in the past,
> so I'll try overstating instead:
> 
> "Option" is a meaningless noise word.  A new developer might ask "is this like a
> command-line option, or a CMake option?  Is it like those Python functions with
> a million keyword arguments, or a config file with sensible defaults?".
> Answering "it can be any of those if you need it to be" might help an advanced
> user (not our audience), but is bewildering to a newbie who needs a rough guide
> for how they're likely to use it.  The only solution that's useful to a newbie
> is to provide a frame of reference, preferably in the form of something they
> already know how to use.

This is the definition I gave:

[AVOptions system is] just some way to set/get fields (that is
"options") defined in a struct (a context) using a high level API
including: setting multiple options at once (through a textual
encoding or a dictionary), input/range validation, setting more fields
based on a single option (e.g. the size) etc.

"Setting options" on a struct/context with AVOptions means to set
_fields_ on the struct, following the very specific rules defined by
AVOptions.

So it's not like "it can be any of those if you need it to be", but it
has a very specific meaning.

> Having said all that, yes this particular answer is wrong.  Could
> you apply [2] so I can start thinking about what to replace it with?

I'll have a look at it.

Andrew Sayers June 16, 2024, 6:02 p.m. UTC | #5

Meta note #1: I've replied in this thread but changed the subject line.
That's because it needs to stay focussed on solving this thread's problem,
but may be of more general interest.

Meta note #2: Stefano, I appreciate your feedback, but would rather wait
for [1] to get sorted out, then formulate my thoughts while writing a new
version.  That way I'll be more focussed on ways to improve things for readers.

This thread started with what I thought was a trivia question[1] -
what is a context?  It's short for "AVClass context structure", which is
synonymous with "AVOptions-enabled struct".  It turned out to be more complex
than that, so I wrote a little patch[3] explaining this piece of jargon.
But it turned out to be more complex again, and so on until we got a 430-line
document explaining things in voluminous detail.

Everyone agrees this isn't ideal, so here are some alternatives.
This may also inspire thoughts about FFmpeg development in general.

# Alternative: Just give up

The argument: We tried something, learnt a lot, but couldn't find a solution
we agreed on, so let's come back another day.

Obviously this is the easy way out, but essentially means leaving a critical
bug in the documentation (misleads the reader about a fundamental concept).
Even the most negative take on this document is that it's better than nothing,
so I think we can rule this one out.

# Err on the side of under-communicating

The argument: this document is on the right tracks, but explains too many things
the reader can already be assumed to know.

This argument is more complex than it appears.  To take some silly examples,
I'm not going to learn Mandarin just because FFmpeg users can't be assumed to
speak English.  But I am willing to use American spelling because it's what
more readers are used to.  This e-mail is plenty long enough already, so
I'll stick to some high-level points about this argument.

The main risk of cutting documentation is that if someone can't follow a single
step, they're lost and don't even know how to express their problem.  Imagine
teaching maths to children - you need to teach them what numbers are, then how
to add them together, then multiplication, then finally exponents.  But if you
say "we don't need to teach numbers because kids all watch Numberblocks now",
you'll cover the majority of kids who could have worked it out anyway, and
leave a minority who just give up and say "I guess I must be bad at maths".
I'd argue it's better to write more, then get feedback from actual newbies and
cut based on the evidence - we'll get it wrong either way, but at least this way
the newbies will know what they want us to cut.

Incidentally, there's a much stronger argument for *drafting* a long document,
even if it gets cut down before it's committed.  FFmpeg has lots of undocumented
nuances that experts just know and newbies don't know to ask, and this thread is
full of instances where writing more detail helped tease out a misunderstanding.
[1] is a great example - I had finally written enough detail to uncover my
assumption that all AVOptions could be set at any time, then that thread
taught me to look for a flag that tells you the options for which that's true.

If you assume I'm not the only person who has been subtly misled that way,
you could argue it's better to commit the long version.  That would give readers
more opportunities to confront their own wrong assumptions, instead of reading
something that assumed they knew one thing, but let them keep believing another.
The obvious counterargument is that we should...

# Spread the information across multiple documents

The argument: this document puts too much information in one place.  We should
instead focus on making small patches that put information people need to know
where they need to know it.

This is where things get more interesting to a general audience.

If you have repo commit access, you're probably imagining a workflow like:
write a bunch of little commits, send them out for review, then commit them
when people stop replying.  Your access is evidence that you basically know how
things work, and also lets you make plans confident in the knowledge that
anything you need committed will make it there in the end.

My workflow is nothing like that.  This thread has constantly reinforced that I
don't understand FFmpeg, so it's better for me not to have commit access.  But
that means I can only work on one patch at once, because I will probably learn
something that invalidates any other work I would have done.  It also means
a single patch not getting interest is enough to sink the project altogether.
I can put up with that when it's one big multi-faceted patch, because I can work
on one part while waiting for feedback on another part.  But my small patches
generally involve a few hours of work, a week of waiting, a ping begging for
attention, then often being rejected or ignored.  In the real world, the only
thing this approach will achieve is to burn me out.

It might be possible to revisit this idea *after* committing the document,
when we're fairly confident the answers are right and just need to tweak
the presentation based on feedback.  Or other people could write documentation
based on issues brought up in this thread, and I'll cut as appropriate.
But for now this is a non-starter.

# Write a blog post

The argument: doxygen and texinfo are good for documenting "timeless truths".
But we don't have anywhere to put transitory information like upgrade guides,
or subjective information like guidance about best practices.  This document
shoehorns information in here that really belongs somewhere like that.

This is basically true, but that doesn't solve anything.

I have neither a personal blog nor the desire to write one, and even if I wrote
an excellent guide to contexts as a general concept, nobody would actually find
the blog to read it.  So this idea would only work if we e.g. overhauled the
news area of the site[4] to look more like GIMP's news section[5].

If someone else would like to start a project like that, I can promise a good
series of posts to help get the ball rolling, and will be happy to trim down
the document as those posts go public.

# Write tutorials

The argument: this explains ideas from first principles that are better
explained by example.  This document shoehorns information in here that
really belongs somewhere like that.

This seems reasonable in principle, but as well as the "lots of small commits"
problem discussed above, FFmpeg is currently structured so nobody is actually
going to do that.

I've tried to learn FFmpeg many times over the years, and last year's attempt
involved trying to write a subtitles tutorial.  I didn't submit it to the ML
because I didn't understand the theory well enough, and was fairly sure I had
made some underlying error that made it all wrong.  Everyone who does understand
FFmpeg well enough to write a good subtitles tutorial seems to understand it
well enough to want a complete rewrite, but not care enough to get that done.
So subtitles end up falling between two stools - too ugly for newbies to learn,
not ugly enough for experts to fix.

If you want relatively new developers to have the confidence to write tutorials,
they need the theory to be well-documented first.

# Rewrite the API

The argument: instead of writing a bunch of words apologising for an interface,
just write a better interface.

In general, I'm a big believer in using documentation to drive API decisions.
But in FFmpeg's case, it wouldn't help to have a single developer trying to fix
a community-wide problem.

We've previously discussed swr_alloc_set_opts() (essentially two functions
sharing one symbol) and av_ambient_viewing_environment_create_side_data()
(receives a different context argument than its function name).
A better example of the community-wide issue might be av_new_packet()
(initializes but does not allocate, despite slightly confusing documentation)
vs. av_new_program() (allocates and initializes, but has no documentation).
Both of these could be documented better, and a developer who only needs to
learn one won't be bothered by the other.  But the real problem is that they
use the word "new" to mean fundamentally incompatible things, creating a trap
for anyone reviewing code that uses the "new" they haven't seen before.

Solving this problem wouldn't just involve rewriting a bunch of functions.
It would involve motivating the community to avoid writing that sort of API
in future, which would take all of us many years to achieve.

# Apply this, then iterate

The argument: OK fine, but dragging down other solutions doesn't help this one.
We should at least try to solve these problems.

This is the position I've currently landed up at.  I'm increasingly able to
justify the big picture decisions behind the document, and we're getting to
the point where detailed discussions are just putting opinions where evidence
should go.

I've talked in a previous e-mail about getting feedback from #ffmpeg and the
libav-user mailing list.  We've also talked about the value of making the
document useful to people familiar with other programming languages, so we
could try reaching out to people who write bindings in other languages.

This sort of iteration can be pretty quick, because we don't need to wait for
a major release.  We just need to have something on ffmpeg.org so people know
this is a "real" project.  As such, I think a "release early, release often"
approach is the best way forward.

[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/329068.html
[2] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-April/325811.html
[3] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-April/325903.html
[4] https://ffmpeg.org/index.html#news
[5] https://www.gimp.org/news/

Paul B Mahol June 16, 2024, 9:20 p.m. UTC | #6

Avoid filling some of bellow points:

https://erikbern.com/2023/12/13/simple-sabotage-for-software.html

Especially part of rewriting public or internal API just for rewrite.

[FFmpeg-devel,v6,1/4] doc: Explain what "context" means

Checks

Commit Message

Comments

Patch