diff mbox series

[FFmpeg-devel,4/6] doc/developer: add a code behaviour section to development policy

Message ID 20230826180748.15977-4-anton@khirnov.net
State New
Headers show
Series [FFmpeg-devel,1/6] doc/developer: move a sentence to a more appropriate place | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Anton Khirnov Aug. 26, 2023, 6:07 p.m. UTC
Document our longstanding de facto policies on things like correctness,
thread-safety, UB, etc.
---
 doc/developer.texi | 50 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 36 insertions(+), 14 deletions(-)

Comments

Andreas Rheinhardt Aug. 26, 2023, 6:36 p.m. UTC | #1
Anton Khirnov:
> Document our longstanding de facto policies on things like correctness,
> thread-safety, UB, etc.
> ---
>  doc/developer.texi | 50 +++++++++++++++++++++++++++++++++-------------
>  1 file changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/developer.texi b/doc/developer.texi
> index df43119f98..afa0148137 100644
> --- a/doc/developer.texi
> +++ b/doc/developer.texi
> @@ -274,10 +274,6 @@ symbols. If in doubt, just avoid names starting with @code{_} altogether.
>  @section Miscellaneous conventions
>  
>  @itemize @bullet
> -@item
> -fprintf and printf are forbidden in libavformat and libavcodec,
> -please use av_log() instead.
> -
>  @item
>  Casts should be used only when necessary. Unneeded parentheses
>  should also be avoided if they don't make the code easier to understand.
> @@ -286,6 +282,42 @@ should also be avoided if they don't make the code easier to understand.
>  @anchor{Development Policy}
>  @chapter Development Policy
>  
> +@section Code behaviour
> +
> +@subheading Correctness
> +The code must be valid. It must not crash, abort, access invalid pointers, leak
> +memory, cause data races or signed integer overflow, or otherwise invoke
> +undefined behaviour. Error codes should be checked and, when applicable,
> +forwarded to the caller.
> +
> +@subheading Thread- and library-safety
> +Our libraries may be called by multiple independent callers in the same process.
> +These calls may happen from any number of threads and the different call sites
> +may not be aware of each other - e.g. a user program may be calling us directly,
> +and use one or more libraries that also call us. The code must behave correctly
> +under such conditions.

Some of this can no longer be guaranteed when FFmpeg is built without
threading support, but called from multiple threads concurrently. Should
this be mentioned or would it just confuse readers? (I'm leaning to the
latter.)

- Andreas
Anton Khirnov Aug. 26, 2023, 6:54 p.m. UTC | #2
Quoting Andreas Rheinhardt (2023-08-26 20:36:25)
> Anton Khirnov:
> > Document our longstanding de facto policies on things like correctness,
> > thread-safety, UB, etc.
> > ---
> >  doc/developer.texi | 50 +++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 36 insertions(+), 14 deletions(-)
> > 
> > diff --git a/doc/developer.texi b/doc/developer.texi
> > index df43119f98..afa0148137 100644
> > --- a/doc/developer.texi
> > +++ b/doc/developer.texi
> > @@ -274,10 +274,6 @@ symbols. If in doubt, just avoid names starting with @code{_} altogether.
> >  @section Miscellaneous conventions
> >  
> >  @itemize @bullet
> > -@item
> > -fprintf and printf are forbidden in libavformat and libavcodec,
> > -please use av_log() instead.
> > -
> >  @item
> >  Casts should be used only when necessary. Unneeded parentheses
> >  should also be avoided if they don't make the code easier to understand.
> > @@ -286,6 +282,42 @@ should also be avoided if they don't make the code easier to understand.
> >  @anchor{Development Policy}
> >  @chapter Development Policy
> >  
> > +@section Code behaviour
> > +
> > +@subheading Correctness
> > +The code must be valid. It must not crash, abort, access invalid pointers, leak
> > +memory, cause data races or signed integer overflow, or otherwise invoke
> > +undefined behaviour. Error codes should be checked and, when applicable,
> > +forwarded to the caller.
> > +
> > +@subheading Thread- and library-safety
> > +Our libraries may be called by multiple independent callers in the same process.
> > +These calls may happen from any number of threads and the different call sites
> > +may not be aware of each other - e.g. a user program may be calling us directly,
> > +and use one or more libraries that also call us. The code must behave correctly
> > +under such conditions.
> 
> Some of this can no longer be guaranteed when FFmpeg is built without
> threading support, but called from multiple threads concurrently. Should
> this be mentioned or would it just confuse readers? (I'm leaning to the
> latter.)

I tend to agree, it's a very obscure use case anyway.

But we should probably document more explicitly somewhere (and warn in
configure) what building without threads actually implies.
Stefano Sabatini Aug. 27, 2023, 12:38 p.m. UTC | #3
Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:

> Document our longstanding de facto policies on things like correctness,
> thread-safety, UB, etc.
>

UB?

---
>  doc/developer.texi | 50 +++++++++++++++++++++++++++++++++-------------
>  1 file changed, 36 insertions(+), 14 deletions(-)
>
> diff --git a/doc/developer.texi b/doc/developer.texi
> index df43119f98..afa0148137 100644
> --- a/doc/developer.texi
> +++ b/doc/developer.texi
> @@ -274,10 +274,6 @@ symbols. If in doubt, just avoid names starting with
> @code{_} altogether.
>  @section Miscellaneous conventions
>
>  @itemize @bullet
> -@item
> -fprintf and printf are forbidden in libavformat and libavcodec,
> -please use av_log() instead.
> -
>  @item
>  Casts should be used only when necessary. Unneeded parentheses
>  should also be avoided if they don't make the code easier to understand.
> @@ -286,6 +282,42 @@ should also be avoided if they don't make the code
> easier to understand.
>  @anchor{Development Policy}
>  @chapter Development Policy
>
> +@section Code behaviour
> +
> +@subheading Correctness
> +The code must be valid. It must not crash, abort, access invalid
> pointers, leak
> +memory, cause data races or signed integer overflow, or otherwise invoke
> +undefined behaviour.


Invoke => assume?

Error codes should be checked and, when applicable,
> +forwarded to the caller.
> +
> +@subheading Thread- and library-safety
> +Our libraries may be called by multiple independent callers in the same
> process.
> +These calls may happen from any number of threads and the different call
> sites
> +may not be aware of each other - e.g. a user program may be calling us
> directly,
>

Calling us -> calling the/our libraries

+and use one or more libraries that also call us. The code must behave
> correctly
> +under such conditions.
> +
> +@subheading Robustness
> +The code must treat as untrusted any bytestream received from a caller or
> read
> +from a file, network, etc. It must not misbehave when arbitrary data is
> sent to
> +it - typically it should print an error message and return
> +@code{AVERROR_INVALIDDATA} on encountering invalid input data.
> +
> +@subheading Memory allocation
> +The code must use the @code{av_malloc()} family of functions from
> +@file{libavutil/mem.h} to perform all memory allocation, except in
> special cases
> +(e.g. when interacting with an external library that requires a specific
> +allocator to be used).
> +
> +All allocations should be checked and @code{AVERROR(ENOMEM)} returned on
> +failure. A common mistake is that error paths leak memory - make sure
> that does
> +not happen.
> +
> +@subheading stdio
> +Our libraries must not access the stdio streams stdin/stdout/stderr
> directly
> +(e.g. via @code{printf()} family of functions), as that is not
> library-safe. For
> +logging, use @code{av_log()}.


Lgtm otherwise, thanks.

>
>
Anton Khirnov Aug. 29, 2023, 8:34 a.m. UTC | #4
Quoting Stefano Sabatini (2023-08-27 14:38:44)
> Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:
> 
> > Document our longstanding de facto policies on things like correctness,
> > thread-safety, UB, etc.
> >
> 
> UB?

undefined behavior

> 
> ---
> >  doc/developer.texi | 50 +++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 36 insertions(+), 14 deletions(-)
> >
> > diff --git a/doc/developer.texi b/doc/developer.texi
> > index df43119f98..afa0148137 100644
> > --- a/doc/developer.texi
> > +++ b/doc/developer.texi
> > @@ -274,10 +274,6 @@ symbols. If in doubt, just avoid names starting with
> > @code{_} altogether.
> >  @section Miscellaneous conventions
> >
> >  @itemize @bullet
> > -@item
> > -fprintf and printf are forbidden in libavformat and libavcodec,
> > -please use av_log() instead.
> > -
> >  @item
> >  Casts should be used only when necessary. Unneeded parentheses
> >  should also be avoided if they don't make the code easier to understand.
> > @@ -286,6 +282,42 @@ should also be avoided if they don't make the code
> > easier to understand.
> >  @anchor{Development Policy}
> >  @chapter Development Policy
> >
> > +@section Code behaviour
> > +
> > +@subheading Correctness
> > +The code must be valid. It must not crash, abort, access invalid
> > pointers, leak
> > +memory, cause data races or signed integer overflow, or otherwise invoke
> > +undefined behaviour.
> 
> 
> Invoke => assume?

No, 'assume' would mean something else. 'cause' maybe.

> 
> Error codes should be checked and, when applicable,
> > +forwarded to the caller.
> > +
> > +@subheading Thread- and library-safety
> > +Our libraries may be called by multiple independent callers in the same
> > process.
> > +These calls may happen from any number of threads and the different call
> > sites
> > +may not be aware of each other - e.g. a user program may be calling us
> > directly,
> >
> 
> Calling us -> calling the/our libraries

I wanted to avoid using the word 'libraries' multiple times, to avoid
possible confusion of which library is being talked about - whether ours
or some external one.
Stefano Sabatini Aug. 31, 2023, 3:28 p.m. UTC | #5
On date Tuesday 2023-08-29 10:34:45 +0200, Anton Khirnov wrote:
> Quoting Stefano Sabatini (2023-08-27 14:38:44)
> > Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:
> > 
> > > Document our longstanding de facto policies on things like correctness,
> > > thread-safety, UB, etc.
> > >
> > 
> > UB?
> 
> undefined behavior

yes, let's avoid acronyms

> 
> > 
> > ---
> > >  doc/developer.texi | 50 +++++++++++++++++++++++++++++++++-------------
> > >  1 file changed, 36 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/doc/developer.texi b/doc/developer.texi
> > > index df43119f98..afa0148137 100644
> > > --- a/doc/developer.texi
> > > +++ b/doc/developer.texi
> > > @@ -274,10 +274,6 @@ symbols. If in doubt, just avoid names starting with
> > > @code{_} altogether.
> > >  @section Miscellaneous conventions
> > >
> > >  @itemize @bullet
> > > -@item
> > > -fprintf and printf are forbidden in libavformat and libavcodec,
> > > -please use av_log() instead.
> > > -
> > >  @item
> > >  Casts should be used only when necessary. Unneeded parentheses
> > >  should also be avoided if they don't make the code easier to understand.
> > > @@ -286,6 +282,42 @@ should also be avoided if they don't make the code
> > > easier to understand.
> > >  @anchor{Development Policy}
> > >  @chapter Development Policy
> > >
> > > +@section Code behaviour
> > > +
> > > +@subheading Correctness
> > > +The code must be valid. It must not crash, abort, access invalid
> > > pointers, leak
> > > +memory, cause data races or signed integer overflow, or otherwise invoke
> > > +undefined behaviour.
> > 
> > 
> > Invoke => assume?
> 

> No, 'assume' would mean something else. 'cause' maybe.

"cause" sounds good.

> 
> > 
> > Error codes should be checked and, when applicable,
> > > +forwarded to the caller.
> > > +
> > > +@subheading Thread- and library-safety
> > > +Our libraries may be called by multiple independent callers in the same
> > > process.
> > > +These calls may happen from any number of threads and the different call
> > > sites
> > > +may not be aware of each other - e.g. a user program may be calling us
> > > directly,
> > >
> > 
> > Calling us -> calling the/our libraries
> 

> I wanted to avoid using the word 'libraries' multiple times, to avoid
> possible confusion of which library is being talked about - whether ours
> or some external one.

But using "us" is even more confusing (who/what is "us"?), I have no
better ideas than repeating "our/the FFmpeg libraries".
Michael Niedermayer Sept. 1, 2023, 5:01 p.m. UTC | #6
On Thu, Aug 31, 2023 at 05:28:48PM +0200, Stefano Sabatini wrote:
> On date Tuesday 2023-08-29 10:34:45 +0200, Anton Khirnov wrote:
> > Quoting Stefano Sabatini (2023-08-27 14:38:44)
> > > Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:
> > > 
> > > > Document our longstanding de facto policies on things like correctness,
> > > > thread-safety, UB, etc.
> > > >
> > > 
> > > UB?
> > 
> > undefined behavior
> 
> yes, let's avoid acronyms

+1 (unless they are very widely established or not critical to the understaning
    of the text, like ISO in ISO C)

thx

[...]
Rémi Denis-Courmont Sept. 1, 2023, 5:10 p.m. UTC | #7
Le torstaina 31. elokuuta 2023, 18.28.48 EEST Stefano Sabatini a écrit :
> On date Tuesday 2023-08-29 10:34:45 +0200, Anton Khirnov wrote:
> > Quoting Stefano Sabatini (2023-08-27 14:38:44)
> > 
> > > Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:
> > > > Document our longstanding de facto policies on things like
> > > > correctness,
> > > > thread-safety, UB, etc.
> > > 
> > > UB?
> > 
> > undefined behavior
> 
> yes, let's avoid acronyms

I think that developers should know common accronyms of the primary 
programming language of the project, and it is totally reasonable to expect 
that they are.

And yes, UB is a very well known accronym in this context. In my experience, 
people who don't know the accronym don't know the concept either, so spelling 
it out wouldn't even help.

TBH, this one is on you.
Paul B Mahol Sept. 1, 2023, 5:26 p.m. UTC | #8
On Fri, Sep 1, 2023 at 7:16 PM Rémi Denis-Courmont <remi@remlab.net> wrote:

> Le torstaina 31. elokuuta 2023, 18.28.48 EEST Stefano Sabatini a écrit :
> > On date Tuesday 2023-08-29 10:34:45 +0200, Anton Khirnov wrote:
> > > Quoting Stefano Sabatini (2023-08-27 14:38:44)
> > >
> > > > Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha
> scritto:
> > > > > Document our longstanding de facto policies on things like
> > > > > correctness,
> > > > > thread-safety, UB, etc.
> > > >
> > > > UB?
> > >
> > > undefined behavior
> >
> > yes, let's avoid acronyms
>
> I think that developers should know common accronyms of the primary
> programming language of the project, and it is totally reasonable to
> expect
> that they are.
>
> And yes, UB is a very well known accronym in this context. In my
> experience,
> people who don't know the accronym don't know the concept either, so
> spelling
> it out wouldn't even help.
>
> TBH, this one is on you.
>

TBH, novice and newbies lack bunch of general and specific knowledge anyway.


>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Stefano Sabatini Sept. 1, 2023, 6:33 p.m. UTC | #9
On date Friday 2023-09-01 20:10:11 +0300, Rémi Denis-Courmont wrote:
> Le torstaina 31. elokuuta 2023, 18.28.48 EEST Stefano Sabatini a écrit :
> > On date Tuesday 2023-08-29 10:34:45 +0200, Anton Khirnov wrote:
> > > Quoting Stefano Sabatini (2023-08-27 14:38:44)
> > > 
> > > > Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:
> > > > > Document our longstanding de facto policies on things like
> > > > > correctness,
> > > > > thread-safety, UB, etc.
> > > > 
> > > > UB?
> > > 
> > > undefined behavior
> > 
> > yes, let's avoid acronyms
> 
> I think that developers should know common accronyms of the primary 
> programming language of the project, and it is totally reasonable to expect 
> that they are.
> 
> And yes, UB is a very well known accronym in this context. In my experience, 
> people who don't know the accronym don't know the concept either, so spelling 
> it out wouldn't even help.
> 
> TBH, this one is on you.

In general, it's good practice to avoid unnecessary use of acronyms as
they add ambiguity with very small gain (some fraction of a second
when typing and a few bytes in storage size). Also note that the
acronym is only used in the commit log and spelled out in the texi
file. That said, I'm not blocking on this but I added the note since
it was not very clear from the context when reading it.
Michael Niedermayer Sept. 2, 2023, 8:03 p.m. UTC | #10
On Fri, Sep 01, 2023 at 08:10:11PM +0300, Rémi Denis-Courmont wrote:
> Le torstaina 31. elokuuta 2023, 18.28.48 EEST Stefano Sabatini a écrit :
> > On date Tuesday 2023-08-29 10:34:45 +0200, Anton Khirnov wrote:
> > > Quoting Stefano Sabatini (2023-08-27 14:38:44)
> > > 
> > > > Il sab 26 ago 2023, 20:08 Anton Khirnov <anton@khirnov.net> ha scritto:
> > > > > Document our longstanding de facto policies on things like
> > > > > correctness,
> > > > > thread-safety, UB, etc.
> > > > 
> > > > UB?
> > > 
> > > undefined behavior
> > 
> > yes, let's avoid acronyms
> 
> I think that developers should know common accronyms of the primary 
> programming language of the project, and it is totally reasonable to expect 
> that they are.
> 

> And yes, UB is a very well known accronym in this context. In my experience, 
> people who don't know the accronym don't know the concept either, so spelling 
> it out wouldn't even help.

There is a difference between not knowing a concept and not knowing what
concept is referenced.

"Undefined behavior" can be googled, and can to some extend be understood
from the meaning of the words by people with a math background.

googling "UB" results in "Universitätsbibliothek Wien" here and nothing on
the first page of links is related to "undefined"

So i do think, in case of a policy or some coding rules it should be spelled out

thx

[...]
Rémi Denis-Courmont Sept. 3, 2023, 8:29 a.m. UTC | #11
Le lauantaina 2. syyskuuta 2023, 23.03.32 EEST Michael Niedermayer a écrit :
> > And yes, UB is a very well known accronym in this context. In my
> > experience, people who don't know the accronym don't know the concept
> > either, so spelling it out wouldn't even help.
> 
> "Undefined behavior" can be googled, and can to some extend be understood
> from the meaning of the words by people with a math background.

And? So can "C language ub" or "programming ub". I would think that adult 
software engineers know that you need to scope when searching a short 
accronym.

Either way, first result on Google is the Wikipedia page starting with "In 
computer programming, undefined behavior (UB)" (exact quote)...
Michael Niedermayer Sept. 3, 2023, 4:21 p.m. UTC | #12
On Sun, Sep 03, 2023 at 11:29:32AM +0300, Rémi Denis-Courmont wrote:
> Le lauantaina 2. syyskuuta 2023, 23.03.32 EEST Michael Niedermayer a écrit :
> > > And yes, UB is a very well known accronym in this context. In my
> > > experience, people who don't know the accronym don't know the concept
> > > either, so spelling it out wouldn't even help.
> > 
> > "Undefined behavior" can be googled, and can to some extend be understood
> > from the meaning of the words by people with a math background.
> 
> And? So can "C language ub" or "programming ub". I would think that adult 
> software engineers know that you need to scope when searching a short 
> accronym.

Its a policy not a puzzle
also are all software engneers here adults ?

thx

[...]
diff mbox series

Patch

diff --git a/doc/developer.texi b/doc/developer.texi
index df43119f98..afa0148137 100644
--- a/doc/developer.texi
+++ b/doc/developer.texi
@@ -274,10 +274,6 @@  symbols. If in doubt, just avoid names starting with @code{_} altogether.
 @section Miscellaneous conventions
 
 @itemize @bullet
-@item
-fprintf and printf are forbidden in libavformat and libavcodec,
-please use av_log() instead.
-
 @item
 Casts should be used only when necessary. Unneeded parentheses
 should also be avoided if they don't make the code easier to understand.
@@ -286,6 +282,42 @@  should also be avoided if they don't make the code easier to understand.
 @anchor{Development Policy}
 @chapter Development Policy
 
+@section Code behaviour
+
+@subheading Correctness
+The code must be valid. It must not crash, abort, access invalid pointers, leak
+memory, cause data races or signed integer overflow, or otherwise invoke
+undefined behaviour. Error codes should be checked and, when applicable,
+forwarded to the caller.
+
+@subheading Thread- and library-safety
+Our libraries may be called by multiple independent callers in the same process.
+These calls may happen from any number of threads and the different call sites
+may not be aware of each other - e.g. a user program may be calling us directly,
+and use one or more libraries that also call us. The code must behave correctly
+under such conditions.
+
+@subheading Robustness
+The code must treat as untrusted any bytestream received from a caller or read
+from a file, network, etc. It must not misbehave when arbitrary data is sent to
+it - typically it should print an error message and return
+@code{AVERROR_INVALIDDATA} on encountering invalid input data.
+
+@subheading Memory allocation
+The code must use the @code{av_malloc()} family of functions from
+@file{libavutil/mem.h} to perform all memory allocation, except in special cases
+(e.g. when interacting with an external library that requires a specific
+allocator to be used).
+
+All allocations should be checked and @code{AVERROR(ENOMEM)} returned on
+failure. A common mistake is that error paths leak memory - make sure that does
+not happen.
+
+@subheading stdio
+Our libraries must not access the stdio streams stdin/stdout/stderr directly
+(e.g. via @code{printf()} family of functions), as that is not library-safe. For
+logging, use @code{av_log()}.
+
 @section Patches/Committing
 @subheading Licenses for patches must be compatible with FFmpeg.
 Contributions should be licensed under the
@@ -395,11 +427,6 @@  If it is a bug, the bug has to be fixed. If it is not, the code should
 be changed to not generate a warning unless that causes a slowdown
 or obfuscates the code.
 
-@subheading Check untrusted input properly.
-Never write to unallocated memory, never write over the end of arrays,
-always check values read from some untrusted source before using them
-as array index or other risky things.
-
 @section Library public interfaces
 Every library in FFmpeg provides a set of public APIs in its installed headers,
 which are those listed in the variable @code{HEADERS} in that library's
@@ -811,11 +838,6 @@  an explanation why to your patchset, its ok to not test if theres a reason.
 @item
 If you added YASM code please check that things still work with --disable-yasm.
 
-@item
-Make sure you check the return values of function and return appropriate
-error codes. Especially memory allocation functions like @code{av_malloc()}
-are notoriously left unchecked, which is a serious problem.
-
 @item
 Test your code with valgrind and or Address Sanitizer to ensure it's free
 of leaks, out of array accesses, etc.