diff mbox

[FFmpeg-devel] doc: Add initial documentation explaining undefined behavior and SUINT

Message ID 20170715175727.11060-1-michael@niedermayer.cc
State New
Headers show

Commit Message

Michael Niedermayer July 15, 2017, 5:57 p.m. UTC
Requested-by: Kieran Kunhya <kierank@obe.tv>

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 doc/undefined.txt | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)
 create mode 100644 doc/undefined.txt

Comments

Ricardo Constantino July 15, 2017, 6:16 p.m. UTC | #1
On 15 July 2017 at 18:57, Michael Niedermayer <michael@niedermayer.cc> wrote:
> Requested-by: Kieran Kunhya <kierank@obe.tv>
>
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  doc/undefined.txt | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>  create mode 100644 doc/undefined.txt
>
> diff --git a/doc/undefined.txt b/doc/undefined.txt
> new file mode 100644
> index 0000000000..957db3e2a9
> --- /dev/null
> +++ b/doc/undefined.txt
> @@ -0,0 +1,47 @@
> +Undefined Behavior
> +------------------
> +In the C language, some operations are undefined, like signed integer overflow
Missing a comma at the end.

> +dereferencing freed pointers, accessing outside allocated space, ...
> +
> +Undefined Behavior must not occur in a C program, it is not safe even if the
> +output of undefined operations is unused. The unsaftey may seem nit picking
unsafety

> +but Optimizing compilers have in fact optimized code on the assumtation that
assumption

> +no undefined Behavior occurs.
> +Optimizing code based on wrong assumtations can and has in some cases lead to
assumptions

> +effects beyond the output of computations.
> +
> +
> +The signed integer overflow problem in speed critical code
> +----------------------------------------------------------
> +Code which is highly optimized and works with signed integers sometimes has the
> +problem that some (invalid) inputs can trigger overflows (undefined behavior).
> +In these cases, often the output of the computation does not matter (as it is
> +from invalid input).
> +In some cases the input can be checked easily in others checking the input is
> +computationally too intensive.
> +In these remaining cases a unsigned type can be used instead of a signed type.
> +unsigned overflows are defined in C.
> +
> +SUINT
> +-----
> +As we have above established there is a need to use "unsigned" sometimes in
> +computations which work with signed integers (which overflow).
> +Using "unsigned" for signed integers has the very significant potential to
> +cause confusion
> +as in
> +unsigned a,b,c;
> +...
> +a+b*c;
> +The reader does not expect b to be semantically -5 here and if the code is
> +changed by maybe adding a cast, a division or other the signeness will almost
signedness

> +certainly be mistaken.
> +To avoid this confusion a new type was introduced, "SUINT" is the C "unsigned"
> +type but it holds a signed "int".
> +to use the same example
> +SUINT a,b,c;
> +...
> +a+b*c;
> +here the reader knows that a,b,c are meant to be signed integers but for C
> +standard compliance / to avoid undefined behavior they are stored in unsigned
> +ints.
> +
> --
> 2.13.0
James Almer July 15, 2017, 11:55 p.m. UTC | #2
On 7/15/2017 2:57 PM, Michael Niedermayer wrote:
> Requested-by: Kieran Kunhya <kierank@obe.tv>
> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  doc/undefined.txt | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>  create mode 100644 doc/undefined.txt
> 
> diff --git a/doc/undefined.txt b/doc/undefined.txt
> new file mode 100644
> index 0000000000..957db3e2a9
> --- /dev/null
> +++ b/doc/undefined.txt
> @@ -0,0 +1,47 @@
> +Undefined Behavior
> +------------------
> +In the C language, some operations are undefined, like signed integer overflow
> +dereferencing freed pointers, accessing outside allocated space, ...
> +
> +Undefined Behavior must not occur in a C program, it is not safe even if the
> +output of undefined operations is unused. The unsaftey may seem nit picking> +but Optimizing compilers have in fact optimized code on the
assumtation that

Assumption.

> +no undefined Behavior occurs.
> +Optimizing code based on wrong assumtations can and has in some cases lead to

Assumptions.

> +effects beyond the output of computations.
> +
> +
> +The signed integer overflow problem in speed critical code
> +----------------------------------------------------------
> +Code which is highly optimized and works with signed integers sometimes has the
> +problem that some (invalid) inputs can trigger overflows (undefined behavior).
> +In these cases, often the output of the computation does not matter (as it is
> +from invalid input).
> +In some cases the input can be checked easily in others checking the input is
> +computationally too intensive.
> +In these remaining cases a unsigned type can be used instead of a signed type.
> +unsigned overflows are defined in C.
> +
> +SUINT
> +-----
> +As we have above established there is a need to use "unsigned" sometimes in
> +computations which work with signed integers (which overflow).
> +Using "unsigned" for signed integers has the very significant potential to
> +cause confusion
> +as in
> +unsigned a,b,c;
> +...
> +a+b*c;
> +The reader does not expect b to be semantically -5 here and if the code is
> +changed by maybe adding a cast, a division or other the signeness will almost
> +certainly be mistaken.
> +To avoid this confusion a new type was introduced, "SUINT" is the C "unsigned"
> +type but it holds a signed "int".
> +to use the same example
> +SUINT a,b,c;
> +...
> +a+b*c;
> +here the reader knows that a,b,c are meant to be signed integers but for C
> +standard compliance / to avoid undefined behavior they are stored in unsigned
> +ints.
> +
>
Michael Niedermayer July 21, 2017, 2:08 p.m. UTC | #3
On Sat, Jul 15, 2017 at 07:57:27PM +0200, Michael Niedermayer wrote:
> Requested-by: Kieran Kunhya <kierank@obe.tv>
> 
> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  doc/undefined.txt | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>  create mode 100644 doc/undefined.txt

typos found by everyone fixed, i hope i have no missed any

applied

Thanks

[...]
diff mbox

Patch

diff --git a/doc/undefined.txt b/doc/undefined.txt
new file mode 100644
index 0000000000..957db3e2a9
--- /dev/null
+++ b/doc/undefined.txt
@@ -0,0 +1,47 @@ 
+Undefined Behavior
+------------------
+In the C language, some operations are undefined, like signed integer overflow
+dereferencing freed pointers, accessing outside allocated space, ...
+
+Undefined Behavior must not occur in a C program, it is not safe even if the
+output of undefined operations is unused. The unsaftey may seem nit picking
+but Optimizing compilers have in fact optimized code on the assumtation that
+no undefined Behavior occurs.
+Optimizing code based on wrong assumtations can and has in some cases lead to
+effects beyond the output of computations.
+
+
+The signed integer overflow problem in speed critical code
+----------------------------------------------------------
+Code which is highly optimized and works with signed integers sometimes has the
+problem that some (invalid) inputs can trigger overflows (undefined behavior).
+In these cases, often the output of the computation does not matter (as it is
+from invalid input).
+In some cases the input can be checked easily in others checking the input is
+computationally too intensive.
+In these remaining cases a unsigned type can be used instead of a signed type.
+unsigned overflows are defined in C.
+
+SUINT
+-----
+As we have above established there is a need to use "unsigned" sometimes in
+computations which work with signed integers (which overflow).
+Using "unsigned" for signed integers has the very significant potential to
+cause confusion
+as in
+unsigned a,b,c;
+...
+a+b*c;
+The reader does not expect b to be semantically -5 here and if the code is
+changed by maybe adding a cast, a division or other the signeness will almost
+certainly be mistaken.
+To avoid this confusion a new type was introduced, "SUINT" is the C "unsigned"
+type but it holds a signed "int".
+to use the same example
+SUINT a,b,c;
+...
+a+b*c;
+here the reader knows that a,b,c are meant to be signed integers but for C
+standard compliance / to avoid undefined behavior they are stored in unsigned
+ints.
+