diff mbox series

[FFmpeg-devel] doc/developer: origin of tables should be documented.

Message ID 20200731084851.2546-1-george@nsup.org
State Superseded
Headers show
Series [FFmpeg-devel] doc/developer: origin of tables should be documented. | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Nicolas George July 31, 2020, 8:48 a.m. UTC
Tables that were not just written by the code author are
not actually source code, otherwise,
"recode data..x1 < proprietary.o > source.c"
would be enough to launder a proprietary blob into
the source code.

Documenting the origin of the tables or the methods
for their generation is necessary to let other developers
take over if the original author is no longer available.

Signed-off-by: Nicolas George <george@nsup.org>
---
 doc/developer.texi | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Lynne July 31, 2020, 9:20 a.m. UTC | #1
Jul 31, 2020, 10:48 by george@nsup.org:

> Tables that were not just written by the code author are
> not actually source code, otherwise,
> "recode data..x1 < proprietary.o > source.c"
> would be enough to launder a proprietary blob into
> the source code.
>
> Documenting the origin of the tables or the methods
> for their generation is necessary to let other developers
> take over if the original author is no longer available.
>

I disagree. We don't have any proprietary blobs in our codebase. Everything
that you think is binary is in fact a well defined VLC table that simply describes
bit positions and lengths.
Neither are quantization tables. They're again, well defined series of lookup
tables that always follow a simple pattern, be it linear or some exponent.
Forcing developers to meticulously describe standard bitstream readers or copying
entire chunks of specifications as comments just because you don't understand them,
or want to justify your own personal war, or are paranoid of binary blobs is not acceptable.
In the first place no one would commit actual binary blobs (read: CPU-specific bytecode or
a series of bytes needed to initialize a device) to the codebase. And if developers
were forced to copy chunks of spec as comments into the codebase you'll end up with
licensed content just sitting in the codebase, making us liable for lawsuits.
Nicolas George July 31, 2020, 9:50 a.m. UTC | #2
Lynne (12020-07-31):
> I disagree. We don't have any proprietary blobs in our codebase.
> Everything that you think is binary is in fact a well defined VLC
> table that simply describes bit positions and lengths.

Please read my message more carefully (and possibly with less
hostility): I never said that the tables we have right now are
proprietary blobs.

What I am saying is that accepting them in new code creates a loophole
that somebody with bad intentions could exploit in the future. One of my
reasons for insisting on this is to close that loophole.

> Neither are quantization tables. They're again, well defined series of
> lookup tables that always follow a simple pattern, be it linear or
> some exponent.

Then all I am asking is:

/* Geometric progression with ratio 42. */

Or, if you used a one-liner in whatever language of your choice, just
copy-paste it into the comment:

/* Zsh: for i in {0..11}; printf "%8x\\n" $[440*2**(22+i/12.)] */

That is all I am asking. Is it really too much to ask? One line of
comment!

> Forcing developers to meticulously describe standard bitstream readers
> or copying entire chunks of specifications as comments just because
> you don't understand them,

If they are standard, just name them or put a Wikipedia link.

> or want to justify your own personal war, or are paranoid of binary
> blobs is not acceptable.

Please refrain from personal attacks. The only thing I am at "war"
against is bad code by Libre Software standards.

Libre Software is not just about churning out code that does the work;
that's for proprietary software and its poor cousin Open Source. Libre
Software is also about building collective knowledge. Libre Software is
almost as much teaching as it is coding, so that other developers can
extend it rather than just using it.

> In the first place no one would commit actual binary blobs (read:
> CPU-specific bytecode or a series of bytes needed to initialize a
> device) to the codebase.

I hope so. I am not convinced some company would not try to.

> And if developers were forced to copy chunks of spec as comments into
> the codebase you'll end up with licensed content just sitting in the
> codebase, making us liable for lawsuits.

As I said, if the explanations already exist elsewhere, just a link or a
bibliography reference is enough. No need to copy chunks, I never talked
about is and do not know why you think I did.

Regards,
Kieran Kunhya July 31, 2020, 12:56 p.m. UTC | #3
> /* Geometric progression with ratio 42. */
>
> Or, if you used a one-liner in whatever language of your choice, just
> copy-paste it into the comment:
>
> /* Zsh: for i in {0..11}; printf "%8x\\n" $[440*2**(22+i/12.)] */
>

This is completely unrealistic for reverse engineered tables such as VLCs
which have no mathematical basis.

Kieran
Nicolas George July 31, 2020, 2:54 p.m. UTC | #4
Kieran Kunhya (12020-07-31):
> > /* Geometric progression with ratio 42. */
> >
> > Or, if you used a one-liner in whatever language of your choice, just
> > copy-paste it into the comment:
> >
> > /* Zsh: for i in {0..11}; printf "%8x\\n" $[440*2**(22+i/12.)] */
> 
> This is completely unrealistic for reverse engineered tables such as VLCs
> which have no mathematical basis.

Duh. For reverse-engineered codecs, I expect a comment with at least the
word "reverse-engineered".

Something like this would be acceptable:

/* Reverse-engineered by encoding various files with the reference
   encoder. */

Reluctance to give this little information is baffling to me.

Regards,
Nicolas George Aug. 3, 2020, 9:54 a.m. UTC | #5
Nicolas George (12020-07-31):
> Something like this would be acceptable:
> 
> /* Reverse-engineered by encoding various files with the reference
>    encoder. */
> 
> Reluctance to give this little information is baffling to me.

Can other developers give their input on this?

Is it acceptable or too much to require at least short comments like
these ones before tables of numbers in the source code?

/* Copied from https://oeis.org/A000055 */

/* Computed with =IF(MOD(A1;2)=0;A1/2;3*A1+1) */

/* Adjusted to match decoding of samples/$dir/*.xxx with screenshots
   from the game. */

They are IMHO necessary to let other people learn from the source code,
which is the difference between Libre Software and non-Libre Software.
They may probably be useful for the author themself if they come back
much later.

Regards,
Valentin Schweitzer Aug. 3, 2020, 2:45 p.m. UTC | #6
> Can other developers give their input on this?

I am not currently a FFmpeg developer, so my opinion might
be less relevant than what others have stated.

I think giving a source for things that are not inherently
obvious, but are part of the code is a good idea. Even if
some tables are well explained in some standards or some
values are in their very nature a "magic number", knowing
which standard the table was taken from, or knowing if a
certain value *is* a magic number gives other devs the
information to research and answer their questions
themselves, with little work for the original author.
Some more documentation, be it about source code or
usage may also decrease the rate of questions on
the mailing lists.

Forcing people to source everything leads to useless
sources, but if it's not clear whether or not a piece
of code needs further explanation, I think verbosity
is the better idea.
Zhao Zhili Aug. 3, 2020, 4:56 p.m. UTC | #7
> On Aug 3, 2020, at 5:54 PM, Nicolas George <george@nsup.org> wrote:
> 
> Nicolas George (12020-07-31):
>> Something like this would be acceptable:
>> 
>> /* Reverse-engineered by encoding various files with the reference
>>   encoder. */
>> 
>> Reluctance to give this little information is baffling to me.
> 
> Can other developers give their input on this?
> 
> Is it acceptable or too much to require at least short comments like
> these ones before tables of numbers in the source code?
> 
> /* Copied from https://oeis.org/A000055 */
> 
> /* Computed with =IF(MOD(A1;2)=0;A1/2;3*A1+1) */
> 
> /* Adjusted to match decoding of samples/$dir/*.xxx with screenshots
>   from the game. */
> 
> They are IMHO necessary to let other people learn from the source code,
> which is the difference between Libre Software and non-Libre Software.
> They may probably be useful for the author themself if they come back
> much later.

Agree. And I like it extends to approximation/non-standard algorithms, although
it'd hard to define "non-standard".

> 
> Regards,
> 
> -- 
>  Nicolas George
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Alexander Strasser Aug. 7, 2020, 9:58 p.m. UTC | #8
Hi Nicolas!

On 2020-08-03 11:54 +0200, Nicolas George wrote:
> Nicolas George (12020-07-31):
> > Something like this would be acceptable:
> >
> > /* Reverse-engineered by encoding various files with the reference
> >    encoder. */
> >
> > Reluctance to give this little information is baffling to me.
>
> Can other developers give their input on this?
>
> Is it acceptable or too much to require at least short comments like
> these ones before tables of numbers in the source code?
>
> /* Copied from https://oeis.org/A000055 */
>
> /* Computed with =IF(MOD(A1;2)=0;A1/2;3*A1+1) */
>
> /* Adjusted to match decoding of samples/$dir/*.xxx with screenshots
>    from the game. */
>
> They are IMHO necessary to let other people learn from the source code,
> which is the difference between Libre Software and non-Libre Software.
> They may probably be useful for the author themself if they come back
> much later.

Not sure I agree with your definition of Libre Software and your exact
wording of the patch.

I agree that leaving a hint of where the data comes from, directly where
the data is defined in the source code, is a good thing and probably not
to much to ask for.


  Alexander
Paul B Mahol Aug. 8, 2020, 4:22 p.m. UTC | #9
Unacceptable as it contains multiple typos.

On 7/31/20, Nicolas George <george@nsup.org> wrote:
> Tables that were not just written by the code author are
> not actually source code, otherwise,
> "recode data..x1 < proprietary.o > source.c"
> would be enough to launder a proprietary blob into
> the source code.
>
> Documenting the origin of the tables or the methods
> for their generation is necessary to let other developers
> take over if the original author is no longer available.
>
> Signed-off-by: Nicolas George <george@nsup.org>
> ---
>  doc/developer.texi | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/doc/developer.texi b/doc/developer.texi
> index b33cab0fc7..5c698fe9c5 100644
> --- a/doc/developer.texi
> +++ b/doc/developer.texi
> @@ -216,6 +216,14 @@ please use av_log() instead.
>  @item
>  Casts should be used only when necessary. Unneeded parentheses
>  should also be avoided if they don't make the code easier to understand.
> +
> +@item
> +If the code contains tables of numbers or other data, their origin should
> be
> +documented in a comment, so that other developers can rebuild them if
> +necessary. If they were taken from a reference, include the URL of that
> +reference. If they were computed by a tool, include the code of the tool.
> +If they were reverse-engineerd, include an honest attempt at explaining the
> +methods used.
>  @end itemize
>
>  @section Editor configuration
> --
> 2.27.0
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox series

Patch

diff --git a/doc/developer.texi b/doc/developer.texi
index b33cab0fc7..5c698fe9c5 100644
--- a/doc/developer.texi
+++ b/doc/developer.texi
@@ -216,6 +216,14 @@  please use av_log() instead.
 @item
 Casts should be used only when necessary. Unneeded parentheses
 should also be avoided if they don't make the code easier to understand.
+
+@item
+If the code contains tables of numbers or other data, their origin should be
+documented in a comment, so that other developers can rebuild them if
+necessary. If they were taken from a reference, include the URL of that
+reference. If they were computed by a tool, include the code of the tool.
+If they were reverse-engineerd, include an honest attempt at explaining the
+methods used.
 @end itemize
 
 @section Editor configuration