Message ID | 20200731084851.2546-1-george@nsup.org |
---|---|
State | Superseded |
Headers | show |
Series | [FFmpeg-devel] doc/developer: origin of tables should be documented. | expand |
Context | Check | Description |
---|---|---|
andriy/default | pending | |
andriy/make | success | Make finished |
andriy/make_fate | success | Make fate finished |
Jul 31, 2020, 10:48 by george@nsup.org: > Tables that were not just written by the code author are > not actually source code, otherwise, > "recode data..x1 < proprietary.o > source.c" > would be enough to launder a proprietary blob into > the source code. > > Documenting the origin of the tables or the methods > for their generation is necessary to let other developers > take over if the original author is no longer available. > I disagree. We don't have any proprietary blobs in our codebase. Everything that you think is binary is in fact a well defined VLC table that simply describes bit positions and lengths. Neither are quantization tables. They're again, well defined series of lookup tables that always follow a simple pattern, be it linear or some exponent. Forcing developers to meticulously describe standard bitstream readers or copying entire chunks of specifications as comments just because you don't understand them, or want to justify your own personal war, or are paranoid of binary blobs is not acceptable. In the first place no one would commit actual binary blobs (read: CPU-specific bytecode or a series of bytes needed to initialize a device) to the codebase. And if developers were forced to copy chunks of spec as comments into the codebase you'll end up with licensed content just sitting in the codebase, making us liable for lawsuits.
Lynne (12020-07-31): > I disagree. We don't have any proprietary blobs in our codebase. > Everything that you think is binary is in fact a well defined VLC > table that simply describes bit positions and lengths. Please read my message more carefully (and possibly with less hostility): I never said that the tables we have right now are proprietary blobs. What I am saying is that accepting them in new code creates a loophole that somebody with bad intentions could exploit in the future. One of my reasons for insisting on this is to close that loophole. > Neither are quantization tables. They're again, well defined series of > lookup tables that always follow a simple pattern, be it linear or > some exponent. Then all I am asking is: /* Geometric progression with ratio 42. */ Or, if you used a one-liner in whatever language of your choice, just copy-paste it into the comment: /* Zsh: for i in {0..11}; printf "%8x\\n" $[440*2**(22+i/12.)] */ That is all I am asking. Is it really too much to ask? One line of comment! > Forcing developers to meticulously describe standard bitstream readers > or copying entire chunks of specifications as comments just because > you don't understand them, If they are standard, just name them or put a Wikipedia link. > or want to justify your own personal war, or are paranoid of binary > blobs is not acceptable. Please refrain from personal attacks. The only thing I am at "war" against is bad code by Libre Software standards. Libre Software is not just about churning out code that does the work; that's for proprietary software and its poor cousin Open Source. Libre Software is also about building collective knowledge. Libre Software is almost as much teaching as it is coding, so that other developers can extend it rather than just using it. > In the first place no one would commit actual binary blobs (read: > CPU-specific bytecode or a series of bytes needed to initialize a > device) to the codebase. I hope so. I am not convinced some company would not try to. > And if developers were forced to copy chunks of spec as comments into > the codebase you'll end up with licensed content just sitting in the > codebase, making us liable for lawsuits. As I said, if the explanations already exist elsewhere, just a link or a bibliography reference is enough. No need to copy chunks, I never talked about is and do not know why you think I did. Regards,
> /* Geometric progression with ratio 42. */ > > Or, if you used a one-liner in whatever language of your choice, just > copy-paste it into the comment: > > /* Zsh: for i in {0..11}; printf "%8x\\n" $[440*2**(22+i/12.)] */ > This is completely unrealistic for reverse engineered tables such as VLCs which have no mathematical basis. Kieran
Kieran Kunhya (12020-07-31): > > /* Geometric progression with ratio 42. */ > > > > Or, if you used a one-liner in whatever language of your choice, just > > copy-paste it into the comment: > > > > /* Zsh: for i in {0..11}; printf "%8x\\n" $[440*2**(22+i/12.)] */ > > This is completely unrealistic for reverse engineered tables such as VLCs > which have no mathematical basis. Duh. For reverse-engineered codecs, I expect a comment with at least the word "reverse-engineered". Something like this would be acceptable: /* Reverse-engineered by encoding various files with the reference encoder. */ Reluctance to give this little information is baffling to me. Regards,
Nicolas George (12020-07-31): > Something like this would be acceptable: > > /* Reverse-engineered by encoding various files with the reference > encoder. */ > > Reluctance to give this little information is baffling to me. Can other developers give their input on this? Is it acceptable or too much to require at least short comments like these ones before tables of numbers in the source code? /* Copied from https://oeis.org/A000055 */ /* Computed with =IF(MOD(A1;2)=0;A1/2;3*A1+1) */ /* Adjusted to match decoding of samples/$dir/*.xxx with screenshots from the game. */ They are IMHO necessary to let other people learn from the source code, which is the difference between Libre Software and non-Libre Software. They may probably be useful for the author themself if they come back much later. Regards,
> Can other developers give their input on this?
I am not currently a FFmpeg developer, so my opinion might
be less relevant than what others have stated.
I think giving a source for things that are not inherently
obvious, but are part of the code is a good idea. Even if
some tables are well explained in some standards or some
values are in their very nature a "magic number", knowing
which standard the table was taken from, or knowing if a
certain value *is* a magic number gives other devs the
information to research and answer their questions
themselves, with little work for the original author.
Some more documentation, be it about source code or
usage may also decrease the rate of questions on
the mailing lists.
Forcing people to source everything leads to useless
sources, but if it's not clear whether or not a piece
of code needs further explanation, I think verbosity
is the better idea.
> On Aug 3, 2020, at 5:54 PM, Nicolas George <george@nsup.org> wrote: > > Nicolas George (12020-07-31): >> Something like this would be acceptable: >> >> /* Reverse-engineered by encoding various files with the reference >> encoder. */ >> >> Reluctance to give this little information is baffling to me. > > Can other developers give their input on this? > > Is it acceptable or too much to require at least short comments like > these ones before tables of numbers in the source code? > > /* Copied from https://oeis.org/A000055 */ > > /* Computed with =IF(MOD(A1;2)=0;A1/2;3*A1+1) */ > > /* Adjusted to match decoding of samples/$dir/*.xxx with screenshots > from the game. */ > > They are IMHO necessary to let other people learn from the source code, > which is the difference between Libre Software and non-Libre Software. > They may probably be useful for the author themself if they come back > much later. Agree. And I like it extends to approximation/non-standard algorithms, although it'd hard to define "non-standard". > > Regards, > > -- > Nicolas George > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Hi Nicolas! On 2020-08-03 11:54 +0200, Nicolas George wrote: > Nicolas George (12020-07-31): > > Something like this would be acceptable: > > > > /* Reverse-engineered by encoding various files with the reference > > encoder. */ > > > > Reluctance to give this little information is baffling to me. > > Can other developers give their input on this? > > Is it acceptable or too much to require at least short comments like > these ones before tables of numbers in the source code? > > /* Copied from https://oeis.org/A000055 */ > > /* Computed with =IF(MOD(A1;2)=0;A1/2;3*A1+1) */ > > /* Adjusted to match decoding of samples/$dir/*.xxx with screenshots > from the game. */ > > They are IMHO necessary to let other people learn from the source code, > which is the difference between Libre Software and non-Libre Software. > They may probably be useful for the author themself if they come back > much later. Not sure I agree with your definition of Libre Software and your exact wording of the patch. I agree that leaving a hint of where the data comes from, directly where the data is defined in the source code, is a good thing and probably not to much to ask for. Alexander
Unacceptable as it contains multiple typos. On 7/31/20, Nicolas George <george@nsup.org> wrote: > Tables that were not just written by the code author are > not actually source code, otherwise, > "recode data..x1 < proprietary.o > source.c" > would be enough to launder a proprietary blob into > the source code. > > Documenting the origin of the tables or the methods > for their generation is necessary to let other developers > take over if the original author is no longer available. > > Signed-off-by: Nicolas George <george@nsup.org> > --- > doc/developer.texi | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/doc/developer.texi b/doc/developer.texi > index b33cab0fc7..5c698fe9c5 100644 > --- a/doc/developer.texi > +++ b/doc/developer.texi > @@ -216,6 +216,14 @@ please use av_log() instead. > @item > Casts should be used only when necessary. Unneeded parentheses > should also be avoided if they don't make the code easier to understand. > + > +@item > +If the code contains tables of numbers or other data, their origin should > be > +documented in a comment, so that other developers can rebuild them if > +necessary. If they were taken from a reference, include the URL of that > +reference. If they were computed by a tool, include the code of the tool. > +If they were reverse-engineerd, include an honest attempt at explaining the > +methods used. > @end itemize > > @section Editor configuration > -- > 2.27.0 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff --git a/doc/developer.texi b/doc/developer.texi index b33cab0fc7..5c698fe9c5 100644 --- a/doc/developer.texi +++ b/doc/developer.texi @@ -216,6 +216,14 @@ please use av_log() instead. @item Casts should be used only when necessary. Unneeded parentheses should also be avoided if they don't make the code easier to understand. + +@item +If the code contains tables of numbers or other data, their origin should be +documented in a comment, so that other developers can rebuild them if +necessary. If they were taken from a reference, include the URL of that +reference. If they were computed by a tool, include the code of the tool. +If they were reverse-engineerd, include an honest attempt at explaining the +methods used. @end itemize @section Editor configuration
Tables that were not just written by the code author are not actually source code, otherwise, "recode data..x1 < proprietary.o > source.c" would be enough to launder a proprietary blob into the source code. Documenting the origin of the tables or the methods for their generation is necessary to let other developers take over if the original author is no longer available. Signed-off-by: Nicolas George <george@nsup.org> --- doc/developer.texi | 8 ++++++++ 1 file changed, 8 insertions(+)