Message ID | 20190428011818.GB1955@kappa.local |
---|---|
State | New |
Headers | show |
> What do you think about using awk instead of shell?
No objection here, especially if it's more robust in some ways than this
or other shell code (though personally I'm not fluent in awk).
My only concern was preventing a considerable performance impact which could
otherwise be avoided relatively easily, and with which I thought I could help.
> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of > Alexander Strasser > Sent: Sunday, April 28, 2019 9:18 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > decoder/encoder/filter/... names in alphabet order > > > What do you think about using awk instead of shell? > > I have 2 POC patches attached. It's probably not 100% correct yet, > but it kind of demonstrates what it would look like. > > The main advantage of using awk, is that it's more elegant and > shorter. It's probably also less risky, because it's more isolated, > e.g. as it was explained by avih, there is no widely supported way > for locals across shells. We already use awk in configure for > mandatory functions, so it's no new dependency. > > Please comment on the awk approach. I did some change to make it correct on ubuntu 16.04, but looks issue on windows mingw. print_in_columns() { sort | tr '\n' ' ' | awk -v width="$ncols" ' { #add int() num_cols = int(width / 24); num_rows = int((NF + num_cols-1) / num_cols); y = x = 1; for (i = 1; i <= NF; i++) { if (y > num_rows) { y = 1; x++; } d[x,y] = $i; y++; } print width print num_cols print num_rows print NF for (y = 1; y <= num_rows; y++) { for (x = 1; x <= num_cols; x++) { # it does not work for the case that the last column is not fully filled #line = sprintf(x != num_cols ? "%s%-24s" : "%s%s", line, d[x,y]); line = sprintf("%s%-24s", line, d[x,y]); } print line; line = ""; } }' | sed 's/ *$//' } on ubuntu 16.04, the output is: Enabled bsfs: 135 5 7 33 aac_adtstoasc extract_extradata hevc_mp4toannexb mpeg4_unpack_bframes truehd_core av1_frame_split filter_units imx_dump_header noise vp9_metadata av1_metadata h264_metadata mjpeg2jpeg null vp9_raw_reorder chomp h264_mp4toannexb mjpega_dump_header prores_metadata vp9_superframe dca_core h264_redundant_pps mov2textsub remove_extradata vp9_superframe_split dump_extradata hapqa_extract mp3_header_decompress text2movsub eac3_core hevc_metadata mpeg2_metadata trace_headers while on windows, the output is: Enabled bsfs: 72 3 11 33 noiseh264_redundant_pps nullextract prores_metadata remove_extradatamp4toannexb text2movsubdump_header trace_headers truehd_corepega_dump_header vp9_metadata vp9_raw_reorderader_decompress vp9_superframea vp9_superframe_splitames I did a quick check, but has not found the root cause yet. > > I'm not against the shell way, or a mixed approach, but before going > either way and pushing I would rather have some more testing; > especially on more exotic platforms. > > > Thank you > Alexander
On 2019-04-28 03:11 +0000, avih wrote: > > What do you think about using awk instead of shell? > > No objection here, especially if it's more robust in some ways than this > or other shell code (though personally I'm not fluent in awk). > > My only concern was preventing a considerable performance impact which could > otherwise be avoided relatively easily, and with which I thought I could help. Yes, thank you. Very much appreciated! Alexander
On 2019-04-28 07:42 +0000, Guo, Yejun wrote: > > -----Original Message----- > > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of > > Alexander Strasser > > Sent: Sunday, April 28, 2019 9:18 AM > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > > decoder/encoder/filter/... names in alphabet order > > > > > > What do you think about using awk instead of shell? > > > > I have 2 POC patches attached. It's probably not 100% correct yet, > > but it kind of demonstrates what it would look like. > > > > The main advantage of using awk, is that it's more elegant and > > shorter. It's probably also less risky, because it's more isolated, > > e.g. as it was explained by avih, there is no widely supported way > > for locals across shells. We already use awk in configure for > > mandatory functions, so it's no new dependency. > > > > Please comment on the awk approach. > > I did some change to make it correct on ubuntu 16.04, but looks issue on windows mingw. > > print_in_columns() { > sort | tr '\n' ' ' | awk -v width="$ncols" ' > { > #add int() > num_cols = int(width / 24); num_rows = int((NF + num_cols-1) / num_cols); The added int() calls looks right. > y = x = 1; > for (i = 1; i <= NF; i++) { > if (y > num_rows) { > y = 1; x++; > } > d[x,y] = $i; > y++; > } > > print width > print num_cols > print num_rows > print NF > > for (y = 1; y <= num_rows; y++) { > for (x = 1; x <= num_cols; x++) { > # it does not work for the case that the last column is not fully filled > #line = sprintf(x != num_cols ? "%s%-24s" : "%s%s", line, d[x,y]); > line = sprintf("%s%-24s", line, d[x,y]); You mean it doesn't work in the way that there will be trailing spaces in the column before the last one which is partially empty, right? But the output looked right and there were never trailing spaces in the last column, right? BTW I changed the algorithm here, because I thought it might be easier to unterstand. I will also try the algorithm, you implemented in shell. > } > print line; > line = ""; > } > }' | sed 's/ *$//' > } > > on ubuntu 16.04, the output is: > Enabled bsfs: > 135 > 5 > 7 > 33 > aac_adtstoasc extract_extradata hevc_mp4toannexb mpeg4_unpack_bframes truehd_core > av1_frame_split filter_units imx_dump_header noise vp9_metadata > av1_metadata h264_metadata mjpeg2jpeg null vp9_raw_reorder > chomp h264_mp4toannexb mjpega_dump_header prores_metadata vp9_superframe > dca_core h264_redundant_pps mov2textsub remove_extradata vp9_superframe_split > dump_extradata hapqa_extract mp3_header_decompress text2movsub > eac3_core hevc_metadata mpeg2_metadata trace_headers > > while on windows, the output is: > Enabled bsfs: > 72 > 3 > 11 > 33 > noiseh264_redundant_pps > nullextract > prores_metadata > remove_extradatamp4toannexb > text2movsubdump_header > trace_headers > truehd_corepega_dump_header > vp9_metadata > vp9_raw_reorderader_decompress > vp9_superframea > vp9_superframe_splitames Wild guess: CR LF line endings are emitted somewhere and the CRs stay in the input. Your terminal resets the cursor to the start of the line when interpreting the midline CRs. Does it work if you extend the tr in print_in_columns like below? sort | tr '\r\n' ' ' | awk -v width="$ncols" ' > I did a quick check, but has not found the root cause yet. > > > > > > I'm not against the shell way, or a mixed approach, but before going > > either way and pushing I would rather have some more testing; > > especially on more exotic platforms. Thanks for testing and looking into it! Alexander
> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of > Alexander Strasser > Sent: Sunday, April 28, 2019 11:29 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > decoder/encoder/filter/... names in alphabet order > > On 2019-04-28 07:42 +0000, Guo, Yejun wrote: > > > -----Original Message----- > > > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf > Of > > > Alexander Strasser > > > Sent: Sunday, April 28, 2019 9:18 AM > > > To: FFmpeg development discussions and patches > <ffmpeg-devel@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > > > decoder/encoder/filter/... names in alphabet order > > > > > > > > > What do you think about using awk instead of shell? > > > > > > I have 2 POC patches attached. It's probably not 100% correct yet, > > > but it kind of demonstrates what it would look like. > > > > > > The main advantage of using awk, is that it's more elegant and > > > shorter. It's probably also less risky, because it's more isolated, > > > e.g. as it was explained by avih, there is no widely supported way > > > for locals across shells. We already use awk in configure for > > > mandatory functions, so it's no new dependency. > > > > > > Please comment on the awk approach. > > > > > I did some change to make it correct on ubuntu 16.04, but looks issue on > windows mingw. > > > > print_in_columns() { > > sort | tr '\n' ' ' | awk -v width="$ncols" ' > > { > > #add int() > > num_cols = int(width / 24); num_rows = int((NF + num_cols-1) / > num_cols); > > The added int() calls looks right. > > > > y = x = 1; > > for (i = 1; i <= NF; i++) { > > if (y > num_rows) { > > y = 1; x++; > > } > > d[x,y] = $i; > > y++; > > } > > > > print width > > print num_cols > > print num_rows > > print NF > > > > for (y = 1; y <= num_rows; y++) { > > for (x = 1; x <= num_cols; x++) { > > # it does not work for the case that the last column is not > fully filled > > #line = sprintf(x != num_cols ? "%s%-24s" : "%s%s", line, > d[x,y]); > > line = sprintf("%s%-24s", line, d[x,y]); > > You mean it doesn't work in the way that there will be trailing spaces > in the column before the last one which is partially empty, right? > > But the output looked right and there were never trailing spaces in the > last column, right? suppose num_cols = 3, see below, the spaces after 'c' is trimmed, while the spaces after 'e' is still there. a<spaces>b<spaces>c d<spaces>e<spaces> > > BTW I changed the algorithm here, because I thought it might be easier > to unterstand. I will also try the algorithm, you implemented in shell. > > > > } > > print line; > > line = ""; > > } > > }' | sed 's/ *$//' > > } > > > > on ubuntu 16.04, the output is: > > Enabled bsfs: > > 135 > > 5 > > 7 > > 33 > > aac_adtstoasc extract_extradata hevc_mp4toannexb > mpeg4_unpack_bframes truehd_core > > av1_frame_split filter_units imx_dump_header > noise vp9_metadata > > av1_metadata h264_metadata mjpeg2jpeg > null vp9_raw_reorder > > chomp h264_mp4toannexb > mjpega_dump_header prores_metadata vp9_superframe > > dca_core h264_redundant_pps mov2textsub > remove_extradata vp9_superframe_split > > dump_extradata hapqa_extract > mp3_header_decompress text2movsub > > eac3_core hevc_metadata mpeg2_metadata > trace_headers > > > > while on windows, the output is: > > Enabled bsfs: > > 72 > > 3 > > 11 > > 33 > > noiseh264_redundant_pps > > nullextract > > prores_metadata > > remove_extradatamp4toannexb > > text2movsubdump_header > > trace_headers > > truehd_corepega_dump_header > > vp9_metadata > > vp9_raw_reorderader_decompress > > vp9_superframea > > vp9_superframe_splitames > > Wild guess: CR LF line endings are emitted somewhere and the CRs stay in > the input. Your terminal resets the cursor to the start of the line when > interpreting the midline CRs. > > Does it work if you extend the tr in print_in_columns like below? > > sort | tr '\r\n' ' ' | awk -v width="$ncols" ' > yes, it works! > > > I did a quick check, but has not found the root cause yet. > > > > > > > > > > I'm not against the shell way, or a mixed approach, but before going > > > either way and pushing I would rather have some more testing; > > > especially on more exotic platforms. > > Thanks for testing and looking into it! > > Alexander > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
On 2019-04-29 00:55 +0000, Guo, Yejun wrote: [...] > > Wild guess: CR LF line endings are emitted somewhere and the CRs stay in > > the input. Your terminal resets the cursor to the start of the line when > > interpreting the midline CRs. > > > > Does it work if you extend the tr in print_in_columns like below? > > > > sort | tr '\r\n' ' ' | awk -v width="$ncols" ' > > > > yes, it works! > > > > > > I did a quick check, but has not found the root cause yet. > > > > > > > > > > > > > > I'm not against the shell way, or a mixed approach, but before going > > > > either way and pushing I would rather have some more testing; > > > > especially on more exotic platforms. > > > > Thanks for testing and looking into it! New patchset sent with subject: [PATCH 0/2] configure: Replace pr with awk and fix column sorting Your last version (v5) of the log_file rewrite performs better, so maybe we should use it instead of my awk version, which is at best as fast as current version (depends on the awk implementation). Alexander
2019-04-28 3:18 GMT+02:00, Alexander Strasser <eclipse7@gmx.net>:
> What do you think about using awk instead of shell?
Do we only use awk for --enable-random and the dependency
files so far? Does configure also work without awk now and
would this change?
Thank you, Carl Eugen
On Wed, 1 May 2019 22:57:47 +0200 Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2019-04-28 3:18 GMT+02:00, Alexander Strasser <eclipse7@gmx.net>: > > > What do you think about using awk instead of shell? > > Do we only use awk for --enable-random and the dependency > files so far? Does configure also work without awk now and > would this change? It seems awk is unconditionally required already. However I wanted to say that it's a very nice dep to have: easy to build, present almost everywhere, even in busybox. Nothing like perl/tcl or worse, python/java/rust/go. - Lauri
> It seems awk is unconditionally required already. However I wanted to > say that it's a very nice dep to have While it's possibly nicer than other deps to have, it's still better to use it IMHO only when it adds some value, like simpler code, better performance, compliance with some things, etc. With this patchset, for part 1 I'm not sure I see the added value with awk, as it looks about the same complexity of the code compared to the shell version. Performance-wise it's negligible as print_in_columns() is only called about 5 times (but with thousands of values to process), so as long as the loop is efficient, the additional awk process per call is barely measurable. For part 2 the awk version is simpler code, but `log_file()` is called nearly 300 times, mostly with files with less than 5 lines to print. At this case it does add a performance penalty, especially on systems where a new process is expensive.
Hi Avih! On 2019-05-02 08:55 +0000, avih wrote: > > It seems awk is unconditionally required already. However I wanted to > > say that it's a very nice dep to have > > While it's possibly nicer than other deps to have, it's still better to use > it IMHO only when it adds some value, like simpler code, better performance, > compliance with some things, etc. Agreed; of course we shouldn't just use awk because we can. Though I think not implementing things in shell is often lower risk, as we have no isolation in POSIX shell, we all share the same variables etc. and the configure script is quite big. Then shell is not suited for many tasks because of the way it works, like you explained really good before. So even if we find an acceptable solution, it will not be understood that easily by most people, because most people are not that deep into shell programming. Also in my experience the shells differ, when using more advanced concepts and in things that aren't used widely. Some of what I said applies to the alternatives too, except that they will always be better isolated. Sorry, that turned out longer than I wanted it to be :( > With this patchset, for part 1 I'm not sure I see the added value with awk, > as it looks about the same complexity of the code compared to the shell > version. Did you look at the version I attached in this thread? Or the one I posted in the new patch set? I changed it to use an algorithm more similar to the latest shell version discussed here. The version I attached in this mail thread, was an experiment to see if it could be easier understood that way. > Performance-wise it's negligible as print_in_columns() is only called > about 5 times (but with thousands of values to process), so as long as the > loop is efficient, the additional awk process per call is barely measurable. > For part 2 the awk version is simpler code, but `log_file()` is called nearly > 300 times, mostly with files with less than 5 lines to print. At this case it > does add a performance penalty, especially on systems where a new process is > expensive. Yes, exactly. Very well worded! Though I am still surprised about the Windows results. I did not investigate more, but I found it quite strange, that it showed a 8% performance hit the first time, and in subsequent runs there were no performance differences measurable for the runtime of the complete configure script. As mentioned in the other patch set, I am fine going with the shell version for part 2 (log_file). A third option would be to just use cat and forget about the line numbers. I am not sure the numbers really add much value, for those dumps of mostly very small source files with little ambiguity. Alexander
Hi Carl Eugen! On 2019-05-01 22:57 +0200, Carl Eugen Hoyos wrote: > 2019-04-28 3:18 GMT+02:00, Alexander Strasser <eclipse7@gmx.net>: > > > What do you think about using awk instead of shell? > > Do we only use awk for --enable-random and the dependency > files so far? Does configure also work without awk now and > would this change? No, it wouldn't as I have written when initially proposing to use it here. We unconditionally use awk for generating and writing the primary configure outputs: the config files. Upon your question I looked it up exactly; the routine was rewritten by Mans to use awk near the end of 2012 in this commit: f454e879238ce317c6d905d187e7608c461a7087 Alexander
> On 2019-05-02 08:55 +0000, avih wrote: > > > It seems awk is unconditionally required already. However I wanted to > > > say that it's a very nice dep to have > > > > While it's possibly nicer than other deps to have, it's still better to use > > it IMHO only when it adds some value, like simpler code, better performance, > > compliance with some things, etc. > > Agreed; of course we shouldn't just use awk because we can. > > Though I think not implementing things in shell is often > lower risk, as we have no isolation in POSIX shell, we all > share the same variables etc. and the configure script is > quite big. Then shell is not suited for many tasks because > of the way it works > ... This actually sounds to me like you're saying we shouldn't use awk because we can, but rather use it where possible because it'd be better than shell. In other words: we should write new configure code in awk. Did I misinterpret the statement or its implications? Assuming I interpreted it correctly, my opinion on this is that configure is already written in shell, so unless there's a consensus to migrate it to awk, things should stay shell except where other tools add some value beyond "this doesn't have the general issues of shell". I.e. use tools where they are better suited for some specific tasks than shell. Personally I don't have an opinion about migrating configure, though I'd think that if such migration ever happens, it should be to a proper build system and not just a different scripting language. > Also in my experience the shells differ, when using more > advanced concepts and in things that aren't used widely. True, though I don't think configure uses exotic shell features, and as far as I know it's currently posix and real-world compliant with many new and old shells. If you know of a valid issue with some shells, you can report it or send a patch. Note that I'm not arguing that shell is a great programming language. It's not - it's hard and it's treacherous. But currently that's what we have, and for now it does its job. > > With this patchset, for part 1 I'm not sure I see the added value with awk, > > as it looks about the same complexity of the code compared to the shell > > version. > > Did you look at the version I attached in this thread? Or the one I > posted in the new patch set? > > I changed it to use an algorithm more similar to the latest shell > version discussed here. I think so, yes. As I said, it's similar to the shell version. I don't think it's worse in any way, but I also didn't see an added value. Please post a link to the actual patch if you think I'm not looking at the patch version you refer to.
Hi! On 2019-05-04 06:28 +0000, avih wrote: > > On 2019-05-02 08:55 +0000, avih wrote: > > > > It seems awk is unconditionally required already. However I wanted to > > > > say that it's a very nice dep to have > > > > > > While it's possibly nicer than other deps to have, it's still better to use > > > it IMHO only when it adds some value, like simpler code, better performance, > > > compliance with some things, etc. > > > > Agreed; of course we shouldn't just use awk because we can. > > > > Though I think not implementing things in shell is often > > lower risk, as we have no isolation in POSIX shell, we all > > share the same variables etc. and the configure script is > > quite big. Then shell is not suited for many tasks because > > of the way it works > > ... > > This actually sounds to me like you're saying we shouldn't use awk because > we can, but rather use it where possible because it'd be better than shell. > > In other words: we should write new configure code in awk. > > Did I misinterpret the statement or its implications? You got me totally wrong :( [...] > > > > Did you look at the version I attached in this thread? Or the one I > > posted in the new patch set? > > > > I changed it to use an algorithm more similar to the latest shell > > version discussed here. > > I think so, yes. As I said, it's similar to the shell version. I don't > think it's worse in any way, but I also didn't see an added value. > > Please post a link to the actual patch if you think I'm not looking > at the patch version you refer to. I guess you were looking at the right patch. I mean this one: http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html Alexander
> I guess you were looking at the right patch. I mean this one: > http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html I was referring to this patch indeed. Thanks. > > > Agreed; of course we shouldn't just use awk because we can. > > > > > > Though I think not implementing things in shell is often > > > lower risk, as we have no isolation in POSIX shell, we all > > > share the same variables etc. and the configure script is > > > quite big. Then shell is not suited for many tasks because > > > of the way it works > > > ... > > > > This actually sounds to me like you're saying we shouldn't use awk because > > we can, but rather use it where possible because it'd be better than shell. > > > > In other words: we should write new configure code in awk. > > > > Did I misinterpret the statement or its implications? > > You got me totally wrong :( I'm only human, it happens. But you didn't explain what you actually meant. Specifically: - What makes this patch a good candidate to use awk rather than shell like the rest of configure? - What should be the general criteria to choose a scripting language for future patches? On Saturday, May 4, 2019 10:43 PM, Alexander Strasser <eclipse7@gmx.net> wrote: Hi! On 2019-05-04 06:28 +0000, avih wrote: > > On 2019-05-02 08:55 +0000, avih wrote: > > > > It seems awk is unconditionally required already. However I wanted to > > > > say that it's a very nice dep to have > > > > > > While it's possibly nicer than other deps to have, it's still better to use > > > it IMHO only when it adds some value, like simpler code, better performance, > > > compliance with some things, etc. > > > > Agreed; of course we shouldn't just use awk because we can. > > > > Though I think not implementing things in shell is often > > lower risk, as we have no isolation in POSIX shell, we all > > share the same variables etc. and the configure script is > > quite big. Then shell is not suited for many tasks because > > of the way it works > > ... > > This actually sounds to me like you're saying we shouldn't use awk because > we can, but rather use it where possible because it'd be better than shell. > > In other words: we should write new configure code in awk. > > Did I misinterpret the statement or its implications? You got me totally wrong :( [...] > > > > Did you look at the version I attached in this thread? Or the one I > > posted in the new patch set? > > > > I changed it to use an algorithm more similar to the latest shell > > version discussed here. > > I think so, yes. As I said, it's similar to the shell version. I don't > think it's worse in any way, but I also didn't see an added value. > > Please post a link to the actual patch if you think I'm not looking > at the patch version you refer to. I guess you were looking at the right patch. I mean this one: http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html Alexander
On 2019-05-05 21:14 +0000, avih wrote: > > I guess you were looking at the right patch. I mean this one: > > http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html > > I was referring to this patch indeed. Thanks. > > > > > > Agreed; of course we shouldn't just use awk because we can. > > > > > > > > Though I think not implementing things in shell is often > > > > lower risk, as we have no isolation in POSIX shell, we all > > > > share the same variables etc. and the configure script is > > > > quite big. Then shell is not suited for many tasks because > > > > of the way it works > > > > ... > > > > > > This actually sounds to me like you're saying we shouldn't use awk because > > > we can, but rather use it where possible because it'd be better than shell. > > > > > > In other words: we should write new configure code in awk. > > > > > > Did I misinterpret the statement or its implications? > > > > You got me totally wrong :( > > > I'm only human, it happens. But you didn't explain what you actually meant. It's a communication problem. I tried to explain why I prefer the awk version over the shell version of this patch in the email before. Seems I was too vague. I will try to answer your specific questions. If there is more to discuss, please don't hesitate to ask. > Specifically: > > - What makes this patch a good candidate to use awk rather than shell like > the rest of configure? I would like to rephrase this question a bit. I guess the other aspect will be in my answer to your next question anyway. patch1 (awk) configure: print_in_columns: replace pr with awk version: http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html patch2 (shell) configure: sort decoder/encoder/filter/... names in alphabet order (v5 as posted in this thread) - Why do you prefer patch1 over patch2? 1. Statistics * patch1: 1 file changed, 16 insertions(+), 2 deletions(-) * patch2: 1 file changed, 24 insertions(+), 2 deletions(-) 2. patch2 uses lots of variables (which is good in itself), but those should be local and we cannot express that portably in shell. In turn we have raised potential for clashes now or in the future. 3. patch2 uses eval combined with non-trivial quoting, which is hard to read and hard to grasp quickly. It's not wrong, but it's not as easy and straight forward as in patch1. 4. Depending on the input (stdin) unexpected things can happen in the "eval" and "set" lines. One example is given in the comment. 5. patch2 to uses shell for the parts easily expressed in shell and uses awk for the parts that are non-trivial in shell. 6. The awk implementation should be easier to read and understand for the majority of readers/developers. > - What should be the general criteria to choose a scripting language for > future patches? I don't see anywhere that we switch away from shell for configure. So in general things should be implemented in shell. Shell though is not really suited for implementing all kinds of algorithms. OTOH shell is an excellent choice for starting processes that communicate via stdin/stdout/stderr and plugging them together. So I think programming shell, sticking only to shell itself (builtins) is almost never a good option. Shell is best when coordinating the execution of other programs. Sometimes when you do not have a particular program around to help you in your shell script, that void needs to be filled. For this purpose awk is often a good choice, because it was especially designed to be a fit for this purpose amongst other things. Using only awk to implement bigger systems or as a general purpose programming language is IMHO nearly as wrong as using shell-only. > On Saturday, May 4, 2019 10:43 PM, Alexander Strasser <eclipse7@gmx.net> wrote: [...] I'm feeling like I might not have expressed my reasoning good and precise enough in every level of detail. I apologize for that. My natural language skills have their limits and I am not a native English speaker. I will try to refine specific points, should you or anyone else have more questions or comments. Alexander
> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of > Alexander Strasser > Sent: Tuesday, May 07, 2019 6:21 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > decoder/encoder/filter/... names in alphabet order > > On 2019-05-05 21:14 +0000, avih wrote: > > > I guess you were looking at the right patch. I mean this one: > > > http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html > > > > I was referring to this patch indeed. Thanks. > > > > > > > > > Agreed; of course we shouldn't just use awk because we can. > > > > > > > > > > Though I think not implementing things in shell is often > > > > > lower risk, as we have no isolation in POSIX shell, we all > > > > > share the same variables etc. and the configure script is > > > > > quite big. Then shell is not suited for many tasks because > > > > > of the way it works > > > > > ... > > > > > > > > This actually sounds to me like you're saying we shouldn't use awk > because > > > > we can, but rather use it where possible because it'd be better than shell. > > > > > > > > In other words: we should write new configure code in awk. > > > > > > > > Did I misinterpret the statement or its implications? > > > > > > You got me totally wrong :( > > > > > > I'm only human, it happens. But you didn't explain what you actually meant. > > It's a communication problem. I tried to explain why I prefer > the awk version over the shell version of this patch in the email > before. Seems I was too vague. > > I will try to answer your specific questions. If there is more > to discuss, please don't hesitate to ask. > > > Specifically: > > > > - What makes this patch a good candidate to use awk rather than shell like > > the rest of configure? > > I would like to rephrase this question a bit. I guess the other aspect > will be in my answer to your next question anyway. > > patch1 (awk) configure: print_in_columns: replace pr with awk version: > http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html > patch2 (shell) configure: sort decoder/encoder/filter/... names in alphabet > order (v5 as posted in this thread) > > - Why do you prefer patch1 over patch2? > > 1. Statistics > * patch1: 1 file changed, 16 insertions(+), 2 deletions(-) > * patch2: 1 file changed, 24 insertions(+), 2 deletions(-) I have no idea which one is better, but I think this point is not the reason. we can refine the shell patch as below with 15 insertions(+), 2 deletions(-1). Actually, the line number does not mean something. :) print_in_columns() { # the input should not contain chars such as '*', otherwise it will be expanded. set -- $(tr ' ' '\n' | sort) test $ncols -lt 24 && col_width=$ncols || col_width=24 cols=$(($ncols / $col_width)) rows=$(($(($# + $cols - 1)) / $cols)) cols_seq=$(seq $cols) for row in $(seq $rows); do print_index=$row print_line="" for col in $cols_seq; do test $print_index -le $# && eval print_line='"$print_line "${'$print_index'}' print_index=$(($print_index + $rows)) done printf "%-${col_width}s" $print_line; printf "\n" done | sed 's/ *$//' } the usage of "test && ||" is already used in configure. > 2. patch2 uses lots of variables (which is good in itself), but those > should be local and we cannot express that portably in shell. > In turn we have raised potential for clashes now or in the future. > 3. patch2 uses eval combined with non-trivial quoting, which is hard > to read and hard to grasp quickly. It's not wrong, but it's not as > easy and straight forward as in patch1. > 4. Depending on the input (stdin) unexpected things can happen in the > "eval" and "set" lines. One example is given in the comment. > 5. patch2 to uses shell for the parts easily expressed in shell and > uses awk for the parts that are non-trivial in shell. > 6. The awk implementation should be easier to read and understand for > the majority of readers/developers. > > > > - What should be the general criteria to choose a scripting language for > > future patches? > > I don't see anywhere that we switch away from shell for configure. So in > general things should be implemented in shell. Shell though is not really > suited for implementing all kinds of algorithms. OTOH shell is an excellent > choice for starting processes that communicate via stdin/stdout/stderr > and plugging them together. > > So I think programming shell, sticking only to shell itself (builtins) is > almost never a good option. Shell is best when coordinating the execution > of other programs. Sometimes when you do not have a particular program > around to help you in your shell script, that void needs to be filled. > For this purpose awk is often a good choice, because it was especially > designed to be a fit for this purpose amongst other things. Using only awk > to implement bigger systems or as a general purpose programming language > is IMHO nearly as wrong as using shell-only. > > > > On Saturday, May 4, 2019 10:43 PM, Alexander Strasser > <eclipse7@gmx.net> wrote: > [...] > > > I'm feeling like I might not have expressed my reasoning good and > precise enough in every level of detail. I apologize for that. My > natural language skills have their limits and I am not a native > English speaker. I will try to refine specific points, should you > or anyone else have more questions or comments. > > > Alexander > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Am 7. Mai 2019 04:07:23 MESZ schrieb "Guo, Yejun" <yejun.guo@intel.com>: > >> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf >Of >> Alexander Strasser >> Sent: Tuesday, May 07, 2019 6:21 AM >> To: FFmpeg development discussions and patches ><ffmpeg-devel@ffmpeg.org> >> Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort >> decoder/encoder/filter/... names in alphabet order >> >> On 2019-05-05 21:14 +0000, avih wrote: >> > > I guess you were looking at the right patch. I mean this one: [...] >> > > > >> > > > In other words: we should write new configure code in awk. >> > > > >> > > > Did I misinterpret the statement or its implications? >> > > >> > > You got me totally wrong :( >> > >> > I'm only human, it happens. But you didn't explain what you >actually meant. >> >> It's a communication problem. I tried to explain why I prefer >> the awk version over the shell version of this patch in the email >> before. Seems I was too vague. >> >> I will try to answer your specific questions. If there is more >> to discuss, please don't hesitate to ask. >> >> > Specifically: >> > >> > - What makes this patch a good candidate to use awk rather than >shell like >> > the rest of configure? >> >> I would like to rephrase this question a bit. I guess the other >aspect >> will be in my answer to your next question anyway. >> >> patch1 (awk) configure: print_in_columns: replace pr with awk >version: >> http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html >> patch2 (shell) configure: sort decoder/encoder/filter/... names in >alphabet >> order (v5 as posted in this thread) >> >> - Why do you prefer patch1 over patch2? >> >> 1. Statistics >> * patch1: 1 file changed, 16 insertions(+), 2 deletions(-) >> * patch2: 1 file changed, 24 insertions(+), 2 deletions(-) > >I have no idea which one is better, but I think this point is not the >reason. My fault. I wanted to add that those points are in no particular order and that this one is not a very strong Point; but it's an indication. >we can refine the shell patch as below with 15 insertions(+), 2 >deletions(-1). Actually, the line number does not mean something. :) > >print_in_columns() { ># the input should not contain chars such as '*', otherwise it will be >expanded. > set -- $(tr ' ' '\n' | sort) > test $ncols -lt 24 && col_width=$ncols || col_width=24 > cols=$(($ncols / $col_width)) > rows=$(($(($# + $cols - 1)) / $cols)) > cols_seq=$(seq $cols) > for row in $(seq $rows); do > print_index=$row > print_line="" > for col in $cols_seq; do >test $print_index -le $# && eval print_line='"$print_line >"${'$print_index'}' > print_index=$(($print_index + $rows)) > done > printf "%-${col_width}s" $print_line; printf "\n" > done | sed 's/ *$//' >} > >the usage of "test && ||" is already used in configure. Lines can bei sqeezed, but that isn't a very good Idea beyond some point. I'm quite sure the shell Implementation will end up with a few lines more or I will not feel comfortable pushing a patch that has been compacted too much. As I said above already, that alone isn't a strong reason to prefer patch2. Though all points together are in my opinion. Best regards, Alexander >> 2. patch2 uses lots of variables (which is good in itself), but those >> should be local and we cannot express that portably in shell. >> In turn we have raised potential for clashes now or in the future. >> 3. patch2 uses eval combined with non-trivial quoting, which is hard >> to read and hard to grasp quickly. It's not wrong, but it's not as >> easy and straight forward as in patch1. >> 4. Depending on the input (stdin) unexpected things can happen in the >> "eval" and "set" lines. One example is given in the comment. >> 5. patch2 to uses shell for the parts easily expressed in shell and >> uses awk for the parts that are non-trivial in shell. >> 6. The awk implementation should be easier to read and understand for >> the majority of readers/developers. >> >> >> > - What should be the general criteria to choose a scripting >language for >> > future patches? >> >> I don't see anywhere that we switch away from shell for configure. So >in >> general things should be implemented in shell. Shell though is not >really >> suited for implementing all kinds of algorithms. OTOH shell is an >excellent >> choice for starting processes that communicate via >stdin/stdout/stderr >> and plugging them together. >> >> So I think programming shell, sticking only to shell itself >(builtins) is >> almost never a good option. Shell is best when coordinating the >execution >> of other programs. Sometimes when you do not have a particular >program >> around to help you in your shell script, that void needs to be >filled. >> For this purpose awk is often a good choice, because it was >especially >> designed to be a fit for this purpose amongst other things. Using >only awk >> to implement bigger systems or as a general purpose programming >language >> is IMHO nearly as wrong as using shell-only. >> >> >> > On Saturday, May 4, 2019 10:43 PM, Alexander Strasser >> <eclipse7@gmx.net> wrote: >> [...] >> >> >> I'm feeling like I might not have expressed my reasoning good and >> precise enough in every level of detail. I apologize for that. My >> natural language skills have their limits and I am not a native >> English speaker. I will try to refine specific points, should you >> or anyone else have more questions or comments.
> patch1 (awk) configure: print_in_columns: replace pr with awk version: http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html > patch2 (shell) configure: sort decoder/encoder/filter/... names in alphabet > order (v5 as posted in this thread) > > - Why do you prefer patch1 over patch2? In general, I agree with your reasoning, but not entirely sure about the conclusion and its implications. Few exceptions first: > 1. Statistics > * patch1: 1 file changed, 16 insertions(+), 2 deletions(-) > * patch2: 1 file changed, 24 insertions(+), 2 deletions(-) As you said later, and I agree, it's a weak argument, as both versions are the same code and complexity, but differ mostly in style and language syntax. > 2. patch2 uses lots of variables (which is good in itself), but those > should be local and we cannot express that portably in shell. > In turn we have raised potential for clashes now or in the future. This is incorrect. A subshell provides identical isolation as an external process, it's fully portable, it can be used here, and it's not slower than an external process, e.g. `print_in_columns() ( ... )`. > 4. Depending on the input (stdin) unexpected things can happen in the > "eval" and "set" lines. One example is given in the comment. "Unexpected" is a bit general. Specifically, the "eval" line does not execute arbitrary inputs, and neither does the "set" line. The only potential surprise is globbing, as documented, and the technique of expanding variables unquoted - like here, is used throughout configure. Look no further than 100% of the very places which pipe the input stream into print_in_columns, and multiple times earlier than that. It's not great of course, but it's not worse than elsewhere, and it's at least documented. Some day maybe someone would improve it. I generally agree with rest of your comments/arguments, including: > 1. Statistics ... (patch2 is more verbose). > 3. patch2 uses eval combined with non-trivial quoting, which is hard > to read and hard to grasp quickly. It's not wrong, but it's not as > easy and straight forward as in patch1. > 5. patch2 to uses shell for the parts easily expressed in shell and > uses awk for the parts that are non-trivial in shell. > 6. The awk implementation should be easier to read and understand for > the majority of readers/developers. which basically say that shell is hard, and, pardon the pun, awkward. And also I agree with: > Shell though is not really suited for implementing all kinds of algorithms. > OTOH shell is an excellent choice for starting processes that communicate > via stdin/stdout/stderr and plugging them together. and with > Shell is best when coordinating the execution of other programs. And similar assertions, however, let's keep the following facts in mind: - We both agree that we should keep to shell except where an external tool provides some meaningful value over shell code. - An external tool only helps by being more suitable for the task than shell code (readability, portability, correctness, performance, etc), but it doesn't help with isolation more than a subshell would. - Shell is capable in a portable way, including for some tasks which can be considered "programming". - configure is currently written in shell, including some non trivial functionality, and doesn't use external script engines for "scriptlets". - Being a shell script, it already requires competence and reasonable understanding of shell, including quoting, expansions, etc. - Maintaining code in more than one scripting system does carry a cost. - Yes, shell is a lousy programming language. I really don't care if this specific patch ends up as shell or awk. What I do care about is being consistent, and understanding the implications. IMHO, because I'm not convinced that the awk version provides a meaningful additional value other than not being shell, if we use the awk version then it sets a precedence of using awk (or maybe also other languages) for small scripts even if they're basically the same complexity and code as they would be in shell. It's not necessarily a bad precedence (even if I wouldn't do that myself), because I do agree that not being shell code does have value, but I think it's important to acknowledge that it does set such precedence. If there was already such practice of using awk scriptlets where possible then I wouldn't have said anything. So I mainly care about the precedence. I guess it comes down to a judgment call on whether or not this awk script can be considered meaningfully more suitable for the task than shell. I think not, but clearly it's only my subjective assessment. Regards, Avi
Hi all, hi Avi! Sorry for the late reply. On 2019-05-07 12:22 +0000, avih wrote: > > patch1 (awk) configure: print_in_columns: replace pr with awk version: http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243380.html > > patch2 (shell) configure: sort decoder/encoder/filter/... names in alphabet > > order (v5 as posted in this thread) > > > > - Why do you prefer patch1 over patch2? > > In general, I agree with your reasoning, but not entirely sure about the > conclusion and its implications. > > Few exceptions first: > > > > 1. Statistics > > * patch1: 1 file changed, 16 insertions(+), 2 deletions(-) > > * patch2: 1 file changed, 24 insertions(+), 2 deletions(-) > > As you said later, and I agree, it's a weak argument, as both versions > are the same code and complexity, but differ mostly in style and language > syntax. > > > 2. patch2 uses lots of variables (which is good in itself), but those > > should be local and we cannot express that portably in shell. > > In turn we have raised potential for clashes now or in the future. > > This is incorrect. A subshell provides identical isolation as an external > process, it's fully portable, it can be used here, and it's not slower than > an external process, e.g. `print_in_columns() ( ... )`. No disagreement regarding those 2 points. Though with all other things being equal, it's nice to have patches ready and tested as they are. > > 4. Depending on the input (stdin) unexpected things can happen in the > > "eval" and "set" lines. One example is given in the comment. > > "Unexpected" is a bit general. Specifically, the "eval" line does not > execute arbitrary inputs, and neither does the "set" line. Regarding eval it seems I misremembered something or it was in previous iteration. I looked at it again and I can't see how the latest version can do something unexpected trough the eval. > The only potential surprise is globbing, as documented, and the technique > of expanding variables unquoted - like here, is used throughout configure. > > Look no further than 100% of the very places which pipe the input stream > into print_in_columns, and multiple times earlier than that. > > It's not great of course, but it's not worse than elsewhere, and it's at > least documented. Some day maybe someone would improve it. > > > I generally agree with rest of your comments/arguments, including: > > > 1. Statistics ... (patch2 is more verbose). > > 3. patch2 uses eval combined with non-trivial quoting, which is hard > > to read and hard to grasp quickly. It's not wrong, but it's not as > > easy and straight forward as in patch1. > > 5. patch2 to uses shell for the parts easily expressed in shell and > > uses awk for the parts that are non-trivial in shell. > > 6. The awk implementation should be easier to read and understand for > > the majority of readers/developers. > > which basically say that shell is hard, and, pardon the pun, awkward. :) > And also I agree with: > > > Shell though is not really suited for implementing all kinds of algorithms. > > OTOH shell is an excellent choice for starting processes that communicate > > via stdin/stdout/stderr and plugging them together. > > and with > > > Shell is best when coordinating the execution of other programs. > > And similar assertions, however, let's keep the following facts in mind: > > - We both agree that we should keep to shell except where an external tool > provides some meaningful value over shell code. > > - An external tool only helps by being more suitable for the task than > shell code (readability, portability, correctness, performance, etc), > but it doesn't help with isolation more than a subshell would. > > - Shell is capable in a portable way, including for some tasks which can > be considered "programming". > > - configure is currently written in shell, including some non trivial > functionality, and doesn't use external script engines for "scriptlets". > > - Being a shell script, it already requires competence and reasonable > understanding of shell, including quoting, expansions, etc. > > - Maintaining code in more than one scripting system does carry a cost. > > - Yes, shell is a lousy programming language. > > > I really don't care if this specific patch ends up as shell or awk. What > I do care about is being consistent, and understanding the implications. > > IMHO, because I'm not convinced that the awk version provides a meaningful > additional value other than not being shell, if we use the awk version then > it sets a precedence of using awk (or maybe also other languages) for small > scripts even if they're basically the same complexity and code as they > would be in shell. > > It's not necessarily a bad precedence (even if I wouldn't do that myself), > because I do agree that not being shell code does have value, but I think > it's important to acknowledge that it does set such precedence. > > If there was already such practice of using awk scriptlets where possible > then I wouldn't have said anything. So I mainly care about the precedence. Interesting point. We seem to disagree regarding the reach of the precedence. I don't believe it sets a precedence "of using awk (or maybe also other languages) for small scripts even if they're basically the same complexity and code as they would be in shell". For me this is just an individual instance where we decided to implement the needed "leaf" functionality in awk. With leaf I mean it is lower level code that does not need to call into e.g. shell functions we provide on the same or lower level. Comparable to e.g. calling an external process, like we do with pr in the current implementation. I would oppose to implement more upper level configure logic in any language other than shell. > I guess it comes down to a judgment call on whether or not this awk script > can be considered meaningfully more suitable for the task than shell. > > I think not, but clearly it's only my subjective assessment. I would still prefer to commit the awk version. IMHO it is better suited for this task. Surely it's not ideal and it's not clearly better in all ways. I intent to push the awk version. I will wait at least until Thursday, so people can further test, comment, or object. Alexander
> I intent to push the awk version. I will wait at least until > Thursday, so people can further test, comment, or object. No further comments here. Thanks! Avi
> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of > avih > Sent: Monday, June 03, 2019 6:29 AM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > decoder/encoder/filter/... names in alphabet order > > > I intent to push the awk version. I will wait at least until > > Thursday, so people can further test, comment, or object. > > No further comments here. I'm ok with the awk version. > > Thanks! > Avi > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
On 2019-06-03 06:24 +0000, Guo, Yejun wrote: > > > > -----Original Message----- > > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of > > avih > > Sent: Monday, June 03, 2019 6:29 AM > > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort > > decoder/encoder/filter/... names in alphabet order > > > > > I intent to push the awk version. I will wait at least until > > > Thursday, so people can further test, comment, or object. > > > > No further comments here. > > I'm ok with the awk version. Finally pushed the awk version. Thanks again for your work and comments. Alexander
From 7dc4043db4ea32f6e85b17b6ec8060dac8c85f6b Mon Sep 17 00:00:00 2001 From: Alexander Strasser <eclipse7@gmx.net> Date: Sat, 27 Apr 2019 01:06:59 +0200 Subject: [PATCH 2/2] configure: log_file: Replace pr with awk invocation Fixes remaining part of ticket #TODO --- configure | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure b/configure index 7e739b3e55..491d4d50e9 100755 --- a/configure +++ b/configure @@ -503,7 +503,7 @@ log(){ log_file(){ log BEGIN $1 - pr -n -t $1 >> $logfile + awk '{ printf("%5d\t%s\n", NR, $0) }' $1 >> $logfile log END $1 } -- 2.20.1