[FFmpeg-devel,V3,1/2] configure: sort decoder/encoder/filter/... names in alphabet order

Submitted by Guo, Yejun on April 23, 2019, 10:31 a.m.

Details

Message ID 1556015482-19503-1-git-send-email-yejun.guo@intel.com
State New
Headers show

Commit Message

Guo, Yejun April 23, 2019, 10:31 a.m.
take decoder names an example, with the default page length, shell command
'pr' needs two pages for all the decoder names. The names are firstly printed
in the first page, then in the second page. So, as a whole, the names are
sorted neither in column order nor in row order. It's a little confused.

One method is to calculate the proper page length, so all the names are printed
in one page by 'pr -l', and so strictly in alphabet order, column by column.

Another method is to use command printf instead of pr, because buybox doesn't
have pr. This patch refines print_in_columns to print the names with printf
in alphabet order, very similar with 'pr -l', except the case when the last
column is not fully filled with names.

The interface of print_in_columns changed, the input needs to be sorted first,
and then pass into print_in_columns as function parameters. It gives the
flexibility the input parameters can be considered as an array and can be indexed.

contributor: Alexander Strasser <eclipse7@gmx.net>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
---
 configure | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

Comments

avih April 23, 2019, 12:13 p.m.
>  print_in_columns() {
> -    cols=$(expr $ncols / 24)
> -    cat | tr ' ' '\n' | sort | pr -r "-$cols" -w $ncols -t
> +    col_width=24
> +    cols=$(expr $ncols / $col_width)
> +    rows=$(expr $(expr $# + $cols - 1) / $cols)
> +    for row in $(seq $rows); do
> +        index=$row
> +        line=""
> +        fmt=""
> +        for col in $(seq $cols); do
> +            if [ $index -le $# ]; then
> +                eval line='"$line "${'$index'}'
> +                fmt="$fmt%-${col_width}s"
> +            fi
> +            index=$(expr $index + $rows)
> +        done
> +        printf "$fmt\n" $line
> +    done | sed 's/ *$//'
> }


The new code is relatively slow.

On linux it adds ~ 1.5s (1500 ms) to the configure time in both bash and dash
on my system, which is roughly additional ~20% to configure run time.
On Windows this will easily add a minute or more (I didn't test).


Few things to consider:

- print_in_column iterates over a _lot_ of values - hundreds or more.

- Subshells - `$(cmd ...)` are relatively very expensive, especially in hot
  (inner) loops, and especially on Windows.

- $(expr ...) can typically be replaced with shell arithmetics - $((...)) .
  Your part 2 already uses it, and regardless it's been used in configure
  before (in pushvar and popvar), so you should use it where possible.

- `for col in $(seq $cols)` need not invoke `seq` on each iterations to always
  produce the same output. You can capture its output once on startup.

- All the places which use the new print_in_columns want the result sorted and
  duplicate pipe through `tr` and `sort`. The original version already did
  `cat | tr ' ' '\n' | sort` in one place, and you can simply replace it with
  `set -- $(cat | tr ' ' '\n' | sort)` to get the values as positional
  parameters. This will also keep the interface the same as before and will
  reduce the patch size.

And finally, is there a good reason to sort the results in columns and not rows?
As you rightfully mentioned, outputs can span several pages, and when reading
it on screen it might be more convenient to read row by row than column by
column. This could also simplify the new code a lot.
Guo, Yejun April 24, 2019, 5:14 a.m.
> -----Original Message-----

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces@ffmpeg.org] On Behalf Of

> avih

> Sent: Tuesday, April 23, 2019 8:14 PM

> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>

> Cc: Guo, Yejun <yejun.guo@intel.com>

> Subject: Re: [FFmpeg-devel] [PATCH V3 1/2] configure: sort

> decoder/encoder/filter/... names in alphabet order

> 

> >  print_in_columns() {

> > -    cols=$(expr $ncols / 24)

> > -    cat | tr ' ' '\n' | sort | pr -r "-$cols" -w $ncols -t

> > +    col_width=24

> > +    cols=$(expr $ncols / $col_width)

> > +    rows=$(expr $(expr $# + $cols - 1) / $cols)

> > +    for row in $(seq $rows); do

> > +        index=$row

> > +        line=""

> > +        fmt=""

> > +        for col in $(seq $cols); do

> > +            if [ $index -le $# ]; then

> > +                eval line='"$line "${'$index'}'

> > +                fmt="$fmt%-${col_width}s"

> > +            fi

> > +            index=$(expr $index + $rows)

> > +        done

> > +        printf "$fmt\n" $line

> > +    done | sed 's/ *$//'

> > }

> 

> 

> The new code is relatively slow.

> 

> On linux it adds ~ 1.5s (1500 ms) to the configure time in both bash and dash

> on my system, which is roughly additional ~20% to configure run time.

> On Windows this will easily add a minute or more (I didn't test).


I tried with mingw on windows, the whole configure 'date;./configure;date' takes
about 2:25 (minute:second) with master, it takes about 3:41 with this v3 patch, and
takes about 2:15 with v4 patch with your suggestions, v4 will be sent out next.

I also tried on ubuntu 16.04, the consumed times are: 8.5 seconds (master), 12 seconds (v3 patch), and 7.5 seconds (v4 patch).

> 

> 

> Few things to consider:

> 

> - print_in_column iterates over a _lot_ of values - hundreds or more.

> 

> - Subshells - `$(cmd ...)` are relatively very expensive, especially in hot

>   (inner) loops, and especially on Windows.

> 

> - $(expr ...) can typically be replaced with shell arithmetics - $((...)) .

>   Your part 2 already uses it, and regardless it's been used in configure

>   before (in pushvar and popvar), so you should use it where possible.


I just thought they are similar and so choose it by random. 
Find the performance is better after change to $((...)).

> 

> - `for col in $(seq $cols)` need not invoke `seq` on each iterations to always

>   produce the same output. You can capture its output once on startup.


thanks, and it helps the performance too.

> 

> - All the places which use the new print_in_columns want the result sorted and

>   duplicate pipe through `tr` and `sort`. The original version already did

>   `cat | tr ' ' '\n' | sort` in one place, and you can simply replace it with

>   `set -- $(cat | tr ' ' '\n' | sort)` to get the values as positional

>   parameters. This will also keep the interface the same as before and will

>   reduce the patch size.


thanks, good point.

> 

> And finally, is there a good reason to sort the results in columns and not rows?

> As you rightfully mentioned, outputs can span several pages, and when reading

> it on screen it might be more convenient to read row by row than column by

> column. This could also simplify the new code a lot.


To check a name with human eye, after we get the nearby area for the check:
If row by row, we have to turn our eyes from left to right, from up to down,
and then again left to right, up to down, maybe again and again. And there is still
possibility that we missed the check.

If column by column, we just turn from up to down (the same column) and
can easily check if the name is there or not.

> 

> _______________________________________________

> ffmpeg-devel mailing list

> ffmpeg-devel@ffmpeg.org

> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

> 

> To unsubscribe, visit link above, or email

> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

Patch hide | download patch | download mbox

diff --git a/configure b/configure
index 3b11ffe..f8032aa 100755
--- a/configure
+++ b/configure
@@ -3832,14 +3832,28 @@  die_unknown(){
 }
 
 print_in_columns() {
-    cols=$(expr $ncols / 24)
-    cat | tr ' ' '\n' | sort | pr -r "-$cols" -w $ncols -t
+    col_width=24
+    cols=$(expr $ncols / $col_width)
+    rows=$(expr $(expr $# + $cols - 1) / $cols)
+    for row in $(seq $rows); do
+        index=$row
+        line=""
+        fmt=""
+        for col in $(seq $cols); do
+            if [ $index -le $# ]; then
+                eval line='"$line "${'$index'}'
+                fmt="$fmt%-${col_width}s"
+            fi
+            index=$(expr $index + $rows)
+        done
+        printf "$fmt\n" $line
+    done | sed 's/ *$//'
 }
 
 show_list() {
     suffix=_$1
     shift
-    echo $* | sed s/$suffix//g | print_in_columns
+    print_in_columns $(echo $* | sed s/$suffix//g | tr ' ' '\n' | sort)
     exit 0
 }
 
@@ -7121,32 +7135,32 @@  test -n "$random_seed" &&
 echo
 
 echo "External libraries:"
-print_enabled '' $EXTERNAL_LIBRARY_LIST $EXTERNAL_AUTODETECT_LIBRARY_LIST | print_in_columns
+print_in_columns $(print_enabled '' $EXTERNAL_LIBRARY_LIST $EXTERNAL_AUTODETECT_LIBRARY_LIST | tr ' ' '\n' | sort)
 echo
 
 echo "External libraries providing hardware acceleration:"
-print_enabled '' $HWACCEL_LIBRARY_LIST $HWACCEL_AUTODETECT_LIBRARY_LIST | print_in_columns
+print_in_columns $(print_enabled '' $HWACCEL_LIBRARY_LIST $HWACCEL_AUTODETECT_LIBRARY_LIST | tr ' ' '\n' | sort)
 echo
 
 echo "Libraries:"
-print_enabled '' $LIBRARY_LIST | print_in_columns
+print_in_columns $(print_enabled '' $LIBRARY_LIST | tr ' ' '\n' | sort)
 echo
 
 echo "Programs:"
-print_enabled '' $PROGRAM_LIST | print_in_columns
+print_in_columns $(print_enabled '' $PROGRAM_LIST | tr ' ' '\n' | sort)
 echo
 
 for type in decoder encoder hwaccel parser demuxer muxer protocol filter bsf indev outdev; do
     echo "Enabled ${type}s:"
     eval list=\$$(toupper $type)_LIST
-    print_enabled '_*' $list | print_in_columns
+    print_in_columns $(print_enabled '_*' $list | tr ' ' '\n' | sort)
     echo
 done
 
 if test -n "$ignore_tests"; then
     ignore_tests=$(echo $ignore_tests | tr ',' ' ')
     echo "Ignored FATE tests:"
-    echo $ignore_tests | print_in_columns
+    print_in_columns $(echo $ignore_tests | tr ' ' '\n' | sort)
     echo
 fi