[FFmpeg-devel,v7] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

It enabled MULTIPLE SIMPLE filter graph concurrency, which bring above about
4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration

Below are some test cases and comparison as reference.
(Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
(Software: Intel iHD driver - 16.9.00100, CentOS 7)

For 1:N transcode by GPU acceleration with vaapi:
./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
    -hwaccel_output_format vaapi \
    -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
    -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
    -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null

    test results:
                2 encoders 5 encoders 10 encoders
    Improved       6.1%    6.9%       5.5%

For 1:N transcode by GPU acceleration with QSV:
./ffmpeg -hwaccel qsv -c:v h264_qsv \
    -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
    -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null \
    -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null /dev/null

    test results:
                2 encoders  5 encoders 10 encoders
    Improved       6%       4%         15%

For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
./ffmpeg -hwaccel qsv -c:v h264_qsv \
    -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
    -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null \
    -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null

    test results:
                2 scale  5 scale   10 scale
    Improved       12%     21%        21%

For CPU only 1 decode to N scaling:
./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
    -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
    -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null

    test results:
                2 scale  5 scale   10 scale
    Improved       25%    107%       148%

Signed-off-by: Wang, Shaofei <shaofei.wang@intel.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
---
The patch will only effect on multiple SIMPLE filter graphs pipeline,
Passed fate and refine the possible data race,
AFL tested, without introducing extra crashs/hangs:

                          american fuzzy lop 2.52b (ffmpeg_g)

    ┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
    │        run time : 0 days, 9 hrs, 48 min, 48 sec      │  cycles done : 0      │
    │   last new path : 0 days, 0 hrs, 0 min, 0 sec        │  total paths : 1866   │
    │ last uniq crash : none seen yet                      │ uniq crashes : 0      │
    │  last uniq hang : 0 days, 9 hrs, 19 min, 23 sec      │   uniq hangs : 35     │
    ├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
    │  now processing : 0 (0.00%)         │    map density : 24.91% / 36.60%       │
    │ paths timed out : 0 (0.00%)         │ count coverage : 2.40 bits/tuple       │
    ├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
    │  now trying : calibration           │ favored paths : 1 (0.05%)              │
    │ stage execs : 0/8 (0.00%)           │  new edges on : 1100 (58.95%)          │
    │ total execs : 123k                  │ total crashes : 0 (0 unique)           │
    │  exec speed : 3.50/sec (zzzz...)    │  total tmouts : 52 (47 unique)         │
    ├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
    │   bit flips : 0/0, 0/0, 0/0                         │    levels : 2          │
    │  byte flips : 0/0, 0/0, 0/0                         │   pending : 1866       │
    │ arithmetics : 0/0, 0/0, 0/0                         │  pend fav : 1          │
    │  known ints : 0/0, 0/0, 0/0                         │ own finds : 1862       │
    │  dictionary : 0/0, 0/0, 0/0                         │  imported : n/a        │
    │       havoc : 0/0, 0/0                              │ stability : 76.69%     │
    │        trim : 0.00%/1828, n/a                       ├────────────────────────┘
    └─────────────────────────────────────────────────────┘          [cpu000: 59%]

 fftools/ffmpeg.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++------
 fftools/ffmpeg.h |  13 +++++
 2 files changed, 169 insertions(+), 16 deletions(-)

[FFmpeg-devel,v7] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

Commit Message

Comments

Patch