[FFmpeg-devel] lavfi/vf_nlmeans: Improve the performance for nlmeans

Submitted by Jun Zhao on Jan. 31, 2019, 1:55 p.m.

Details

Message ID 1548942957-19151-1-git-send-email-mypopydev@gmail.com
State New
Headers show

Commit Message

Jun Zhao Jan. 31, 2019, 1:55 p.m.
Remove the pdiff_lut_scale in nlmeans, when search the weight_luttable
in nlmeans_slices(), the old way need to the float-point arithmetic
using pdiff_lut_scale. This change will avoid using pdiff_lut_scale
in the weight_lut table search, it's will improve the performance about
12%. (1080P size picture).

Use the profiling cmd like:
perf stat -a -d -r 5 ./ffmpeg -i input -an -vf nlmeans=s=30 -vframes 10 \
-f null /dev/null

without this change:
when s=1.0(default value) 63s
     s=30.0               72s
after this change:
     s=1.0(default value) 56s
     s=30.0               63s

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
---
 libavfilter/vf_nlmeans.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

Comments

Carl Eugen Hoyos Jan. 31, 2019, 7:57 p.m.
2019-01-31 14:55 GMT+01:00, Jun Zhao <mypopydev@gmail.com>:
> Remove the pdiff_lut_scale in nlmeans, when search the weight_luttable
> in nlmeans_slices(), the old way need to the float-point arithmetic
> using pdiff_lut_scale. This change will avoid using pdiff_lut_scale
> in the weight_lut table search, it's will improve the performance about
> 12%. (1080P size picture).

Please mention the change in heap memory requirement with numbers.
Remove "old way need to the float-point arithmetic" because new way
also needs (some) floating-point arithmetic.
Feel free not to mention pdiff_lut_scale in the commit message.

Carl Eugen
mypopy@gmail.com Feb. 1, 2019, 2:42 a.m.
On Fri, Feb 1, 2019 at 3:57 AM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2019-01-31 14:55 GMT+01:00, Jun Zhao <mypopydev@gmail.com>:
> > Remove the pdiff_lut_scale in nlmeans, when search the weight_luttable
> > in nlmeans_slices(), the old way need to the float-point arithmetic
> > using pdiff_lut_scale. This change will avoid using pdiff_lut_scale
> > in the weight_lut table search, it's will improve the performance about
> > 12%. (1080P size picture).
>
> Please mention the change in heap memory requirement with numbers.
> Remove "old way need to the float-point arithmetic" because new way
> also needs (some) floating-point arithmetic.
> Feel free not to mention pdiff_lut_scale in the commit message.
>
Will update the commit message again, Tks

Patch hide | download patch | download mbox

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index 82e779c..72eb819 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -43,8 +43,7 @@  struct weighted_avg {
     float sum;
 };
 
-#define WEIGHT_LUT_NBITS 9
-#define WEIGHT_LUT_SIZE  (1<<WEIGHT_LUT_NBITS)
+#define WEIGHT_LUT_SIZE  (800000) // need to >  300 * 300 * log(255)
 
 typedef struct NLMeansContext {
     const AVClass *class;
@@ -63,7 +62,6 @@  typedef struct NLMeansContext {
     struct weighted_avg *wa;                    // weighted average of every pixel
     ptrdiff_t wa_linesize;                      // linesize for wa in struct size unit
     float weight_lut[WEIGHT_LUT_SIZE];          // lookup table mapping (scaled) patch differences to their associated weights
-    float pdiff_lut_scale;                      // scale factor for patch differences before looking into the LUT
     uint32_t max_meaningful_diff;               // maximum difference considered (if the patch difference is too high we ignore the pixel)
     NLMeansDSPContext dsp;
 } NLMeansContext;
@@ -401,8 +399,7 @@  static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs
             const uint32_t patch_diff_sq = e - d - b + a;
 
             if (patch_diff_sq < s->max_meaningful_diff) {
-                const unsigned weight_lut_idx = patch_diff_sq * s->pdiff_lut_scale;
-                const float weight = s->weight_lut[weight_lut_idx]; // exp(-patch_diff_sq * s->pdiff_scale)
+                const float weight = s->weight_lut[patch_diff_sq]; // exp(-patch_diff_sq * s->pdiff_scale)
                 wa[x].total_weight += weight;
                 wa[x].sum += weight * src[x];
             }
@@ -527,10 +524,9 @@  static av_cold int init(AVFilterContext *ctx)
 
     s->pdiff_scale = 1. / (h * h);
     s->max_meaningful_diff = -log(1/255.) / s->pdiff_scale;
-    s->pdiff_lut_scale = 1./s->max_meaningful_diff * WEIGHT_LUT_SIZE;
-    av_assert0((s->max_meaningful_diff - 1) * s->pdiff_lut_scale < FF_ARRAY_ELEMS(s->weight_lut));
+    av_assert0((s->max_meaningful_diff - 1) < FF_ARRAY_ELEMS(s->weight_lut));
     for (i = 0; i < WEIGHT_LUT_SIZE; i++)
-        s->weight_lut[i] = exp(-i / s->pdiff_lut_scale * s->pdiff_scale);
+        s->weight_lut[i] = exp(-i * s->pdiff_scale);
 
     CHECK_ODD_FIELD(research_size,   "Luma research window");
     CHECK_ODD_FIELD(patch_size,      "Luma patch");