[FFmpeg-devel] lavfi/vf_nlmeans: Remove the pdiff_lut_scale to improve the performance

Submitted by Jun Zhao on Jan. 30, 2019, 10:56 a.m.

Details

Message ID 1548845764-32527-1-git-send-email-mypopydev@gmail.com
State New
Headers show

Commit Message

Jun Zhao Jan. 30, 2019, 10:56 a.m.
Remove the pdiff_lut_scale in nlmeans, and this change will avoid
using pdiff_lut_scale in the exp table search in nlmean_slice, it's will
improve the performance about 12%.

Use the profiling cmd like:
perf stat -a -d -r 5 ./ffmpeg -i input -an -vf nlmeans=s=30 -vframes 10 \
-f null /dev/null

without this change:
when s=1.0(default value) 63s
     s=30.0               72s
after this change:
     s=1.0(default value) 56s
     s=30.0               63s

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
---
 libavfilter/vf_nlmeans.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

Comments

Carl Eugen Hoyos Jan. 30, 2019, 2:24 p.m.
2019-01-30 11:56 GMT+01:00, Jun Zhao <mypopydev@gmail.com>:
> Remove the pdiff_lut_scale in nlmeans

This sentence is very misleading.

> and this change will avoid
> using pdiff_lut_scale in the exp table search in nlmean_slice, it's will
> improve the performance about 12%.

Please mention in the commit message that you increase
the context size including the amount (and probably remove
the part above that you "remove" something from the context).

Carl Eugen
mypopy@gmail.com Jan. 31, 2019, 1:31 a.m.
On Wed, Jan 30, 2019 at 10:24 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> 2019-01-30 11:56 GMT+01:00, Jun Zhao <mypopydev@gmail.com>:
> > Remove the pdiff_lut_scale in nlmeans
>
> This sentence is very misleading.
>
> > and this change will avoid
> > using pdiff_lut_scale in the exp table search in nlmean_slice, it's will
> > improve the performance about 12%.
>
> Please mention in the commit message that you increase
> the context size including the amount (and probably remove
> the part above that you "remove" something from the context).
>

Looks like I need to improve the commit message, will update in the V2. :)

In fact, when I try to use nlmeans for denoising  a 1080P size pictures, I found
nlmeans is really SO slow, then I profiling the code, found nlmeans_slice is the
bottleneck, this is the  to remove the pdiff_lut_scale when search the
exp() table in nlmeans_slice function.

Patch hide | download patch | download mbox

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index 82e779c..72eb819 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -43,8 +43,7 @@  struct weighted_avg {
     float sum;
 };
 
-#define WEIGHT_LUT_NBITS 9
-#define WEIGHT_LUT_SIZE  (1<<WEIGHT_LUT_NBITS)
+#define WEIGHT_LUT_SIZE  (800000) // need to >  300 * 300 * log(255)
 
 typedef struct NLMeansContext {
     const AVClass *class;
@@ -63,7 +62,6 @@  typedef struct NLMeansContext {
     struct weighted_avg *wa;                    // weighted average of every pixel
     ptrdiff_t wa_linesize;                      // linesize for wa in struct size unit
     float weight_lut[WEIGHT_LUT_SIZE];          // lookup table mapping (scaled) patch differences to their associated weights
-    float pdiff_lut_scale;                      // scale factor for patch differences before looking into the LUT
     uint32_t max_meaningful_diff;               // maximum difference considered (if the patch difference is too high we ignore the pixel)
     NLMeansDSPContext dsp;
 } NLMeansContext;
@@ -401,8 +399,7 @@  static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs
             const uint32_t patch_diff_sq = e - d - b + a;
 
             if (patch_diff_sq < s->max_meaningful_diff) {
-                const unsigned weight_lut_idx = patch_diff_sq * s->pdiff_lut_scale;
-                const float weight = s->weight_lut[weight_lut_idx]; // exp(-patch_diff_sq * s->pdiff_scale)
+                const float weight = s->weight_lut[patch_diff_sq]; // exp(-patch_diff_sq * s->pdiff_scale)
                 wa[x].total_weight += weight;
                 wa[x].sum += weight * src[x];
             }
@@ -527,10 +524,9 @@  static av_cold int init(AVFilterContext *ctx)
 
     s->pdiff_scale = 1. / (h * h);
     s->max_meaningful_diff = -log(1/255.) / s->pdiff_scale;
-    s->pdiff_lut_scale = 1./s->max_meaningful_diff * WEIGHT_LUT_SIZE;
-    av_assert0((s->max_meaningful_diff - 1) * s->pdiff_lut_scale < FF_ARRAY_ELEMS(s->weight_lut));
+    av_assert0((s->max_meaningful_diff - 1) < FF_ARRAY_ELEMS(s->weight_lut));
     for (i = 0; i < WEIGHT_LUT_SIZE; i++)
-        s->weight_lut[i] = exp(-i / s->pdiff_lut_scale * s->pdiff_scale);
+        s->weight_lut[i] = exp(-i * s->pdiff_scale);
 
     CHECK_ODD_FIELD(research_size,   "Luma research window");
     CHECK_ODD_FIELD(patch_size,      "Luma patch");