Message ID | 20180506114100.4223-8-u@pkh.me |
---|---|
State | Superseded |
Headers | show |
On Sun, May 06, 2018 at 13:40:58 +0200, Clément Bœsch wrote: > Overall speed appears to be 1.1x faster with no noticeable quality impact. Probably platform dependant? > struct weighted_avg { > - double total_weight; > - double sum; > + float total_weight; > + float sum; > }; I believe these calculaions in nlmeans_plane() will promote to double before being cast back to float: // Also weight the centered pixel wa->total_weight += 1.0; wa->sum += 1.0 * src[y*src_linesize + x]; (At least the second one. The first one - just an assignment of a constant - is covered by the preprocessor, IIUC.) They need to use "1.0f". (There are others, but only in init(), which don't matter for performance.) Cheers, Moritz
On Sun, May 06, 2018 at 04:53:54PM +0200, Moritz Barsnick wrote: > On Sun, May 06, 2018 at 13:40:58 +0200, Clément Bœsch wrote: > > Overall speed appears to be 1.1x faster with no noticeable quality impact. > > Probably platform dependant? > > > struct weighted_avg { > > - double total_weight; > > - double sum; > > + float total_weight; > > + float sum; > > }; > > I believe these calculaions in nlmeans_plane() will promote to double > before being cast back to float: > > // Also weight the centered pixel > wa->total_weight += 1.0; > wa->sum += 1.0 * src[y*src_linesize + x]; > > (At least the second one. The first one - just an assignment of a > constant - is covered by the preprocessor, IIUC.) They need to use > "1.0f". > It doesn't really matter here actually, in "lavfi/nlmeans: move final weighted averaging out of nlmeans_plane" you can see that this code represents 0.24% of the CPU time. I fixed it locally anyway, thanks. > (There are others, but only in init(), which don't matter for > performance.) Yeah, I left these to double on purpose.
diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c index f37f1183f7..201e4feb41 100644 --- a/libavfilter/vf_nlmeans.c +++ b/libavfilter/vf_nlmeans.c @@ -40,8 +40,8 @@ #include "video.h" struct weighted_avg { - double total_weight; - double sum; + float total_weight; + float sum; }; #define WEIGHT_LUT_NBITS 9 @@ -63,8 +63,8 @@ typedef struct NLMeansContext { ptrdiff_t ii_lz_32; // linesize in 32-bit units of the integral image struct weighted_avg *wa; // weighted average of every pixel ptrdiff_t wa_linesize; // linesize for wa in struct size unit - double weight_lut[WEIGHT_LUT_SIZE]; // lookup table mapping (scaled) patch differences to their associated weights - double pdiff_lut_scale; // scale factor for patch differences before looking into the LUT + float weight_lut[WEIGHT_LUT_SIZE]; // lookup table mapping (scaled) patch differences to their associated weights + float pdiff_lut_scale; // scale factor for patch differences before looking into the LUT int max_meaningful_diff; // maximum difference considered (if the patch difference is too high we ignore the pixel) NLMeansDSPContext dsp; } NLMeansContext; @@ -206,7 +206,7 @@ static void compute_safe_ssd_integral_image_c(uint32_t *dst, ptrdiff_t dst_lines * @param w width to compute * @param h height to compute */ -static inline void compute_unsafe_ssd_integral_image(uint32_t *dst, ptrdiff_t dst_linesize_32, +static void compute_unsafe_ssd_integral_image(uint32_t *dst, ptrdiff_t dst_linesize_32, int startx, int starty, const uint8_t *src, ptrdiff_t linesize, int offx, int offy, int r, int sw, int sh, @@ -402,7 +402,7 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs const int patch_diff_sq = get_integral_patch_value(td->ii_start, s->ii_lz_32, x, y, td->p); if (patch_diff_sq < s->max_meaningful_diff) { const int weight_lut_idx = patch_diff_sq * s->pdiff_lut_scale; - const double weight = s->weight_lut[weight_lut_idx]; // exp(-patch_diff_sq * s->pdiff_scale) + const float weight = s->weight_lut[weight_lut_idx]; // exp(-patch_diff_sq * s->pdiff_scale) wa[x].total_weight += weight; wa[x].sum += weight * src[x]; }