[FFmpeg-devel] delogo filter: new "uglarm" interpolation mode added

Submitted by Uwe Freese on Jan. 1, 2019, 9:14 p.m.

Details

Message ID d803ca1a-72f1-98cb-7f61-952be8421162@gmx.de
State New
Headers show

Commit Message

Uwe Freese Jan. 1, 2019, 9:14 p.m.
Hello,

here's a new version of the patch.

Thanks for the infos. I used the raw output of a small test video (where 
delogo is applied in both modes) before and after the changes to make 
sure the output is bytewise identical (the changes don't change the output).


In general I want to say that it seems to me that *some* of the points 
go now in a more philosophical direction.

I prefer code easy to understand and assume that compilers optimize 
much. Yes, there may be many possibilities to probably "optimize" 
something, but I tend to believe that many times this is not needed nor 
helpful, because it doesn't really optimize and complicates the code.

Additionally, when I don't have complete and functioning code to replace 
my code (the same what's expected from me), I'm not willing to spend 
many hours to guess what could be meant.

I hope that not so many additional patches are needed anymore. I 
understand that coding styles, syntax etc. shall be corrected, but hope 
that we don't have to discuss too much alternative ways of implementing 
basically the same.



Changes since the last version of the patch (mail from 29.12. 21:38), 
according the comments since then:

- change include order at the beginning of the file
- change loop format (linebreaks)
- change loop borders
- change indentation (line 175)
- moved init code (create tables) to config_input function

- used av_malloc_array instead of av_malloc. Avoid the cast. Use 
variable in sizeof(...).
- copyright 2018, 2019 - happy new year BTW ;-)


Comments and remaining open points from the mails:

>> +        for (y = logo_y1 + 1; y < logo_y2; y++) {
>> +            for (x = logo_x1 + 1,
>> +                xdst = dst + logo_x1 + 1,
>> +                xsrc = src + logo_x1 + 1; x < logo_x2; x++, xdst++, xsrc++) {
> I think a line break after the semicolons would make this easier to
> understand.

Hm. I used the same format as in the original code.

Nonetheless, I changed the format now, because I changed the loop 
borders as requested and the loops are now different anyway.

>> +                for (bx = 0; bx < logo_w; bx++) {
>> +                    interpd += topleft[bx] *
>> +                        uglarmtable[abs(bx - (x - logo_x1)) + (y - logo_y1) * (logo_w - 1)];
>> +                    interpd += botleft[bx] *
>> +                        uglarmtable[abs(bx - (x - logo_x1)) + (logo_h - (y - logo_y1) - 1) * (logo_w - 1)];
>> +                }
> This can be optimized, and since it is the inner loop of the filter, I
> think it is worth it. You can declare a pointer that will stay the same
> for the whole y loop:
>
> 	double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
>
> and then use it in that loop:
>
> 	interpd += ... * xweight[abs(bx - (x - logo_x1))];
>
> It avoids a lot of multiplications.

I'm not sure about this point. First, I would absolutely assume from the 
compiler that it optimizes expressions like "(logo_w - 1)" or a fixed 
offset to access the array in the loop and that they are only calculated 
once.

Secondly, I don't exactly understand how *xweight in your example should 
be used (and would mean smaller or easier code).

@All: What is the opinion about changing this loop regarding use of an 
additional xweight pointer?

>
>> +
>> +                for (by = 1; by < logo_h - 1; by++) {
>> +                    interpd += topleft[by * src_linesize] *
>> +                        uglarmtable[(x - logo_x1) + abs(by - (y - logo_y1)) * (logo_w - 1)];
>> +                    interpd += topleft[by * src_linesize + (logo_w - 1)] *
>> +                        uglarmtable[logo_w - (x - logo_x1) - 1 + abs(by - (y - logo_y1)) * (logo_w - 1)];
>> +                }
> This one is more tricky to optimize, because of the abs() within the
> multiplication, but it can be done by splitting the loop in two:
>
> 	for (by = 1; by < y; by++) {
> 	    interpd += ... * yweight[x - logo_x1];
> 	    yweight -= logo_w_minus_one;
> 	}
> 	for (; by < logo_h_minus_one; by++) {
> 	    interpd += ... * yweight[x - logo_x1];
> 	    yweight += logo_w_minus_one;
> 	}
> 	av_assert2(yweight == the_correct_value);
>
> In fact, I think even the x loop would be a little more readable if it
> was split in two like that.
Sorry, I don't understand. Please give me a complete example that 
replaces the previous for loop.
>> +
>> +                interp = (uint64_t)(interpd /
>> +                    uglarmweightsum[(x - logo_x1) - 1 + (y - logo_y1 - 1) * (logo_w - 2)]);
> The cast to uint64_t seems suspicious. Can you explain?

Every pixel value of the inner logo area is an integer. Only for the 
calculation of the weighted average, all values use floats. At the end, 
it is rounded (truncated) to an int.

Should work - and did work for 17 years...

>
>> +                *xdst = interp;
>> +            }
>> +
>> +            dst += dst_linesize;
>> +            src += src_linesize;
>> +        }
>> +    }
>> +}
>> +
>> +/**
>> + * Calculate the lookup tables to be used in UGLARM interpolation mode.
>> + *
>> + * @param *uglarmtable      Pointer to table containing weights for each possible
>> + *                          diagonal distance between a border pixel and an inner
>> + *                          logo pixel.
>> + * @param *uglarmweightsum  Pointer to a table containing the weight sum to divide
>> + *                          by for each pixel within the logo area.
>> + * @param sar               The sar to take into account when calculating lookup
>> + *                          tables.
>> + * @param logo_w            width of the logo
>> + * @param logo_h            height of the logo
>> + * @param exponent          exponent used in uglarm interpolation
>> + */
>> +static void calc_uglarm_tables(double *uglarmtable, double *uglarmweightsum,
>> +                               AVRational sar, int logo_w, int logo_h,
>> +                               float exponent)
>> +{
>> +    double aspect = (double)sar.num / sar.den;
> Tiny optimization:
>
> 	double aspect2 = aspect * aspect;
>
> for use later.

I wouldn't consider this an optimization. First, I assume the compiler 
to only calculate "aspect * aspect" once in that loop (as well as "y * 
y" which doesn't change in the inner loop). Second, the code with using 
the additional variable makes the code more complex and third, this loop 
is only calculated once at startup.

@All: What are your opinions?

>
>> +                double d = pow(sqrt(x * x * aspect * aspect + y * y), exponent);
> 	pow(x * x * aspect2 + y * y, exponent / 2);

Hm. Again, I'm unsure if this "optimization" is worth it. I wouldn't do 
this normally.
Reasons: Again, I think the compiler would optimize it. At least the 
compiled ffmpeg binaries are exactly the same size... And the original 
code explains better the calculation: Calculate the distance 
(*Pythagorean theorem, c = sqrt(a² + b²), and then calculate the 
weighted distance value with the power. Again, this is only run once at 
startup
*

*@All: Other opinions?
*

>
>> +#define MAX_PLANES 10
> You could use FF_ARRAY_ELEMS(AVPixFmtDescriptor.comp), and the
> consistency would be guaranteed. Well, that syntax is not valid, but it
> can be expressed:
>
> #define MAX_PLANES FF_ARRAY_ELEMS(((AVPixFmtDescriptor *)0)->comp)
>
> But I have a better suggestion:
>
> #define MAX_SUB 2
>
> 	double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
>
> and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the
> table will be computed only once for Y and A and once for U and V.
>
> The assert is still needed with that solution, though.

I don't understand this. Please provide a complete example, or it stays 
as it is. - and could of course be optimized later.

>> +    if (s->mode == MODE_UGLARM) {
> No need to test, we can free anyway.

But why should the for loop run in xy mode and check "planes count" 
times to free the memory? The code without the "if" also looks to me 
more like this is not checked by mistake.

@All: Opinions?


> [Carl Eugen] While there, please don't use sizeof(type) but sizeof(variable),
same below.

Hope that the code is correct like this?

s->uglarmtable[plane] =
                 av_malloc_array((logo_w - 1) * (logo_h - 1), sizeof(s->uglarmtable[plane]));

Regards,
Uwe

Comments

Nicolas George Jan. 2, 2019, 2:37 p.m.
Uwe Freese (2019-01-01):
> > This can be optimized, and since it is the inner loop of the filter, I
> > think it is worth it. You can declare a pointer that will stay the same
> > for the whole y loop:
> > 
> > 	double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
> > 
> > and then use it in that loop:
> > 
> > 	interpd += ... * xweight[abs(bx - (x - logo_x1))];
> > 
> > It avoids a lot of multiplications.
> 
> I'm not sure about this point. First, I would absolutely assume from the
> compiler that it optimizes expressions like "(logo_w - 1)" or a fixed offset
> to access the array in the loop and that they are only calculated once.

Relying a lot on compiler optimizations is a sure way of being
disappointed. But the logo_w_minus_one variable was not about
optimization but about organization. Each time there is the same
computation at several places, it is a sign that a specific variable
should probably be used. That way, if the design changes a little, only
the initialization of the variable needs to be changed.

In this particular instance, logo_w_minus_one was not a good name (it
was for explaining); table_stride would be better. That way, if you
decide to change the structure of the table, you do not need to look at
all uses of logo_w, you just need to update the initialization of
table_stride.

> Secondly, I don't exactly understand how *xweight in your example should be
> used (and would mean smaller or easier code).

XXX

> > > +                interp = (uint64_t)(interpd /
> > > +                    uglarmweightsum[(x - logo_x1) - 1 + (y - logo_y1 - 1) * (logo_w - 2)]);
> > The cast to uint64_t seems suspicious. Can you explain?
> 
> Every pixel value of the inner logo area is an integer. Only for the
> calculation of the weighted average, all values use floats. At the end, it
> is rounded (truncated) to an int.

int and uint64_t are not the same thing. Why uint64_t?

> > 	pow(x * x * aspect2 + y * y, exponent / 2);
> 
> Hm. Again, I'm unsure if this "optimization" is worth it. I wouldn't do this
> normally.

In this particular instance, definitely yes. Compilers have very little
latitude to optimize floating point operations, and they will certainly
not optimize mathematical functions based on subtle arithmetic
properties. This is a C compiler, not a TI-89.

> > But I have a better suggestion:
> > 
> > #define MAX_SUB 2
> > 
> > 	double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
> > 
> > and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the
> > table will be computed only once for Y and A and once for U and V.
> > 
> > The assert is still needed with that solution, though.
> 
> I don't understand this. Please provide a complete example, or it stays as
> it is. - and could of course be optimized later.

I do not see how it could be more complete without including code that
is irrelevant to the example. But since you insist:

#define MAX_SUB 2

typedef struct DelogoContext {
    ...
    double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
    ...
}
...
    av_assert0(hsub <= MAX_SUB && vsub <= MAX_SUB);
    if (!s->uglarmtable[hsub][vsub])
	s->uglarmtable[hsub][vsub] = av_malloc_array(...);
    ...
    calc_uglarm_tables(s->uglarmtable[hsub][vsub],
		       s->uglarmweightsum[hsub][vsub]);
    ...
    apply_delogo(...,
		 s->uglarmtable[hsub][vsub],
		 s->uglarmweightsum[hsub][vsub]);

> But why should the for loop run in xy mode and check "planes count" times to
> free the memory? The code without the "if" also looks to me more like this
> is not checked by mistake.

This is the usual way this project does things: free everything
unconditionally. The rationale is that it is less likely to lead to
leaks in case of code change.


> Hope that the code is correct like this?
> 
> s->uglarmtable[plane] =
>                 av_malloc_array((logo_w - 1) * (logo_h - 1), sizeof(s->uglarmtable[plane]));

Sorry, no: you are taking the size of the pointer uglarmtable[plane];
you need to take the size of the pointed objects *uglarmtable[plane].

Regards,
Nicolas George Jan. 2, 2019, 3:25 p.m.
Nicolas George (2019-01-02):
> Uwe Freese (2019-01-01):
> > > This can be optimized, and since it is the inner loop of the filter, I
> > > think it is worth it. You can declare a pointer that will stay the same
> > > for the whole y loop:
> > > 
> > > 	double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
> > > 
> > > and then use it in that loop:
> > > 
> > > 	interpd += ... * xweight[abs(bx - (x - logo_x1))];
> > > 
> > > It avoids a lot of multiplications.
> 
> > Secondly, I don't exactly understand how *xweight in your example should be
> > used (and would mean smaller or easier code).
> 
> XXX

Sorry, forgot to fill that part before sending.

Completely untested code:

    double *table_t, *table_b, *table_l, *table_r;

    table_t = uglarmtable + x + table_stride * y;
    table_b = uglarmtable + x + table_stride * (logo_h - y - 1);
    table_l = uglarmtable + table_stride * (y - 1) + x;
    table_r = uglarmtable + table_stride * (y - 1) + logo_w - x - 1;
    /* top+bottom on the left of the current point */
    for (bx = 0; bx < x; bx++) {
	weightsum += table_t;
	weightsum += table_b;
	table_t--;
	table_b--;
    }
    /* top+bottom on the right of the current point */
    for (; bx < logo_w; bx++) {
	weightsum += table_t;
	weightsum += table_b;
	table_t++;
	table_b++;
    }
    /* left+right above the current point */
    for (by = 1; by < y; by++) {
	weightsum += *table_l;
	weightsum += *table_r;
	table_l -= table_stride;
	table_r -= table_stride;
    }
    /* left+right below the current point */
    for (; by < logo_w - 1; by++) {
	weightsum += *table_l;
	weightsum += *table_r;
	table_l += table_stride;
	table_r += table_stride;
    }
    av_assert2(table_t == uglarmtable + (logo_w - x) + table_stride * y);
    av_assert2(table_b == uglarmtable + (logo_w - x) + table_stride * (logo_h - y - 1));
    av_assert2(table_l == uglarmtable + table_stride * (logo_h - y - 1) + x);
    av_assert2(table_r == uglarmtable + table_stride * (logo_h - y - 1) + logo_w - x - 1;

That makes more lines, but the lines are way simpler: no tricky
arithmetic, all blocks almost identical, with the changes easy to see
and understand.

Regards,
Uwe Freese Jan. 2, 2019, 9:08 p.m.
Hi,

just an info: code didn't compile, don't know why I didn't see this... 
New patch is in work...

Regards, Uwe

Am 01.01.19 um 22:14 schrieb Uwe Freese:
> Hello,
>
> here's a new version of the patch.
>
> Thanks for the infos. I used the raw output of a small test video 
> (where delogo is applied in both modes) before and after the changes 
> to make sure the output is bytewise identical (the changes don't 
> change the output).
>
>
> In general I want to say that it seems to me that *some* of the points 
> go now in a more philosophical direction.
>
> I prefer code easy to understand and assume that compilers optimize 
> much. Yes, there may be many possibilities to probably "optimize" 
> something, but I tend to believe that many times this is not needed 
> nor helpful, because it doesn't really optimize and complicates the code.
>
> Additionally, when I don't have complete and functioning code to 
> replace my code (the same what's expected from me), I'm not willing to 
> spend many hours to guess what could be meant.
>
> I hope that not so many additional patches are needed anymore. I 
> understand that coding styles, syntax etc. shall be corrected, but 
> hope that we don't have to discuss too much alternative ways of 
> implementing basically the same.
>
>
>
> Changes since the last version of the patch (mail from 29.12. 21:38), 
> according the comments since then:
>
> - change include order at the beginning of the file
> - change loop format (linebreaks)
> - change loop borders
> - change indentation (line 175)
> - moved init code (create tables) to config_input function
>
> - used av_malloc_array instead of av_malloc. Avoid the cast. Use 
> variable in sizeof(...).
> - copyright 2018, 2019 - happy new year BTW ;-)
>
>
> Comments and remaining open points from the mails:
>
>>> +        for (y = logo_y1 + 1; y < logo_y2; y++) {
>>> +            for (x = logo_x1 + 1,
>>> +                xdst = dst + logo_x1 + 1,
>>> +                xsrc = src + logo_x1 + 1; x < logo_x2; x++, xdst++, 
>>> xsrc++) {
>> I think a line break after the semicolons would make this easier to
>> understand.
>
> Hm. I used the same format as in the original code.
>
> Nonetheless, I changed the format now, because I changed the loop 
> borders as requested and the loops are now different anyway.
>
>>> +                for (bx = 0; bx < logo_w; bx++) {
>>> +                    interpd += topleft[bx] *
>>> +                        uglarmtable[abs(bx - (x - logo_x1)) + (y - 
>>> logo_y1) * (logo_w - 1)];
>>> +                    interpd += botleft[bx] *
>>> +                        uglarmtable[abs(bx - (x - logo_x1)) + 
>>> (logo_h - (y - logo_y1) - 1) * (logo_w - 1)];
>>> +                }
>> This can be optimized, and since it is the inner loop of the filter, I
>> think it is worth it. You can declare a pointer that will stay the same
>> for the whole y loop:
>>
>>     double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
>>
>> and then use it in that loop:
>>
>>     interpd += ... * xweight[abs(bx - (x - logo_x1))];
>>
>> It avoids a lot of multiplications.
>
> I'm not sure about this point. First, I would absolutely assume from 
> the compiler that it optimizes expressions like "(logo_w - 1)" or a 
> fixed offset to access the array in the loop and that they are only 
> calculated once.
>
> Secondly, I don't exactly understand how *xweight in your example 
> should be used (and would mean smaller or easier code).
>
> @All: What is the opinion about changing this loop regarding use of an 
> additional xweight pointer?
>
>>
>>> +
>>> +                for (by = 1; by < logo_h - 1; by++) {
>>> +                    interpd += topleft[by * src_linesize] *
>>> +                        uglarmtable[(x - logo_x1) + abs(by - (y - 
>>> logo_y1)) * (logo_w - 1)];
>>> +                    interpd += topleft[by * src_linesize + (logo_w 
>>> - 1)] *
>>> +                        uglarmtable[logo_w - (x - logo_x1) - 1 + 
>>> abs(by - (y - logo_y1)) * (logo_w - 1)];
>>> +                }
>> This one is more tricky to optimize, because of the abs() within the
>> multiplication, but it can be done by splitting the loop in two:
>>
>>     for (by = 1; by < y; by++) {
>>         interpd += ... * yweight[x - logo_x1];
>>         yweight -= logo_w_minus_one;
>>     }
>>     for (; by < logo_h_minus_one; by++) {
>>         interpd += ... * yweight[x - logo_x1];
>>         yweight += logo_w_minus_one;
>>     }
>>     av_assert2(yweight == the_correct_value);
>>
>> In fact, I think even the x loop would be a little more readable if it
>> was split in two like that.
> Sorry, I don't understand. Please give me a complete example that 
> replaces the previous for loop.
>>> +
>>> +                interp = (uint64_t)(interpd /
>>> +                    uglarmweightsum[(x - logo_x1) - 1 + (y - 
>>> logo_y1 - 1) * (logo_w - 2)]);
>> The cast to uint64_t seems suspicious. Can you explain?
>
> Every pixel value of the inner logo area is an integer. Only for the 
> calculation of the weighted average, all values use floats. At the 
> end, it is rounded (truncated) to an int.
>
> Should work - and did work for 17 years...
>
>>
>>> +                *xdst = interp;
>>> +            }
>>> +
>>> +            dst += dst_linesize;
>>> +            src += src_linesize;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * Calculate the lookup tables to be used in UGLARM interpolation 
>>> mode.
>>> + *
>>> + * @param *uglarmtable      Pointer to table containing weights for 
>>> each possible
>>> + *                          diagonal distance between a border 
>>> pixel and an inner
>>> + *                          logo pixel.
>>> + * @param *uglarmweightsum  Pointer to a table containing the 
>>> weight sum to divide
>>> + *                          by for each pixel within the logo area.
>>> + * @param sar               The sar to take into account when 
>>> calculating lookup
>>> + *                          tables.
>>> + * @param logo_w            width of the logo
>>> + * @param logo_h            height of the logo
>>> + * @param exponent          exponent used in uglarm interpolation
>>> + */
>>> +static void calc_uglarm_tables(double *uglarmtable, double 
>>> *uglarmweightsum,
>>> +                               AVRational sar, int logo_w, int logo_h,
>>> +                               float exponent)
>>> +{
>>> +    double aspect = (double)sar.num / sar.den;
>> Tiny optimization:
>>
>>     double aspect2 = aspect * aspect;
>>
>> for use later.
>
> I wouldn't consider this an optimization. First, I assume the compiler 
> to only calculate "aspect * aspect" once in that loop (as well as "y * 
> y" which doesn't change in the inner loop). Second, the code with 
> using the additional variable makes the code more complex and third, 
> this loop is only calculated once at startup.
>
> @All: What are your opinions?
>
>>
>>> +                double d = pow(sqrt(x * x * aspect * aspect + y * 
>>> y), exponent);
>>     pow(x * x * aspect2 + y * y, exponent / 2);
>
> Hm. Again, I'm unsure if this "optimization" is worth it. I wouldn't 
> do this normally.
> Reasons: Again, I think the compiler would optimize it. At least the 
> compiled ffmpeg binaries are exactly the same size... And the original 
> code explains better the calculation: Calculate the distance 
> (*Pythagorean theorem, c = sqrt(a² + b²), and then calculate the 
> weighted distance value with the power. Again, this is only run once 
> at startup
> *
>
> *@All: Other opinions?
> *
>
>>
>>> +#define MAX_PLANES 10
>> You could use FF_ARRAY_ELEMS(AVPixFmtDescriptor.comp), and the
>> consistency would be guaranteed. Well, that syntax is not valid, but it
>> can be expressed:
>>
>> #define MAX_PLANES FF_ARRAY_ELEMS(((AVPixFmtDescriptor *)0)->comp)
>>
>> But I have a better suggestion:
>>
>> #define MAX_SUB 2
>>
>>     double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
>>
>> and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the
>> table will be computed only once for Y and A and once for U and V.
>>
>> The assert is still needed with that solution, though.
>
> I don't understand this. Please provide a complete example, or it 
> stays as it is. - and could of course be optimized later.
>
>>> +    if (s->mode == MODE_UGLARM) {
>> No need to test, we can free anyway.
>
> But why should the for loop run in xy mode and check "planes count" 
> times to free the memory? The code without the "if" also looks to me 
> more like this is not checked by mistake.
>
> @All: Opinions?
>
>
>> [Carl Eugen] While there, please don't use sizeof(type) but 
>> sizeof(variable),
> same below.
>
> Hope that the code is correct like this?
>
> s->uglarmtable[plane] =
>                 av_malloc_array((logo_w - 1) * (logo_h - 1), 
> sizeof(s->uglarmtable[plane]));
>
> Regards,
> Uwe
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Patch hide | download patch | download mbox

From 001e57532567de410ecfa860df99610620dcd385 Mon Sep 17 00:00:00 2001
From: breaker27 <mail@uwe-freese.de>
Date: Wed, 26 Dec 2018 18:16:48 +0100
Subject: [PATCH] avfilter/vf_delogo: add uglarm interpolation mode

---
 doc/filters.texi        |  28 ++++++++
 libavfilter/vf_delogo.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 205 insertions(+), 5 deletions(-)

diff --git a/doc/filters.texi b/doc/filters.texi
index 65ce25bc18..792560ad79 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -7952,6 +7952,34 @@  Specify the thickness of the fuzzy edge of the rectangle (added to
 deprecated, setting higher values should no longer be necessary and
 is not recommended.
 
+@item mode
+Specify the interpolation mode used to remove the logo.
+It accepts the following values:
+@table @option
+@item xy
+Only the border pixels in straight x and y direction are considered
+to replace any logo pixel. This mode tends to flicker and to show
+horizontal and vertical lines.
+@item uglarm
+Consider the whole border for every logo pixel to replace. This mode
+uses the distance raised to the power of the given @var{exponent} as
+weight that decides to which amount every border pixel is taken into
+account. This results in a more blurred area, which tends to be less
+distracting. The uglarm mode was first implemented in VirtualDub's
+LogoAway filter and is also known from ffdshow and is the
+abbreviation for "Uwe's Great LogoAway Remove Mode".
+@end table
+The default value is @code{xy}.
+
+@item exponent
+Specify the exponent used to calculate the weight out of the
+distance in @code{uglarm} mode (weight = distance ^ @var{exponent}).
+The value @code{0} results in a logo area which has
+the same average color everywhere. The higher the value, the more
+relevant a near border pixel will get, meaning that the borders of
+the logo area are more similar to the surrounding pixels. The default
+value is @code{3}.
+
 @item show
 When set to 1, a green rectangle is drawn on the screen to simplify
 finding the right @var{x}, @var{y}, @var{w}, and @var{h} parameters.
diff --git a/libavfilter/vf_delogo.c b/libavfilter/vf_delogo.c
index 065d093641..45f2f9cd3f 100644
--- a/libavfilter/vf_delogo.c
+++ b/libavfilter/vf_delogo.c
@@ -2,6 +2,7 @@ 
  * Copyright (c) 2002 Jindrich Makovicka <makovick@gmail.com>
  * Copyright (c) 2011 Stefano Sabatini
  * Copyright (c) 2013, 2015 Jean Delvare <jdelvare@suse.com>
+ * Copyright (c) 2018, 2019 Uwe Freese <mail@uwe-freese.de>
  *
  * This file is part of FFmpeg.
  *
@@ -25,8 +26,12 @@ 
  * A very simple tv station logo remover
  * Originally imported from MPlayer libmpcodecs/vf_delogo.c,
  * the algorithm was later improved.
+ * The "UGLARM" mode was first implemented 2001 by Uwe Freese for Virtual
+ * Dub's LogoAway filter (by Krzysztof Wojdon), taken over into ffdshow's
+ * logoaway filter (by Milan Cutka), from where it was ported to ffmpeg.
  */
 
+#include "libavutil/avassert.h"
 #include "libavutil/common.h"
 #include "libavutil/imgutils.h"
 #include "libavutil/opt.h"
@@ -50,6 +55,10 @@ 
  * @param logo_w width of the logo
  * @param logo_h height of the logo
  * @param band   the size of the band around the processed area
+ * @param *uglarmtable      pointer to weight table in UGLARM interpolation mode,
+ *                          zero when x-y mode is used
+ * @param *uglarmweightsum  pointer to weight sum table in UGLARM interpolation mode,
+ *                          zero when x-y mode is used
  * @param show   show a rectangle around the processed area, useful for
  *               parameters tweaking
  * @param direct if non-zero perform in-place processing
@@ -58,7 +67,8 @@  static void apply_delogo(uint8_t *dst, int dst_linesize,
                          uint8_t *src, int src_linesize,
                          int w, int h, AVRational sar,
                          int logo_x, int logo_y, int logo_w, int logo_h,
-                         unsigned int band, int show, int direct)
+                         unsigned int band, double *uglarmtable,
+                         double *uglarmweightsum, int show, int direct)
 {
     int x, y;
     uint64_t interp, weightl, weightr, weightt, weightb, weight;
@@ -89,6 +99,7 @@  static void apply_delogo(uint8_t *dst, int dst_linesize,
     dst += (logo_y1 + 1) * dst_linesize;
     src += (logo_y1 + 1) * src_linesize;
 
+    if (!uglarmtable) {
     for (y = logo_y1+1; y < logo_y2; y++) {
         left_sample = topleft[src_linesize*(y-logo_y1)]   +
                       topleft[src_linesize*(y-logo_y1-1)] +
@@ -151,12 +162,125 @@  static void apply_delogo(uint8_t *dst, int dst_linesize,
         dst += dst_linesize;
         src += src_linesize;
     }
+    } else {
+        int bx, by;
+        double interpd;
+
+        for (y = 1; y < logo_y2 - logo_y1; y++) {
+            for (x = 1, xdst = dst + logo_x1 + 1, xsrc = src + logo_x1 + 1;
+                x < logo_x2 - logo_x1;
+                x++, xdst++, xsrc++) {
+
+                if (show && (y == 1 || y == logo_y2 - logo_y1 - 1 ||
+                             x == 1 || x == logo_x2 - logo_x1 - 1)) {
+                    *xdst = 0;
+                    continue;
+                }
+
+                interpd = 0;
+
+                for (bx = 0; bx < logo_w; bx++) {
+                    interpd += topleft[bx] *
+                        uglarmtable[abs(bx - x) + y * (logo_w - 1)];
+                    interpd += botleft[bx] *
+                        uglarmtable[abs(bx - x) + (logo_h - y - 1) * (logo_w - 1)];
+                }
+
+                for (by = 1; by < logo_h - 1; by++) {
+                    interpd += topleft[by * src_linesize] *
+                        uglarmtable[x + abs(by - y) * (logo_w - 1)];
+                    interpd += topleft[by * src_linesize + (logo_w - 1)] *
+                        uglarmtable[logo_w - x - 1 + abs(by - y) * (logo_w - 1)];
+                }
+
+                interp = (uint64_t)(interpd /
+                    uglarmweightsum[x - 1 + (y - 1) * (logo_w - 2)]);
+                *xdst = interp;
+            }
+
+            dst += dst_linesize;
+            src += src_linesize;
+        }
+    }
+}
+
+/**
+ * Calculate the lookup tables to be used in UGLARM interpolation mode.
+ *
+ * @param *uglarmtable      Pointer to table containing weights for each possible
+ *                          diagonal distance between a border pixel and an inner
+ *                          logo pixel.
+ * @param *uglarmweightsum  Pointer to a table containing the weight sum to divide
+ *                          by for each pixel within the logo area.
+ * @param sar               The sar to take into account when calculating lookup
+ *                          tables.
+ * @param logo_w            width of the logo
+ * @param logo_h            height of the logo
+ * @param exponent          exponent used in uglarm interpolation
+ */
+static void calc_uglarm_tables(double *uglarmtable, double *uglarmweightsum,
+                               AVRational sar, int logo_w, int logo_h,
+                               float exponent)
+{
+    double aspect = (double)sar.num / sar.den;
+    int x, y;
+
+    /* uglarmtable will contain a weight for each possible diagonal distance
+     * between a border pixel and an inner logo pixel. The maximum distance in
+     * each direction between border and an inner pixel can be logo_w - 1. The
+     * weight of a border pixel which is x,y pixels away is stored at position
+     * x + y * (logo_w - 1). */
+    for (y = 0; y < logo_h - 1; y++)
+        for (x = 0; x < logo_w - 1; x++) {
+            if (x + y != 0) {
+                double d = pow(sqrt(x * x * aspect * aspect + y * y), exponent);
+                uglarmtable[x + y * (logo_w - 1)] = 1.0 / d;
+            } else {
+                uglarmtable[x + y * (logo_w - 1)] = 1.0;
+            }
+        }
+
+    /* uglarmweightsum will contain the sum of all weights which is used when
+     * an inner pixel of the logo at position x,y is calculated out of the
+     * border pixels. The aggregated value has to be divided by that. The value
+     * to use for the inner 1-based logo position x,y is stored at
+     * (x - 1) + (y - 1) * (logo_w - 2). */
+    for (y = 1; y < logo_h - 1; y++)
+        for (x = 1; x < logo_w - 1; x++) {
+            double weightsum = 0;
+
+            for (int bx = 0; bx < logo_w; bx++) {
+                /* top border */
+                weightsum += uglarmtable[abs(bx - x) + y * (logo_w - 1)];
+                /* bottom border */
+                weightsum += uglarmtable[abs(bx - x) + (logo_h - y - 1) * (logo_w - 1)];
+            }
+
+            for (int by = 1; by < logo_h - 1; by++) {
+                /* left border */
+                weightsum += uglarmtable[x + abs(by - y) * (logo_w - 1)];
+                /* right border */
+                weightsum += uglarmtable[(logo_w - x - 1) + abs(by - y) * (logo_w - 1)];
+            }
+
+            uglarmweightsum[(x - 1) + (y - 1) * (logo_w - 2)] = weightsum;
+        }
 }
 
+enum mode {
+    MODE_XY,
+    MODE_UGLARM,
+    MODE_NB
+};
+
+#define MAX_PLANES 10
+
 typedef struct DelogoContext {
     const AVClass *class;
-    int x, y, w, h, band, show;
-}  DelogoContext;
+    int x, y, w, h, band, mode, show;
+    float exponent;
+    double *uglarmtable[MAX_PLANES], *uglarmweightsum[MAX_PLANES];
+} DelogoContext;
 
 #define OFFSET(x) offsetof(DelogoContext, x)
 #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
@@ -171,6 +295,10 @@  static const AVOption delogo_options[]= {
     { "band", "set delogo area band size", OFFSET(band), AV_OPT_TYPE_INT, { .i64 =  0 },  0, INT_MAX, FLAGS },
     { "t",    "set delogo area band size", OFFSET(band), AV_OPT_TYPE_INT, { .i64 =  0 },  0, INT_MAX, FLAGS },
 #endif
+    { "mode", "set the interpolation mode",OFFSET(mode), AV_OPT_TYPE_INT, { .i64 = MODE_XY }, MODE_XY, MODE_NB-1, FLAGS, "mode" },
+        { "xy",     "use pixels in straight x and y direction",  OFFSET(mode), AV_OPT_TYPE_CONST, { .i64 = MODE_XY },     0, 0, FLAGS, "mode" },
+        { "uglarm", "UGLARM mode, use full border",              OFFSET(mode), AV_OPT_TYPE_CONST, { .i64 = MODE_UGLARM }, 0, 0, FLAGS, "mode" },
+    { "exponent","exponent used for UGLARM interpolation", OFFSET(exponent), AV_OPT_TYPE_FLOAT, { .dbl = 3.0 }, 0, 6, FLAGS },
     { "show", "show delogo area",          OFFSET(show), AV_OPT_TYPE_BOOL,{ .i64 =  0 },  0, 1,       FLAGS },
     { NULL }
 };
@@ -215,8 +343,8 @@  static av_cold int init(AVFilterContext *ctx)
 #else
     s->band = 1;
 #endif
-    av_log(ctx, AV_LOG_VERBOSE, "x:%d y:%d, w:%d h:%d band:%d show:%d\n",
-           s->x, s->y, s->w, s->h, s->band, s->show);
+    av_log(ctx, AV_LOG_VERBOSE, "x:%d y:%d, w:%d h:%d band:%d mode:%d exponent:%f show:%d\n",
+           s->x, s->y, s->w, s->h, s->band, s->mode, s->exponent, s->show);
 
     s->w += s->band*2;
     s->h += s->band*2;
@@ -226,6 +354,18 @@  static av_cold int init(AVFilterContext *ctx)
     return 0;
 }
 
+static av_cold void uninit(AVFilterContext *ctx)
+{
+    DelogoContext *s = ctx->priv;
+
+    if (s->mode == MODE_UGLARM) {
+        for (int plane = 0; plane < MAX_PLANES; plane++) {
+            av_freep(&s->uglarmtable[plane]);
+            av_freep(&s->uglarmweightsum[plane]);
+        }
+    }
+}
+
 static int config_input(AVFilterLink *inlink)
 {
     DelogoContext *s = inlink->dst->priv;
@@ -237,6 +377,32 @@  static int config_input(AVFilterLink *inlink)
         return AVERROR(EINVAL);
     }
 
+    /* Init lookup tables once */
+    if (s->mode == MODE_UGLARM) {
+        /* Up and left borders were rounded down, inject lost bits
+         * into width and height to avoid error accumulation */
+        int logo_w = AV_CEIL_RSHIFT(s->w + (s->x & ((1<<hsub)-1)), hsub);
+        int logo_h = AV_CEIL_RSHIFT(s->h + (s->y & ((1<<vsub)-1)), vsub);
+
+        if (!s->uglarmtable[plane]) {
+            s->uglarmtable[plane] =
+                av_malloc_array((logo_w - 1) * (logo_h - 1), sizeof(s->uglarmtable[plane]));
+
+            if (!s->uglarmtable[plane])
+                return AVERROR(ENOMEM);
+
+            s->uglarmweightsum[plane] =
+                av_malloc_array((logo_w - 2) * (logo_h - 2), sizeof(s->uglarmweightsum[plane]));
+
+            if (!s->uglarmweightsum[plane])
+                return AVERROR(ENOMEM);
+
+            calc_uglarm_tables(s->uglarmtable[plane],
+                            s->uglarmweightsum[plane],
+                            sar, logo_w, logo_h, s->exponent);
+        }
+    }
+
     return 0;
 }
 
@@ -270,6 +436,9 @@  static int filter_frame(AVFilterLink *inlink, AVFrame *in)
     if (!sar.num)
         sar.num = sar.den = 1;
 
+    if (s->mode == MODE_UGLARM)
+        av_assert0(desc->nb_components <= MAX_PLANES);
+
     for (plane = 0; plane < desc->nb_components; plane++) {
         int hsub = plane == 1 || plane == 2 ? hsub0 : 0;
         int vsub = plane == 1 || plane == 2 ? vsub0 : 0;
@@ -284,6 +453,8 @@  static int filter_frame(AVFilterLink *inlink, AVFrame *in)
                      AV_CEIL_RSHIFT(s->w + (s->x & ((1<<hsub)-1)), hsub),
                      AV_CEIL_RSHIFT(s->h + (s->y & ((1<<vsub)-1)), vsub),
                      s->band>>FFMIN(hsub, vsub),
+                     s->uglarmtable[plane],
+                     s->uglarmweightsum[plane],
                      s->show, direct);
     }
 
@@ -317,6 +488,7 @@  AVFilter ff_vf_delogo = {
     .priv_size     = sizeof(DelogoContext),
     .priv_class    = &delogo_class,
     .init          = init,
+    .uninit        = uninit,
     .query_formats = query_formats,
     .inputs        = avfilter_vf_delogo_inputs,
     .outputs       = avfilter_vf_delogo_outputs,
-- 
2.11.0