[FFmpeg-devel,1/4] zmbvenc: don't sum the entropy when blocks are equal

Submitted by matthew.w.fearnley@gmail.com on Dec. 19, 2018, 10 p.m.

Details

Message ID 20181219220003.27225-1-matthew.w.fearnley@gmail.com
State New
Headers show

Commit Message

matthew.w.fearnley@gmail.com Dec. 19, 2018, 10 p.m.
From: Matthew Fearnley <matthew.w.fearnley@gmail.com>

If *xored is 0, then histogram[0]==bw*bh and histogram[1..255]==0.

Because histogram[0] is skipped over for the entropy calculation, the
return value is always 0 when *xored==0, so we don't need to waste time
calculating it.

This addition both clarifies the behaviour of the code and improves
the speed when the block matches.
---
 libavcodec/zmbvenc.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Tomas Härdin Dec. 20, 2018, 4:29 p.m.
ons 2018-12-19 klockan 22:00 +0000 skrev matthew.w.fearnley@gmail.com:
> > From: Matthew Fearnley <matthew.w.fearnley@gmail.com>
> 
> If *xored is 0, then histogram[0]==bw*bh and histogram[1..255]==0.
> 
> Because histogram[0] is skipped over for the entropy calculation, the
> return value is always 0 when *xored==0, so we don't need to waste time
> calculating it.
> 
> This addition both clarifies the behaviour of the code and improves
> the speed when the block matches.
> ---
>  libavcodec/zmbvenc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/libavcodec/zmbvenc.c b/libavcodec/zmbvenc.c
> index 4d9147657d..2f041dae32 100644
> --- a/libavcodec/zmbvenc.c
> +++ b/libavcodec/zmbvenc.c
> @@ -71,6 +71,7 @@ static inline int block_cmp(ZmbvEncContext *c, uint8_t *src, int stride,
>      int i, j;
>      uint8_t histogram[256] = {0};
>  
> +    /* build frequency histogram of byte values for src[] ^ src2[] */
>      *xored = 0;
>      for(j = 0; j < bh; j++){
>          for(i = 0; i < bw; i++){
> @@ -82,6 +83,10 @@ static inline int block_cmp(ZmbvEncContext *c, uint8_t *src, int stride,
>          src2 += stride2;
>      }
>  
> +    /* early out if src and src2 are equal */
> +    if (!*xored) return 0;

I have a feeling this could be sped up further by just doing *xored =
histogram[0] == ZMBV_BLOCK*ZMBV_BLOCK after the loops, if [PATCH 3/4]
is applied before this. Computing both histogram and xored in the loop
seems pointless.

Beyond that this looks good

/Tomas
matthew.w.fearnley@gmail.com Dec. 20, 2018, 5:46 p.m.
On Thu, 20 Dec 2018 at 16:30, Tomas Härdin <tjoppen@acc.umu.se> wrote:

> I have a feeling this could be sped up further by just doing *xored =
> histogram[0] == ZMBV_BLOCK*ZMBV_BLOCK after the loops, if [PATCH 3/4]
> is applied before this. Computing both histogram and xored in the loop
> seems pointless.
>

You're right, that speedup didn't occur to me.  It makes the logic a bit
more tenuous, but it would be more efficient.
Note that bw,bh aren't guaranteed to equal ZMBV_BLOCK, so `histogram[0] ==
bw*bh` would have to be used to guard against those (literal) edge cases.
Tomas Härdin Dec. 22, 2018, 12:11 p.m.
tor 2018-12-20 klockan 17:46 +0000 skrev Matthew Fearnley:
> > On Thu, 20 Dec 2018 at 16:30, Tomas Härdin <tjoppen@acc.umu.se> wrote:
> 
> > I have a feeling this could be sped up further by just doing *xored =
> > histogram[0] == ZMBV_BLOCK*ZMBV_BLOCK after the loops, if [PATCH 3/4]
> > is applied before this. Computing both histogram and xored in the loop
> > seems pointless.
> > 
> 
> You're right, that speedup didn't occur to me.  It makes the logic a bit
> more tenuous, but it would be more efficient.

Eh, I wouldn't really call "we have xored data to output if and only if
number of zeroes < number of pixel" fairly easy to grasp. A comment
might be good tho

> Note that bw,bh aren't guaranteed to equal ZMBV_BLOCK, so `histogram[0] ==
> bw*bh` would have to be used to guard against those (literal) edge cases.

Right, yes. But if we have block sizes other than ZMBV_BLOCK^2 then we
need score tables for those sizes too.

/Tomas
matthew.w.fearnley@gmail.com Dec. 22, 2018, 3:32 p.m.
> On 22 Dec 2018, at 12:11, Tomas Härdin <tjoppen@acc.umu.se> wrote:
> 
> tor 2018-12-20 klockan 17:46 +0000 skrev Matthew Fearnley:
>>>> On Thu, 20 Dec 2018 at 16:30, Tomas Härdin <tjoppen@acc.umu.se> wrote:
>>> 
>>> I have a feeling this could be sped up further by just doing *xored =
>>> histogram[0] == ZMBV_BLOCK*ZMBV_BLOCK after the loops, if [PATCH 3/4]
>>> is applied before this. Computing both histogram and xored in the loop
>>> seems pointless.
>>> 
>> 
>> You're right, that speedup didn't occur to me.  It makes the logic a bit
>> more tenuous, but it would be more efficient.
> 
> Eh, I wouldn't really call "we have xored data to output if and only if
> number of zeroes < number of pixel" fairly easy to grasp. A comment
> might be good tho
Agreed.
Just have to get the patch order sorted out now somehow.
>> Note that bw,bh aren't guaranteed to equal ZMBV_BLOCK, so `histogram[0] ==
>> bw*bh` would have to be used to guard against those (literal) edge cases.
> 
> Right, yes. But if we have block sizes other than ZMBV_BLOCK^2 then we
> need score tables for those sizes too.
I’ve thought about that a bit. I wondered if it would be worth it given:
- the extra code, memory and logic needed
- it would only improve the edge blocks
- the existing score table isn’t catastrophically bad for short blocks, and would still favour blocks with more common pixels.

It would be better from a correctness perspective though, and effects on running time should be negligible.

I can work on a patch, see how it looks.
Tomas Härdin Dec. 25, 2018, 9:35 a.m.
lör 2018-12-22 klockan 15:32 +0000 skrev Matthew Fearnley:
> > > > 
> > > Note that bw,bh aren't guaranteed to equal ZMBV_BLOCK, so `histogram[0] ==
> > > bw*bh` would have to be used to guard against those (literal) edge cases.
> > 
> > Right, yes. But if we have block sizes other than ZMBV_BLOCK^2 then we
> > need score tables for those sizes too.
> 
> I’ve thought about that a bit. I wondered if it would be worth it given:
> - the extra code, memory and logic needed

If you have a huge amount of DOS captures to optimize then it might be
worth it, else probably questionable

> - it would only improve the edge blocks

I imagine large blocks would be good for scenes with mostly global
motion. You cut down on the number of MVs and thus the amount of data
zlib has to compress, if the block size is a good fit.

> - the existing score table isn’t catastrophically bad for short blocks, and would still favour blocks with more common pixels.
> 
> It would be better from a correctness perspective though, and effects on running time should be negligible.

Good point. There's also no telling whether the current model is
actually an accurate prediction of how zlib behaves :)

/Tomas
matthew.w.fearnley@gmail.com Jan. 19, 2019, 10:40 p.m.
On Tue, 25 Dec 2018 at 09:35, Tomas Härdin <tjoppen@acc.umu.se> wrote:

> lör 2018-12-22 klockan 15:32 +0000 skrev Matthew Fearnley:
> > > > >
> > > > Note that bw,bh aren't guaranteed to equal ZMBV_BLOCK, so
> `histogram[0] ==
> > > > bw*bh` would have to be used to guard against those (literal) edge
> cases.
> > >
> > > Right, yes. But if we have block sizes other than ZMBV_BLOCK^2 then we
> > > need score tables for those sizes too.
> >
> > I’ve thought about that a bit. I wondered if it would be worth it given:
> > - the extra code, memory and logic needed
>
> If you have a huge amount of DOS captures to optimize then it might be
> worth it, else probably questionable
>
> > - it would only improve the edge blocks
>
> I imagine large blocks would be good for scenes with mostly global
> motion. You cut down on the number of MVs and thus the amount of data
> zlib has to compress, if the block size is a good fit.
>
> > - the existing score table isn’t catastrophically bad for short blocks,
> and would still favour blocks with more common pixels.
> >
> > It would be better from a correctness perspective though, and effects on
> running time should be negligible.
>
> Good point. There's also no telling whether the current model is
> actually an accurate prediction of how zlib behaves :)
>
>
Hi.  Just to say, I tried setting up additional score_tabs for the
bottom/right partial blocks.  Unfortunately, after implementing it and
ironing out the bugs, the new score tables actually caused a slight
increase in file size!
After a little informal investigation with a couple of videos, my findings
were that increasing the divisor - '(i / (double)(ZMBV_BLOCK * ZMBV_BLOCK *
some_factor))' - would generally lead to slight compression improvements.
Given the score table is still "valid" for smaller blocks, and given the
extra complexity of adding the score tables, plus the fact that it may
generally hurt the compression, I'm inclined to leave it with the one score
table.  But there may be potential to improve the current compression
method in future, by somehow tuning the divisor for better results
generally.

Matthew
Tomas Härdin Jan. 20, 2019, 3:15 p.m.
lör 2019-01-19 klockan 22:40 +0000 skrev Matthew Fearnley:
> > On Tue, 25 Dec 2018 at 09:35, Tomas Härdin <tjoppen@acc.umu.se> wrote:
> 
> > lör 2018-12-22 klockan 15:32 +0000 skrev Matthew Fearnley:
> > > > > > 
> > > > > 
> > > > > Note that bw,bh aren't guaranteed to equal ZMBV_BLOCK, so
> > 
> > `histogram[0] ==
> > > > > bw*bh` would have to be used to guard against those (literal) edge
> > 
> > cases.
> > > > 
> > > > Right, yes. But if we have block sizes other than ZMBV_BLOCK^2 then we
> > > > need score tables for those sizes too.
> > > 
> > > I’ve thought about that a bit. I wondered if it would be worth it given:
> > > - the extra code, memory and logic needed
> > 
> > If you have a huge amount of DOS captures to optimize then it might be
> > worth it, else probably questionable
> > 
> > > - it would only improve the edge blocks
> > 
> > I imagine large blocks would be good for scenes with mostly global
> > motion. You cut down on the number of MVs and thus the amount of data
> > zlib has to compress, if the block size is a good fit.
> > 
> > > - the existing score table isn’t catastrophically bad for short blocks,
> > 
> > and would still favour blocks with more common pixels.
> > > 
> > > It would be better from a correctness perspective though, and effects on
> > 
> > running time should be negligible.
> > 
> > Good point. There's also no telling whether the current model is
> > actually an accurate prediction of how zlib behaves :)
> > 
> > 
> 
> Hi.  Just to say, I tried setting up additional score_tabs for the
> bottom/right partial blocks.  Unfortunately, after implementing it and
> ironing out the bugs, the new score tables actually caused a slight
> increase in file size!
> After a little informal investigation with a couple of videos, my findings
> were that increasing the divisor - '(i / (double)(ZMBV_BLOCK * ZMBV_BLOCK *
> some_factor))' - would generally lead to slight compression improvements.
> Given the score table is still "valid" for smaller blocks, and given the
> extra complexity of adding the score tables, plus the fact that it may
> generally hurt the compression, I'm inclined to leave it with the one score
> table.  But there may be potential to improve the current compression
> method in future, by somehow tuning the divisor for better results
> generally.

Huh, that's strange. Sounds like something that warrants a comment in
the code. I also see we have an answer do Carl's question: you're still
experimenting with this :) I think we can hold off on pushing anything
until you feel happy with the result

/Tomas

Patch hide | download patch | download mbox

diff --git a/libavcodec/zmbvenc.c b/libavcodec/zmbvenc.c
index 4d9147657d..2f041dae32 100644
--- a/libavcodec/zmbvenc.c
+++ b/libavcodec/zmbvenc.c
@@ -71,6 +71,7 @@  static inline int block_cmp(ZmbvEncContext *c, uint8_t *src, int stride,
     int i, j;
     uint8_t histogram[256] = {0};
 
+    /* build frequency histogram of byte values for src[] ^ src2[] */
     *xored = 0;
     for(j = 0; j < bh; j++){
         for(i = 0; i < bw; i++){
@@ -82,6 +83,10 @@  static inline int block_cmp(ZmbvEncContext *c, uint8_t *src, int stride,
         src2 += stride2;
     }
 
+    /* early out if src and src2 are equal */
+    if (!*xored) return 0;
+
+    /* sum the entropy of the non-zero values */
     for(i = 1; i < 256; i++)
         sum += c->score_tab[histogram[i]];