[FFmpeg-devel] avcodec/mlp*: improvements

Message ID	CAPYw7P4g+o+XQGWFn185=n56Dptrw-6d0mTXW+uPn=FrUp9XgA@mail.gmail.com
State	New
Headers	show Delivered-To: ffmpegpatchwork2@gmail.com Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; MIME-Version: 1.0 From: Paul B Mahol <onemda@gmail.com> Date: Wed, 25 Oct 2023 13:12:04 +0200 Message-ID: <CAPYw7P4g+o+XQGWFn185=n56Dptrw-6d0mTXW+uPn=FrUp9XgA@mail.gmail.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Content-Type: multipart/mixed; boundary="000000000000c2b089060888693a" Subject: [FFmpeg-devel] [PATCH] avcodec/mlp*: improvements Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Series	[FFmpeg-devel] avcodec/mlp: improvements \| expand [FFmpeg-devel] avcodec/mlp: improvements

Context	Check	Description
yinshiyou/make_loongarch64	success	Make finished
yinshiyou/make_fate_loongarch64	success	Make fate finished
andriy/make_x86	success	Make finished
andriy/make_fate_x86	success	Make fate finished

Paul B Mahol Oct. 25, 2023, 11:12 a.m. UTC

Set attached.

Tomas Härdin Oct. 25, 2023, 6:39 p.m. UTC | #1

>             if (c) {
>                 e[0] = 1 << 14;
>                 e[1] = 0 << 14;
>                 e[2] = v[1];
>                 e[3] = v[0];
>             } else {
>                 e[0] = v[0];
>                 e[1] = v[1];
>                 e[2] = 0 << 14;
>                 e[3] = 1 << 14;
>             }
> 
>             if (invert2x2(e, d)) {
>                 sum = UINT64_MAX;
>                 goto next;
>             }
> 

You can make use of the properties of e to simplify calculating the
inverse. The determinant is always v[0]<<14, so you can just do if
(!v[0]) continue; and skip the determinant check altogether.

>                 if (d[i] != av_clip_intp2(d[i], 15)) {

d[i] < INT16_MIN || d[i] > INT16_MAX is more clear and probably faster

> +                    lt = ((lm * e[0]) >> 14) + ((rm * e[1]) >> 14);
> +                    rt = ((lm * e[2]) >> 14) + ((rm * e[3]) >> 14);

Result is implementation-defined. Use division by (1<<14). Also add
then divide. The intermediate result is 49 bits so fits easily in 64
bits.

You could also simplify this calculation by again making use of the
properties of e.

>                     if (c)
>                         v += FFABS(rt);
>                     else
>                         v += FFABS(lt);
>                     sum += v;
>                     if (sum > best_sum)
>                         goto next;

Seems like this reduces to solving a linear program.

>                     if ((((lt * d[0]) >> 14) + ((rt * d[1]) >> 14))
> != lm) {
>                         sum = UINT64_MAX;
>                         goto next;
>                     }
> 
>                     if ((((lt * d[2]) >> 14) + ((rt * d[3]) >> 14))
> != rm) {
>                         sum = UINT64_MAX;
>                         goto next;
>                     }

Looks like a massive hack. I'd prefer to formally verify that the
arithmetic works out. Also again you can make use of the properties of
e, or inv(e) as it were.

/Tomas

Paul B Mahol Oct. 25, 2023, 6:58 p.m. UTC | #2

On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote:

>
> >             if (c) {
> >                 e[0] = 1 << 14;
> >                 e[1] = 0 << 14;
> >                 e[2] = v[1];
> >                 e[3] = v[0];
> >             } else {
> >                 e[0] = v[0];
> >                 e[1] = v[1];
> >                 e[2] = 0 << 14;
> >                 e[3] = 1 << 14;
> >             }
> >
> >             if (invert2x2(e, d)) {
> >                 sum = UINT64_MAX;
> >                 goto next;
> >             }
> >
>
> You can make use of the properties of e to simplify calculating the
> inverse. The determinant is always v[0]<<14, so you can just do if
> (!v[0]) continue; and skip the determinant check altogether.
>
> >                 if (d[i] != av_clip_intp2(d[i], 15)) {
>
> d[i] < INT16_MIN || d[i] > INT16_MAX is more clear and probably faster
>
> > +                    lt = ((lm * e[0]) >> 14) + ((rm * e[1]) >> 14);
> > +                    rt = ((lm * e[2]) >> 14) + ((rm * e[3]) >> 14);
>
> Result is implementation-defined. Use division by (1<<14). Also add
> then divide. The intermediate result is 49 bits so fits easily in 64
> bits.
>

Division by (1<<14)  will give incorrect results. been there done that,
you can check all your "reviews" validity by testing patches and that
results is bitexact, otherwise I'm just wasting time here.

Additions are done before not later, again check your comments validity
before commenting more. Thanks.


> You could also simplify this calculation by again making use of the
> properties of e.
>
> >                     if (c)
> >                         v += FFABS(rt);
> >                     else
> >                         v += FFABS(lt);
> >                     sum += v;
> >                     if (sum > best_sum)
> >                         goto next;
>
> Seems like this reduces to solving a linear program.
>
> >                     if ((((lt * d[0]) >> 14) + ((rt * d[1]) >> 14))
> > != lm) {
> >                         sum = UINT64_MAX;
> >                         goto next;
> >                     }
> >
> >                     if ((((lt * d[2]) >> 14) + ((rt * d[3]) >> 14))
> > != rm) {
> >                         sum = UINT64_MAX;
> >                         goto next;
> >                     }
>
> Looks like a massive hack. I'd prefer to formally verify that the
> arithmetic works out. Also again you can make use of the properties of
> e, or inv(e) as it were.
>

Arithmetic may not always work out.


>
> /Tomas
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

Paul B Mahol Oct. 25, 2023, 7 p.m. UTC | #3

On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote:

>
> >             if (c) {
> >                 e[0] = 1 << 14;
> >                 e[1] = 0 << 14;
> >                 e[2] = v[1];
> >                 e[3] = v[0];
> >             } else {
> >                 e[0] = v[0];
> >                 e[1] = v[1];
> >                 e[2] = 0 << 14;
> >                 e[3] = 1 << 14;
> >             }
> >
> >             if (invert2x2(e, d)) {
> >                 sum = UINT64_MAX;
> >                 goto next;
> >             }
> >
>
> You can make use of the properties of e to simplify calculating the
> inverse. The determinant is always v[0]<<14, so you can just do if
> (!v[0]) continue; and skip the determinant check altogether.
>

Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ?
May added such cases later.


>
> >                 if (d[i] != av_clip_intp2(d[i], 15)) {
>
> d[i] < INT16_MIN || d[i] > INT16_MAX is more clear and probably faster
>
> > +                    lt = ((lm * e[0]) >> 14) + ((rm * e[1]) >> 14);
> > +                    rt = ((lm * e[2]) >> 14) + ((rm * e[3]) >> 14);
>
> Result is implementation-defined. Use division by (1<<14). Also add
> then divide. The intermediate result is 49 bits so fits easily in 64
> bits.
>
> You could also simplify this calculation by again making use of the
> properties of e.
>
> >                     if (c)
> >                         v += FFABS(rt);
> >                     else
> >                         v += FFABS(lt);
> >                     sum += v;
> >                     if (sum > best_sum)
> >                         goto next;
>
> Seems like this reduces to solving a linear program.
>
> >                     if ((((lt * d[0]) >> 14) + ((rt * d[1]) >> 14))
> > != lm) {
> >                         sum = UINT64_MAX;
> >                         goto next;
> >                     }
> >
> >                     if ((((lt * d[2]) >> 14) + ((rt * d[3]) >> 14))
> > != rm) {
> >                         sum = UINT64_MAX;
> >                         goto next;
> >                     }
>
> Looks like a massive hack. I'd prefer to formally verify that the
> arithmetic works out. Also again you can make use of the properties of
> e, or inv(e) as it were.
>
> /Tomas
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

Tomas Härdin Oct. 25, 2023, 7:03 p.m. UTC | #4

On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote:
> On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote:
> 
> > 
> > >             if (c) {
> > >                 e[0] = 1 << 14;
> > >                 e[1] = 0 << 14;
> > >                 e[2] = v[1];
> > >                 e[3] = v[0];
> > >             } else {
> > >                 e[0] = v[0];
> > >                 e[1] = v[1];
> > >                 e[2] = 0 << 14;
> > >                 e[3] = 1 << 14;
> > >             }
> > > 
> > >             if (invert2x2(e, d)) {
> > >                 sum = UINT64_MAX;
> > >                 goto next;
> > >             }
> > > 
> > 
> > You can make use of the properties of e to simplify calculating the
> > inverse. The determinant is always v[0]<<14, so you can just do if
> > (!v[0]) continue; and skip the determinant check altogether.
> > 
> 
> Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ?
> May added such cases later.

You can just work the math out on paper. Inverse of

 1     0
 v[1]  v[0]

is

 1           0
 -v[1]/v[0]  1/v[0]

not accounting for shifts.

Also RE: my other comments, you are right. I didn't take into account
that MLP is lossless and that there may be off-by-one errors.

And as I said on IRC you can formulate this as a least squares problem,
then solve it using a linear system solve. This patch seems finds a
solution that minimizes L1 rather than L2 though. Not sure what the
implications of that are compressionwise. What happens if you replace
FFABS() with a square for scoring?

/Tomas

Paul B Mahol Oct. 25, 2023, 7:59 p.m. UTC | #5

On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git@haerdin.se> wrote:

> On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote:
> > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote:
> >
> > >
> > > >             if (c) {
> > > >                 e[0] = 1 << 14;
> > > >                 e[1] = 0 << 14;
> > > >                 e[2] = v[1];
> > > >                 e[3] = v[0];
> > > >             } else {
> > > >                 e[0] = v[0];
> > > >                 e[1] = v[1];
> > > >                 e[2] = 0 << 14;
> > > >                 e[3] = 1 << 14;
> > > >             }
> > > >
> > > >             if (invert2x2(e, d)) {
> > > >                 sum = UINT64_MAX;
> > > >                 goto next;
> > > >             }
> > > >
> > >
> > > You can make use of the properties of e to simplify calculating the
> > > inverse. The determinant is always v[0]<<14, so you can just do if
> > > (!v[0]) continue; and skip the determinant check altogether.
> > >
> >
> > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ?
> > May added such cases later.
>
> You can just work the math out on paper. Inverse of
>
>  1     0
>  v[1]  v[0]
>
> is
>
>  1           0
>  -v[1]/v[0]  1/v[0]
>
> not accounting for shifts.
>

But I want to add real 2x2 matrix with no 0 cell, with:

a, b
c, d

later. (even though gains are small, as encoded files use it rarely)


>
> Also RE: my other comments, you are right. I didn't take into account
> that MLP is lossless and that there may be off-by-one errors.
>
> And as I said on IRC you can formulate this as a least squares problem,
> then solve it using a linear system solve. This patch seems finds a
> solution that minimizes L1 rather than L2 though. Not sure what the
> implications of that are compressionwise. What happens if you replace
> FFABS() with a square for scoring?
>

It reduces size usually by less then 0.002 %


Linear system solver gives vectors to create equations for both channels at
same time?


>
> /Tomas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

Tomas Härdin Oct. 30, 2023, 1:14 p.m. UTC | #6

ons 2023-10-25 klockan 21:59 +0200 skrev Paul B Mahol:
> On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git@haerdin.se> wrote:
> 
> > On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote:
> > > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se>
> > > wrote:
> > > 
> > > > 
> > > > >             if (c) {
> > > > >                 e[0] = 1 << 14;
> > > > >                 e[1] = 0 << 14;
> > > > >                 e[2] = v[1];
> > > > >                 e[3] = v[0];
> > > > >             } else {
> > > > >                 e[0] = v[0];
> > > > >                 e[1] = v[1];
> > > > >                 e[2] = 0 << 14;
> > > > >                 e[3] = 1 << 14;
> > > > >             }
> > > > > 
> > > > >             if (invert2x2(e, d)) {
> > > > >                 sum = UINT64_MAX;
> > > > >                 goto next;
> > > > >             }
> > > > > 
> > > > 
> > > > You can make use of the properties of e to simplify calculating
> > > > the
> > > > inverse. The determinant is always v[0]<<14, so you can just do
> > > > if
> > > > (!v[0]) continue; and skip the determinant check altogether.
> > > > 
> > > 
> > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ?
> > > May added such cases later.
> > 
> > You can just work the math out on paper. Inverse of
> > 
> >  1     0
> >  v[1]  v[0]
> > 
> > is
> > 
> >  1           0
> >  -v[1]/v[0]  1/v[0]
> > 
> > not accounting for shifts.
> > 
> 
> But I want to add real 2x2 matrix with no 0 cell, with:
> 
> a, b
> c, d
> 
> later. (even though gains are small, as encoded files use it rarely)

If this is possible within MLP then yes, do that. It is not clear from
what you've told me so far and from my brief reading of the code how
capable the format is.

> > Also RE: my other comments, you are right. I didn't take into
> > account
> > that MLP is lossless and that there may be off-by-one errors.
> > 
> > And as I said on IRC you can formulate this as a least squares
> > problem,
> > then solve it using a linear system solve. This patch seems finds a
> > solution that minimizes L1 rather than L2 though. Not sure what the
> > implications of that are compressionwise. What happens if you
> > replace
> > FFABS() with a square for scoring?
> > 
> 
> It reduces size usually by less then 0.002 %
> 
> Linear system solver gives vectors to create equations for both
> channels at
> same time?

L2 minimization allows using ordinary least squarse. As I said on IRC,
the rub lies in formulating the problem properly. Minimizing L1 is much
harder, since it involves solving a linear program. Of course for
practical purposes we don't need an exact solution.

Looking a bit more at the code, what is important is the decoding
coefficients, the d matrix. The encoder is free to choose d and the
encoded residuals so long as it decodes correctly. The decoder is
specified on d, not e.

Currently only one matrix is used (count=1 in estimate_coeff). With two
matrices something akin to a lifting scheme can be performed. This
means almost any 2x2 transform should be possible to perform (modulo
bitexactness concerns).

What I mean by lifting scheme here is that any 2x2 matrix A can be
decomposed into the product of two or more matrices on the form that e
has. I think.

We could potentially do something like alternating transforms on this
form:

l += k1*r;
r += k2*l;
l += k3*r;
r += k4*l;

This can always be inverted provided the intermediate results don't go
out of range, or in the event that they do go out of range, the decoder
is sufficiently well specified so that encoder and decoder don't go out
of sync. Compare how YCoCg-R is specified and fits in 3*8 bits. In fact
the WP article on YCoCg perhaps gets the point across better:
https://en.wikipedia.org/wiki/YCoCg
it in turn links this stackoverflow post which makes the same point:
https://stackoverflow.com/questions/10566668/lossless-rgb-to-ycbcr-transformation/12146329#12146329

I believe any transformed found by PCA can be converted into an
equivalent lifting scheme, and it will always be lossless provided
modulo is specified correctly in the codec. I have no idea if it is.

/Tomas

Paul B Mahol Oct. 30, 2023, 1:30 p.m. UTC | #7

On Mon, Oct 30, 2023 at 2:15 PM Tomas Härdin <git@haerdin.se> wrote:

> ons 2023-10-25 klockan 21:59 +0200 skrev Paul B Mahol:
> > On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git@haerdin.se> wrote:
> >
> > > On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote:
> > > > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se>
> > > > wrote:
> > > >
> > > > >
> > > > > >             if (c) {
> > > > > >                 e[0] = 1 << 14;
> > > > > >                 e[1] = 0 << 14;
> > > > > >                 e[2] = v[1];
> > > > > >                 e[3] = v[0];
> > > > > >             } else {
> > > > > >                 e[0] = v[0];
> > > > > >                 e[1] = v[1];
> > > > > >                 e[2] = 0 << 14;
> > > > > >                 e[3] = 1 << 14;
> > > > > >             }
> > > > > >
> > > > > >             if (invert2x2(e, d)) {
> > > > > >                 sum = UINT64_MAX;
> > > > > >                 goto next;
> > > > > >             }
> > > > > >
> > > > >
> > > > > You can make use of the properties of e to simplify calculating
> > > > > the
> > > > > inverse. The determinant is always v[0]<<14, so you can just do
> > > > > if
> > > > > (!v[0]) continue; and skip the determinant check altogether.
> > > > >
> > > >
> > > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ?
> > > > May added such cases later.
> > >
> > > You can just work the math out on paper. Inverse of
> > >
> > >  1     0
> > >  v[1]  v[0]
> > >
> > > is
> > >
> > >  1           0
> > >  -v[1]/v[0]  1/v[0]
> > >
> > > not accounting for shifts.
> > >
> >
> > But I want to add real 2x2 matrix with no 0 cell, with:
> >
> > a, b
> > c, d
> >
> > later. (even though gains are small, as encoded files use it rarely)
>
> If this is possible within MLP then yes, do that. It is not clear from
> what you've told me so far and from my brief reading of the code how
> capable the format is.
>
> > > Also RE: my other comments, you are right. I didn't take into
> > > account
> > > that MLP is lossless and that there may be off-by-one errors.
> > >
> > > And as I said on IRC you can formulate this as a least squares
> > > problem,
> > > then solve it using a linear system solve. This patch seems finds a
> > > solution that minimizes L1 rather than L2 though. Not sure what the
> > > implications of that are compressionwise. What happens if you
> > > replace
> > > FFABS() with a square for scoring?
> > >
> >
> > It reduces size usually by less then 0.002 %
> >
> > Linear system solver gives vectors to create equations for both
> > channels at
> > same time?
>
> L2 minimization allows using ordinary least squarse. As I said on IRC,
> the rub lies in formulating the problem properly. Minimizing L1 is much
> harder, since it involves solving a linear program. Of course for
> practical purposes we don't need an exact solution.
>
> Looking a bit more at the code, what is important is the decoding
> coefficients, the d matrix. The encoder is free to choose d and the
> encoded residuals so long as it decodes correctly. The decoder is
> specified on d, not e.
>
> Currently only one matrix is used (count=1 in estimate_coeff). With two
> matrices something akin to a lifting scheme can be performed. This
> means almost any 2x2 transform should be possible to perform (modulo
> bitexactness concerns).
>
> What I mean by lifting scheme here is that any 2x2 matrix A can be
> decomposed into the product of two or more matrices on the form that e
> has. I think.
>
> We could potentially do something like alternating transforms on this
> form:
>
> l += k1*r;
> r += k2*l;
> l += k3*r;
> r += k4*l;
>
> This can always be inverted provided the intermediate results don't go
> out of range, or in the event that they do go out of range, the decoder
> is sufficiently well specified so that encoder and decoder don't go out
> of sync. Compare how YCoCg-R is specified and fits in 3*8 bits. In fact
> the WP article on YCoCg perhaps gets the point across better:
> https://en.wikipedia.org/wiki/YCoCg
> it in turn links this stackoverflow post which makes the same point:
>
> https://stackoverflow.com/questions/10566668/lossless-rgb-to-ycbcr-transformation/12146329#12146329
>
> I believe any transformed found by PCA can be converted into an
> equivalent lifting scheme, and it will always be lossless provided
> modulo is specified correctly in the codec. I have no idea if it is.
>

L = k1 * l + k2 * r
R = L * k3 + r * k4

This is affine transform for 2x2 matrix case, and here typical PCA or
lifting fails.


>
> /Tomas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

[FFmpeg-devel] avcodec/mlp*: improvements

Checks

Commit Message

Comments

Patch