Message ID | CAPYw7P4g+o+XQGWFn185=n56Dptrw-6d0mTXW+uPn=FrUp9XgA@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] avcodec/mlp*: improvements | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
> if (c) { > e[0] = 1 << 14; > e[1] = 0 << 14; > e[2] = v[1]; > e[3] = v[0]; > } else { > e[0] = v[0]; > e[1] = v[1]; > e[2] = 0 << 14; > e[3] = 1 << 14; > } > > if (invert2x2(e, d)) { > sum = UINT64_MAX; > goto next; > } > You can make use of the properties of e to simplify calculating the inverse. The determinant is always v[0]<<14, so you can just do if (!v[0]) continue; and skip the determinant check altogether. > if (d[i] != av_clip_intp2(d[i], 15)) { d[i] < INT16_MIN || d[i] > INT16_MAX is more clear and probably faster > + lt = ((lm * e[0]) >> 14) + ((rm * e[1]) >> 14); > + rt = ((lm * e[2]) >> 14) + ((rm * e[3]) >> 14); Result is implementation-defined. Use division by (1<<14). Also add then divide. The intermediate result is 49 bits so fits easily in 64 bits. You could also simplify this calculation by again making use of the properties of e. > if (c) > v += FFABS(rt); > else > v += FFABS(lt); > sum += v; > if (sum > best_sum) > goto next; Seems like this reduces to solving a linear program. > if ((((lt * d[0]) >> 14) + ((rt * d[1]) >> 14)) > != lm) { > sum = UINT64_MAX; > goto next; > } > > if ((((lt * d[2]) >> 14) + ((rt * d[3]) >> 14)) > != rm) { > sum = UINT64_MAX; > goto next; > } Looks like a massive hack. I'd prefer to formally verify that the arithmetic works out. Also again you can make use of the properties of e, or inv(e) as it were. /Tomas
On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote: > > > if (c) { > > e[0] = 1 << 14; > > e[1] = 0 << 14; > > e[2] = v[1]; > > e[3] = v[0]; > > } else { > > e[0] = v[0]; > > e[1] = v[1]; > > e[2] = 0 << 14; > > e[3] = 1 << 14; > > } > > > > if (invert2x2(e, d)) { > > sum = UINT64_MAX; > > goto next; > > } > > > > You can make use of the properties of e to simplify calculating the > inverse. The determinant is always v[0]<<14, so you can just do if > (!v[0]) continue; and skip the determinant check altogether. > > > if (d[i] != av_clip_intp2(d[i], 15)) { > > d[i] < INT16_MIN || d[i] > INT16_MAX is more clear and probably faster > > > + lt = ((lm * e[0]) >> 14) + ((rm * e[1]) >> 14); > > + rt = ((lm * e[2]) >> 14) + ((rm * e[3]) >> 14); > > Result is implementation-defined. Use division by (1<<14). Also add > then divide. The intermediate result is 49 bits so fits easily in 64 > bits. > Division by (1<<14) will give incorrect results. been there done that, you can check all your "reviews" validity by testing patches and that results is bitexact, otherwise I'm just wasting time here. Additions are done before not later, again check your comments validity before commenting more. Thanks. > You could also simplify this calculation by again making use of the > properties of e. > > > if (c) > > v += FFABS(rt); > > else > > v += FFABS(lt); > > sum += v; > > if (sum > best_sum) > > goto next; > > Seems like this reduces to solving a linear program. > > > if ((((lt * d[0]) >> 14) + ((rt * d[1]) >> 14)) > > != lm) { > > sum = UINT64_MAX; > > goto next; > > } > > > > if ((((lt * d[2]) >> 14) + ((rt * d[3]) >> 14)) > > != rm) { > > sum = UINT64_MAX; > > goto next; > > } > > Looks like a massive hack. I'd prefer to formally verify that the > arithmetic works out. Also again you can make use of the properties of > e, or inv(e) as it were. > Arithmetic may not always work out. > > /Tomas > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote: > > > if (c) { > > e[0] = 1 << 14; > > e[1] = 0 << 14; > > e[2] = v[1]; > > e[3] = v[0]; > > } else { > > e[0] = v[0]; > > e[1] = v[1]; > > e[2] = 0 << 14; > > e[3] = 1 << 14; > > } > > > > if (invert2x2(e, d)) { > > sum = UINT64_MAX; > > goto next; > > } > > > > You can make use of the properties of e to simplify calculating the > inverse. The determinant is always v[0]<<14, so you can just do if > (!v[0]) continue; and skip the determinant check altogether. > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ? May added such cases later. > > > if (d[i] != av_clip_intp2(d[i], 15)) { > > d[i] < INT16_MIN || d[i] > INT16_MAX is more clear and probably faster > > > + lt = ((lm * e[0]) >> 14) + ((rm * e[1]) >> 14); > > + rt = ((lm * e[2]) >> 14) + ((rm * e[3]) >> 14); > > Result is implementation-defined. Use division by (1<<14). Also add > then divide. The intermediate result is 49 bits so fits easily in 64 > bits. > > You could also simplify this calculation by again making use of the > properties of e. > > > if (c) > > v += FFABS(rt); > > else > > v += FFABS(lt); > > sum += v; > > if (sum > best_sum) > > goto next; > > Seems like this reduces to solving a linear program. > > > if ((((lt * d[0]) >> 14) + ((rt * d[1]) >> 14)) > > != lm) { > > sum = UINT64_MAX; > > goto next; > > } > > > > if ((((lt * d[2]) >> 14) + ((rt * d[3]) >> 14)) > > != rm) { > > sum = UINT64_MAX; > > goto next; > > } > > Looks like a massive hack. I'd prefer to formally verify that the > arithmetic works out. Also again you can make use of the properties of > e, or inv(e) as it were. > > /Tomas > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote: > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote: > > > > > > if (c) { > > > e[0] = 1 << 14; > > > e[1] = 0 << 14; > > > e[2] = v[1]; > > > e[3] = v[0]; > > > } else { > > > e[0] = v[0]; > > > e[1] = v[1]; > > > e[2] = 0 << 14; > > > e[3] = 1 << 14; > > > } > > > > > > if (invert2x2(e, d)) { > > > sum = UINT64_MAX; > > > goto next; > > > } > > > > > > > You can make use of the properties of e to simplify calculating the > > inverse. The determinant is always v[0]<<14, so you can just do if > > (!v[0]) continue; and skip the determinant check altogether. > > > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ? > May added such cases later. You can just work the math out on paper. Inverse of 1 0 v[1] v[0] is 1 0 -v[1]/v[0] 1/v[0] not accounting for shifts. Also RE: my other comments, you are right. I didn't take into account that MLP is lossless and that there may be off-by-one errors. And as I said on IRC you can formulate this as a least squares problem, then solve it using a linear system solve. This patch seems finds a solution that minimizes L1 rather than L2 though. Not sure what the implications of that are compressionwise. What happens if you replace FFABS() with a square for scoring? /Tomas
On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git@haerdin.se> wrote: > On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote: > > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> wrote: > > > > > > > > > if (c) { > > > > e[0] = 1 << 14; > > > > e[1] = 0 << 14; > > > > e[2] = v[1]; > > > > e[3] = v[0]; > > > > } else { > > > > e[0] = v[0]; > > > > e[1] = v[1]; > > > > e[2] = 0 << 14; > > > > e[3] = 1 << 14; > > > > } > > > > > > > > if (invert2x2(e, d)) { > > > > sum = UINT64_MAX; > > > > goto next; > > > > } > > > > > > > > > > You can make use of the properties of e to simplify calculating the > > > inverse. The determinant is always v[0]<<14, so you can just do if > > > (!v[0]) continue; and skip the determinant check altogether. > > > > > > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ? > > May added such cases later. > > You can just work the math out on paper. Inverse of > > 1 0 > v[1] v[0] > > is > > 1 0 > -v[1]/v[0] 1/v[0] > > not accounting for shifts. > But I want to add real 2x2 matrix with no 0 cell, with: a, b c, d later. (even though gains are small, as encoded files use it rarely) > > Also RE: my other comments, you are right. I didn't take into account > that MLP is lossless and that there may be off-by-one errors. > > And as I said on IRC you can formulate this as a least squares problem, > then solve it using a linear system solve. This patch seems finds a > solution that minimizes L1 rather than L2 though. Not sure what the > implications of that are compressionwise. What happens if you replace > FFABS() with a square for scoring? > It reduces size usually by less then 0.002 % Linear system solver gives vectors to create equations for both channels at same time? > > /Tomas > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
ons 2023-10-25 klockan 21:59 +0200 skrev Paul B Mahol: > On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git@haerdin.se> wrote: > > > On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote: > > > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> > > > wrote: > > > > > > > > > > > > if (c) { > > > > > e[0] = 1 << 14; > > > > > e[1] = 0 << 14; > > > > > e[2] = v[1]; > > > > > e[3] = v[0]; > > > > > } else { > > > > > e[0] = v[0]; > > > > > e[1] = v[1]; > > > > > e[2] = 0 << 14; > > > > > e[3] = 1 << 14; > > > > > } > > > > > > > > > > if (invert2x2(e, d)) { > > > > > sum = UINT64_MAX; > > > > > goto next; > > > > > } > > > > > > > > > > > > > You can make use of the properties of e to simplify calculating > > > > the > > > > inverse. The determinant is always v[0]<<14, so you can just do > > > > if > > > > (!v[0]) continue; and skip the determinant check altogether. > > > > > > > > > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ? > > > May added such cases later. > > > > You can just work the math out on paper. Inverse of > > > > 1 0 > > v[1] v[0] > > > > is > > > > 1 0 > > -v[1]/v[0] 1/v[0] > > > > not accounting for shifts. > > > > But I want to add real 2x2 matrix with no 0 cell, with: > > a, b > c, d > > later. (even though gains are small, as encoded files use it rarely) If this is possible within MLP then yes, do that. It is not clear from what you've told me so far and from my brief reading of the code how capable the format is. > > Also RE: my other comments, you are right. I didn't take into > > account > > that MLP is lossless and that there may be off-by-one errors. > > > > And as I said on IRC you can formulate this as a least squares > > problem, > > then solve it using a linear system solve. This patch seems finds a > > solution that minimizes L1 rather than L2 though. Not sure what the > > implications of that are compressionwise. What happens if you > > replace > > FFABS() with a square for scoring? > > > > It reduces size usually by less then 0.002 % > > Linear system solver gives vectors to create equations for both > channels at > same time? L2 minimization allows using ordinary least squarse. As I said on IRC, the rub lies in formulating the problem properly. Minimizing L1 is much harder, since it involves solving a linear program. Of course for practical purposes we don't need an exact solution. Looking a bit more at the code, what is important is the decoding coefficients, the d matrix. The encoder is free to choose d and the encoded residuals so long as it decodes correctly. The decoder is specified on d, not e. Currently only one matrix is used (count=1 in estimate_coeff). With two matrices something akin to a lifting scheme can be performed. This means almost any 2x2 transform should be possible to perform (modulo bitexactness concerns). What I mean by lifting scheme here is that any 2x2 matrix A can be decomposed into the product of two or more matrices on the form that e has. I think. We could potentially do something like alternating transforms on this form: l += k1*r; r += k2*l; l += k3*r; r += k4*l; This can always be inverted provided the intermediate results don't go out of range, or in the event that they do go out of range, the decoder is sufficiently well specified so that encoder and decoder don't go out of sync. Compare how YCoCg-R is specified and fits in 3*8 bits. In fact the WP article on YCoCg perhaps gets the point across better: https://en.wikipedia.org/wiki/YCoCg it in turn links this stackoverflow post which makes the same point: https://stackoverflow.com/questions/10566668/lossless-rgb-to-ycbcr-transformation/12146329#12146329 I believe any transformed found by PCA can be converted into an equivalent lifting scheme, and it will always be lossless provided modulo is specified correctly in the codec. I have no idea if it is. /Tomas
On Mon, Oct 30, 2023 at 2:15 PM Tomas Härdin <git@haerdin.se> wrote: > ons 2023-10-25 klockan 21:59 +0200 skrev Paul B Mahol: > > On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git@haerdin.se> wrote: > > > > > On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote: > > > > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git@haerdin.se> > > > > wrote: > > > > > > > > > > > > > > > if (c) { > > > > > > e[0] = 1 << 14; > > > > > > e[1] = 0 << 14; > > > > > > e[2] = v[1]; > > > > > > e[3] = v[0]; > > > > > > } else { > > > > > > e[0] = v[0]; > > > > > > e[1] = v[1]; > > > > > > e[2] = 0 << 14; > > > > > > e[3] = 1 << 14; > > > > > > } > > > > > > > > > > > > if (invert2x2(e, d)) { > > > > > > sum = UINT64_MAX; > > > > > > goto next; > > > > > > } > > > > > > > > > > > > > > > > You can make use of the properties of e to simplify calculating > > > > > the > > > > > inverse. The determinant is always v[0]<<14, so you can just do > > > > > if > > > > > (!v[0]) continue; and skip the determinant check altogether. > > > > > > > > > > > > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ? > > > > May added such cases later. > > > > > > You can just work the math out on paper. Inverse of > > > > > > 1 0 > > > v[1] v[0] > > > > > > is > > > > > > 1 0 > > > -v[1]/v[0] 1/v[0] > > > > > > not accounting for shifts. > > > > > > > But I want to add real 2x2 matrix with no 0 cell, with: > > > > a, b > > c, d > > > > later. (even though gains are small, as encoded files use it rarely) > > If this is possible within MLP then yes, do that. It is not clear from > what you've told me so far and from my brief reading of the code how > capable the format is. > > > > Also RE: my other comments, you are right. I didn't take into > > > account > > > that MLP is lossless and that there may be off-by-one errors. > > > > > > And as I said on IRC you can formulate this as a least squares > > > problem, > > > then solve it using a linear system solve. This patch seems finds a > > > solution that minimizes L1 rather than L2 though. Not sure what the > > > implications of that are compressionwise. What happens if you > > > replace > > > FFABS() with a square for scoring? > > > > > > > It reduces size usually by less then 0.002 % > > > > Linear system solver gives vectors to create equations for both > > channels at > > same time? > > L2 minimization allows using ordinary least squarse. As I said on IRC, > the rub lies in formulating the problem properly. Minimizing L1 is much > harder, since it involves solving a linear program. Of course for > practical purposes we don't need an exact solution. > > Looking a bit more at the code, what is important is the decoding > coefficients, the d matrix. The encoder is free to choose d and the > encoded residuals so long as it decodes correctly. The decoder is > specified on d, not e. > > Currently only one matrix is used (count=1 in estimate_coeff). With two > matrices something akin to a lifting scheme can be performed. This > means almost any 2x2 transform should be possible to perform (modulo > bitexactness concerns). > > What I mean by lifting scheme here is that any 2x2 matrix A can be > decomposed into the product of two or more matrices on the form that e > has. I think. > > We could potentially do something like alternating transforms on this > form: > > l += k1*r; > r += k2*l; > l += k3*r; > r += k4*l; > > This can always be inverted provided the intermediate results don't go > out of range, or in the event that they do go out of range, the decoder > is sufficiently well specified so that encoder and decoder don't go out > of sync. Compare how YCoCg-R is specified and fits in 3*8 bits. In fact > the WP article on YCoCg perhaps gets the point across better: > https://en.wikipedia.org/wiki/YCoCg > it in turn links this stackoverflow post which makes the same point: > > https://stackoverflow.com/questions/10566668/lossless-rgb-to-ycbcr-transformation/12146329#12146329 > > I believe any transformed found by PCA can be converted into an > equivalent lifting scheme, and it will always be lossless provided > modulo is specified correctly in the codec. I have no idea if it is. > L = k1 * l + k2 * r R = L * k3 + r * k4 This is affine transform for 2x2 matrix case, and here typical PCA or lifting fails. > > /Tomas > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
From 310979c0394ab8572b34754ae1436537512c5afd Mon Sep 17 00:00:00 2001 From: Paul B Mahol <onemda@gmail.com> Date: Wed, 25 Oct 2023 11:05:35 +0200 Subject: [PATCH 2/4] avcodec/mlpenc: add 3.1 ch layout support for truehd Signed-off-by: Paul B Mahol <onemda@gmail.com> --- libavcodec/mlpenc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libavcodec/mlpenc.c b/libavcodec/mlpenc.c index 27ef5f2c82..3a7893b3f0 100644 --- a/libavcodec/mlpenc.c +++ b/libavcodec/mlpenc.c @@ -600,6 +600,7 @@ static av_cold int mlp_encode_init(AVCodecContext *avctx) break; case AV_CH_LAYOUT_2POINT1: case AV_CH_LAYOUT_SURROUND: + case AV_CH_LAYOUT_3POINT1: case AV_CH_LAYOUT_5POINT0: case AV_CH_LAYOUT_5POINT1: ctx->ch2_presentation_mod= 0; @@ -2399,12 +2400,13 @@ const FFCodec ff_truehd_encoder = { .p.priv_class = &mlp_class, .p.sample_fmts = (const enum AVSampleFormat[]) {AV_SAMPLE_FMT_S16P, AV_SAMPLE_FMT_S32P, AV_SAMPLE_FMT_NONE}, .p.supported_samplerates = (const int[]) {44100, 48000, 88200, 96000, 176400, 192000, 0}, - CODEC_OLD_CHANNEL_LAYOUTS(AV_CH_LAYOUT_MONO, AV_CH_LAYOUT_STEREO, AV_CH_LAYOUT_2POINT1, AV_CH_LAYOUT_SURROUND, AV_CH_LAYOUT_5POINT0, AV_CH_LAYOUT_5POINT1) + CODEC_OLD_CHANNEL_LAYOUTS(AV_CH_LAYOUT_MONO, AV_CH_LAYOUT_STEREO, AV_CH_LAYOUT_2POINT1, AV_CH_LAYOUT_SURROUND, AV_CH_LAYOUT_3POINT1, AV_CH_LAYOUT_5POINT0, AV_CH_LAYOUT_5POINT1) .p.ch_layouts = (const AVChannelLayout[]) { AV_CHANNEL_LAYOUT_MONO, AV_CHANNEL_LAYOUT_STEREO, AV_CHANNEL_LAYOUT_2POINT1, AV_CHANNEL_LAYOUT_SURROUND, + AV_CHANNEL_LAYOUT_3POINT1, AV_CHANNEL_LAYOUT_5POINT0, AV_CHANNEL_LAYOUT_5POINT1, { 0 } -- 2.42.0