diff mbox series

[FFmpeg-devel,2/2] libswcale/input: fix incorrect rgbf32 yuv conversions

Message ID 20200912090714.66582-3-mindmark@gmail.com
State New
Headers show
Series libswcale/input: fix incorrect rgbf32 yuv conversions | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Mark Reid Sept. 12, 2020, 9:07 a.m. UTC
From: Mark Reid <mindmark@gmail.com>

---
 libswscale/input.c                  | 12 +++++-------
 tests/ref/fate/filter-pixfmts-scale |  8 ++++----
 2 files changed, 9 insertions(+), 11 deletions(-)

Comments

Michael Niedermayer Sept. 13, 2020, 3:55 p.m. UTC | #1
On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> From: Mark Reid <mindmark@gmail.com>
> 
> ---
>  libswscale/input.c                  | 12 +++++-------
>  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
>  2 files changed, 9 insertions(+), 11 deletions(-)

Can you provide some tests that show that this is better ?
Iam asking that because some of the numbers in some of the code
(i dont remember which) where tuned to give more accurate overall results

an example for tests would be converting from A->B->A then compare to the orig

thx


[...]
Mark Reid Sept. 13, 2020, 11:04 p.m. UTC | #2
On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> > From: Mark Reid <mindmark@gmail.com>
> >
> > ---
> >  libswscale/input.c                  | 12 +++++-------
> >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> >  2 files changed, 9 insertions(+), 11 deletions(-)
>
> Can you provide some tests that show that this is better ?
> Iam asking that because some of the numbers in some of the code
> (i dont remember which) where tuned to give more accurate overall results
>
> an example for tests would be converting from A->B->A then compare to the
> orig
>
> thx
>
>
Hopefully i can explain this clearly!

It's easier to see the error if you run a black image through the old
conversion.
zero values don't get mapped to zero. (attached sample image)

ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407


I think this is a error in fixed point rounding, the issue is basically
boils down to

128 << 8 != 257 << 7
and
16 << 8  != 33 << 7

https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
the 8 bit rgb to yuv formula is

Y = ry*r + gy*g + by*b  + 16
U = ru*r + gu*g + bu*b + 128
V = rv*r + gv*g + bv*b + 128

I think the studio swing offsets at the end are calculated wrong in the old
code.
(257 << (RGB2YUV_SHIFT + bpc - 9)))
257 is correct for 8 bit rounding but not for 16-bit.

the 257 i believe is from (128 << 1) + 1
the +1 is for rounding

for rounding 16-bit (128 << 9) + 1 = 0x10001

therefore I think the correct rounding any bit depth with the old formula
would be (untested)
(((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )

I just simplified it to
(0x10001 << (RGB2YUV_SHIFT - 1))

The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.


>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Everything should be made as simple as possible, but not simpler.
> -- Albert Einstein
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Rémi Achard Sept. 14, 2020, 8:20 a.m. UTC | #3
> The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.

rgb64ToUV and rgb64ToY use it too, might be an opportunity to reduce code
duplication.

I ran into this while working on related precision issues (color range
conversion being precise only for 8bits) and just used the same formula as
the others templates to solve the issue like you.

Le lun. 14 sept. 2020 à 06:30, Mark Reid <mindmark@gmail.com> a écrit :

> On Sun., Sep. 13, 2020, 4:04 p.m. Mark Reid, <mindmark@gmail.com> wrote:
>
> >
> >
> > On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer
> <michael@niedermayer.cc>
> > wrote:
> >
> >> On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> >> > From: Mark Reid <mindmark@gmail.com>
> >> >
> >> > ---
> >> >  libswscale/input.c                  | 12 +++++-------
> >> >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> >> >  2 files changed, 9 insertions(+), 11 deletions(-)
> >>
> >> Can you provide some tests that show that this is better ?
> >> Iam asking that because some of the numbers in some of the code
> >> (i dont remember which) where tuned to give more accurate overall
> results
> >>
> >> an example for tests would be converting from A->B->A then compare to
> the
> >> orig
> >>
> >> thx
> >>
> >>
> > Hopefully i can explain this clearly!
> >
> > It's easier to see the error if you run a black image through the old
> > conversion.
> > zero values don't get mapped to zero. (attached sample image)
> >
> > ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
> > The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407
> >
> >
> > I think this is a error in fixed point rounding, the issue is basically
> > boils down to
> >
> > 128 << 8 != 257 << 7
> > and
> > 16 << 8  != 33 << 7
> >
> > https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
> > the 8 bit rgb to yuv formula is
> >
> > Y = ry*r + gy*g + by*b  + 16
> > U = ru*r + gu*g + bu*b + 128
> > V = rv*r + gv*g + bv*b + 128
> >
> > I think the studio swing offsets at the end are calculated wrong in the
> > old code.
> > (257 << (RGB2YUV_SHIFT + bpc - 9)))
> > 257 is correct for 8 bit rounding but not for 16-bit.
> >
> > the 257 i believe is from (128 << 1) + 1
> > the +1 is for rounding
> >
> > for rounding 16-bit (128 << 9) + 1 = 0x10001
> >
> > therefore I think the correct rounding any bit depth with the old formula
> > would be (untested)
> > (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
> >
> > I just simplified it to
> > (0x10001 << (RGB2YUV_SHIFT - 1))
> >
> > The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.
> >
>
> I'm still not sure if I'm correct about all this. The rounding stuff
> doesn't make 100% sense to me. But this is all I've gathered from reading
> the code.
>
>
>
> >
> >>
> >> [...]
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>
> >> Everything should be made as simple as possible, but not simpler.
> >> -- Albert Einstein
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel@ffmpeg.org
> >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >> To unsubscribe, visit link above, or email
> >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Michael Niedermayer Sept. 14, 2020, 9:44 p.m. UTC | #4
On Sun, Sep 13, 2020 at 04:04:42PM -0700, Mark Reid wrote:
> On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer <michael@niedermayer.cc>
> wrote:
> 
> > On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> > > From: Mark Reid <mindmark@gmail.com>
> > >
> > > ---
> > >  libswscale/input.c                  | 12 +++++-------
> > >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> > >  2 files changed, 9 insertions(+), 11 deletions(-)
> >
> > Can you provide some tests that show that this is better ?
> > Iam asking that because some of the numbers in some of the code
> > (i dont remember which) where tuned to give more accurate overall results
> >
> > an example for tests would be converting from A->B->A then compare to the
> > orig
> >
> > thx
> >
> >
> Hopefully i can explain this clearly!
> 
> It's easier to see the error if you run a black image through the old
> conversion.
> zero values don't get mapped to zero. (attached sample image)
> 
> ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
> The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407
> 
> 
> I think this is a error in fixed point rounding, the issue is basically
> boils down to
> 
> 128 << 8 != 257 << 7
> and
> 16 << 8  != 33 << 7
> 
> https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
> the 8 bit rgb to yuv formula is
> 
> Y = ry*r + gy*g + by*b  + 16
> U = ru*r + gu*g + bu*b + 128
> V = rv*r + gv*g + bv*b + 128
> 
> I think the studio swing offsets at the end are calculated wrong in the old
> code.
> (257 << (RGB2YUV_SHIFT + bpc - 9)))
> 257 is correct for 8 bit rounding but not for 16-bit.
> 
> the 257 i believe is from (128 << 1) + 1
> the +1 is for rounding
> 
> for rounding 16-bit (128 << 9) + 1 = 0x10001
> 
> therefore I think the correct rounding any bit depth with the old formula
> would be (untested)
> (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
> 
> I just simplified it to
> (0x10001 << (RGB2YUV_SHIFT - 1))
> 
> The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.

You quite possibly are correct, can you test that this actually works
out. The test sample only covers 1 color (black)
a testsample covering a wide range of the color cube would be more
convincing that this change is needed and sufficient to fix this.

thx

[...]
Mark Reid Sept. 15, 2020, 1:31 a.m. UTC | #5
On Mon, Sep 14, 2020 at 2:44 PM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Sun, Sep 13, 2020 at 04:04:42PM -0700, Mark Reid wrote:
> > On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer
> <michael@niedermayer.cc>
> > wrote:
> >
> > > On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> > > > From: Mark Reid <mindmark@gmail.com>
> > > >
> > > > ---
> > > >  libswscale/input.c                  | 12 +++++-------
> > > >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> > > >  2 files changed, 9 insertions(+), 11 deletions(-)
> > >
> > > Can you provide some tests that show that this is better ?
> > > Iam asking that because some of the numbers in some of the code
> > > (i dont remember which) where tuned to give more accurate overall
> results
> > >
> > > an example for tests would be converting from A->B->A then compare to
> the
> > > orig
> > >
> > > thx
> > >
> > >
> > Hopefully i can explain this clearly!
> >
> > It's easier to see the error if you run a black image through the old
> > conversion.
> > zero values don't get mapped to zero. (attached sample image)
> >
> > ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
> > The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407
> >
> >
> > I think this is a error in fixed point rounding, the issue is basically
> > boils down to
> >
> > 128 << 8 != 257 << 7
> > and
> > 16 << 8  != 33 << 7
> >
> > https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
> > the 8 bit rgb to yuv formula is
> >
> > Y = ry*r + gy*g + by*b  + 16
> > U = ru*r + gu*g + bu*b + 128
> > V = rv*r + gv*g + bv*b + 128
> >
> > I think the studio swing offsets at the end are calculated wrong in the
> old
> > code.
> > (257 << (RGB2YUV_SHIFT + bpc - 9)))
> > 257 is correct for 8 bit rounding but not for 16-bit.
> >
> > the 257 i believe is from (128 << 1) + 1
> > the +1 is for rounding
> >
> > for rounding 16-bit (128 << 9) + 1 = 0x10001
> >
> > therefore I think the correct rounding any bit depth with the old formula
> > would be (untested)
> > (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
> >
> > I just simplified it to
> > (0x10001 << (RGB2YUV_SHIFT - 1))
> >
> > The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.
>
> You quite possibly are correct, can you test that this actually works
> out. The test sample only covers 1 color (black)
> a testsample covering a wide range of the color cube would be more
> convincing that this change is needed and sufficient to fix this.
>
> thx
>

I wrote a small python script to compare the raw gbrpf32le images if that
works? I attached it and also a more colorful test pattern.

it runs these two commands and compares the 2 raw float images
ffmpeg -y -i test_pattern.exr -f rawvideo original.gbrpf32le
ffmpeg -y -i test_pattern.exr -vf
format=pix_fmts=rgb48le,format=pix_fmts=gbrpf32le -f rawvideo
converted.gbrpf32le

python gbrpf32le_diff.py test_pattern.exr

without patch:
avg error: 237.445495855
min error: 0.0
max error: 468.399102688

with patch:
avg error: 15.9312244829
min error: 0.0
max error: 69.467689991


These are floating points scaled to 16-bit values.
Even with my patch, I'm kinda disturbed how much error there is.


>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> There will always be a question for which you do not know the correct
> answer.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Mark Reid Sept. 27, 2020, 5:01 a.m. UTC | #6
On Mon, Sep 14, 2020 at 6:31 PM Mark Reid <mindmark@gmail.com> wrote:

>
>
> On Mon, Sep 14, 2020 at 2:44 PM Michael Niedermayer <michael@niedermayer.cc>
> wrote:
>
>> On Sun, Sep 13, 2020 at 04:04:42PM -0700, Mark Reid wrote:
>> > On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer
>> <michael@niedermayer.cc>
>> > wrote:
>> >
>> > > On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
>> > > > From: Mark Reid <mindmark@gmail.com>
>> > > >
>> > > > ---
>> > > >  libswscale/input.c                  | 12 +++++-------
>> > > >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
>> > > >  2 files changed, 9 insertions(+), 11 deletions(-)
>> > >
>> > > Can you provide some tests that show that this is better ?
>> > > Iam asking that because some of the numbers in some of the code
>> > > (i dont remember which) where tuned to give more accurate overall
>> results
>> > >
>> > > an example for tests would be converting from A->B->A then compare to
>> the
>> > > orig
>> > >
>> > > thx
>> > >
>> > >
>> > Hopefully i can explain this clearly!
>> >
>> > It's easier to see the error if you run a black image through the old
>> > conversion.
>> > zero values don't get mapped to zero. (attached sample image)
>> >
>> > ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
>> > The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407
>> >
>> >
>> > I think this is a error in fixed point rounding, the issue is basically
>> > boils down to
>> >
>> > 128 << 8 != 257 << 7
>> > and
>> > 16 << 8  != 33 << 7
>> >
>> > https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
>> > the 8 bit rgb to yuv formula is
>> >
>> > Y = ry*r + gy*g + by*b  + 16
>> > U = ru*r + gu*g + bu*b + 128
>> > V = rv*r + gv*g + bv*b + 128
>> >
>> > I think the studio swing offsets at the end are calculated wrong in the
>> old
>> > code.
>> > (257 << (RGB2YUV_SHIFT + bpc - 9)))
>> > 257 is correct for 8 bit rounding but not for 16-bit.
>> >
>> > the 257 i believe is from (128 << 1) + 1
>> > the +1 is for rounding
>> >
>> > for rounding 16-bit (128 << 9) + 1 = 0x10001
>> >
>> > therefore I think the correct rounding any bit depth with the old
>> formula
>> > would be (untested)
>> > (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
>> >
>> > I just simplified it to
>> > (0x10001 << (RGB2YUV_SHIFT - 1))
>> >
>> > The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.
>>
>> You quite possibly are correct, can you test that this actually works
>> out. The test sample only covers 1 color (black)
>> a testsample covering a wide range of the color cube would be more
>> convincing that this change is needed and sufficient to fix this.
>>
>> thx
>>
>
> I wrote a small python script to compare the raw gbrpf32le images if that
> works? I attached it and also a more colorful test pattern.
>
> it runs these two commands and compares the 2 raw float images
> ffmpeg -y -i test_pattern.exr -f rawvideo original.gbrpf32le
> ffmpeg -y -i test_pattern.exr -vf
> format=pix_fmts=rgb48le,format=pix_fmts=gbrpf32le -f rawvideo
> converted.gbrpf32le
>
> python gbrpf32le_diff.py test_pattern.exr
>
> without patch:
> avg error: 237.445495855
> min error: 0.0
> max error: 468.399102688
>
> with patch:
> avg error: 15.9312244829
> min error: 0.0
> max error: 69.467689991
>
>
> These are floating points scaled to 16-bit values.
> Even with my patch, I'm kinda disturbed how much error there is.
>

ping
I re-wrote the python script as a c swscale test, if that helps
replicate my results. attached is patch for that.
it generates an image of random float values and does the
conversion/comparison .

before patch:
gbrpf32le -> yuva444p16le -> gbrpf32le
avg diff: 0.003852
min diff: 0.000000
max diff: 0.006638

after patch:
gbrpf32le -> yuva444p16le -> gbrpf32le
avg diff: 0.000125
min diff: 0.000000
max diff: 0.000501


>
>
>>
>> [...]
>>
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> There will always be a question for which you do not know the correct
>> answer.
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
>
Michael Niedermayer Sept. 28, 2020, 5:33 p.m. UTC | #7
On Sat, Sep 26, 2020 at 10:01:30PM -0700, Mark Reid wrote:
> On Mon, Sep 14, 2020 at 6:31 PM Mark Reid <mindmark@gmail.com> wrote:
> 
> >
> >
> > On Mon, Sep 14, 2020 at 2:44 PM Michael Niedermayer <michael@niedermayer.cc>
> > wrote:
> >
> >> On Sun, Sep 13, 2020 at 04:04:42PM -0700, Mark Reid wrote:
> >> > On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer
> >> <michael@niedermayer.cc>
> >> > wrote:
> >> >
> >> > > On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> >> > > > From: Mark Reid <mindmark@gmail.com>
> >> > > >
> >> > > > ---
> >> > > >  libswscale/input.c                  | 12 +++++-------
> >> > > >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> >> > > >  2 files changed, 9 insertions(+), 11 deletions(-)
> >> > >
> >> > > Can you provide some tests that show that this is better ?
> >> > > Iam asking that because some of the numbers in some of the code
> >> > > (i dont remember which) where tuned to give more accurate overall
> >> results
> >> > >
> >> > > an example for tests would be converting from A->B->A then compare to
> >> the
> >> > > orig
> >> > >
> >> > > thx
> >> > >
> >> > >
> >> > Hopefully i can explain this clearly!
> >> >
> >> > It's easier to see the error if you run a black image through the old
> >> > conversion.
> >> > zero values don't get mapped to zero. (attached sample image)
> >> >
> >> > ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
> >> > The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407
> >> >
> >> >
> >> > I think this is a error in fixed point rounding, the issue is basically
> >> > boils down to
> >> >
> >> > 128 << 8 != 257 << 7
> >> > and
> >> > 16 << 8  != 33 << 7
> >> >
> >> > https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
> >> > the 8 bit rgb to yuv formula is
> >> >
> >> > Y = ry*r + gy*g + by*b  + 16
> >> > U = ru*r + gu*g + bu*b + 128
> >> > V = rv*r + gv*g + bv*b + 128
> >> >
> >> > I think the studio swing offsets at the end are calculated wrong in the
> >> old
> >> > code.
> >> > (257 << (RGB2YUV_SHIFT + bpc - 9)))
> >> > 257 is correct for 8 bit rounding but not for 16-bit.
> >> >
> >> > the 257 i believe is from (128 << 1) + 1
> >> > the +1 is for rounding
> >> >
> >> > for rounding 16-bit (128 << 9) + 1 = 0x10001
> >> >
> >> > therefore I think the correct rounding any bit depth with the old
> >> formula
> >> > would be (untested)
> >> > (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
> >> >
> >> > I just simplified it to
> >> > (0x10001 << (RGB2YUV_SHIFT - 1))
> >> >
> >> > The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.
> >>
> >> You quite possibly are correct, can you test that this actually works
> >> out. The test sample only covers 1 color (black)
> >> a testsample covering a wide range of the color cube would be more
> >> convincing that this change is needed and sufficient to fix this.
> >>
> >> thx
> >>
> >
> > I wrote a small python script to compare the raw gbrpf32le images if that
> > works? I attached it and also a more colorful test pattern.
> >
> > it runs these two commands and compares the 2 raw float images
> > ffmpeg -y -i test_pattern.exr -f rawvideo original.gbrpf32le
> > ffmpeg -y -i test_pattern.exr -vf
> > format=pix_fmts=rgb48le,format=pix_fmts=gbrpf32le -f rawvideo
> > converted.gbrpf32le
> >
> > python gbrpf32le_diff.py test_pattern.exr
> >
> > without patch:
> > avg error: 237.445495855
> > min error: 0.0
> > max error: 468.399102688
> >
> > with patch:
> > avg error: 15.9312244829
> > min error: 0.0
> > max error: 69.467689991
> >
> >
> > These are floating points scaled to 16-bit values.
> > Even with my patch, I'm kinda disturbed how much error there is.
> >
> 
> ping
> I re-wrote the python script as a c swscale test, if that helps
> replicate my results. attached is patch for that.
> it generates an image of random float values and does the
> conversion/comparison .
> 
> before patch:
> gbrpf32le -> yuva444p16le -> gbrpf32le
> avg diff: 0.003852
> min diff: 0.000000
> max diff: 0.006638
> 
> after patch:
> gbrpf32le -> yuva444p16le -> gbrpf32le
> avg diff: 0.000125
> min diff: 0.000000
> max diff: 0.000501
> 
> 
> >
> >
> >>
> >> [...]
> >>
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>
> >> There will always be a question for which you do not know the correct
> >> answer.
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel@ffmpeg.org
> >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >> To unsubscribe, visit link above, or email
> >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> >

>  Makefile             |    1 
>  tests/.gitignore     |    1 
>  tests/floatimg_cmp.c |  267 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 269 insertions(+)
> a04783297b0a5d9be6186a150606724457e7b0c5  0001-libswscale-tests-add-floatimg_cmp-test.patch
> From 9c4bc86201037aec454a98b60301d9250fecc7ca Mon Sep 17 00:00:00 2001
> From: Mark Reid <mindmark@gmail.com>
> Date: Sun, 20 Sep 2020 20:45:08 -0700
> Subject: [PATCH] libswscale/tests: add floatimg_cmp test
> 
> ---
>  libswscale/Makefile             |   1 +
>  libswscale/tests/.gitignore     |   1 +
>  libswscale/tests/floatimg_cmp.c | 267 ++++++++++++++++++++++++++++++++
>  3 files changed, 269 insertions(+)
>  create mode 100644 libswscale/tests/floatimg_cmp.c
> 
> diff --git a/libswscale/Makefile b/libswscale/Makefile
> index 5e03e6fa0a..4b8f9de425 100644
> --- a/libswscale/Makefile
> +++ b/libswscale/Makefile
> @@ -25,5 +25,6 @@ OBJS-$(CONFIG_SHARED)        += log2_tab.o
>  SLIBOBJS-$(HAVE_GNU_WINDRES) += swscaleres.o
>  
>  TESTPROGS = colorspace                                                  \
> +            floatimg_cmp                                                \
>              pixdesc_query                                               \
>              swscale                                                     \
> diff --git a/libswscale/tests/.gitignore b/libswscale/tests/.gitignore
> index 1a26f038c4..c56abf0ee7 100644
> --- a/libswscale/tests/.gitignore
> +++ b/libswscale/tests/.gitignore
> @@ -1,3 +1,4 @@
>  /colorspace
> +/floatimg_cmp
>  /pixdesc_query
>  /swscale
> diff --git a/libswscale/tests/floatimg_cmp.c b/libswscale/tests/floatimg_cmp.c
> new file mode 100644
> index 0000000000..affb6fa492
> --- /dev/null
> +++ b/libswscale/tests/floatimg_cmp.c

This seems to produce some compiler warnings

also turning this into a fate test would be a good idea i think


CC	libswscale/tests/floatimg_cmp.o
libswscale/tests/floatimg_cmp.c: In function ‘main’:
libswscale/tests/floatimg_cmp.c:118:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inFormat);
     ^~~~~
libswscale/tests/floatimg_cmp.c:162:13: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
             uint8_t *ptr = rgbIn[p];
             ^~~~~~~
libswscale/tests/floatimg_cmp.c:197:26: warning: passing argument 2 of ‘sws_scale’ from incompatible pointer type [-Wincompatible-pointer-types]
     res = sws_scale(sws, rgbIn, rgbStride, 0, h, (uint8_t * const *) dst, dstStride);
                          ^~~~~
In file included from libswscale/tests/floatimg_cmp.c:35:0:
./libswscale/swscale.h:217:5: note: expected ‘const uint8_t * const* {aka const unsigned char * const*}’ but argument is of type ‘uint8_t ** {aka unsigned char **}’
 int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
     ^~~~~~~~~
libswscale/tests/floatimg_cmp.c:212:26: warning: passing argument 2 of ‘sws_scale’ from incompatible pointer type [-Wincompatible-pointer-types]
     res = sws_scale(sws, dst, dstStride, 0, h, (uint8_t * const *) rgbOut, rgbStride);
                          ^~~
In file included from libswscale/tests/floatimg_cmp.c:35:0:
./libswscale/swscale.h:217:5: note: expected ‘const uint8_t * const* {aka const unsigned char * const*}’ but argument is of type ‘uint8_t ** {aka unsigned char **}’
 int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
     ^~~~~~~~~
libswscale/tests/floatimg_cmp.c:219:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     double sum = 0;
     ^~~~~~
libswscale/tests/floatimg_cmp.c:240:17: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
                 float diff = fabsf(v0.f - v1.f);
                 ^~~~~
LD	libswscale/tests/floatimg_cmp

thx

[...]
Michael Niedermayer Sept. 28, 2020, 5:37 p.m. UTC | #8
On Sat, Sep 26, 2020 at 10:01:30PM -0700, Mark Reid wrote:
> On Mon, Sep 14, 2020 at 6:31 PM Mark Reid <mindmark@gmail.com> wrote:
> 
> >
> >
> > On Mon, Sep 14, 2020 at 2:44 PM Michael Niedermayer <michael@niedermayer.cc>
> > wrote:
> >
> >> On Sun, Sep 13, 2020 at 04:04:42PM -0700, Mark Reid wrote:
> >> > On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer
> >> <michael@niedermayer.cc>
> >> > wrote:
> >> >
> >> > > On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com wrote:
> >> > > > From: Mark Reid <mindmark@gmail.com>
> >> > > >
> >> > > > ---
> >> > > >  libswscale/input.c                  | 12 +++++-------
> >> > > >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> >> > > >  2 files changed, 9 insertions(+), 11 deletions(-)
> >> > >
> >> > > Can you provide some tests that show that this is better ?
> >> > > Iam asking that because some of the numbers in some of the code
> >> > > (i dont remember which) where tuned to give more accurate overall
> >> results
> >> > >
> >> > > an example for tests would be converting from A->B->A then compare to
> >> the
> >> > > orig
> >> > >
> >> > > thx
> >> > >
> >> > >
> >> > Hopefully i can explain this clearly!
> >> >
> >> > It's easier to see the error if you run a black image through the old
> >> > conversion.
> >> > zero values don't get mapped to zero. (attached sample image)
> >> >
> >> > ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo 4x4_zero.rawvideo
> >> > The image should be rgb 0, 0, 0  everywhere but instead it's 353, 0, 407
> >> >
> >> >
> >> > I think this is a error in fixed point rounding, the issue is basically
> >> > boils down to
> >> >
> >> > 128 << 8 != 257 << 7
> >> > and
> >> > 16 << 8  != 33 << 7
> >> >
> >> > https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
> >> > the 8 bit rgb to yuv formula is
> >> >
> >> > Y = ry*r + gy*g + by*b  + 16
> >> > U = ru*r + gu*g + bu*b + 128
> >> > V = rv*r + gv*g + bv*b + 128
> >> >
> >> > I think the studio swing offsets at the end are calculated wrong in the
> >> old
> >> > code.
> >> > (257 << (RGB2YUV_SHIFT + bpc - 9)))
> >> > 257 is correct for 8 bit rounding but not for 16-bit.
> >> >
> >> > the 257 i believe is from (128 << 1) + 1
> >> > the +1 is for rounding
> >> >
> >> > for rounding 16-bit (128 << 9) + 1 = 0x10001
> >> >
> >> > therefore I think the correct rounding any bit depth with the old
> >> formula
> >> > would be (untested)
> >> > (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
> >> >
> >> > I just simplified it to
> >> > (0x10001 << (RGB2YUV_SHIFT - 1))
> >> >
> >> > The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm using.
> >>
> >> You quite possibly are correct, can you test that this actually works
> >> out. The test sample only covers 1 color (black)
> >> a testsample covering a wide range of the color cube would be more
> >> convincing that this change is needed and sufficient to fix this.
> >>
> >> thx
> >>
> >
> > I wrote a small python script to compare the raw gbrpf32le images if that
> > works? I attached it and also a more colorful test pattern.
> >
> > it runs these two commands and compares the 2 raw float images
> > ffmpeg -y -i test_pattern.exr -f rawvideo original.gbrpf32le
> > ffmpeg -y -i test_pattern.exr -vf
> > format=pix_fmts=rgb48le,format=pix_fmts=gbrpf32le -f rawvideo
> > converted.gbrpf32le
> >
> > python gbrpf32le_diff.py test_pattern.exr
> >
> > without patch:
> > avg error: 237.445495855
> > min error: 0.0
> > max error: 468.399102688
> >
> > with patch:
> > avg error: 15.9312244829
> > min error: 0.0
> > max error: 69.467689991
> >
> >
> > These are floating points scaled to 16-bit values.
> > Even with my patch, I'm kinda disturbed how much error there is.
> >
> 
> ping
> I re-wrote the python script as a c swscale test, if that helps
> replicate my results. attached is patch for that.
> it generates an image of random float values and does the
> conversion/comparison .
> 
> before patch:
> gbrpf32le -> yuva444p16le -> gbrpf32le
> avg diff: 0.003852
> min diff: 0.000000
> max diff: 0.006638
> 
> after patch:
> gbrpf32le -> yuva444p16le -> gbrpf32le
> avg diff: 0.000125
> min diff: 0.000000
> max diff: 0.000501

is it better for all middle formats ?
Iam asking as it seems this should be rather easy to test with your code

But from what i  see above, obviously this is an improvment and should be 
applied

thx

[...]
Mark Reid Sept. 29, 2020, 1:50 a.m. UTC | #9
On Mon, Sep 28, 2020 at 10:38 AM Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Sat, Sep 26, 2020 at 10:01:30PM -0700, Mark Reid wrote:
> > On Mon, Sep 14, 2020 at 6:31 PM Mark Reid <mindmark@gmail.com> wrote:
> >
> > >
> > >
> > > On Mon, Sep 14, 2020 at 2:44 PM Michael Niedermayer
> <michael@niedermayer.cc>
> > > wrote:
> > >
> > >> On Sun, Sep 13, 2020 at 04:04:42PM -0700, Mark Reid wrote:
> > >> > On Sun, Sep 13, 2020 at 8:55 AM Michael Niedermayer
> > >> <michael@niedermayer.cc>
> > >> > wrote:
> > >> >
> > >> > > On Sat, Sep 12, 2020 at 02:07:14AM -0700, mindmark@gmail.com
> wrote:
> > >> > > > From: Mark Reid <mindmark@gmail.com>
> > >> > > >
> > >> > > > ---
> > >> > > >  libswscale/input.c                  | 12 +++++-------
> > >> > > >  tests/ref/fate/filter-pixfmts-scale |  8 ++++----
> > >> > > >  2 files changed, 9 insertions(+), 11 deletions(-)
> > >> > >
> > >> > > Can you provide some tests that show that this is better ?
> > >> > > Iam asking that because some of the numbers in some of the code
> > >> > > (i dont remember which) where tuned to give more accurate overall
> > >> results
> > >> > >
> > >> > > an example for tests would be converting from A->B->A then
> compare to
> > >> the
> > >> > > orig
> > >> > >
> > >> > > thx
> > >> > >
> > >> > >
> > >> > Hopefully i can explain this clearly!
> > >> >
> > >> > It's easier to see the error if you run a black image through the
> old
> > >> > conversion.
> > >> > zero values don't get mapped to zero. (attached sample image)
> > >> >
> > >> > ffmpeg -i 4x4_zero.exr -pix_fmt rgb48le -f rawvideo
> 4x4_zero.rawvideo
> > >> > The image should be rgb 0, 0, 0  everywhere but instead it's 353,
> 0, 407
> > >> >
> > >> >
> > >> > I think this is a error in fixed point rounding, the issue is
> basically
> > >> > boils down to
> > >> >
> > >> > 128 << 8 != 257 << 7
> > >> > and
> > >> > 16 << 8  != 33 << 7
> > >> >
> > >> > https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601
> > >> > the 8 bit rgb to yuv formula is
> > >> >
> > >> > Y = ry*r + gy*g + by*b  + 16
> > >> > U = ru*r + gu*g + bu*b + 128
> > >> > V = rv*r + gv*g + bv*b + 128
> > >> >
> > >> > I think the studio swing offsets at the end are calculated wrong in
> the
> > >> old
> > >> > code.
> > >> > (257 << (RGB2YUV_SHIFT + bpc - 9)))
> > >> > 257 is correct for 8 bit rounding but not for 16-bit.
> > >> >
> > >> > the 257 i believe is from (128 << 1) + 1
> > >> > the +1 is for rounding
> > >> >
> > >> > for rounding 16-bit (128 << 9) + 1 = 0x10001
> > >> >
> > >> > therefore I think the correct rounding any bit depth with the old
> > >> formula
> > >> > would be (untested)
> > >> > (((128 << (bpc - 7)) + 1) << (RGB2YUV_SHIFT-1) )
> > >> >
> > >> > I just simplified it to
> > >> > (0x10001 << (RGB2YUV_SHIFT - 1))
> > >> >
> > >> > The rgb48ToUV and rgb48ToY funcs in input.c use the formula I'm
> using.
> > >>
> > >> You quite possibly are correct, can you test that this actually works
> > >> out. The test sample only covers 1 color (black)
> > >> a testsample covering a wide range of the color cube would be more
> > >> convincing that this change is needed and sufficient to fix this.
> > >>
> > >> thx
> > >>
> > >
> > > I wrote a small python script to compare the raw gbrpf32le images if
> that
> > > works? I attached it and also a more colorful test pattern.
> > >
> > > it runs these two commands and compares the 2 raw float images
> > > ffmpeg -y -i test_pattern.exr -f rawvideo original.gbrpf32le
> > > ffmpeg -y -i test_pattern.exr -vf
> > > format=pix_fmts=rgb48le,format=pix_fmts=gbrpf32le -f rawvideo
> > > converted.gbrpf32le
> > >
> > > python gbrpf32le_diff.py test_pattern.exr
> > >
> > > without patch:
> > > avg error: 237.445495855
> > > min error: 0.0
> > > max error: 468.399102688
> > >
> > > with patch:
> > > avg error: 15.9312244829
> > > min error: 0.0
> > > max error: 69.467689991
> > >
> > >
> > > These are floating points scaled to 16-bit values.
> > > Even with my patch, I'm kinda disturbed how much error there is.
> > >
> >
> > ping
> > I re-wrote the python script as a c swscale test, if that helps
> > replicate my results. attached is patch for that.
> > it generates an image of random float values and does the
> > conversion/comparison .
> >
> > before patch:
> > gbrpf32le -> yuva444p16le -> gbrpf32le
> > avg diff: 0.003852
> > min diff: 0.000000
> > max diff: 0.006638
> >
> > after patch:
> > gbrpf32le -> yuva444p16le -> gbrpf32le
> > avg diff: 0.000125
> > min diff: 0.000000
> > max diff: 0.000501
>
> is it better for all middle formats ?
> Iam asking as it seems this should be rather easy to test with your code
>

good idea, yes it looks better for other intermediate formats too, see my
results below.
I'll submit a new version of patch that does this and with the warnings
fixed.

gbrpf32le -> yuv444p16le -> gbrpf32le
avg diff: 0.003852      avg diff: 0.000125
min diff: 0.000000      min diff: 0.000000
max diff: 0.006638     max diff: 0.000501

gbrpf32le -> yuv444p -> gbrpf32le
avg diff: 0.004316      avg diff: 0.001804
min diff: 0.000000      min diff: 0.000000
max diff: 0.012704     max diff: 0.006399

gbrpf32le -> yuv444p9le -> gbrpf32le
avg diff: 0.004053      avg diff: 0.000906
min diff: 0.000001      min diff: 0.000000
max diff: 0.009402     max diff: 0.003313

gbrpf32le -> yuv444p10le -> gbrpf32le
avg diff: 0.003960      avg diff: 0.000467
min diff: 0.000000      min diff: 0.000000
max diff: 0.008123     max diff: 0.001912

gbrpf32le -> yuv444p12le -> gbrpf32le
avg diff: 0.003878      avg diff: 0.000166
min diff: 0.000000      min diff: 0.000000
max diff: 0.007011     max diff: 0.000802

gbrpf32le -> yuv444p14le -> gbrpf32le
avg diff: 0.003868      avg diff: 0.000127
min diff: 0.000000      min diff: 0.000000
max diff: 0.006729     max diff: 0.000524

gbrpf32le -> rgb24 -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> bgr24 -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> rgba -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> bgra -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> argb -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> abgr -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> 0rgb -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> 0bgr -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> rgb0 -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> bgr0 -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> rgb48le -> gbrpf32le
avg diff: 0.003851      avg diff: 0.000249
min diff: 0.000000      min diff: 0.000000
max diff: 0.007076     max diff: 0.000990

gbrpf32le -> bgr48le -> gbrpf32le
avg diff: 0.003851      avg diff: 0.000249
min diff: 0.000000      min diff: 0.000000
max diff: 0.007076     max diff: 0.000990

gbrpf32le -> rgba64le -> gbrpf32le
avg diff: 0.003851      avg diff: 0.000249
min diff: 0.000000      min diff: 0.000000
max diff: 0.007076     max diff: 0.000990

gbrpf32le -> bgra64le -> gbrpf32le
avg diff: 0.003851      avg diff: 0.000249
min diff: 0.000000      min diff: 0.000000
max diff: 0.007076     max diff: 0.000990

gbrpf32le -> gbrp -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> gbrap -> gbrpf32le
avg diff: 0.004122      avg diff: 0.001011
min diff: 0.000000      min diff: 0.000000
max diff: 0.008975     max diff: 0.004229

gbrpf32le -> gbrp9le -> gbrpf32le
avg diff: 0.007737      avg diff: 0.003917
min diff: 0.000000      min diff: 0.000000
max diff: 0.014009     max diff: 0.007870

gbrpf32le -> gbrp10le -> gbrpf32le
avg diff: 0.007662      avg diff: 0.003841
min diff: 0.000000      min diff: 0.000000
max diff: 0.013605     max diff: 0.007456

gbrpf32le -> gbrap10le -> gbrpf32le
avg diff: 0.007662      avg diff: 0.003841
min diff: 0.000000      min diff: 0.000000
max diff: 0.013605     max diff: 0.007456

gbrpf32le -> gbrp12le -> gbrpf32le
avg diff: 0.007622      avg diff: 0.003796
min diff: 0.000000      min diff: 0.000000
max diff: 0.013335     max diff: 0.007140

gbrpf32le -> gbrap12le -> gbrpf32le
avg diff: 0.007622      avg diff: 0.003796
min diff: 0.000000      min diff: 0.000000
max diff: 0.013335     max diff: 0.007140

gbrpf32le -> gbrp14le -> gbrpf32le
avg diff: 0.007620      avg diff: 0.003792
min diff: 0.000000      min diff: 0.000000
max diff: 0.013232    max diff: 0.007034

gbrpf32le -> gbrp16le -> gbrpf32le
avg diff: 0.007680      avg diff: 0.003853
min diff: 0.000000      min diff: 0.000000
max diff: 0.013275     max diff: 0.007098

gbrpf32le -> gbrap16le -> gbrpf32le
avg diff: 0.007680      avg diff: 0.003853
min diff: 0.000000      min diff: 0.000000
max diff: 0.013275     max diff: 0.007098


>
> But from what i  see above, obviously this is an improvment and should be
> applied
>
> thx
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Awnsering whenever a program halts or runs forever is
> On a turing machine, in general impossible (turings halting problem).
> On any real computer, always possible as a real computer has a finite
> number
> of states N, and will either halt in less than N cycles or never halt.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
diff mbox series

Patch

diff --git a/libswscale/input.c b/libswscale/input.c
index 064ed5902f..67a85b0418 100644
--- a/libswscale/input.c
+++ b/libswscale/input.c
@@ -984,15 +984,14 @@  static av_always_inline void planar_rgbf32_to_uv(uint8_t *_dstU, uint8_t *_dstV,
     uint16_t *dstV       = (uint16_t *)_dstV;
     int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX];
     int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = rgb2yuv[BV_IDX];
-    int bpc = 16;
-    int shift = 14;
+
     for (i = 0; i < width; i++) {
         int g = av_clip_uint16(lrintf(65535.0f * rdpx(src[0] + i)));
         int b = av_clip_uint16(lrintf(65535.0f * rdpx(src[1] + i)));
         int r = av_clip_uint16(lrintf(65535.0f * rdpx(src[2] + i)));
 
-        dstU[i] = (ru*r + gu*g + bu*b + (257 << (RGB2YUV_SHIFT + bpc - 9))) >> (RGB2YUV_SHIFT + shift - 14);
-        dstV[i] = (rv*r + gv*g + bv*b + (257 << (RGB2YUV_SHIFT + bpc - 9))) >> (RGB2YUV_SHIFT + shift - 14);
+        dstU[i] = (ru*r + gu*g + bu*b + (0x10001 << (RGB2YUV_SHIFT - 1))) >> RGB2YUV_SHIFT;
+        dstV[i] = (rv*r + gv*g + bv*b + (0x10001 << (RGB2YUV_SHIFT - 1))) >> RGB2YUV_SHIFT;
     }
 }
 
@@ -1003,14 +1002,13 @@  static av_always_inline void planar_rgbf32_to_y(uint8_t *_dst, const uint8_t *_s
     uint16_t *dst    = (uint16_t *)_dst;
 
     int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = rgb2yuv[BY_IDX];
-    int bpc = 16;
-    int shift = 14;
+
     for (i = 0; i < width; i++) {
         int g = av_clip_uint16(lrintf(65535.0f * rdpx(src[0] + i)));
         int b = av_clip_uint16(lrintf(65535.0f * rdpx(src[1] + i)));
         int r = av_clip_uint16(lrintf(65535.0f * rdpx(src[2] + i)));
 
-        dst[i] = ((ry*r + gy*g + by*b + (33 << (RGB2YUV_SHIFT + bpc - 9))) >> (RGB2YUV_SHIFT + shift - 14));
+        dst[i] = (ry*r + gy*g + by*b + (0x2001 << (RGB2YUV_SHIFT - 1))) >> RGB2YUV_SHIFT;
     }
 }
 
diff --git a/tests/ref/fate/filter-pixfmts-scale b/tests/ref/fate/filter-pixfmts-scale
index d7020ad2c3..30e7cd5b06 100644
--- a/tests/ref/fate/filter-pixfmts-scale
+++ b/tests/ref/fate/filter-pixfmts-scale
@@ -25,8 +25,8 @@  gbrap12be           1d9b57766ba9c2192403f43967cb9af0
 gbrap12le           bb1ba1c157717db3dd612a76d38a018e
 gbrap16be           c72b935a6e57a8e1c37bff08c2db55b1
 gbrap16le           13eb0e62b1ac9c1c86c81521eaefab5f
-gbrapf32be          42e53d9edccbd9e09c4cd78780ba92f3
-gbrapf32le          eebf3973ef94c841f0a1ceb1ed61621d
+gbrapf32be          366b804d5697276e8c481c4bdf05a00b
+gbrapf32le          558a268e6d6b907449d1056afab78f29
 gbrp                dc3387f925f972c61aae7eb23cdc19f0
 gbrp10be            0277d4c3a8498d75e2783fb81379e481
 gbrp10le            f3d70f8ab845c3c9b8f7452e4a6e285a
@@ -38,8 +38,8 @@  gbrp16be            5fc826cfabebfc1442cb793c4b6303e2
 gbrp16le            1b3e0b63d47a3e1b6b20931316883bf2
 gbrp9be             d9c88968001e1452ff31fbc8d16b18a0
 gbrp9le             2ccfed0816bf6bd4bb3a5b7591d9603a
-gbrpf32be           4614d32e4417f80e0adcc1bdcf6cde42
-gbrpf32le           1366ee77e5559672260bbe51040e28b2
+gbrpf32be           f3d0cefdf11c861001880772d817aac8
+gbrpf32le           290468205c1c18a0667edfca45061aee
 gray                221201cc7cfc4964eacd8b3e426fd276
 gray10be            9452756d0b37f4f5c7cae7635e22d747
 gray10le            37fd2e1ec6b66410212d39a342e864df