diff mbox

[FFmpeg-devel,v2] swscale: add unscaled conversion from yuv420p to p010

Message ID 20160903132522.29006-1-timo@rothenpieler.org
State Superseded
Headers show

Commit Message

Timo Rothenpieler Sept. 3, 2016, 1:25 p.m. UTC
---
 libswscale/swscale_unscaled.c        | 57 ++++++++++++++++++++++++++++++++++++
 tests/ref/fate/filter-pixdesc-p010le |  2 +-
 tests/ref/fate/filter-pixfmts-copy   |  2 +-
 tests/ref/fate/filter-pixfmts-crop   |  2 +-
 tests/ref/fate/filter-pixfmts-field  |  2 +-
 tests/ref/fate/filter-pixfmts-hflip  |  2 +-
 tests/ref/fate/filter-pixfmts-il     |  2 +-
 tests/ref/fate/filter-pixfmts-null   |  2 +-
 tests/ref/fate/filter-pixfmts-scale  |  2 +-
 tests/ref/fate/filter-pixfmts-vflip  |  2 +-
 10 files changed, 66 insertions(+), 9 deletions(-)

Comments

Timo Rothenpieler Sept. 3, 2016, 1:27 p.m. UTC | #1
> @@ -236,6 +237,57 @@ static int planarToP010Wrapper(SwsContext *c, const uint8_t *src8[],
>      return srcSliceH;
>  }
>  
> +#if AV_HAVE_BIGENDIAN || 1

Nevermind the || 1, left over from testing speed differences and forgot
to remove it.
Carl Eugen Hoyos Sept. 4, 2016, 10:43 a.m. UTC | #2
2016-09-03 15:25 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:

> +            output_pixel(tdstY++, (t | (t << 8)) & 0xFFC0);

Please remove the "& 0xFFC0" here and below.

Carl Eugen
Paul B Mahol Sept. 4, 2016, 11:10 a.m. UTC | #3
On 9/4/16, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2016-09-03 15:25 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
>
>> +            output_pixel(tdstY++, (t | (t << 8)) & 0xFFC0);
>
> Please remove the "& 0xFFC0" here and below.

Please explain your reasoning here.
Carl Eugen Hoyos Sept. 4, 2016, 11:18 a.m. UTC | #4
2016-09-04 13:10 GMT+02:00 Paul B Mahol <onemda@gmail.com>:
> On 9/4/16, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2016-09-03 15:25 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
>>
>>> +            output_pixel(tdstY++, (t | (t << 8)) & 0xFFC0);
>>
>> Please remove the "& 0xFFC0" here and below.
>
> Please explain your reasoning here.

The linked specification clearly explains that the content
of P016 and P010 is identical and that it is not necessary
to set the lsb's to 0.
I believe we agree that the patch is slightly ugly because
of the defines but developers seem to agree that it has
to be accepted for performance reasons (nobody
objected). Even if "& 0xFFC0" is very cheap, I don't think
it comes for free. (Or does it?)

Finally, with the change, the function can also be used
for P016, note that I tried to object to P010: It does not
serve any real purpose, if I remember correctly, the
explanation for the commit was that there is a bug in
FFmpeg's pix_fmt decision routine that needed to
be worked-around ("hacked").

Carl Eugen
Hendrik Leppkes Sept. 4, 2016, 1:42 p.m. UTC | #5
On Sun, Sep 4, 2016 at 1:18 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> 2016-09-04 13:10 GMT+02:00 Paul B Mahol <onemda@gmail.com>:
>> On 9/4/16, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 2016-09-03 15:25 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
>>>
>>>> +            output_pixel(tdstY++, (t | (t << 8)) & 0xFFC0);
>>>
>>> Please remove the "& 0xFFC0" here and below.
>>
>> Please explain your reasoning here.
>
> The linked specification clearly explains that the content
> of P016 and P010 is identical and that it is not necessary
> to set the lsb's to 0.

While the specification does say that P010's memory layout was
particularly chosen to be able to simply alias to P016 if needed,
having random garbage in there is not good, so the zeroing of those
bits should remain.

- Hendrik
Carl Eugen Hoyos Sept. 4, 2016, 1:52 p.m. UTC | #6
2016-09-04 15:42 GMT+02:00 Hendrik Leppkes <h.leppkes@gmail.com>:
> On Sun, Sep 4, 2016 at 1:18 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> 2016-09-04 13:10 GMT+02:00 Paul B Mahol <onemda@gmail.com>:
>>> On 9/4/16, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>>> 2016-09-03 15:25 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
>>>>
>>>>> +            output_pixel(tdstY++, (t | (t << 8)) & 0xFFC0);
>>>>
>>>> Please remove the "& 0xFFC0" here and below.
>>>
>>> Please explain your reasoning here.
>>
>> The linked specification clearly explains that the content
>> of P016 and P010 is identical and that it is not necessary
>> to set the lsb's to 0.
>
> While the specification does say that P010's memory layout
> was particularly chosen to be able to simply alias to P016 if
> needed, having random garbage in there is not good, so
> the zeroing of those bits should remain.

Since it is not garbage (and not random), your argumentation
makes no sense.

Is the operation really free as you all seem to insinuate?

Carl Eugen
Timo Rothenpieler Sept. 4, 2016, 2:02 p.m. UTC | #7
> Finally, with the change, the function can also be used
> for P016, note that I tried to object to P010: It does not
> serve any real purpose, if I remember correctly, the
> explanation for the commit was that there is a bug in
> FFmpeg's pix_fmt decision routine that needed to
> be worked-around ("hacked").

It's the input format to nvenc in 10bit mode.
The purpose of this patch is to make conversion from yuv420p (8 bit) to
p010 (10 bit) fast.
Ali KIZIL Sept. 4, 2016, 2:06 p.m. UTC | #8
On Sep 4, 2016 5:02 PM, "Timo Rothenpieler" <timo@rothenpieler.org> wrote:
>
> > Finally, with the change, the function can also be used
> > for P016, note that I tried to object to P010: It does not
> > serve any real purpose, if I remember correctly, the
> > explanation for the commit was that there is a bug in
> > FFmpeg's pix_fmt decision routine that needed to
> > be worked-around ("hacked").
>
> It's the input format to nvenc in 10bit mode.
> The purpose of this patch is to make conversion from yuv420p (8 bit) to
> p010 (10 bit) fast.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Especially for 4K and UHD frames.
You may not notice conversion speed change in HD or SD resolutions.
Carl Eugen Hoyos Sept. 4, 2016, 2:06 p.m. UTC | #9
2016-09-04 16:02 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
> The purpose of this patch is to make conversion from
> yuv420p (8 bit) to p010 (10 bit) fast.

Do I understand you correctly that your patch is
faster without the change I suggested?

Carl Eugen
Ronald S. Bultje Sept. 4, 2016, 2:27 p.m. UTC | #10
Hi,

On Sun, Sep 4, 2016 at 10:06 AM, Carl Eugen Hoyos <ceffmpeg@gmail.com>
wrote:

> 2016-09-04 16:02 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
> > The purpose of this patch is to make conversion from
> > yuv420p (8 bit) to p010 (10 bit) fast.
>
> Do I understand you correctly that your patch is
> faster without the change I suggested?


We should also consider whether not zeroing out the LSBs affects other
pieces of code (or hardware) which support p010 as input.

If they all ignore the LSBs as the spec seems to require, I agree with
Carl. If they break or the output changes, it may be more complicated.

Ronald
Ali KIZIL Sept. 4, 2016, 2:36 p.m. UTC | #11
Hi,

On Sep 4, 2016 5:33 PM, "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
>
> Hi,
>
> On Sun, Sep 4, 2016 at 10:06 AM, Carl Eugen Hoyos <ceffmpeg@gmail.com>
> wrote:
>
> > 2016-09-04 16:02 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
> > > The purpose of this patch is to make conversion from
> > > yuv420p (8 bit) to p010 (10 bit) fast.
> >
> > Do I understand you correctly that your patch is
> > faster without the change I suggested?
>
>
> We should also consider whether not zeroing out the LSBs affects other
> pieces of code (or hardware) which support p010 as input.
>
> If they all ignore the LSBs as the spec seems to require, I agree with
> Carl. If they break or the output changes, it may be more complicated.
>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

I can test the patch in Monday with and without zeros and report if it
break anything for Nvenc Hevc encoding.
Timo Rothenpieler Sept. 4, 2016, 3:42 p.m. UTC | #12
On 9/4/2016 4:06 PM, Carl Eugen Hoyos wrote:
> 2016-09-04 16:02 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
>> The purpose of this patch is to make conversion from
>> yuv420p (8 bit) to p010 (10 bit) fast.
> 
> Do I understand you correctly that your patch is
> faster without the change I suggested?

With the &:
[bench @ 0x600045b80] t:0.011178 avg:0.011172 max:0.018297 min:0.010505

Without it:
[bench @ 0x600045b80] t:0.008455 avg:0.008517 max:0.015815 min:0.007941

So it is quite a bit faster.

Tested with nvenc hevc10 encoding, and the output is visually identical,
and the file size is also exactly the same.
So it seems to cleanly ignore the unused bits.

Also, given that at least microsoft argues with upcasting to 16 bit, the
approach without zeroing the lsb would be more accurate, as

t << 8 | t

is how one would convert 8 bit to 16 bit.


So I'd say going with the faster approach here should be fine.
If at some point someone runs into something that chokes on the bits
being non-zero, which I think is highly unlikely, it can be changed back.
Ronald S. Bultje Sept. 4, 2016, 5:34 p.m. UTC | #13
Hi,

On Sep 4, 2016 5:42 PM, "Timo Rothenpieler" <timo@rothenpieler.org> wrote:
>
> On 9/4/2016 4:06 PM, Carl Eugen Hoyos wrote:
> > 2016-09-04 16:02 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
> >> The purpose of this patch is to make conversion from
> >> yuv420p (8 bit) to p010 (10 bit) fast.
> >
> > Do I understand you correctly that your patch is
> > faster without the change I suggested?
>
> With the &:
> [bench @ 0x600045b80] t:0.011178 avg:0.011172 max:0.018297 min:0.010505
>
> Without it:
> [bench @ 0x600045b80] t:0.008455 avg:0.008517 max:0.015815 min:0.007941
>
> So it is quite a bit faster.
>
> Tested with nvenc hevc10 encoding, and the output is visually identical,
> and the file size is also exactly the same.
> So it seems to cleanly ignore the unused bits.
>
> Also, given that at least microsoft argues with upcasting to 16 bit, the
> approach without zeroing the lsb would be more accurate, as
>
> t << 8 | t
>
> is how one would convert 8 bit to 16 bit.
>
>
> So I'd say going with the faster approach here should be fine.
> If at some point someone runs into something that chokes on the bits
> being non-zero, which I think is highly unlikely, it can be changed back.

Sounds good, thanks for testing and confirming.

Ronald
Ali KIZIL Sept. 5, 2016, 8:53 a.m. UTC | #14
2016-09-04 20:34 GMT+03:00 Ronald S. Bultje <rsbultje@gmail.com>:

> Hi,
>
> On Sep 4, 2016 5:42 PM, "Timo Rothenpieler" <timo@rothenpieler.org> wrote:
> >
> > On 9/4/2016 4:06 PM, Carl Eugen Hoyos wrote:
> > > 2016-09-04 16:02 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:
> > >> The purpose of this patch is to make conversion from
> > >> yuv420p (8 bit) to p010 (10 bit) fast.
> > >
> > > Do I understand you correctly that your patch is
> > > faster without the change I suggested?
> >
> > With the &:
> > [bench @ 0x600045b80] t:0.011178 avg:0.011172 max:0.018297 min:0.010505
> >
> > Without it:
> > [bench @ 0x600045b80] t:0.008455 avg:0.008517 max:0.015815 min:0.007941
> >
> > So it is quite a bit faster.
> >
> > Tested with nvenc hevc10 encoding, and the output is visually identical,
> > and the file size is also exactly the same.
> > So it seems to cleanly ignore the unused bits.
> >
> > Also, given that at least microsoft argues with upcasting to 16 bit, the
> > approach without zeroing the lsb would be more accurate, as
> >
> > t << 8 | t
> >
> > is how one would convert 8 bit to 16 bit.
> >
> >
> > So I'd say going with the faster approach here should be fine.
> > If at some point someone runs into something that chokes on the bits
> > being non-zero, which I think is highly unlikely, it can be changed back.
>
> Sounds good, thanks for testing and confirming.
>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Hello,

Is there any obstacle to release this patch ?
Carl Eugen Hoyos Sept. 5, 2016, 9:04 a.m. UTC | #15
2016-09-05 10:53 GMT+02:00 Ali KIZIL <alikizil@gmail.com>:

> Is there any obstacle to release this patch ?

Which patch?

Carl Eugen
Ali KIZIL Sept. 5, 2016, 9:06 a.m. UTC | #16
2016-09-05 12:04 GMT+03:00 Carl Eugen Hoyos <ceffmpeg@gmail.com>:

> 2016-09-05 10:53 GMT+02:00 Ali KIZIL <alikizil@gmail.com>:
>
> > Is there any obstacle to release this patch ?
>
> Which patch?
>
> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Sorry Carl, maybe I missed. I mean the patch for "add unscaled conversion
from yuv420p to p010".
I see the one "add unscaled conversion from yuv420p10 to p010" in github,
but I can not see the one "add unscaled conversion from yuv420p to p010".
Lastly it was confirmed that padding zeros were not needed.

Kind Regards,
Carl Eugen Hoyos Sept. 5, 2016, 9:07 a.m. UTC | #17
2016-09-03 15:25 GMT+02:00 Timo Rothenpieler <timo@rothenpieler.org>:

> +static int planar8ToP010leWrapper

If you want, you can rename this function to planar8ToP01xleWrapper
or planar8ToP016leWrapper after removing the binary and.

Carl Eugen
Carl Eugen Hoyos Sept. 5, 2016, 9:09 a.m. UTC | #18
2016-09-05 11:06 GMT+02:00 Ali KIZIL <alikizil@gmail.com>:
> 2016-09-05 12:04 GMT+03:00 Carl Eugen Hoyos <ceffmpeg@gmail.com>:
>
>> 2016-09-05 10:53 GMT+02:00 Ali KIZIL <alikizil@gmail.com>:
>>
>> > Is there any obstacle to release this patch ?
>>
>> Which patch?
>
> Sorry Carl, maybe I missed. I mean the patch for "add unscaled
> conversion from yuv420p to p010".

Everybody who commented since yesterday afternoon agrees that the
patch should be changed (slightly). So yes, there still is an obstacle.

Please cut your quotes and please give every OP more than a few
hours to resend.

Carl Eugen
diff mbox

Patch

diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index ca7374a..cca2302 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -33,6 +33,7 @@ 
 #include "libavutil/bswap.h"
 #include "libavutil/pixdesc.h"
 #include "libavutil/avassert.h"
+#include "libavutil/avconfig.h"
 
 DECLARE_ALIGNED(8, static const uint8_t, dithers)[8][8][8]={
 {
@@ -236,6 +237,57 @@  static int planarToP010Wrapper(SwsContext *c, const uint8_t *src8[],
     return srcSliceH;
 }
 
+#if AV_HAVE_BIGENDIAN || 1
+#define output_pixel(p, v) do { \
+        uint16_t *pp = (p); \
+        AV_WL16(pp, (v)); \
+    } while(0)
+#else
+#define output_pixel(p, v) (*p) = (v)
+#endif
+
+static int planar8ToP010leWrapper(SwsContext *c, const uint8_t *src[],
+                                  int srcStride[], int srcSliceY,
+                                  int srcSliceH, uint8_t *dstParam8[],
+                                  int dstStride[])
+{
+    uint16_t *dstY = (uint16_t*)(dstParam8[0] + dstStride[0] * srcSliceY);
+    uint16_t *dstUV = (uint16_t*)(dstParam8[1] + dstStride[1] * srcSliceY / 2);
+    int x, y, t;
+
+    av_assert0(!(dstStride[0] % 2 || dstStride[1] % 2));
+
+    for (y = 0; y < srcSliceH; y++) {
+        uint16_t *tdstY = dstY;
+        const uint8_t *tsrc0 = src[0];
+        for (x = c->srcW; x > 0; x--) {
+            t = *tsrc0++;
+            output_pixel(tdstY++, (t | (t << 8)) & 0xFFC0);
+        }
+        src[0] += srcStride[0];
+        dstY += dstStride[0] / 2;
+
+        if (!(y & 1)) {
+            uint16_t *tdstUV = dstUV;
+            const uint8_t *tsrc1 = src[1];
+            const uint8_t *tsrc2 = src[2];
+            for (x = c->srcW / 2; x > 0; x--) {
+                t = *tsrc1++;
+                output_pixel(tdstUV++, (t | (t << 8)) & 0xFFC0);
+                t = *tsrc2++;
+                output_pixel(tdstUV++, (t | (t << 8)) & 0xFFC0);
+            }
+            src[1] += srcStride[1];
+            src[2] += srcStride[2];
+            dstUV += dstStride[1] / 2;
+        }
+    }
+
+    return srcSliceH;
+}
+
+#undef output_pixel
+
 static int planarToYuy2Wrapper(SwsContext *c, const uint8_t *src[],
                                int srcStride[], int srcSliceY, int srcSliceH,
                                uint8_t *dstParam[], int dstStride[])
@@ -1645,6 +1697,11 @@  void ff_get_unscaled_swscale(SwsContext *c)
         dstFormat == AV_PIX_FMT_P010) {
         c->swscale = planarToP010Wrapper;
     }
+    /* yuv420p_to_p010le */
+    if ((srcFormat == AV_PIX_FMT_YUV420P || srcFormat == AV_PIX_FMT_YUVA420P) &&
+        dstFormat == AV_PIX_FMT_P010LE) {
+        c->swscale = planar8ToP010leWrapper;
+    }
 
     if (srcFormat == AV_PIX_FMT_YUV410P && !(dstH & 3) &&
         (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
diff --git a/tests/ref/fate/filter-pixdesc-p010le b/tests/ref/fate/filter-pixdesc-p010le
index cac2635..2500604 100644
--- a/tests/ref/fate/filter-pixdesc-p010le
+++ b/tests/ref/fate/filter-pixdesc-p010le
@@ -1 +1 @@ 
-pixdesc-p010le      0268fd44f63022e21ada69704534fc85
+pixdesc-p010le      7b4a503997eb4e14cba80ee52db85e39
diff --git a/tests/ref/fate/filter-pixfmts-copy b/tests/ref/fate/filter-pixfmts-copy
index ce957f7..bcc4475 100644
--- a/tests/ref/fate/filter-pixfmts-copy
+++ b/tests/ref/fate/filter-pixfmts-copy
@@ -36,7 +36,7 @@  monow               54d16d2c01abfd72ecdb5e51e283937c
 nv12                8e24feb2c544dc26a20047a71e4c27aa
 nv21                335d85c9af6110f26ae9e187a82ed2cf
 p010be              7f9842d6015026136bad60d03c035cc3
-p010le              1929db89609c4b8c6d9c9030a9e7843d
+p010le              9ba7bc4611e36b2435eb2dff353b8af5
 pal8                ff5929f5b42075793b2c34cb441bede5
 rgb0                0de71e5a1f97f81fb51397a0435bfa72
 rgb24               f4438057d046e6d98ade4e45294b21be
diff --git a/tests/ref/fate/filter-pixfmts-crop b/tests/ref/fate/filter-pixfmts-crop
index e2c77a8..51c6df9 100644
--- a/tests/ref/fate/filter-pixfmts-crop
+++ b/tests/ref/fate/filter-pixfmts-crop
@@ -34,7 +34,7 @@  gray16le            9ff7c866bd98def4e6c91542c1c45f80
 nv12                92cda427f794374731ec0321ee00caac
 nv21                1bcfc197f4fb95de85ba58182d8d2f69
 p010be              8b2de2eb6b099bbf355bfc55a0694ddc
-p010le              a1e4f713e145dfc465bfe0cc77096a03
+p010le              fa78436272020be0d2569139808429b6
 pal8                1f2cdc8e718f95c875dbc1034a688bfb
 rgb0                736646b70dd9a0be22b8da8041e35035
 rgb24               c5fbbf816bb2000f4d2914e335698ef5
diff --git a/tests/ref/fate/filter-pixfmts-field b/tests/ref/fate/filter-pixfmts-field
index 20c76a1..8232a5c 100644
--- a/tests/ref/fate/filter-pixfmts-field
+++ b/tests/ref/fate/filter-pixfmts-field
@@ -36,7 +36,7 @@  monow               03d783611d265cae78293f88ea126ea1
 nv12                16f7a46708ef25ebd0b72e47920cc11e
 nv21                7294574037cc7f9373ef5695d8ebe809
 p010be              a0311a09bba7383553267d2b3b9c075e
-p010le              f1cc90d292046109a626db2da9f0f9b6
+p010le              634c62ef33b362795339a03907a33137
 pal8                0658c18dcd8d052d59dfbe23f5b368d9
 rgb0                ca3fa6e865b91b3511c7f2bf62830059
 rgb24               25ab271e26a5785be169578d99da5dd0
diff --git a/tests/ref/fate/filter-pixfmts-hflip b/tests/ref/fate/filter-pixfmts-hflip
index 8e902fb..9417eb8 100644
--- a/tests/ref/fate/filter-pixfmts-hflip
+++ b/tests/ref/fate/filter-pixfmts-hflip
@@ -34,7 +34,7 @@  gray16le            d91ce41e304419bcf32ac792f01bd64f
 nv12                801e58f1be5fd0b5bc4bf007c604b0b4
 nv21                9f10dfff8963dc327d3395af21f0554f
 p010be              744b13e44d39e1ff7588983fa03e0101
-p010le              aeb31f50c66f376b0530c7bb6287212b
+p010le              447768b443c5cd8ff591ccb53463f220
 pal8                5b7c77d99817b4f52339742a47de7797
 rgb0                0092452f37d73da20193265ace0b7d57
 rgb24               21571104e6091a689feabb7867e513dd
diff --git a/tests/ref/fate/filter-pixfmts-il b/tests/ref/fate/filter-pixfmts-il
index e568843..2ca37a3 100644
--- a/tests/ref/fate/filter-pixfmts-il
+++ b/tests/ref/fate/filter-pixfmts-il
@@ -36,7 +36,7 @@  monow               6e9cfb8d3a344c5f0c3e1d5e1297e580
 nv12                3c3ba9b1b4c4dfff09c26f71b51dd146
 nv21                ab586d8781246b5a32d8760a61db9797
 p010be              3df51286ef66b53e3e283dbbab582263
-p010le              38945445b360fa737e9e37257393e823
+p010le              7c101300e86f25e5528583ed811f8d25
 rgb0                cfaf68671e43248267d8cd50cae8c13f
 rgb24               88894f608cf33ba310f21996748d77a7
 rgb444be            99d36d814988fb388aacdef575dacfcf
diff --git a/tests/ref/fate/filter-pixfmts-null b/tests/ref/fate/filter-pixfmts-null
index ce957f7..bcc4475 100644
--- a/tests/ref/fate/filter-pixfmts-null
+++ b/tests/ref/fate/filter-pixfmts-null
@@ -36,7 +36,7 @@  monow               54d16d2c01abfd72ecdb5e51e283937c
 nv12                8e24feb2c544dc26a20047a71e4c27aa
 nv21                335d85c9af6110f26ae9e187a82ed2cf
 p010be              7f9842d6015026136bad60d03c035cc3
-p010le              1929db89609c4b8c6d9c9030a9e7843d
+p010le              9ba7bc4611e36b2435eb2dff353b8af5
 pal8                ff5929f5b42075793b2c34cb441bede5
 rgb0                0de71e5a1f97f81fb51397a0435bfa72
 rgb24               f4438057d046e6d98ade4e45294b21be
diff --git a/tests/ref/fate/filter-pixfmts-scale b/tests/ref/fate/filter-pixfmts-scale
index 62ea6fa..6320557 100644
--- a/tests/ref/fate/filter-pixfmts-scale
+++ b/tests/ref/fate/filter-pixfmts-scale
@@ -36,7 +36,7 @@  monow               35c68b86c226d6990b2dcb573a05ff6b
 nv12                b118d24a3653fe66e5d9e079033aef79
 nv21                c74bb1c10dbbdee8a1f682b194486c4d
 p010be              1d6726d94bf1385996a9a9840dd0e878
-p010le              5d436e6b35292a0e356d81f37f989b66
+p010le              4b316f2b9e18972299beb73511278fa8
 pal8                29e10892009b2cfe431815ec3052ed3b
 rgb0                fbd27e98154efb7535826afed41e9bb0
 rgb24               e022e741451e81f2ecce1c7240b93e87
diff --git a/tests/ref/fate/filter-pixfmts-vflip b/tests/ref/fate/filter-pixfmts-vflip
index 9651724..667a1b6 100644
--- a/tests/ref/fate/filter-pixfmts-vflip
+++ b/tests/ref/fate/filter-pixfmts-vflip
@@ -36,7 +36,7 @@  monow               90a947bfcd5f2261e83b577f48ec57b1
 nv12                261ebe585ae2aa4e70d39a10c1679294
 nv21                2909feacd27bebb080c8e0fa41795269
 p010be              06e9354b6e0e38ba41736352cedc0bd5
-p010le              cdf6a3c38d9d4e3f079fa369e1dda662
+p010le              9686439b0d21cae1949d6aebe98b5e88
 pal8                450b0155d0f2d5628bf95a442db5f817
 rgb0                56a7ea69541bcd27bef6a5615784722b
 rgb24               195e6dae1c3a488b9d3ceb7560d25d85