diff mbox

[FFmpeg-devel] dstdec: big-endian compatiblity

Message ID ecca03d49a94432344f07118e6bef02cb06bc736.1546772896.git.pross@xvid.org
State Accepted
Headers show

Commit Message

Peter Ross Jan. 6, 2019, 11:12 a.m. UTC
for the '127-bit shift left' algorithm to work as intended, little-endian
reads and writes must be used.

 libavcodec/dstdec.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Carl Eugen Hoyos Jan. 6, 2019, 11:57 a.m. UTC | #1
2019-01-06 12:12 GMT+01:00, Peter Ross <pross@xvid.org>:
> for the '127-bit shift left' algorithm to work as intended, little-endian
> reads and writes must be used.
>
>  libavcodec/dstdec.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/dstdec.c b/libavcodec/dstdec.c
> index 511861f4d2..e9653edc9f 100644
> --- a/libavcodec/dstdec.c
> +++ b/libavcodec/dstdec.c
> @@ -343,8 +343,15 @@ static int decode_frame(AVCodecContext *avctx, void
> *data,
>              v = ((predict >> 15) ^ residual) & 1;
>              dsd[((i >> 3) * channels + ch) << 2] |= v << (7 - (i & 0x7 ));
>
> -            AV_WN64A(status + 8, (AV_RN64A(status + 8) << 1) |
> ((AV_RN64A(status) >> 63) & 1));
> -            AV_WN64A(status, (AV_RN64A(status) << 1) | v);
> +#if HAVE_BIGENDIAN
> +#define RL64A AV_RL64
> +#define WL64A AV_WL64
> +#else
> +#define RL64A AV_RN64A
> +#define WL64A AV_WN64A
> +#endif
> +            WL64A(status + 8, (RL64A(status + 8) << 1) | ((RL64A(status) >>
> 63) & 1));
> +            WL64A(status, (RL64A(status) << 1) | v);

Why not using AV_WL64() and and AV_RL64()?
Is there a measurable speed difference?

Carl Eugen
Peter Ross Jan. 6, 2019, 10:55 p.m. UTC | #2
On Sun, Jan 06, 2019 at 12:57:37PM +0100, Carl Eugen Hoyos wrote:
> 2019-01-06 12:12 GMT+01:00, Peter Ross <pross@xvid.org>:
> > for the '127-bit shift left' algorithm to work as intended, little-endian
> > reads and writes must be used.
> >
> Why not using AV_WL64() and and AV_RL64()?

good question.

> Is there a measurable speed difference?

x86: no difference, compiler output is identical.

other cpus that do not support unaligned reads: big difference, due to the
compiler inserting additional instructions to check the alignment of the data.

mipsel:
RN64A: bench: utime=105.902s stime=0.040s rtime=105.933s
RN64:  bench: utime=230.055s stime=0.004s rtime=230.082s

why so much difference? the 127-bit shift left operation must happen for
each 1-bit DSD sample. in a single channel of DSD audio, there are at
least 2.8 millions 1-bit samples per second.

-- Peter
(A907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B)
Carl Eugen Hoyos Jan. 6, 2019, 11:12 p.m. UTC | #3
2019-01-06 23:55 GMT+01:00, Peter Ross <pross@xvid.org>:
> On Sun, Jan 06, 2019 at 12:57:37PM +0100, Carl Eugen Hoyos wrote:
>> 2019-01-06 12:12 GMT+01:00, Peter Ross <pross@xvid.org>:
>> > for the '127-bit shift left' algorithm to work as intended,
>> > little-endian
>> > reads and writes must be used.
>> >
>> Why not using AV_WL64() and and AV_RL64()?
>
> good question.
>
>> Is there a measurable speed difference?
>
> x86: no difference, compiler output is identical.
>
> other cpus that do not support unaligned reads: big difference, due to the
> compiler inserting additional instructions to check the alignment of the
> data.
>
> mipsel:
> RN64A: bench: utime=105.902s stime=0.040s rtime=105.933s
> RN64:  bench: utime=230.055s stime=0.004s rtime=230.082s
>
> why so much difference? the 127-bit shift left operation must happen for
> each 1-bit DSD sample. in a single channel of DSD audio, there are at
> least 2.8 millions 1-bit samples per second.

Sounds to me as if we would need AV_R[BL]xxA which is aligned
for native endian (only).

But this may be unrelated to your patch, you could add a comment
tough that this makes a difference.

Thank you, Carl Eugen
diff mbox

Patch

diff --git a/libavcodec/dstdec.c b/libavcodec/dstdec.c
index 511861f4d2..e9653edc9f 100644
--- a/libavcodec/dstdec.c
+++ b/libavcodec/dstdec.c
@@ -343,8 +343,15 @@  static int decode_frame(AVCodecContext *avctx, void *data,
             v = ((predict >> 15) ^ residual) & 1;
             dsd[((i >> 3) * channels + ch) << 2] |= v << (7 - (i & 0x7 ));
 
-            AV_WN64A(status + 8, (AV_RN64A(status + 8) << 1) | ((AV_RN64A(status) >> 63) & 1));
-            AV_WN64A(status, (AV_RN64A(status) << 1) | v);
+#if HAVE_BIGENDIAN
+#define RL64A AV_RL64
+#define WL64A AV_WL64
+#else
+#define RL64A AV_RN64A
+#define WL64A AV_WN64A
+#endif
+            WL64A(status + 8, (RL64A(status + 8) << 1) | ((RL64A(status) >> 63) & 1));
+            WL64A(status, (RL64A(status) << 1) | v);
         }
     }