Message ID | ecca03d49a94432344f07118e6bef02cb06bc736.1546772896.git.pross@xvid.org |
---|---|
State | Accepted |
Headers | show |
2019-01-06 12:12 GMT+01:00, Peter Ross <pross@xvid.org>: > for the '127-bit shift left' algorithm to work as intended, little-endian > reads and writes must be used. > > libavcodec/dstdec.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/dstdec.c b/libavcodec/dstdec.c > index 511861f4d2..e9653edc9f 100644 > --- a/libavcodec/dstdec.c > +++ b/libavcodec/dstdec.c > @@ -343,8 +343,15 @@ static int decode_frame(AVCodecContext *avctx, void > *data, > v = ((predict >> 15) ^ residual) & 1; > dsd[((i >> 3) * channels + ch) << 2] |= v << (7 - (i & 0x7 )); > > - AV_WN64A(status + 8, (AV_RN64A(status + 8) << 1) | > ((AV_RN64A(status) >> 63) & 1)); > - AV_WN64A(status, (AV_RN64A(status) << 1) | v); > +#if HAVE_BIGENDIAN > +#define RL64A AV_RL64 > +#define WL64A AV_WL64 > +#else > +#define RL64A AV_RN64A > +#define WL64A AV_WN64A > +#endif > + WL64A(status + 8, (RL64A(status + 8) << 1) | ((RL64A(status) >> > 63) & 1)); > + WL64A(status, (RL64A(status) << 1) | v); Why not using AV_WL64() and and AV_RL64()? Is there a measurable speed difference? Carl Eugen
On Sun, Jan 06, 2019 at 12:57:37PM +0100, Carl Eugen Hoyos wrote: > 2019-01-06 12:12 GMT+01:00, Peter Ross <pross@xvid.org>: > > for the '127-bit shift left' algorithm to work as intended, little-endian > > reads and writes must be used. > > > Why not using AV_WL64() and and AV_RL64()? good question. > Is there a measurable speed difference? x86: no difference, compiler output is identical. other cpus that do not support unaligned reads: big difference, due to the compiler inserting additional instructions to check the alignment of the data. mipsel: RN64A: bench: utime=105.902s stime=0.040s rtime=105.933s RN64: bench: utime=230.055s stime=0.004s rtime=230.082s why so much difference? the 127-bit shift left operation must happen for each 1-bit DSD sample. in a single channel of DSD audio, there are at least 2.8 millions 1-bit samples per second. -- Peter (A907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B)
2019-01-06 23:55 GMT+01:00, Peter Ross <pross@xvid.org>: > On Sun, Jan 06, 2019 at 12:57:37PM +0100, Carl Eugen Hoyos wrote: >> 2019-01-06 12:12 GMT+01:00, Peter Ross <pross@xvid.org>: >> > for the '127-bit shift left' algorithm to work as intended, >> > little-endian >> > reads and writes must be used. >> > >> Why not using AV_WL64() and and AV_RL64()? > > good question. > >> Is there a measurable speed difference? > > x86: no difference, compiler output is identical. > > other cpus that do not support unaligned reads: big difference, due to the > compiler inserting additional instructions to check the alignment of the > data. > > mipsel: > RN64A: bench: utime=105.902s stime=0.040s rtime=105.933s > RN64: bench: utime=230.055s stime=0.004s rtime=230.082s > > why so much difference? the 127-bit shift left operation must happen for > each 1-bit DSD sample. in a single channel of DSD audio, there are at > least 2.8 millions 1-bit samples per second. Sounds to me as if we would need AV_R[BL]xxA which is aligned for native endian (only). But this may be unrelated to your patch, you could add a comment tough that this makes a difference. Thank you, Carl Eugen
diff --git a/libavcodec/dstdec.c b/libavcodec/dstdec.c index 511861f4d2..e9653edc9f 100644 --- a/libavcodec/dstdec.c +++ b/libavcodec/dstdec.c @@ -343,8 +343,15 @@ static int decode_frame(AVCodecContext *avctx, void *data, v = ((predict >> 15) ^ residual) & 1; dsd[((i >> 3) * channels + ch) << 2] |= v << (7 - (i & 0x7 )); - AV_WN64A(status + 8, (AV_RN64A(status + 8) << 1) | ((AV_RN64A(status) >> 63) & 1)); - AV_WN64A(status, (AV_RN64A(status) << 1) | v); +#if HAVE_BIGENDIAN +#define RL64A AV_RL64 +#define WL64A AV_WL64 +#else +#define RL64A AV_RN64A +#define WL64A AV_WN64A +#endif + WL64A(status + 8, (RL64A(status + 8) << 1) | ((RL64A(status) >> 63) & 1)); + WL64A(status, (RL64A(status) << 1) | v); } }