Message ID | 20181204170258.GE3501@michaelspb |
---|---|
State | Accepted |
Headers | show |
2018-12-04 18:02 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>: > On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote: >> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>: >> > The affected functions could also be changed into macros, this is the >> > smaller change to fix it though. And avoids (probably) less readable >> > macros >> >> > The extra code should be optimized out when optimizations are done as >> > all >> > values are known at build after inlining. >> >> Shouldn't this be verified? > > ive verified it with the patch below Thank you! Carl Eugen
On Wed, Dec 05, 2018 at 12:15:14AM +0100, Carl Eugen Hoyos wrote: > 2018-12-04 18:02 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>: > > On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote: > >> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>: > >> > The affected functions could also be changed into macros, this is the > >> > smaller change to fix it though. And avoids (probably) less readable > >> > macros > >> > >> > The extra code should be optimized out when optimizations are done as > >> > all > >> > values are known at build after inlining. > >> > >> Shouldn't this be verified? > > > > ive verified it with the patch below > > Thank you! will apply thx [...]
diff --git a/libavcodec/ppc/hevcdsp.c b/libavcodec/ppc/hevcdsp.c index c1d562a409..47246ed42d 100644 --- a/libavcodec/ppc/hevcdsp.c +++ b/libavcodec/ppc/hevcdsp.c @@ -57,13 +57,14 @@ static av_always_inline void transform4x4(vec_s16 src_01, vec_s16 src_23, o0 = vec_msums(src_13, trans4[1], zero); e1 = vec_msums(src_02, trans4[2], zero); o1 = vec_msums(src_13, trans4[3], zero); - +__asm volatile ("MARK\n\t"); switch(shift) { case 7: add = vec_sl(vec_splat_s32(1), vec_splat_u32( 7 - 1)); break; case 10: add = vec_sl(vec_splat_s32(1), vec_splat_u32(10 - 1)); break; case 12: add = vec_sl(vec_splat_s32(1), vec_splat_u32(12 - 1)); break; default: abort(); } +__asm volatile ("MARK-E\n\t"); e0 = vec_add(e0, add); e1 = vec_add(e1, add); @@ -79,13 +80,14 @@ static av_always_inline void scale(vec_s32 res[4], vec_s16 res_packed[2], { int i; vec_u32 v_shift; - +__asm volatile ("MARK\n\t"); switch(shift) { case 7: v_shift = vec_splat_u32(7) ; break; case 10: v_shift = vec_splat_u32(10); break; case 12: v_shift = vec_splat_u32(12); break; default: abort(); } +__asm volatile ("MARK-E2\n\t"); for (i = 0; i < 4; i++) res[i] = vec_sra(res[i], v_shift);