diff mbox

[FFmpeg-devel] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations

Message ID 20181204170258.GE3501@michaelspb
State Accepted
Headers show

Commit Message

Michael Niedermayer Dec. 4, 2018, 5:02 p.m. UTC
On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote:
> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>:
> > The affected functions could also be changed into macros, this is the
> > smaller change to fix it though. And avoids (probably) less readable macros
> 
> > The extra code should be optimized out when optimizations are done as all
> > values are known at build after inlining.
> 
> Shouldn't this be verified?

ive verified it with the patch below
only SIMD instructions are between the markers, so 
powerpc-linux-gnu-gcc-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
seems to optimize all conditional code out.
All bets are off with a different compiler though. That could be
worse or even better after the patch

But i can do more tests if you want me to test something specific ?


> This is speed-critical code, no?

Yes, and the way this code is written depends on the compiler, if the compiler
makes a mistake the function can be alot slower.
Thats one of the reasons we use nasm/yasm on x86, that always produces
the same result.

thx





[...]

Comments

Carl Eugen Hoyos Dec. 4, 2018, 11:15 p.m. UTC | #1
2018-12-04 18:02 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>:
> On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote:
>> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>:
>> > The affected functions could also be changed into macros, this is the
>> > smaller change to fix it though. And avoids (probably) less readable
>> > macros
>>
>> > The extra code should be optimized out when optimizations are done as
>> > all
>> > values are known at build after inlining.
>>
>> Shouldn't this be verified?
>
> ive verified it with the patch below

Thank you!

Carl Eugen
Michael Niedermayer Dec. 5, 2018, 9:52 p.m. UTC | #2
On Wed, Dec 05, 2018 at 12:15:14AM +0100, Carl Eugen Hoyos wrote:
> 2018-12-04 18:02 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>:
> > On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote:
> >> 2018-12-04 16:29 GMT+01:00, Michael Niedermayer <michael@niedermayer.cc>:
> >> > The affected functions could also be changed into macros, this is the
> >> > smaller change to fix it though. And avoids (probably) less readable
> >> > macros
> >>
> >> > The extra code should be optimized out when optimizations are done as
> >> > all
> >> > values are known at build after inlining.
> >>
> >> Shouldn't this be verified?
> >
> > ive verified it with the patch below
> 
> Thank you!

will apply

thx

[...]
diff mbox

Patch

diff --git a/libavcodec/ppc/hevcdsp.c b/libavcodec/ppc/hevcdsp.c
index c1d562a409..47246ed42d 100644
--- a/libavcodec/ppc/hevcdsp.c
+++ b/libavcodec/ppc/hevcdsp.c
@@ -57,13 +57,14 @@  static av_always_inline void transform4x4(vec_s16 src_01, vec_s16 src_23,
     o0 = vec_msums(src_13, trans4[1], zero);
     e1 = vec_msums(src_02, trans4[2], zero);
     o1 = vec_msums(src_13, trans4[3], zero);
-
+__asm volatile ("MARK\n\t");
     switch(shift) {
     case  7: add = vec_sl(vec_splat_s32(1), vec_splat_u32( 7 - 1)); break;
     case 10: add = vec_sl(vec_splat_s32(1), vec_splat_u32(10 - 1)); break;
     case 12: add = vec_sl(vec_splat_s32(1), vec_splat_u32(12 - 1)); break;
     default: abort();
     }
+__asm volatile ("MARK-E\n\t");
 
     e0 = vec_add(e0, add);
     e1 = vec_add(e1, add);
@@ -79,13 +80,14 @@  static av_always_inline void scale(vec_s32 res[4], vec_s16 res_packed[2],
 {
     int i;
     vec_u32 v_shift;
-
+__asm volatile ("MARK\n\t");
     switch(shift) {
     case  7: v_shift = vec_splat_u32(7) ; break;
     case 10: v_shift = vec_splat_u32(10); break;
     case 12: v_shift = vec_splat_u32(12); break;
     default: abort();
     }
+__asm volatile ("MARK-E2\n\t");
 
     for (i = 0; i < 4; i++)
         res[i] = vec_sra(res[i], v_shift);