From patchwork Tue Dec 4 17:02:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Niedermayer X-Patchwork-Id: 11276 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 9309E44D3E0 for ; Tue, 4 Dec 2018 19:03:05 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 44C5868A590; Tue, 4 Dec 2018 19:02:57 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6FAF1688334 for ; Tue, 4 Dec 2018 19:02:51 +0200 (EET) X-Originating-IP: 213.47.41.20 Received: from localhost (213-47-41-20.cable.dynamic.surfer.at [213.47.41.20]) (Authenticated sender: michael@niedermayer.cc) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 72B06C0005 for ; Tue, 4 Dec 2018 17:02:59 +0000 (UTC) Date: Tue, 4 Dec 2018 18:02:58 +0100 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20181204170258.GE3501@michaelspb> References: <20181204152940.25829-1-michael@niedermayer.cc> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [FFmpeg-devel] [PATCH] avcodec/ppc/hevcdsp: Fix build failures with powerpc-linux-gnu-gcc-4.8 with --disable-optimizations X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" On Tue, Dec 04, 2018 at 04:33:03PM +0100, Carl Eugen Hoyos wrote: > 2018-12-04 16:29 GMT+01:00, Michael Niedermayer : > > The affected functions could also be changed into macros, this is the > > smaller change to fix it though. And avoids (probably) less readable macros > > > The extra code should be optimized out when optimizations are done as all > > values are known at build after inlining. > > Shouldn't this be verified? ive verified it with the patch below only SIMD instructions are between the markers, so powerpc-linux-gnu-gcc-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4 seems to optimize all conditional code out. All bets are off with a different compiler though. That could be worse or even better after the patch But i can do more tests if you want me to test something specific ? > This is speed-critical code, no? Yes, and the way this code is written depends on the compiler, if the compiler makes a mistake the function can be alot slower. Thats one of the reasons we use nasm/yasm on x86, that always produces the same result. thx [...] diff --git a/libavcodec/ppc/hevcdsp.c b/libavcodec/ppc/hevcdsp.c index c1d562a409..47246ed42d 100644 --- a/libavcodec/ppc/hevcdsp.c +++ b/libavcodec/ppc/hevcdsp.c @@ -57,13 +57,14 @@ static av_always_inline void transform4x4(vec_s16 src_01, vec_s16 src_23, o0 = vec_msums(src_13, trans4[1], zero); e1 = vec_msums(src_02, trans4[2], zero); o1 = vec_msums(src_13, trans4[3], zero); - +__asm volatile ("MARK\n\t"); switch(shift) { case 7: add = vec_sl(vec_splat_s32(1), vec_splat_u32( 7 - 1)); break; case 10: add = vec_sl(vec_splat_s32(1), vec_splat_u32(10 - 1)); break; case 12: add = vec_sl(vec_splat_s32(1), vec_splat_u32(12 - 1)); break; default: abort(); } +__asm volatile ("MARK-E\n\t"); e0 = vec_add(e0, add); e1 = vec_add(e1, add); @@ -79,13 +80,14 @@ static av_always_inline void scale(vec_s32 res[4], vec_s16 res_packed[2], { int i; vec_u32 v_shift; - +__asm volatile ("MARK\n\t"); switch(shift) { case 7: v_shift = vec_splat_u32(7) ; break; case 10: v_shift = vec_splat_u32(10); break; case 12: v_shift = vec_splat_u32(12); break; default: abort(); } +__asm volatile ("MARK-E2\n\t"); for (i = 0; i < 4; i++) res[i] = vec_sra(res[i], v_shift);