From patchwork Sat Nov 17 08:12:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lauri Kasanen X-Patchwork-Id: 11049 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 2129C446F0F for ; Sat, 17 Nov 2018 10:09:57 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8963A689E88; Sat, 17 Nov 2018 10:09:57 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6FFE8689E62 for ; Sat, 17 Nov 2018 10:09:50 +0200 (EET) Received: from Valinor ([84.250.81.169]) by mail.gmx.com (mrgmx102 [212.227.17.174]) with ESMTPSA (Nemesis) id 0MHrk1-1gKLIp2VJ7-003h3O for ; Sat, 17 Nov 2018 09:09:50 +0100 Date: Sat, 17 Nov 2018 10:12:14 +0200 From: Lauri Kasanen To: ffmpeg-devel@ffmpeg.org Message-Id: <20181117101214.19917fdecfc9fc4d26854a9c@gmx.com> X-Mailer: Sylpheed 3.5.0 (GTK+ 2.18.6; x86_64-unknown-linux-gnu) Mime-Version: 1.0 X-Provags-ID: V03:K1:exOshHUmlY3xby63D+/bbigMWALJSBbINmSPWmbvwwCGOTFMxnO p6h/EILxEh5izoxsfX+e7CxRZhgofq4qRWl5Ht1nqyILctvOhg/hauKqIGny7gJYiVg+UNc 4TA8qIo4sgkNeGmBL8RJ5fr6nIKmbzJBZhvvu8a6n8ZpRF95HWoOZiXkiC/llzQoDQAjFBQ 3V9ec9onBNq8TfjepmhTw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1; V01:K0:F06h2ORu5QI=:00aKCpIu21kM779Y7iz9Pg /CR1U9YAlNBfB3lJdweTSfJFR3Bajb6lecp4aFD1AI8QH1v8duJSs0fTvJmYd35bQolSn7vP+ PRrRjsRTwb5IlpS6jNMbV6v2pxpds6tMbpzCDmxTgfQka99Lq85ZICCqxPXXaUKrwtA+QYNg8 1UELnGHINzBxv0P0XeGM9zvxTGborzmhzZy5wk8kDZp1QrUrsrdxkHLuLAhUlLm6yNCSueCFp sUdH+SgxYQYNHkfe26Kea16rSeoiykx+kmxTly/mSoERkS97gW4ZAkmdkPt4eHZ2Samp++03I doVN7dePQCDMK0Ut3lXwMjZWeomB+EQIghuNucNgyU30zUYyYutG81l/vrvYGsNtDzAzvspRP Y+ckLe7a8fI3cGKrpo6nCiqIjFNK7swVeEJaDvJP6J0+juqehG3Mq6StRlVZlotP7pG/POCtf 2CUUYZZk0q5Ds/jjadQS0shBaPABiWOsK4vdHO2ICi66Zp5zgCNZFsOdimx4LsrTTu952AZo+ H5OIfQWBGHgP6LBzV9FOeKD07BoiQCj93wK5WUnicSCBBvkXG1E5lSgJcsbaXFejDmrHuh4a0 peNt/CWWbhRJLuRYx2QDN8GnpXQ2g5LMYftLRaCG6khIkGwintNOuusYWoywkf/IJV0Eso7cq lVT2hnkFwVtdTDb4Fpyb3fpcsT0uKHXxTpLsSYWowrzXZdnhIq5ClSx3q6JX2hEnmxRHdilZK SJ6ouYLsQ+cXXR1PqN6fZKco8qvmiCgK9UZD0Hu6zvOjDJEPIx7QIALMt/qkSbrehjqC8NG3Q Ig9S2TcVC5h4Pua3Pzjw5mVS3gvaLhM6MxxzRXe9RgAbK1u7YaMVtmQgNwbj0EIVwDRcdeX0J hKmeRQfm53ECxl4kw/8EJbSPFCs3mAQUUh1OiDvZkVyClDA5TbgPdyIBToHOH9 Subject: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \ -f null -vframes 100 -v error -nostats - 1158 UNITS in planar1, 65528 runs, 8 skips -cpuflags 0 19082 UNITS in planar1, 65533 runs, 3 skips 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version takes as many cycles as the x86 SSE2 version, yikes it's fast. Note that this function uses VSX instructions, but is not marked so. This is because several existing functions also make that mistake. I'll submit a patch moving them once this is reviewed. v2: Remove !BE check Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_altivec.c | 53 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/libswscale/ppc/swscale_altivec.c b/libswscale/ppc/swscale_altivec.c index 2fb2337..8c6056d 100644 --- a/libswscale/ppc/swscale_altivec.c +++ b/libswscale/ppc/swscale_altivec.c @@ -324,6 +324,53 @@ static void hScale_altivec_real(SwsContext *c, int16_t *dst, int dstW, } } } + +static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset, int start) +{ + int i; + for (i = start; i < dstW; i++) { + int val = (src[i] + dither[(i + offset) & 7]) >> 7; + dest[i] = av_clip_uint8(val); + } +} + +static void yuv2plane1_8_altivec(const int16_t *src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) +{ + const int dst_u = -(uintptr_t)dest & 15; + int i, j; + LOCAL_ALIGNED(16, int16_t, val, [16]); + const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7, 7}; + vector int16_t vi, vileft, ditherleft, ditherright; + vector uint8_t vd; + + for (j = 0; j < 16; j++) { + val[j] = dither[(dst_u + offset + j) & 7]; + } + + ditherleft = vec_ld(0, val); + ditherright = vec_ld(0, &val[8]); + + yuv2plane1_8_u(src, dest, dst_u, dither, offset, 0); + + for (i = dst_u; i < dstW - 15; i += 16) { + + vi = vec_vsx_ld(0, &src[i]); + vi = vec_adds(ditherleft, vi); + vileft = vec_sra(vi, shifts); + + vi = vec_vsx_ld(0, &src[i + 8]); + vi = vec_adds(ditherright, vi); + vi = vec_sra(vi, shifts); + + vd = vec_packsu(vileft, vi); + vec_st(vd, 0, &dest[i]); + } + + yuv2plane1_8_u(src, dest, dstW, dither, offset, i); +} + #endif /* HAVE_ALTIVEC */ av_cold void ff_sws_init_swscale_ppc(SwsContext *c) @@ -367,6 +414,12 @@ av_cold void ff_sws_init_swscale_ppc(SwsContext *c) c->yuv2packedX = ff_yuv2rgb24_X_altivec; break; } + + switch (c->dstBpc) { + case 8: + c->yuv2plane1 = yuv2plane1_8_altivec; + break; + } } #endif /* HAVE_ALTIVEC */ }