From patchwork Sun Mar 24 13:10:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lauri Kasanen X-Patchwork-Id: 12417 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 6171444958A for ; Sun, 24 Mar 2019 15:13:21 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3CD8768A71A; Sun, 24 Mar 2019 15:13:21 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9EE91689A0B for ; Sun, 24 Mar 2019 15:13:14 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1553433194; bh=mGZCH0Ggq5bHmaceo8fRuCp2YYB4qUbNQRH5HaeFiHI=; h=X-UI-Sender-Class:Date:From:To:Subject; b=AUW1NNQ6nEpCGahTWkYGQqrL15qxBEDDtJzJjGd5+cU9pdfFwG7Do+aurj4fJk4a7 hGmpTsVB186nSfjL3pFHDdmdSowjB1zxCmFdGWKED9xDKadEVyo1WdcaZCn0Q4OruM slQLDUh+ETfD1MVAJhry3QEQhSbvafnZxcNLPQyM= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from Valinor ([84.250.81.169]) by mail.gmx.com (mrgmx001 [212.227.17.184]) with ESMTPSA (Nemesis) id 0LjrDd-1gadum0e7t-00bojc for ; Sun, 24 Mar 2019 14:08:05 +0100 Date: Sun, 24 Mar 2019 15:10:52 +0200 From: Lauri Kasanen To: ffmpeg-devel@ffmpeg.org Message-Id: <20190324151052.0c7e6b9a27606552e050d9aa@gmx.com> X-Mailer: Sylpheed 3.5.0 (GTK+ 2.18.6; x86_64-unknown-linux-gnu) Mime-Version: 1.0 X-Provags-ID: V03:K1:+bYTSLtmAXNeIl2EKeYMnUtnOATeP9zj7HAjgSi5bFNQj0FiA6y 36WAai6KnGnJQ1fDP3gmCblHZ/1LX3384StulPvdHvDhjD6uAkYQaf7a9PLN/l/KcXYouBJ UFaLs2VEAY2cHY+4V/b/pnP7cUA2eO3zOq5iWIM5trq9Tj5uKzznBY8Cuzmn0IH2D/fVDnS LDQi072RAgl6SweJxF2NQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1; V03:K0:5TwTH7OfEus=:0TrG1iQNCSrBT3/GZiGmD/ /DM+QilN4yYP4kgKYCirIieCZ/DvtSMx+zwMcAHwC2vqP/l5kEt+WzXjLX9tLEEaPABbOrEuQ QplKr8N1WCMvY7uF6Z7xyIyp9gXvkv3AO546n47bje5cI4k7Mk9nxUnHqprLN04qXnVE3EGCw 7dHra4gWzQhCTcusBZ0FbjtTlUatxqsZGNOied3NO2ab4SfaCPIVm7aP+SM1mgOMRX7KAPZZW J9agzHHKuVoj29TEQRX1UjBK7CEOoNSYVt00VkHsWpKiKslkTJZd3rKlKzyHo6JnJh62BLDYA 21zqxGv7Vz+OLr7RJfmP+fxb+FpK4F5VEa3Fs2PH6lyrPzZi5x8ti/2IgF3T4lxaUENp+MdeM cw+/hlhLmqwF3aFxvVkB8/URmMWvWD26qSXqNcBrH97MEAOGkr/sJK1BVDtiJoQrRkio1Oh6e THFbhqWeOt9Cna7azjXMMP5Rbs0nEjfHvZCLb4/oZ4jmDzofz6msbbgEw5IbjbicuZ03U8LQ+ e5P80ayHYWtlsySoIoRMTkOWjE8JVm9N9vfUuyH4HHysZixZIzwtey6DaflsO7PEifs58CmmY uKqxWqu6IawY/MLVDo/fDePNsX368KlFBSLwIrzHfaZzeEz/7f0ac3gjk3wEAj90tThX+I7G/ 3r6q5Yb8lVFN1Qj8FXvDbzSPSpDpMjkxBSpXSci9ElGWp4GDcMEi8zMNwp1y1m5n9DK+x/Y3F XsXTQ6ElQ/peDRCd0b6xSuVZES073Dv/eH3Jm1X3fRxa6yST78hdTkINO8TQf+WffFH8+hL+2 PLyqMqP5aXRZPedaDhZObFY0CN+RA07dzIIdlp5NFfAAWNgrLKGxBrlbtVWNzWnD16QRNR3kW vUb5OW6ATZQ9byZpe5KCbkuOkDeRkUpW6L7mA463BubhiRVDtAKqUxyuOocFaa Subject: [FFmpeg-devel] [PATCH 2/3] swscale/ppc: VSX-optimize yuv2422_2 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 5.1x speedup: yuyv422 19339 UNITS in yuv2packed2, 16384 runs, 0 skips 3718 UNITS in yuv2packed2, 16383 runs, 1 skips yvyu422 19438 UNITS in yuv2packed2, 16384 runs, 0 skips 3800 UNITS in yuv2packed2, 16380 runs, 4 skips uyvy422 19128 UNITS in yuv2packed2, 16384 runs, 0 skips 3721 UNITS in yuv2packed2, 16380 runs, 4 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) -- 2.6.2 diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 0bb82ac..1c4051b 100644 --- a/libswscale/ppc/swscale_vsx.c +++ b/libswscale/ppc/swscale_vsx.c @@ -726,6 +726,61 @@ write422(const vector int16_t vy1, const vector int16_t vy2, } } +#define SETUP(x, buf0, buf1, alpha) { \ + x = vec_ld(0, buf0); \ + tmp = vec_mule(x, alpha); \ + tmp2 = vec_mulo(x, alpha); \ + tmp3 = vec_mergeh(tmp, tmp2); \ + tmp4 = vec_mergel(tmp, tmp2); \ +\ + x = vec_ld(0, buf1); \ + tmp = vec_mule(x, alpha); \ + tmp2 = vec_mulo(x, alpha); \ + tmp5 = vec_mergeh(tmp, tmp2); \ + tmp6 = vec_mergel(tmp, tmp2); \ +\ + tmp3 = vec_add(tmp3, tmp5); \ + tmp4 = vec_add(tmp4, tmp6); \ +\ + tmp3 = vec_sra(tmp3, shift19); \ + tmp4 = vec_sra(tmp4, shift19); \ + x = vec_packs(tmp3, tmp4); \ +} + +static av_always_inline void +yuv2422_2_vsx_template(SwsContext *c, const int16_t *buf[2], + const int16_t *ubuf[2], const int16_t *vbuf[2], + const int16_t *abuf[2], uint8_t *dest, int dstW, + int yalpha, int uvalpha, int y, + enum AVPixelFormat target) +{ + const int16_t *buf0 = buf[0], *buf1 = buf[1], + *ubuf0 = ubuf[0], *ubuf1 = ubuf[1], + *vbuf0 = vbuf[0], *vbuf1 = vbuf[1]; + const int16_t yalpha1 = 4096 - yalpha; + const int16_t uvalpha1 = 4096 - uvalpha; + vector int16_t vy1, vy2, vu, vv; + vector int32_t tmp, tmp2, tmp3, tmp4, tmp5, tmp6; + const vector int16_t vyalpha1 = vec_splats(yalpha1); + const vector int16_t vuvalpha1 = vec_splats(uvalpha1); + const vector uint32_t shift19 = vec_splats(19U); + int i; + av_assert2(yalpha <= 4096U); + av_assert2(uvalpha <= 4096U); + + for (i = 0; i < ((dstW + 1) >> 1); i += 8) { + + SETUP(vy1, &buf0[i * 2], &buf1[i * 2], vyalpha1) + SETUP(vy2, &buf0[(i + 4) * 2], &buf1[(i + 4) * 2], vyalpha1) + SETUP(vu, &ubuf0[i], &ubuf1[i], vuvalpha1) + SETUP(vv, &vbuf0[i], &vbuf1[i], vuvalpha1) + + write422(vy1, vy2, vu, vv, &dest[i * 4], target); + } +} + +#undef SETUP + static av_always_inline void yuv2422_1_vsx_template(SwsContext *c, const int16_t *buf0, const int16_t *ubuf[2], const int16_t *vbuf[2], @@ -786,7 +841,18 @@ yuv2422_1_vsx_template(SwsContext *c, const int16_t *buf0, } } +#define YUV2PACKEDWRAPPER2(name, base, ext, fmt) \ +static void name ## ext ## _2_vsx(SwsContext *c, const int16_t *buf[2], \ + const int16_t *ubuf[2], const int16_t *vbuf[2], \ + const int16_t *abuf[2], uint8_t *dest, int dstW, \ + int yalpha, int uvalpha, int y) \ +{ \ + name ## base ## _2_vsx_template(c, buf, ubuf, vbuf, abuf, \ + dest, dstW, yalpha, uvalpha, y, fmt); \ +} + #define YUV2PACKEDWRAPPER(name, base, ext, fmt) \ +YUV2PACKEDWRAPPER2(name, base, ext, fmt) \ static void name ## ext ## _1_vsx(SwsContext *c, const int16_t *buf0, \ const int16_t *ubuf[2], const int16_t *vbuf[2], \ const int16_t *abuf0, uint8_t *dest, int dstW, \ @@ -909,12 +975,15 @@ av_cold void ff_sws_init_swscale_vsx(SwsContext *c) switch (dstFormat) { case AV_PIX_FMT_YUYV422: c->yuv2packed1 = yuv2yuyv422_1_vsx; + c->yuv2packed2 = yuv2yuyv422_2_vsx; break; case AV_PIX_FMT_YVYU422: c->yuv2packed1 = yuv2yvyu422_1_vsx; + c->yuv2packed2 = yuv2yvyu422_2_vsx; break; case AV_PIX_FMT_UYVY422: c->yuv2packed1 = yuv2uyvy422_1_vsx; + c->yuv2packed2 = yuv2uyvy422_2_vsx; break; } }