From patchwork Tue Nov 27 00:11:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Eugen Hoyos X-Patchwork-Id: 11177 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id DF55944D047 for ; Tue, 27 Nov 2018 02:11:06 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 61F87689D40; Tue, 27 Nov 2018 02:11:07 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E897B689A2A for ; Tue, 27 Nov 2018 02:11:00 +0200 (EET) Received: by mail-io1-f66.google.com with SMTP id s22so15511052ioc.8 for ; Mon, 26 Nov 2018 16:11:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=6zwpu1i+QI62wBbA2OIWodROrV39Wthfbh/jT4ItFj4=; b=IbUdZyHDUfovhUEC22G9fFD4id3f8qM/sGPREhHzU4N3aEIU+JRk9DSWXsr1Z3eDeo E28pK5ORpVgg4iOmQWD/xs4ql7OLNu4f8Ywev6tz3+sdtuO/YUEbdS7oVo81tHTXSuh+ t+7syPWrxSUQlIaop3c3Axul7WEf4WXOYrFaptwwfDpzL2YdoR9pjX6VoTa8XsHU3aVB +oe+gf09QT4sMbvES8k5wmqTBnBcu8j5UeF3Rfl6icsmHQq9OUf26/BVvjtJWRb8dBqf h45iQcak+I+lvYAn5IaoPPfI17cCZxNx0Tm4n4nSpeq9SEXtaZib6l40v0Lc09+aBTiR xRGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=6zwpu1i+QI62wBbA2OIWodROrV39Wthfbh/jT4ItFj4=; b=X7pqqcs6mI9zIXPc5mwnGmC8NTZI08XPsQ44si4crzphd6HR7kCmPCm/JuIz6vkzBL sr6VkMrNg5TYWfPtcRvzUqN+AOEEmEte0lI/9zXQxnuIpsrESxc6zHFuQSHlCom6lo9e fclZRnuVZ17+/JYwEvhJLVDIGeeiQqhslV4GWcDVuNA0rY7GZVbNzOQbLuMMlehF85IY u/KlwnS5PVh2ErU0nV5U+Tg+jwIp3RWYXTDBTqxPOr41udBjZSuM/Ct2tHm8w2c2EeM+ +bHOcuJ+EXTWvVJ5pEEAi2hkNEj9/J71RTtaq2kfi9bqxZDpoLAB9bbQm7zmGpLdwUky /XRA== X-Gm-Message-State: AA+aEWbMU0/3IHxYnNmym9+nGbDqrK+JgtpMhnmi2+r7hyF7ufInsQy7 0nBO17lx6RbREuls20rr10NBOuvhEDD37qYEhiD7Mg== X-Google-Smtp-Source: AFSGD/UzGKrH96NqcQ9HCyqGMTGmqwhmTmGkhc4Ypk9MFhVPzEviTZrvXIkgVPdT2ylDW+WKtyL1w/Wt6OKOgM7nO5c= X-Received: by 2002:a5e:c70d:: with SMTP id f13mr24177333iop.55.1543277464174; Mon, 26 Nov 2018 16:11:04 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a02:5f11:0:0:0:0:0 with HTTP; Mon, 26 Nov 2018 16:11:03 -0800 (PST) In-Reply-To: References: <20181117101214.19917fdecfc9fc4d26854a9c@gmx.com> From: Carl Eugen Hoyos Date: Tue, 27 Nov 2018 01:11:03 +0100 Message-ID: To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" 2018-11-27 0:17 GMT+01:00, Carl Eugen Hoyos : > 2018-11-17 9:12 GMT+01:00, Lauri Kasanen : >> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt >> yuv420p \ >> -f null -vframes 100 -v error -nostats - >> >> 1158 UNITS in planar1, 65528 runs, 8 skips >> >> -cpuflags 0 >> >> 19082 UNITS in planar1, 65533 runs, 3 skips >> >> 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version >> takes as many cycles as the x86 SSE2 version, yikes it's fast. >> >> Note that this function uses VSX instructions, but is not marked so. >> This is because several existing functions also make that mistake. >> I'll submit a patch moving them once this is reviewed. >> >> v2: Remove !BE check >> Signed-off-by: Lauri Kasanen >> --- >> libswscale/ppc/swscale_altivec.c | 53 >> ++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 53 insertions(+) >> >> diff --git a/libswscale/ppc/swscale_altivec.c >> b/libswscale/ppc/swscale_altivec.c >> index 2fb2337..8c6056d 100644 >> --- a/libswscale/ppc/swscale_altivec.c >> +++ b/libswscale/ppc/swscale_altivec.c >> @@ -324,6 +324,53 @@ static void hScale_altivec_real(SwsContext *c, >> int16_t >> *dst, int dstW, >> } >> } >> } >> + >> +static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW, >> + const uint8_t *dither, int offset, int start) >> +{ >> + int i; >> + for (i = start; i < dstW; i++) { >> + int val = (src[i] + dither[(i + offset) & 7]) >> 7; >> + dest[i] = av_clip_uint8(val); >> + } >> +} >> + >> +static void yuv2plane1_8_altivec(const int16_t *src, uint8_t *dest, int >> dstW, >> + const uint8_t *dither, int offset) >> +{ >> + const int dst_u = -(uintptr_t)dest & 15; >> + int i, j; >> + LOCAL_ALIGNED(16, int16_t, val, [16]); > >> + const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, >> 7, >> 7}; > > The patch breaks compilation with xlc, sorry for not testing earlier: > libswscale/ppc/swscale_altivec.c:344:11: error: unknown type name 'vector' > const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7, 7}; In case this error does not make much sense to you, don't worry too much, the following change was necessary to make xlc pass rv20-1239: ;-) (As expected, other tests also fail.) Carl Eugen diff --git a/fftools/ffmpeg_filter.c b/fftools/ffmpeg_filter.c index 6518d50..fb749c5 100644 --- a/fftools/ffmpeg_filter.c +++ b/fftools/ffmpeg_filter.c @@ -744,6 +744,7 @@ static int configure_input_video_filter InputFile *f = input_files[ist->file_index]; AVRational tb = ist->framerate.num ? av_inv_q(ist->framerate) : ist->st->time_base; +if(!ist->framerate.num)tb = ist->st->time_base; AVRational fr = ist->framerate; AVRational sar; AVBPrint args;