From patchwork Thu Jul 19 14:52:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 9757 Delivered-To: ffmpegpatchwork@gmail.com Received: by 2002:a02:104:0:0:0:0:0 with SMTP id c4-v6csp1784302jad; Thu, 19 Jul 2018 07:53:11 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeUo+NlnoXwkLZMNOdJP8blkHQuuLCKUW5RHy7Ia2sVQxOAL39plt3Bd7qWKCECx9WRc+lD X-Received: by 2002:adf:a211:: with SMTP id p17-v6mr7774440wra.196.1532011990933; Thu, 19 Jul 2018 07:53:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532011990; cv=none; d=google.com; s=arc-20160816; b=yQirJARXJFOJtStjBCh8BG09UCWlqLgiu5XdF7Ioc0wqrsbh9ZTP5Pu1FfoYh97W9m gAfMhbRhm9HOAy17wwk2hgFD8SP44yU0Gfy6oYbQ7Jyp/fQGTp0OuOIu4aQbNrCna/Cn nQpVtih20iEPEpdjfBKb53MTghf7DZdNQ4LdQ42eBNW6yIEHaesicbPZdLthMSLYYfTD wBvxqpuGKz+vykDwKY0fxLQNc11J/dTZS9nusIz8VqVVJyhfFpUka4HER+0c1ZXFWS9U kVT6JsVQZ1WAkshVeasld3aAG/4oGJbJ42s0onDDFJ+m4/t/18FTs2ciUUJYoZqvN/i0 ZX5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=Q2cLNM0Cmxq6789opdPRcBeWoCjQSuctzKblbFvWBZk=; b=SBgdzZwB3MZhed1Wt6Drw9n96wK2G/t0wi9/UDYJTV0ERu5iJ62nYjhbZiNRWPKvCh vbKpOzfOESJVopX51+JGdCU/WgiDo8mKV99isxfoLZwKruae5W9uFb8K2/T+/jpYpc+v awJL0Igxq+ho+vfp+kw2uLdCwM5PVD0Wj0T6JLe29mx9PH+GW73Y4b4za0l8Pt0oO799 RIq5WmpFX44sMHlcIWLgpDDA3kXGDsIK/gqiVtQ5X1fKF7cuGs5g4WdAfAhLuH0SQ+Hg w3QntWB1o1OPMepuAeJZYc5nBDSrWdjZ9Nf7FNwksNOPByrK80Wm8uZFLc50jz3SA2It V/Gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com header.s=20150623 header.b=l5BXjHjn; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id y15-v6si4933178wrn.343.2018.07.19.07.53.10; Thu, 19 Jul 2018 07:53:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com header.s=20150623 header.b=l5BXjHjn; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 21E93689DCB; Thu, 19 Jul 2018 17:52:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 02AB0689CBF for ; Thu, 19 Jul 2018 17:52:55 +0300 (EEST) Received: by mail-ed1-f65.google.com with SMTP id r4-v6so7286711edp.9 for ; Thu, 19 Jul 2018 07:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=Gc5F879OK+3x0DrqAsP/+5pUa+yP2TI9iH4s1lOv+aU=; b=l5BXjHjneKXVd2w2hI9vS8GyEO3HYpz/K13MrlJ5ZspZ8wWap7QsmpKQxSoEiYFqZh Wh2kZczl2e4mLQxbUcofUa5aKtwDerE4HnnijfO6npn98p2NuEku8ZlsQ1aySQx876G4 s9YEtA/MXurUSZxxLw28tMWZ3WxeztJUHNxQtH5mr+vlmN7ob8N+k2jQ78XkzhQ8yHNl SusYT4fTA0mygO+SeJIaIgrk4HlslgAUG/gBrX7jhpcl8qt27tiYWbclwPwo5pHLXtwV yBHvkHg4z0Yi4FcJpCGDfhljYKJNdDPWJeaKPcseTPCubXgRrilE3O9ufgzgFLV2Lp5F yZGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=Gc5F879OK+3x0DrqAsP/+5pUa+yP2TI9iH4s1lOv+aU=; b=JaWN5zT2EmvyVtbOrVtuEAfD6ysB8LP/JOoE6gnu+rVkqEL3G2bUFnHXuMJWlZHRV6 I51GNFa16lb6v3IL2hFJAUp6wmljPvK9Af2EhAAppZkJxhbbdt62foIqz7S06IxFoJ41 WKaF1tCSEuHACElJCu9P8pjAIB0nAlLB892dOseui/tsl/MjntyRW3fWz7wAF7Xm3R/B N9CXE3XYy5x4zBkXz1ZwBwhm5g0A98FS43wVxVNF9EJQLPif4FHnJlFPYo3SYkCDbe/F 0BzFqtu/+znWkJiP/TM9XZVGCW/G3uklG8oOOjBqEDqxqbljc0N8OTctQqHgnhBoVCdU aMlg== X-Gm-Message-State: AOUpUlFuJx4H/40Lr82bFdpABzFkhaeU3OqUZtH2EWfMMkPmHQVcmkPH PNPmQ4+JK7bmKzurUfzKLAxc7ZkqiIc= X-Received: by 2002:a50:9084:: with SMTP id c4-v6mr11812113eda.179.1532011986331; Thu, 19 Jul 2018 07:53:06 -0700 (PDT) Received: from Highwind.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id y10-v6sm3620960ede.38.2018.07.19.07.53.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Jul 2018 07:53:05 -0700 (PDT) From: James Darnley To: ffmpeg-devel@ffmpeg.org Date: Thu, 19 Jul 2018 16:52:48 +0200 Message-Id: <20180719145252.30613-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180719145252.30613-1-jdarnley@obe.tv> References: <20180719145252.30613-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 2/6] diracdec: add 10-bit Legall 5, 3 (5_3) SIMD functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 94fps SSE2: 118fps AVX2: 121fps --- libavcodec/x86/dirac_dwt_10bit.asm | 55 +++++++++++++++++++++++++++ libavcodec/x86/dirac_dwt_init_10bit.c | 23 +++++++++++ 2 files changed, 78 insertions(+) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index dc3830615e..c00de32bfe 100644 --- a/libavcodec/x86/dirac_dwt_10bit.asm +++ b/libavcodec/x86/dirac_dwt_10bit.asm @@ -24,6 +24,7 @@ SECTION_RODATA cextern pd_1 +pd_2: times 4 dd 2 SECTION .text @@ -100,9 +101,63 @@ REP_RET %endmacro +%macro LEGALL53_VERTICAL_LO 0 + +cglobal legall53_vertical_lo, 4, 4, 4, b0, b1, b2, w + mova m3, [pd_2] + shl wd, 2 + add b0q, wq + add b1q, wq + add b2q, wq + neg wq + + ALIGN 16 + .loop: + mova m0, [b0q + wq] + mova m1, [b1q + wq] + mova m2, [b2q + wq] + paddd m0, m2 + paddd m0, m3 + psrad m0, 2 + psubd m1, m0 + mova [b1q + wq], m1 + add wq, mmsize + jl .loop +RET + +%endmacro + +%macro LEGALL53_VERTICAL_HI 0 + +cglobal legall53_vertical_hi, 4, 4, 4, b0, b1, b2, w + mova m3, [pd_1] + shl wd, 2 + add b0q, wq + add b1q, wq + add b2q, wq + neg wq + + ALIGN 16 + .loop: + mova m0, [b0q + wq] + mova m1, [b1q + wq] + mova m2, [b2q + wq] + paddd m0, m2 + paddd m0, m3 + psrad m0, 1 + paddd m1, m0 + mova [b1q + wq], m1 + add wq, mmsize + jl .loop +RET + +%endmacro + INIT_XMM sse2 HAAR_HORIZONTAL HAAR_VERTICAL +LEGALL53_VERTICAL_HI +LEGALL53_VERTICAL_LO INIT_XMM avx HAAR_HORIZONTAL diff --git a/libavcodec/x86/dirac_dwt_init_10bit.c b/libavcodec/x86/dirac_dwt_init_10bit.c index 939950e3ff..88cf267d14 100644 --- a/libavcodec/x86/dirac_dwt_init_10bit.c +++ b/libavcodec/x86/dirac_dwt_init_10bit.c @@ -23,6 +23,9 @@ #include "libavutil/x86/cpu.h" #include "libavcodec/dirac_dwt.h" +void ff_legall53_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width); +void ff_legall53_vertical_lo_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width); + void ff_horizontal_compose_haar_10bit_sse2(int32_t *b0, int32_t *b1, int width_align); void ff_horizontal_compose_haar_10bit_avx(int32_t *b0, int32_t *b1, int width_align); void ff_horizontal_compose_haar_10bit_avx2(int32_t *b0, int32_t *b1, int width_align); @@ -91,6 +94,22 @@ static void horizontal_compose_haar_avx2(int32_t *b, int32_t *tmp, int width) } } +static void legall53_vertical_lo_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width) +{ + int i = width & ~3; + ff_legall53_vertical_lo_sse2(b0, b1, b2, i); + for(; ivertical_compose_h0 = (void*)legall53_vertical_hi_sse2; + d->vertical_compose_l0 = (void*)legall53_vertical_lo_sse2; + break; case DWT_DIRAC_HAAR0: d->vertical_compose = (void*)vertical_compose_haar_sse2; break;