From patchwork Sat Jun 10 11:46:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 3903 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.195 with SMTP id 186csp133539vsk; Sat, 10 Jun 2017 04:55:36 -0700 (PDT) X-Received: by 10.28.72.212 with SMTP id v203mr2743662wma.90.1497095736268; Sat, 10 Jun 2017 04:55:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497095736; cv=none; d=google.com; s=arc-20160816; b=izTHCUpJw+epgcQAqtyuBrH3DxmCAA/og/I+dFlyJQUvnbo0JLZz1R1FIF2xCKTfkX XRgC9fbDtJaJlolAuNxGjTMwj9JijRf9/waB4o7r2LUYdQD70+jHMFMSc+Kcx6uGyfUH eFOLXbnx/NgWyVBOmw0qYLEZn29r1oWBhrZtQbRWPaMy6mMIO7mesehhDSi1mIjgrR2r C4HgieTebzATYUFhy6IjLVSlZda69cw4svqLPKI8FVzX0f7Sxg0fZ8LgPIqMoR806y9c KMUb0a1NiV8uUrodRyGpnjcI+erebP6kfYiGIWLXXdVMqvvPbPjkS7FaBpWMNcHvwh/v OOaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=rD3+9OxCSni25xUEXMX8qvyzvQ39iehcAidw0BlCU6U=; b=0va0b6ttlGBgXQLK+miICjcD6bYz5IBYQ3XrAmJhMjJpjEbbZrbSClKzNkIlygDg7X NnAZddGmKtkXftCKY1wp27AFUZta2ezuGTIz7q+TZckoVSpbFy3TwFvr/wvj+O8dl+Az kNFcUkZEFjeIngZZ7uhpUZMtTMRxPOrC4Q0edqn0wRW9YW/yFAY3HOseYegusbYw5sG2 JGPrIrsMbvcdG3kkX8sQmKOaZIv+ZPTZiMxbZ+sb0ESaDUXyX7nqhHZoZvwlYZcx3U5T 2IaMH2aw1UFOpg1hboQIZeLuH7O+Z8VMIu4p1Yc1qnO2wOvPNDIezVMV271cDFt8LUuV mMFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com header.b=0xoWJi4t; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 9si2318523wmj.93.2017.06.10.04.55.35; Sat, 10 Jun 2017 04:55:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com header.b=0xoWJi4t; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D2062689F12; Sat, 10 Jun 2017 14:55:32 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr0-f193.google.com (mail-wr0-f193.google.com [209.85.128.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3311368062A for ; Sat, 10 Jun 2017 14:55:26 +0300 (EEST) Received: by mail-wr0-f193.google.com with SMTP id g76so11932202wrd.2 for ; Sat, 10 Jun 2017 04:55:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=7sZ7t4KVfvPmGF4PfDNU9ufaRs59UaBAu1CGpHDwows=; b=0xoWJi4tbE/z0XEv+T/FnQR88xVXFVmppzg+5y8cW2Us+f/BC7A4QBB4wOXLpMweRY pDj9NuWRs6GLnlcwM9Aizb3E4SiCrMx/N0B8/2vtDm+oV0ZroFv9QeH0u9TfOHjwFF8Q CoDe6Lfot87Qn7aJXHVLZ4Dud2jibTqZq9aIJIWokFVaoCOwvHg2jyDeshc2HRlP1RSm b7+ofSu5GakmHa+4+iMTe8Bojfn6Gn/fCZheKHdqse3AgPY3KNr8B0P9OfQNamspaf3Y vGyFOyHwxIVHQGbI1Ez0A8h8jkjXolqILdxkR+4zQF7AMFEkqaLNXtoqPlPtc2n/+N5S bbFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=7sZ7t4KVfvPmGF4PfDNU9ufaRs59UaBAu1CGpHDwows=; b=jY78QLHSJ36XbuGi785EWq7ofpHj2rMQ/+kIHJQ/ZgurlM79EmaT6YMUAvwSCCrgEa VfvoIpXd6/lAX3dQpYZfRZfHxMCJYzyA4aDF5pDfSbNOrLnvVPE869a1VD6BZMZSz7EE cKzrfZWzbIqkQPCtsrYqYe0Z+CuIDzUD7g7ENT38occ6ZotBC4lUeSsJH1OIsUjMe49Q RJiHGy+CBunGC3+3uWrPTko8/EJr+4T7KnroQsp4gla1zAorKHIammzndeK/l+5gf5Zu //OZHpg3E4q2X5CTXjNPFyf52YMB/IPRNB+Uk50cc4GZMQi+f+DU/8xOUVoNnghR7TYB kJyQ== X-Gm-Message-State: AODbwcBAKquT24MFqar90oB5AAOINmI1BI61FzaJvXiICilLAYVQCiur f6NkoJNydgl6/SYWzAA= X-Received: by 10.223.169.138 with SMTP id b10mr2034454wrd.29.1497095224273; Sat, 10 Jun 2017 04:47:04 -0700 (PDT) Received: from Highwind.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id c71sm1575231wmh.21.2017.06.10.04.47.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Jun 2017 04:47:03 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Sat, 10 Jun 2017 13:46:44 +0200 Message-Id: <20170610114644.3138-6-jdarnley@obe.tv> X-Mailer: git-send-email 2.13.0 In-Reply-To: <20170610114644.3138-1-jdarnley@obe.tv> References: <20170610114644.3138-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 5/5] avcodec/x86: add x86-64 8-bit simple_idct add function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavcodec/x86/idctdsp_init.c | 2 ++ libavcodec/x86/simple_idct.h | 3 ++ libavcodec/x86/simple_idct10.asm | 61 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 1826d01e0e..ca4dd01775 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -103,6 +103,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { c->idct = ff_simple_idct8_sse2; c->idct_put = ff_simple_idct8_put_sse2; + c->idct_put = ff_simple_idct8_add_sse2; c->perm_type = FF_IDCT_PERM_TRANSPOSE; } } @@ -115,6 +116,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { c->idct = ff_simple_idct8_avx; c->idct_put = ff_simple_idct8_put_avx; + c->idct_put = ff_simple_idct8_add_avx; c->perm_type = FF_IDCT_PERM_TRANSPOSE; } diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index b559f8527c..9b64cfe9bc 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -35,6 +35,9 @@ void ff_simple_idct8_avx(int16_t *block); void ff_simple_idct8_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct8_put_avx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct8_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct8_add_avx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block); diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm index a247072121..d233e780b9 100644 --- a/libavcodec/x86/simple_idct10.asm +++ b/libavcodec/x86/simple_idct10.asm @@ -82,6 +82,31 @@ SECTION .text movhps %8, %12 %endmacro +%macro LOAD_ZXBW_8 16 + pmovzxbw %1, %9 + pmovzxbw %2, %10 + pmovzxbw %3, %11 + pmovzxbw %4, %12 + pmovzxbw %5, %13 + pmovzxbw %6, %14 + pmovzxbw %7, %15 + pmovzxbw %8, %16 +%endmacro + +%macro LOAD_ZXBW_4 9 + movh %1, %5 + movh %2, %6 + movh %3, %7 + movh %4, %8 + punpcklbw %1, %9 + punpcklbw %2, %9 + punpcklbw %3, %9 + punpcklbw %4, %9 +%endmacro + +%define PASS4ROWS(base, stride, stride3) \ + [base], [base + stride], [base + 2*stride], [base + stride3] + %macro idct_fn 0 cglobal simple_idct8, 1, 1, 16, block IDCT_FN "", 11, pw_round_20_div_w4, 20 @@ -99,6 +124,42 @@ cglobal simple_idct8_put, 3, 4, 16, pixels, lsize, block STORE_HI_LO PASS8ROWS(pixelsq, r2, lsizeq, r3), m8, m1, m4, m9 RET +; TODO: optimise by not writing the final data to the block. +cglobal simple_idct8_add, 3, 4, 16, pixels, lsize, block + IDCT_FN "", 11, pw_round_20_div_w4, 20 + lea r2, [3*lsizeq] + lea r3, [pixelsq + r2] + %if cpuflag(sse4) + LOAD_ZXBW_8 m3, m5, m6, m7, m12, m13, m14, m15, PASS8ROWS(pixelsq, r3, lsizeq, r2) + paddsw m8, m3 + paddsw m0, m5 + paddsw m1, m6 + paddsw m2, m7 + paddsw m4, m12 + paddsw m11, m13 + paddsw m9, m14 + paddsw m10, m15 + %else + pxor m12, m12 + LOAD_ZXBW_4 m3, m5, m6, m7, PASS4ROWS(pixelsq, lsizeq, r2), m12 + paddsw m8, m3 + paddsw m0, m5 + paddsw m1, m6 + paddsw m2, m7 + lea pixelsq, [pixelsq + 4*lsizeq] + LOAD_ZXBW_4 m3, m5, m6, m7, PASS4ROWS(pixelsq, lsizeq, r2), m12 + paddsw m4, m3 + paddsw m11, m5 + paddsw m9, m6 + paddsw m10, m7 + %endif + packuswb m8, m0 + packuswb m1, m2 + packuswb m4, m11 + packuswb m9, m10 + STORE_HI_LO PASS8ROWS(pixelsq, r3, lsizeq, r2), m8, m1, m4, m9 +RET + cglobal simple_idct10, 1, 1, 16, block IDCT_FN "", 12, "", 19 RET