From patchwork Sat Apr 15 01:46:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 3421 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.3.129 with SMTP id 123csp449903vsd; Fri, 14 Apr 2017 18:55:01 -0700 (PDT) X-Received: by 10.28.223.84 with SMTP id w81mr873217wmg.32.1492221301448; Fri, 14 Apr 2017 18:55:01 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w198si843033wmf.114.2017.04.14.18.55.00; Fri, 14 Apr 2017 18:55:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E65016883B9; Sat, 15 Apr 2017 04:54:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr0-f193.google.com (mail-wr0-f193.google.com [209.85.128.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 901786883B9 for ; Sat, 15 Apr 2017 04:54:44 +0300 (EEST) Received: by mail-wr0-f193.google.com with SMTP id u18so14123621wrc.1 for ; Fri, 14 Apr 2017 18:54:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dRhuiDqm3QKqCk4xa1n9HuPD3C3SjPZyn5FBYNNQBLs=; b=EK5AW3ax4VIArEbRprye39vaGnvFdFhys8429DV5uL3wvPkFjM78xNooazaxNpIhPZ PiBZlD0Rxr5v6bKQPh5PoiKCUlQU2KrvYad/ps+Ankq/01UvVM7nt4vwhQoSSSsPTvCD TVYJvUXbIHq0UGHnCzkvmJIWqDJmKMoiuAL99auwdbvYL7Q7lE5LtKGQXjP+KrSmuORL lZs3+uTgIiDzRcVUhvEg4CedBtH9qJgTyfWp7wryT7fe+RifGXnz8UNsayOjYVz1jmrI Sn2KJIr7+RpnvXe8966W4uqrqnP/oATr3fFqV1fIyeDwmYKHLmOLosx6jbU0Z/1G05HH jR2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=dRhuiDqm3QKqCk4xa1n9HuPD3C3SjPZyn5FBYNNQBLs=; b=mLkp0xsqWFWEcFu4txKhHG1w2PYuZ9X+5vpi78UG7ZBUwdx50T4xkOawnKeHAVeAUb lCNxirGv9Uaa6gUMQfkYa0RUaoVnt33U1Ckcmsv4XQi0mA3Ag2T/EjoDMf9edgUQSqSk DvNre/eqarc+Ha/PLMkvH5hxO1DU5crKk8a5Fxgx4N9czrZe6WTqbGA6vlJoz5vZhqNm 07ctrtsq+jmirUfirmfRwRCxS7Ib7f0C4UBDV4RpWy8gDPvXbeJlcfDZDmS0iUERdC5k J/cewdyjRpV3+nywdYidEgXxRZEzoKPDo0e5Nwf0gMwMQSLASk4Wfb2mtTzrEiEJoJfx H4aA== X-Gm-Message-State: AN3rC/4BroT4L8QOyge6j/1FY1ZE7A10y6h2KwGd8DNGgwkWSYqBKTuy +haHYPe63jc5ermf X-Received: by 10.223.139.91 with SMTP id v27mr4722099wra.9.1492220810904; Fri, 14 Apr 2017 18:46:50 -0700 (PDT) Received: from localhost.localdomain (d51a44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id 3sm4459091wrv.33.2017.04.14.18.46.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 14 Apr 2017 18:46:50 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Sat, 15 Apr 2017 03:46:17 +0200 Message-Id: <20170415014618.1592-6-jdarnley@obe.tv> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20170415014618.1592-1-jdarnley@obe.tv> References: <20170415014618.1592-1-jdarnley@obe.tv> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/6] avcodec/h264: add avx 8-bit h264_idct_dc_add X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext --- libavcodec/x86/h264_idct.asm | 20 ++++++++++++++++++++ libavcodec/x86/h264dsp_init.c | 2 ++ 2 files changed, 22 insertions(+) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index 24fb4d2..43f7791 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -1158,7 +1158,27 @@ INIT_XMM avx movd [%7+%8], %4 %endmacro +%macro DC_ADD_INIT 1 + add %1d, 32 + sar %1d, 6 + movd m0, %1d + pshuflw m0, m0, 0 + lea %1, [3*stride_q] + pxor m1, m1 + psubw m1, m0 + packuswb m0, m0 + packuswb m1, m1 +%endmacro + cglobal h264_idct_add_8, 3, 3, 8, dst_, block_, stride_ movsxdifnidn stride_q, stride_d IDCT4_ADD dst_q, block_q, stride_q RET + +cglobal h264_idct_dc_add_8, 3, 4, 6, dst_, block_, stride_ + movsxdifnidn stride_q, stride_d + movsx r3d, word [block_q] + mov dword [block_q], 0 + DC_ADD_INIT r3 + DC_ADD_MMXEXT_OP movd, dst_q, stride_q, r3 +RET diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c index 8ba085f..bf74937 100644 --- a/libavcodec/x86/h264dsp_init.c +++ b/libavcodec/x86/h264dsp_init.c @@ -35,6 +35,7 @@ IDCT_ADD_FUNC(, 8, mmx) IDCT_ADD_FUNC(, 8, avx) IDCT_ADD_FUNC(, 10, sse2) IDCT_ADD_FUNC(_dc, 8, mmxext) +IDCT_ADD_FUNC(_dc, 8, avx) IDCT_ADD_FUNC(_dc, 10, mmxext) IDCT_ADD_FUNC(8_dc, 8, mmxext) IDCT_ADD_FUNC(8_dc, 10, sse2) @@ -340,6 +341,7 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth, } c->h264_idct_add = ff_h264_idct_add_8_avx; + c->h264_idct_dc_add = ff_h264_idct_dc_add_8_avx; } } else if (bit_depth == 10) { if (EXTERNAL_MMXEXT(cpu_flags)) {