From patchwork Tue Nov 29 11:52:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 1600 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.90.1 with SMTP id o1csp2387819vsb; Tue, 29 Nov 2016 03:54:56 -0800 (PST) X-Received: by 10.28.9.80 with SMTP id 77mr24190751wmj.68.1480420496092; Tue, 29 Nov 2016 03:54:56 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id rq20si58875341wjb.203.2016.11.29.03.54.55; Tue, 29 Nov 2016 03:54:56 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 68D44689E42; Tue, 29 Nov 2016 13:54:43 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wj0-f195.google.com (mail-wj0-f195.google.com [209.85.210.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E3A45689E11 for ; Tue, 29 Nov 2016 13:54:36 +0200 (EET) Received: by mail-wj0-f195.google.com with SMTP id jb2so17817563wjb.3 for ; Tue, 29 Nov 2016 03:54:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=+x9QAnZhmnX5fbEfSqQi/uUi+YEY0FsNsYfKU/L81cI=; b=jyi3UBGXNLcJvagUapKfqi4BQGMbVqNEuuAMMIb7+XdZYWjT/ww/IJgtv/R4XNZqqg 1nkhqp6IiikL+woWmKvUsRV1Fa2jeOcJLD6gHSL7InPrg/j6llvOysWhuDCC2lSAZbZQ q3t8t6mJMvkG5p5UhdjZxvnSMVlfFGAh0a6Hg8Ckl4WSjGbTi+PDahKfQv1gDIh9gDY9 0owys/GfMcIbcH5KyzPGIlc+UzHNjJJ2tNnIHivTCWX69z/XpL/NzJvSuMN+KCjFYiE+ 33+Dgi5+obSc91MBbJk10zXMVGxacs8qsVw8cWDj6fcqMZNUK0LOSDm2qxMw/UGDteQW BS1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=+x9QAnZhmnX5fbEfSqQi/uUi+YEY0FsNsYfKU/L81cI=; b=R34rWgeK3MfpXGCixWHMTxRUI/yGNd4P47gtjNbkGhxW9zHqjNVE5x7NswK1UDr48e uuY9ICs3L9cLcNpW8NkJXLjn3x3hM+DEk59e6ekFgev6/5RLYdcpQGIp2MfIwsp/9yKq uT8YXh3MdLSIn4180kg4nplyo+gOoh9zgWx90z/0S5a4RQdqbGD8xUqjSaSrNxDRjgT1 GMv/8uZI15btpiTFpoZEMq+SAGnQYDMr07C9tBs+zKQj1rKTAC4kQ5E1wg6HRBQ2bVq4 j4lVz0ZjaBAGBoElWYqsZbd/SJBHzHvAZokoANeLaz8dkYSRGKEtOqS6bQ6e4fOvE5i2 zCXA== X-Gm-Message-State: AKaTC00Dbg/xabx/iP3LPeCYaS8cCG7B5Zx17htDggYcTqimntVw2jCJa3IvCQjegGNf1g== X-Received: by 10.194.96.135 with SMTP id ds7mr23387822wjb.29.1480420482480; Tue, 29 Nov 2016 03:54:42 -0800 (PST) Received: from Ifrit.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id 63sm2445688wmg.2.2016.11.29.03.54.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Nov 2016 03:54:42 -0800 (PST) From: James Darnley To: FFmpeg development discussions and patches Date: Tue, 29 Nov 2016 12:52:34 +0100 Message-Id: <20161129115235.8937-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.10.2 In-Reply-To: <20161129115235.8937-1-jdarnley@obe.tv> References: <20161129115235.8937-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 2/3] avcodec/h264: mmx 4:2:2 idct add8 function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: James Darnley MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" 2.87 times faster (1830 vs. 638 cycles) --- libavcodec/x86/h264_idct.asm | 32 ++++++++++++++++++++++++++++++++ libavcodec/x86/h264dsp_init.c | 7 ++++++- 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index 27222cb..c36fea5 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -697,6 +697,38 @@ cglobal h264_idct_add8_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride, call h264_idct_add8_mmx_plane RET +cglobal h264_idct_add8_422_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride, nnzc, cntr, coeff, dst2, picreg +; dst1, block_offset, block, stride, nnzc, cntr, coeff, dst2, picreg + movsxdifnidn r3, r3d +%ifdef PIC + lea picregq, [scan8_mem] +%endif +%if ARCH_X86_64 + mov dst2q, r0 +%endif + + mov r5, 16 ; i + add r2, 512 ; i * 16 * sizeof(dctcoef) ; #define dctcoef int16_t + + call h264_idct_add8_mmx_plane + add r5, 4 + call h264_idct_add8_mmx_plane + +%if ARCH_X86_64 + add dst2q, gprsize ; dest[1] +%else + add r0mp, gprsize +%endif + + add r5, 4 ; set to 32 + add r2, 256 ; set to i * 16 * sizeof(dctcoef) + + call h264_idct_add8_mmx_plane + add r5, 4 + call h264_idct_add8_mmx_plane + + RET + h264_idct_add8_mmxext_plane: movsxdifnidn r3, r3d .nextblock: diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c index 027c1ae..ed52c4d 100644 --- a/libavcodec/x86/h264dsp_init.c +++ b/libavcodec/x86/h264dsp_init.c @@ -78,6 +78,8 @@ IDCT_ADD_REP_FUNC2(, 8, 8, sse2) IDCT_ADD_REP_FUNC2(, 8, 10, sse2) IDCT_ADD_REP_FUNC2(, 8, 10, avx) +IDCT_ADD_REP_FUNC2(, 8_422, 8, mmx) + void ff_h264_luma_dc_dequant_idct_mmx(int16_t *output, int16_t *input, int qmul); void ff_h264_luma_dc_dequant_idct_sse2(int16_t *output, int16_t *input, int qmul); @@ -228,8 +230,11 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth, c->h264_idct_add16 = ff_h264_idct_add16_8_mmx; c->h264_idct8_add4 = ff_h264_idct8_add4_8_mmx; - if (chroma_format_idc <= 1) + if (chroma_format_idc <= 1) { c->h264_idct_add8 = ff_h264_idct_add8_8_mmx; + } else { + c->h264_idct_add8 = ff_h264_idct_add8_422_8_mmx; + } c->h264_idct_add16intra = ff_h264_idct_add16intra_8_mmx; if (cpu_flags & AV_CPU_FLAG_CMOV) c->h264_luma_dc_dequant_idct = ff_h264_luma_dc_dequant_idct_mmx;