From patchwork Fri Mar 17 13:18:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 2983 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.50.79 with SMTP id y76csp260805vsy; Fri, 17 Mar 2017 06:25:24 -0700 (PDT) X-Received: by 10.28.131.77 with SMTP id f74mr2808205wmd.109.1489757124323; Fri, 17 Mar 2017 06:25:24 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j38si11203552wra.42.2017.03.17.06.25.23; Fri, 17 Mar 2017 06:25:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A5AA3688365; Fri, 17 Mar 2017 15:24:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6F0D868836C for ; Fri, 17 Mar 2017 15:24:48 +0200 (EET) Received: by mail-wm0-f66.google.com with SMTP id u132so3282497wmg.1 for ; Fri, 17 Mar 2017 06:25:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gEBGQvMKzvh8HCAYTT/zL0XEZFWyi0fOqIQBI4LwtJ8=; b=xJIvh342vMZXOJG1YX8eqTcNnaPZcV0pxUWOSDXZaFEzjtBfoTx+PmbJd9Asen7l7X Rd9iMKccWgkwbFE/CHNt0v2IlIyfWxNMq/mc/KdOt9e7C1z2v2WrEgLOindvp6+DEkEM I9CPhlTqY8AMybxT8Lm49X5rmEKdteiuLJqxSTf8a/14hjT7ATwy7uTgUijtMr++ZsHX yCTAeGXI30M0sWR+bKh3i5zSWUYyXFVH+6VvTuD4b+RQYvnq7ANiybpDRCXje/2hluFB 0VOjutjTS2+mE3vaLOijwZ8n7gQHZyyUUPGVJjKPzl61Bh7rI2N4/0DP70BBwxEzxKax HrEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=gEBGQvMKzvh8HCAYTT/zL0XEZFWyi0fOqIQBI4LwtJ8=; b=ORJUxJYmSqNGzTNpuFkjFpGE6jSC7uVhhWtedvYcQOVVBUh5pLYNoqoGAxJdN2OQoM rtrKYnqB0l8vVXHZapGo2N3W7OpFvlYdLQUnJp6HVcc1Fcjbu8cuDxRd+O+C5QeZy/OJ DIPw24gxNpytgxjej7RJD9a/4hTqmpWNvlPdluX2htQJJkEyIS6kcZXFLWOT9myQlsXi VOFyZ302vqnjIfC1+QXgkc+BtvoHA3hZRsKFuSb7CtkCeEfK5XsMEkWDNeGEuLWepS/P 955uh5wYpI+t9c76hMD8qmtJxAGJ8l1MOUFb5yizgGhOcHKyLwMVympvdLr8vDqI+kIJ 4O6Q== X-Gm-Message-State: AFeK/H01ePVICh3ALo45akdTWNyBK9uNhDB+PVO/dUtqQdn49AqJklii6eVKxXg3CUUVVg== X-Received: by 10.28.180.195 with SMTP id d186mr2910141wmf.112.1489756770625; Fri, 17 Mar 2017 06:19:30 -0700 (PDT) Received: from localhost.localdomain (d51a44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id e72sm2676411wma.5.2017.03.17.06.19.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Mar 2017 06:19:30 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Fri, 17 Mar 2017 14:18:43 +0100 Message-Id: <20170317131845.7760-9-jdarnley@obe.tv> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20170317131845.7760-1-jdarnley@obe.tv> References: <20170317131845.7760-1-jdarnley@obe.tv> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 08/10] h264_idct_add16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" 1.01x faster (2150±46.1 vs. 2118±29.0 decicycles) compared with sse2 --- libavcodec/x86/h264_idct.asm | 40 +++++++++++++++++++++++++++++++++++++++- libavcodec/x86/h264dsp_init.c | 2 ++ 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index a74e095..f1f2ce7 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -858,7 +858,7 @@ h264_add8x4_idct_sse2: %else add r0, r0m %endif - call h264_add8x4_idct_sse2 + call h264_add8x4_idct_ %+ cpuname %%skip: %if %1 < 7 add r2, 64 @@ -1142,6 +1142,29 @@ IDCT_DC_DEQUANT 7 INIT_XMM avx +ALIGN 16 +; r0 = uint8_t *dst (clobbered), r2 = int16_t *block, r3 = int stride +h264_add8x4_idct_avx: + movu m0, [r2 + 0] + movu m1, [r2 + 32] + movu m2, [r2 + 16] + movu m3, [r2 + 48] + SBUTTERFLY qdq, 0, 1, 4 + SBUTTERFLY qdq, 2, 3, 5 + IDCT4_1D w,0,1,2,3,4,5 + TRANSPOSE2x4x4W 0,1,2,3,4 + paddw m0, [pw_32] + IDCT4_1D w,0,1,2,3,4,5 + pxor m7, m7 + mova [r2+ 0], m7 + mova [r2+16], m7 + mova [r2+32], m7 + mova [r2+48], m7 + STORE_DIFFx2 m0, m1, m4, m5, m7, 6, r0, r3 + lea r0, [r0+r3*2] + STORE_DIFFx2 m2, m3, m4, m5, m7, 6, r0, r3 +ret + ; %unmacro STORE_DIFFx2 8 ; remove macro from x86util.asm but yasm doesn't have this yet %macro STORE_DIFFx2 8 ; add1, add2, reg1, reg2, zero, shift, source, stride movd %3, [%7] @@ -1199,3 +1222,18 @@ cglobal h264_idct8_dc_add_8, 3, 4, 0 lea dst_q, [dst_q + stride_q*4] DC_ADD_MMXEXT_OP movq, dst_q, stride_q, r3 RET + +cglobal h264_idct_add16_8, 5, 5 + ARCH_X86_64, 8, dst_, block_offset_, block_, stride_, nnzc_ + movsxdifnidn stride_q, stride_d + %if ARCH_X86_64 + mov r5, r0 + %endif + add16_sse2_cycle 0, 0xc + add16_sse2_cycle 1, 0x14 + add16_sse2_cycle 2, 0xe + add16_sse2_cycle 3, 0x16 + add16_sse2_cycle 4, 0x1c + add16_sse2_cycle 5, 0x24 + add16_sse2_cycle 6, 0x1e + add16_sse2_cycle 7, 0x26 +RET diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c index de7becf..3396fd8 100644 --- a/libavcodec/x86/h264dsp_init.c +++ b/libavcodec/x86/h264dsp_init.c @@ -62,6 +62,7 @@ IDCT_ADD_REP_FUNC(8, 4, 10, avx) IDCT_ADD_REP_FUNC(, 16, 8, mmx) IDCT_ADD_REP_FUNC(, 16, 8, mmxext) IDCT_ADD_REP_FUNC(, 16, 8, sse2) +IDCT_ADD_REP_FUNC(, 16, 8, avx) IDCT_ADD_REP_FUNC(, 16, 10, sse2) IDCT_ADD_REP_FUNC(, 16intra, 8, mmx) IDCT_ADD_REP_FUNC(, 16intra, 8, mmxext) @@ -346,6 +347,7 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth, c->h264_idct8_add = ff_h264_idct8_add_8_avx; c->h264_idct_dc_add = ff_h264_idct_dc_add_8_avx; c->h264_idct8_dc_add = ff_h264_idct8_dc_add_8_avx; + c->h264_idct_add16 = ff_h264_idct_add16_8_avx; } } else if (bit_depth == 10) { if (EXTERNAL_MMXEXT(cpu_flags)) {