From patchwork Wed Apr 5 18:53:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ronald S. Bultje" X-Patchwork-Id: 3306 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.44.195 with SMTP id s186csp323732vss; Wed, 5 Apr 2017 11:53:30 -0700 (PDT) X-Received: by 10.223.146.132 with SMTP id 4mr24557146wrn.91.1491418410222; Wed, 05 Apr 2017 11:53:30 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b53si30239505wra.74.2017.04.05.11.53.28; Wed, 05 Apr 2017 11:53:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3522B6882FE; Wed, 5 Apr 2017 21:53:23 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt0-f193.google.com (mail-qt0-f193.google.com [209.85.216.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5369A688296 for ; Wed, 5 Apr 2017 21:53:17 +0300 (EEST) Received: by mail-qt0-f193.google.com with SMTP id r5so2837949qtb.2 for ; Wed, 05 Apr 2017 11:53:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5XnrUHF3v1teOEiFbv7pCD/vmyA7L5ofZZmqd1Rw62c=; b=aGWW2L9WfxVUDkAAXv6aVrkJYehgZuYEvwvgYvPmY1O2c7cufROiBsTCHB42iM32mQ pYaDIhD8zaO+//heSw8O/SnkfDGkGO1p0UHz1NrxTHu/x020kXrBQ1UZUAf24ZnUt2rT zRYH2S8+tSrhyHUoDekvUFG1ziMnlYIPfrLG+uvhSESBSKHKS9bVD5bhTJ63vtYdvDb6 58SElOAgJN9LDHc5vSipbx94EvCKC7tME1N4U7AAAXLOQftTFFnr9og2s4jOxL/FQv1R S6oBVD8ygYs8NUoyNeyJ5VgggQgsCvfQjNrKrIxDcEdXqVw3FwBA9GBsytlt7OShYDQf lNUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5XnrUHF3v1teOEiFbv7pCD/vmyA7L5ofZZmqd1Rw62c=; b=WAsqU+PNhHBCnw6ecaJJhuuts+mka/KA/DmASK4naZ5Mgyg2N3KUt98EOJqdpYpKA/ 9LsthOLDYampvQ+jIYbOtU1aUbgoSf5l8GQ+Pb6E78rIOBezsOk+UkD1g7Xbq/GKZ4+9 /zc0Hg4KsEJIzNDEQJEdiJwi8GNzsIMGvPEUVw/SnOn/4iDd2tC+6CE6WxQZfeocBsre e+y31X5WE+gPU+utXEY0ywK69yCLxCdugJhPkwtiqmipyAOZ1gkqG9SbykdpMtf0nl5g Apu97k+Ufs255/6/YlVLzk95X+8dJAo3sSo6PdaUAbx6ZnMHEgGbI9ayAUm73eLQQ7Xs 7QCw== X-Gm-Message-State: AFeK/H2zNeThgzxiMhwC4Ez84LeNPSGfJ36VdllcUTlaUD3xum2PivNRvUB4OHYkqtNL3g== X-Received: by 10.200.43.149 with SMTP id m21mr33630501qtm.102.1491418398732; Wed, 05 Apr 2017 11:53:18 -0700 (PDT) Received: from localhost.localdomain ([65.206.95.146]) by smtp.gmail.com with ESMTPSA id z42sm14509493qtb.20.2017.04.05.11.53.17 (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 05 Apr 2017 11:53:18 -0700 (PDT) From: "Ronald S. Bultje" To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Apr 2017 14:53:17 -0400 Message-Id: <1491418397-47797-1-git-send-email-rsbultje@gmail.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <20170405173944.GU4714@nb4> References: <20170405173944.GU4714@nb4> Subject: [FFmpeg-devel] [PATCH] cavs: add a sse2 idct implementation. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: "Ronald S. Bultje" MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time. --- libavcodec/x86/cavsdsp.c | 15 +++++++++++++- libavcodec/x86/cavsidct.asm | 48 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/cavsdsp.c b/libavcodec/x86/cavsdsp.c index add4536..a8a198b 100644 --- a/libavcodec/x86/cavsdsp.c +++ b/libavcodec/x86/cavsdsp.c @@ -29,6 +29,7 @@ #include "libavutil/x86/cpu.h" #include "libavcodec/cavsdsp.h" #include "libavcodec/idctdsp.h" +#include "libavcodec/x86/idctdsp.h" #include "constants.h" #include "fpel.h" #include "idctdsp.h" @@ -43,7 +44,16 @@ static void cavs_idct8_add_mmx(uint8_t *dst, int16_t *block, ptrdiff_t stride) { LOCAL_ALIGNED(16, int16_t, b2, [64]); ff_cavs_idct8_mmx(b2, block); - ff_add_pixels_clamped(b2, dst, stride); + ff_add_pixels_clamped_mmx(b2, dst, stride); +} + +void ff_cavs_idct8_sse2(int16_t *out, const int16_t *in); + +static void cavs_idct8_add_sse2(uint8_t *dst, int16_t *block, ptrdiff_t stride) +{ + LOCAL_ALIGNED(16, int16_t, b2, [64]); + ff_cavs_idct8_sse2(b2, block); + ff_add_pixels_clamped_sse2(b2, dst, stride); } #endif /* HAVE_MMX_EXTERNAL */ @@ -446,6 +456,9 @@ av_cold void ff_cavsdsp_init_x86(CAVSDSPContext *c, AVCodecContext *avctx) if (EXTERNAL_SSE2(cpu_flags)) { c->put_cavs_qpel_pixels_tab[0][0] = put_cavs_qpel16_mc00_sse2; c->avg_cavs_qpel_pixels_tab[0][0] = avg_cavs_qpel16_mc00_sse2; + + c->cavs_idct8_add = cavs_idct8_add_sse2; + c->idct_perm = FF_IDCT_PERM_TRANSPOSE; } #endif } diff --git a/libavcodec/x86/cavsidct.asm b/libavcodec/x86/cavsidct.asm index 5421196..6c768c2 100644 --- a/libavcodec/x86/cavsidct.asm +++ b/libavcodec/x86/cavsidct.asm @@ -29,11 +29,16 @@ cextern pw_64 SECTION .text -%macro CAVS_IDCT8_1D 2 ; source, round +%macro CAVS_IDCT8_1D 2-3 1 ; source, round, init_load +%if %3 == 1 mova m4, [%1+7*16] ; m4 = src7 mova m5, [%1+1*16] ; m5 = src1 mova m2, [%1+5*16] ; m2 = src5 mova m7, [%1+3*16] ; m7 = src3 +%else + SWAP 1, 7 + SWAP 4, 6 +%endif mova m0, m4 mova m3, m5 mova m6, m2 @@ -163,3 +168,44 @@ cglobal cavs_idct8, 2, 4, 8, 8 * 16, out, in, cnt, tmp jg .loop_2 RET + +INIT_XMM sse2 +cglobal cavs_idct8, 2, 2, 8 + ARCH_X86_64, 0 - 8 * 16, out, in + CAVS_IDCT8_1D inq, [pw_4] + psraw m7, 3 + psraw m6, 3 + psraw m5, 3 + psraw m4, 3 + psraw m3, 3 + psraw m2, 3 + psraw m1, 3 + psraw m0, 3 +%if ARCH_X86_64 + TRANSPOSE8x8W 7, 5, 3, 1, 0, 2, 4, 6, 8 + mova [rsp+4*16], m0 +%else + mova [rsp+0*16], m4 + TRANSPOSE8x8W 7, 5, 3, 1, 0, 2, 4, 6, [rsp+0*16], [rsp+4*16], 1 +%endif + mova [rsp+0*16], m7 + mova [rsp+2*16], m3 + mova [rsp+6*16], m4 + CAVS_IDCT8_1D rsp, [pw_64], 0 + psraw m7, 7 + psraw m6, 7 + psraw m5, 7 + psraw m4, 7 + psraw m3, 7 + psraw m2, 7 + psraw m1, 7 + psraw m0, 7 + + mova [outq+0*16], m7 + mova [outq+1*16], m5 + mova [outq+2*16], m3 + mova [outq+3*16], m1 + mova [outq+4*16], m0 + mova [outq+5*16], m2 + mova [outq+6*16], m4 + mova [outq+7*16], m6 + RET