From patchwork Tue Apr 4 16:48:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ronald S. Bultje" X-Patchwork-Id: 3292 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.44.195 with SMTP id s186csp275810vss; Tue, 4 Apr 2017 09:50:33 -0700 (PDT) X-Received: by 10.28.61.11 with SMTP id k11mr15733620wma.119.1491324633873; Tue, 04 Apr 2017 09:50:33 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j41si25270531wre.45.2017.04.04.09.50.32; Tue, 04 Apr 2017 09:50:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AEF7F688312; Tue, 4 Apr 2017 19:50:13 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk0-f195.google.com (mail-qk0-f195.google.com [209.85.220.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 437E8680D1D for ; Tue, 4 Apr 2017 19:50:06 +0300 (EEST) Received: by mail-qk0-f195.google.com with SMTP id k139so13104707qke.2 for ; Tue, 04 Apr 2017 09:50:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7+uyDzJTlHE9efWIfgWpV21hx2F2q4sTsW9u9rcCaN8=; b=NAu9TJwtgXTbyMgE6J9vUXFXhNa4Xtf/I9wRPjhZEjB8xMaPcd1Mj7JuxO1At9NcID uUBXmSI1prJP56uzvuBukQ3LHCiUHtbx8fNLx1j6WMz/hFp5qVHJLThw4AjLIv5pB49w +JOGF96PTCvt2x0Mgma/tpER+mq0IoegQptAet45IVqrgFwPtm8Ip8+PWxgTfPl2WlzY rJ4Mkpzsr2NMtdbIteA6v53fUh/hQCw4tZKMzx3GQw/liKz3du8NlH2ka5dEH+iKiv4b rnd/9NodoYkqKZw7MSDL/mK3y9PE0d3fjJEVh6u+8fl97IgDgv2FMDKGxvjwkR9uOqp8 5brA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7+uyDzJTlHE9efWIfgWpV21hx2F2q4sTsW9u9rcCaN8=; b=tLYnXabYtM/h0f6u3/sJlmfL8LF0tLywqIHdAvGQ7iFlDFmWfeTkCs4PoXpSlbxTQU NVDKw+8FyYknhJFE6lzb38ilJZ1Tg+eLr0llFOZwtjUMVnHfOe9chNoN0A3gaO+ylcpe E970HkS5bgDq3rb1+Dcpzm1wggfLYZW+v4GawI0zNI3fCqCxek3+0Ufdddtu5BJwh8GL opw36ZhhREIxkH6u6gI7huB+VmJNRqgYbTcupVXTKiiZGcScDBtWlAAZ8rtMur7gUQ44 GVrMH7r287A6cms8i6hJu95xb7pJKKmLI7/gQ/U/8pNiytfdZEWjxy1DsJehRGfUVMe8 +sPA== X-Gm-Message-State: AFeK/H3VaoysHT3uXCXyxClKNPns/RV/+4IUTfbzFqTAvwvvVNuWwb5Bk/qEm1ZxhJyDZw== X-Received: by 10.55.155.11 with SMTP id d11mr6015868qke.155.1491324607233; Tue, 04 Apr 2017 09:50:07 -0700 (PDT) Received: from localhost.localdomain ([65.206.95.146]) by smtp.gmail.com with ESMTPSA id x19sm12205798qtc.23.2017.04.04.09.50.06 (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 04 Apr 2017 09:50:06 -0700 (PDT) From: "Ronald S. Bultje" To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Apr 2017 12:48:16 -0400 Message-Id: <1491324498-26655-4-git-send-email-rsbultje@gmail.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1491324498-26655-1-git-send-email-rsbultje@gmail.com> References: <1491324498-26655-1-git-send-email-rsbultje@gmail.com> Subject: [FFmpeg-devel] [PATCH 4/6] cavs: add a sse2 idct implementation. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: "Ronald S. Bultje" MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time. --- libavcodec/x86/cavsdsp.c | 15 +++++++++++++- libavcodec/x86/cavsidct.asm | 48 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/cavsdsp.c b/libavcodec/x86/cavsdsp.c index add4536..a8a198b 100644 --- a/libavcodec/x86/cavsdsp.c +++ b/libavcodec/x86/cavsdsp.c @@ -29,6 +29,7 @@ #include "libavutil/x86/cpu.h" #include "libavcodec/cavsdsp.h" #include "libavcodec/idctdsp.h" +#include "libavcodec/x86/idctdsp.h" #include "constants.h" #include "fpel.h" #include "idctdsp.h" @@ -43,7 +44,16 @@ static void cavs_idct8_add_mmx(uint8_t *dst, int16_t *block, ptrdiff_t stride) { LOCAL_ALIGNED(16, int16_t, b2, [64]); ff_cavs_idct8_mmx(b2, block); - ff_add_pixels_clamped(b2, dst, stride); + ff_add_pixels_clamped_mmx(b2, dst, stride); +} + +void ff_cavs_idct8_sse2(int16_t *out, const int16_t *in); + +static void cavs_idct8_add_sse2(uint8_t *dst, int16_t *block, ptrdiff_t stride) +{ + LOCAL_ALIGNED(16, int16_t, b2, [64]); + ff_cavs_idct8_sse2(b2, block); + ff_add_pixels_clamped_sse2(b2, dst, stride); } #endif /* HAVE_MMX_EXTERNAL */ @@ -446,6 +456,9 @@ av_cold void ff_cavsdsp_init_x86(CAVSDSPContext *c, AVCodecContext *avctx) if (EXTERNAL_SSE2(cpu_flags)) { c->put_cavs_qpel_pixels_tab[0][0] = put_cavs_qpel16_mc00_sse2; c->avg_cavs_qpel_pixels_tab[0][0] = avg_cavs_qpel16_mc00_sse2; + + c->cavs_idct8_add = cavs_idct8_add_sse2; + c->idct_perm = FF_IDCT_PERM_TRANSPOSE; } #endif } diff --git a/libavcodec/x86/cavsidct.asm b/libavcodec/x86/cavsidct.asm index 5421196..99b505d 100644 --- a/libavcodec/x86/cavsidct.asm +++ b/libavcodec/x86/cavsidct.asm @@ -29,11 +29,16 @@ cextern pw_64 SECTION .text -%macro CAVS_IDCT8_1D 2 ; source, round +%macro CAVS_IDCT8_1D 2-3 1 ; source, round, init_load +%if %3 == 1 mova m4, [%1+7*16] ; m4 = src7 mova m5, [%1+1*16] ; m5 = src1 mova m2, [%1+5*16] ; m2 = src5 mova m7, [%1+3*16] ; m7 = src3 +%else + SWAP 1, 7 + SWAP 4, 6 +%endif mova m0, m4 mova m3, m5 mova m6, m2 @@ -163,3 +168,44 @@ cglobal cavs_idct8, 2, 4, 8, 8 * 16, out, in, cnt, tmp jg .loop_2 RET + +INIT_XMM sse2 +cglobal cavs_idct8, 2, 2, 8, 0 - 8 * 16, out, in + CAVS_IDCT8_1D inq, [pw_4] + psraw m7, 3 + psraw m6, 3 + psraw m5, 3 + psraw m4, 3 + psraw m3, 3 + psraw m2, 3 + psraw m1, 3 + psraw m0, 3 +%if ARCH_X86_64 + TRANSPOSE8x8W 7, 5, 3, 1, 0, 2, 4, 6, 8 + mova [rsp+4*16], m0 +%else + mova [rsp+0*16], m4 + TRANSPOSE8x8W 7, 5, 3, 1, 0, 2, 4, 6, [rsp+0*16], [rsp+4*16], 1 +%endif + mova [rsp+0*16], m7 + mova [rsp+2*16], m3 + mova [rsp+6*16], m4 + CAVS_IDCT8_1D rsp, [pw_64], 0 + psraw m7, 7 + psraw m6, 7 + psraw m5, 7 + psraw m4, 7 + psraw m3, 7 + psraw m2, 7 + psraw m1, 7 + psraw m0, 7 + + mova [outq+0*16], m7 + mova [outq+1*16], m5 + mova [outq+2*16], m3 + mova [outq+3*16], m1 + mova [outq+4*16], m0 + mova [outq+5*16], m2 + mova [outq+6*16], m4 + mova [outq+7*16], m6 + RET