From patchwork Sat Jun 3 00:18:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 3809 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp430842vsk; Fri, 2 Jun 2017 17:25:27 -0700 (PDT) X-Received: by 10.223.135.90 with SMTP id 26mr6153661wrz.199.1496449527924; Fri, 02 Jun 2017 17:25:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1496449527; cv=none; d=google.com; s=arc-20160816; b=NOcasHSoNWCWD1VcwrFr8WOw9TdNReXrAiePY5zOS/tx4+f1plKj7u5fZhf+t5YnsJ zyrP3viwRRHMD2N8nfiuVMrx2dTTGY5XjNcLCdBl5SuV3yp9DIV7Z9/6mRHjDrcHbhX6 gtTJ1J7dD6rt8uD81f7/Dm4t7rFXNbl6jiYd3hWoPUJPupPR03u8eib6JSLZQP+2k0c4 nWvjldg+DdZkcG2/wk3FJGf699RDkyGQ/gxqEyHwalgl72u4rDpqbwkdtZjWUcvUjuBn MacDYiC/hdYTrrVT5P4vT8OjTRJSKTNu7lPBECRv0Bl6qZm4bPCFlYLTsAjtd4r13Lzd i7Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=7j8PvorYm3PMBeFAlfVUd4Ol0CnNwe6U406JyatGdmI=; b=geBXWZ1BbEts5TjNNeQWbOWSr1EuvfscWTcK+s4bXG8xWUTsmV4bp3PPX8NjpkkbSv MN3WGHLuJrCdDJbXD3heNw/TzdIWRVBk/+4BiLoGKs24wqK4SwVCdntihvzNuG/RTen9 0GGM0RyVIZA4b3vbtv7wPZtwGtYUHE0fso06t7u+7E5XgsMQoPTdIynvLgosEaGuACIb btBiEAnWpIFi6W+EdkqoDwB8J16zP7g6v5Czb+p9InigT/sbVEE8Ij5IJ4OYTUYPbLo1 AyX+e8bcmHcFmvRZSBsQ03r3xXToKVV/xxwslxSX7frIIgdtQD4PogTycGY1R68zT3pj 6+zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w83si231555wma.43.2017.06.02.17.25.27; Fri, 02 Jun 2017 17:25:27 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2603A68981A; Sat, 3 Jun 2017 03:25:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 83E9E680964 for ; Sat, 3 Jun 2017 03:25:12 +0300 (EEST) Received: by mail-wm0-f66.google.com with SMTP id b84so21234693wmh.0 for ; Fri, 02 Jun 2017 17:25:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=SMtKeDuma5rUrv8O2Bgioq1QBKnFIeme5VZ01aP6qHk=; b=fcKGPTNdncmapIPRwBhGctuwTgYiI98VIBMkRY4jo/5E60FaK+VSGOsFBmHAbQSo2W E18Thj9HA+pnaY6C37GD5/915s2AQSgZP03gEr14sPiA8cAR814xNRBNAyud/W4bhYZn hbkgKMA5gjYPaOCVpCL0ZMV/rprktVXF7H++ZKN5kXmQhrHtrxDg4e5Jx7rUeUTz15r7 FzrLWmsZJ4XrW5VYYR+GP4DxJPmm1d4MimbZaNmfolWf3PoHtQSpTDS2cpN9e2pE0XZc ye8txBJe5Yi608e/9rDTxJzyiBMhR8BcR1m/9Rg3lcEIh6CAPJJOkIMMKVgLQh/S2YZa Y1Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=SMtKeDuma5rUrv8O2Bgioq1QBKnFIeme5VZ01aP6qHk=; b=kkxqw1dYPfF/GkCciRa70v5i9AAcCQ5BK99ZGpTPyLg+T0adfYqYLxf8Djh3dnsXYG Lg/gli6urQKk1/0oVVfD7/lEF6Sr/RjEetYTaXWJEhASZmo9dlTJ17yEOyS0V74y+d/W 4lR1D6ki81teIrQDRrVShx7aGxYUFRKLgznf46AM8tEktVTw8ENXYu248/sOZpGuIdbh sZg7qXVrD7NmkVyb8RIo6sMM29gpcpjzeQMJ6fAcEFFkY10DduY9Il3FBEs3P8objwQX oq18njSZCANBq2fFlBHtlcsFAzYpI4SMQy3bqFHjKYO/55BbCEhaogX4czlQ0ZD0iWR5 EK1w== X-Gm-Message-State: AODbwcBwPogjDANSbCknRtMPdu3jxcq2yCbfVB9AAqmmlS462CHP1D4m 1PoqLRuHqzgfLGMCf2Y= X-Received: by 10.28.206.70 with SMTP id e67mr1020253wmg.37.1496449105011; Fri, 02 Jun 2017 17:18:25 -0700 (PDT) Received: from Highwind.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id 25sm23284762wrz.8.2017.06.02.17.18.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Jun 2017 17:18:24 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Sat, 3 Jun 2017 02:18:06 +0200 Message-Id: <20170603001809.13960-4-jdarnley@obe.tv> X-Mailer: git-send-email 2.12.2 In-Reply-To: <20170603001809.13960-1-jdarnley@obe.tv> References: <20170603001809.13960-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 3/6] add and fix xmm version of simple_idct X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavcodec/tests/x86/dct.c | 3 +++ libavcodec/x86/idctdsp_init.c | 1 + libavcodec/x86/simple_idct.asm | 45 ++++++++++++++++++++++++++++++++++++++++++ libavcodec/x86/simple_idct.h | 1 + 4 files changed, 50 insertions(+) diff --git a/libavcodec/tests/x86/dct.c b/libavcodec/tests/x86/dct.c index 34f5b8767b..97116570f4 100644 --- a/libavcodec/tests/x86/dct.c +++ b/libavcodec/tests/x86/dct.c @@ -97,6 +97,9 @@ static const struct algo idct_tab_arch[] = { #endif #endif #endif +#if HAVE_SSE2_EXTERNAL + { "SIMPLE-SSE2", ff_simple_idct_sse2, FF_IDCT_PERM_SIMPLE, AV_CPU_FLAG_SSE2 }, +#endif { 0 } }; diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index f1c915aa00..82530a5cc4 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -92,6 +92,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { c->idct_put = ff_simple_idct_put_sse2; c->idct_add = ff_simple_idct_add_sse2; + c->idct = ff_simple_idct_sse2; c->perm_type = FF_IDCT_PERM_SIMPLE; } } diff --git a/libavcodec/x86/simple_idct.asm b/libavcodec/x86/simple_idct.asm index 3b62a4f9d3..a6eb42464b 100644 --- a/libavcodec/x86/simple_idct.asm +++ b/libavcodec/x86/simple_idct.asm @@ -151,6 +151,10 @@ SECTION .text psrad m2, %7 packssdw m7, m1 ; A1+B1 a1+b1 A0+B0 a0+b0 packssdw m2, m4 ; A0-B0 a0-b0 A1-B1 a1-b1 +%if mmsize == 16 +pshufd m7, m7, 8 +pshufd m2, m2, 8 +%endif movq [%5], m7 movq m1, [blockq + %3] ; R3 R1 r3 r1 movq m4, [coeffs + 80] ; -C1 C5 -C1 C5 @@ -172,9 +176,15 @@ SECTION .text psubd m4, m3 ; a3-B3 a3-b3 psrad m6, %7 packssdw m2, m6 ; A3+B3 a3+b3 A2+B2 a2+b2 +%if mmsize == 16 +pshufd m2, m2, 8 +%endif movq [8 + %5], m2 psrad m4, %7 packssdw m4, m0 ; A2-B2 a2-b2 A3-B3 a3-b3 +%if mmsize == 16 +pshufd m4, m4, 8 +%endif movq [16 + %5], m4 jmp %%2 %%1: @@ -182,6 +192,9 @@ SECTION .text paddd m0, [d40000] psrad m0, 13 packssdw m0, m0 +%if mmsize == 16 +pshufd m0, m0, 8 +%endif movq [%5], m0 movq [8 + %5], m0 movq [16 + %5], m0 @@ -239,6 +252,10 @@ SECTION .text psrad m2, %7 packssdw m7, m1 ; A1+B1 a1+b1 A0+B0 a0+b0 packssdw m2, m4 ; A0-B0 a0-b0 A1-B1 a1-b1 +%if mmsize == 16 +pshufd m7, m7, 8 +pshufd m2, m2, 8 +%endif movq [%5], m7 movq m1, [blockq + %3] ; R3 R1 r3 r1 movq m4, [coeffs + 80] ; -C1 C5 -C1 C5 @@ -260,9 +277,15 @@ SECTION .text psubd m4, m3 ; a3-B3 a3-b3 psrad m6, %7 packssdw m2, m6 ; A3+B3 a3+b3 A2+B2 a2+b2 +%if mmsize == 16 +pshufd m2, m2, 8 +%endif movq [8 + %5], m2 psrad m4, %7 packssdw m4, m0 ; A2-B2 a2-b2 A3-B3 a3-b3 +%if mmsize == 16 +pshufd m4, m4, 8 +%endif movq [16 + %5], m4 %endmacro @@ -614,9 +637,15 @@ SECTION .text psrad m7, %6 psrad m3, %6 packssdw m4, m7 ; A0 a0 +%if mmsize == 16 +pshufd m4, m4, q0020 +%endif movq [%5], m4 psrad m0, %6 packssdw m0, m3 ; A1 a1 +%if mmsize == 16 +pshufd m0, m0, q0020 +%endif movq [16 + %5], m0 movq [96 + %5], m0 movq [112 + %5], m4 @@ -624,9 +653,15 @@ SECTION .text psrad m6, %6 psrad m2, %6 packssdw m5, m2 ; A2-B2 a2-b2 +%if mmsize == 16 +pshufd m5, m5, q0020 +%endif movq [32 + %5], m5 psrad m1, %6 packssdw m6, m1 ; A3+B3 a3+b3 +%if mmsize == 16 +pshufd m6, m6, q0020 +%endif movq [48 + %5], m6 movq [64 + %5], m6 movq [80 + %5], m5 @@ -711,9 +746,15 @@ SECTION .text movq m7, [coeffs + 32] ; C6 C2 C6 C2 psrad m1, %6 packssdw m4, m1 ; A0 a0 +%if mmsize == 16 +pshufd m4, m4, 8 +%endif movq [%5], m4 psrad m2, %6 packssdw m0, m2 ; A1 a1 +%if mmsize == 16 +pshufd m0, m0, 8 +%endif movq [16 + %5], m0 movq [96 + %5], m0 movq [112 + %5], m4 @@ -889,6 +930,10 @@ RET INIT_XMM sse2 +cglobal simple_idct, 1, 2, 8, 128, block, t0 + IDCT +RET + cglobal simple_idct_put, 3, 5, 8, 128, pixels, lsize, block, lsize3, t0 IDCT lea lsize3q, [lsizeq*3] diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index d17ef6a462..b19e910372 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -26,6 +26,7 @@ void ff_simple_idct_mmx(int16_t *block); void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_sse2(int16_t *block); void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block);