From patchwork Mon Jun 12 13:36:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 3926 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.22.4 with SMTP id 4csp95768vsw; Mon, 12 Jun 2017 06:44:06 -0700 (PDT) X-Received: by 10.28.153.213 with SMTP id b204mr4664483wme.96.1497275046665; Mon, 12 Jun 2017 06:44:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497275046; cv=none; d=google.com; s=arc-20160816; b=Qwyk2fBNPn5/erMEoP1w/9vGUqHcLhQ6CFJ+llggq2WhT7EkU6JKibFGCVPI2wz/fJ qYyrx3k4No0WSCv6/EKivRjnzhFjv7E2rN4wj5oL3qyhKDakqNOTVE+/tKXHbs0D+JWC lwZDhrhodvtypwS+dcqYly1k1i+rGONk3Kd6Y755wWyAxFORz79NyQbK+qq2WMhdiCiJ KboT8Vlq+ZdPcV8b+K4Ac3Vl5iNH2KsUUBwS2eH09oTROngoIiew8FEFTlbmhxhR+gMI flsizNJ5eiEFivy7+iNW3uqAMTMMC43RWhyY6HVr9tjUs8t8YHj2hSW0tAFUuv9YnhF0 IcIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=TnK0oNLF3SmskTR4NA7avSqNEqKvak6CqTSl5YCl2LE=; b=YXUfWgiQu5/luHW9Nx7lZeq6lh+1I9bZiiKAgLHh5XVLzCL7ZEpXXoCFG4hxXKX4Un SUQnEwy6Cj3N4irauJNlcPK9KEYCfWpqtJPrscNsyGKCXBrWlMHbZR13gC8CTc74MFe4 K5Wjk4em9FHOBFiP1WZawK++z8ictixCgRltQaOrwKvO6hVBqLZf1zxNvyVjRiFpet6c 9zy4W5g9/0ZZI0F2E1ghEVw1a5wL0dKXRg07uSsrquEjC5fz8VtB2PVowB5O1qdq+2di 6ZQrb5BvTkQ7fkGQUVK7A2QzfFiHl3TFsD0nEXluLpavPqwEgGhJLOOuWDcxaciZCFjT gIpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u27si8264885wru.73.2017.06.12.06.44.02; Mon, 12 Jun 2017 06:44:06 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5014C689F94; Mon, 12 Jun 2017 16:43:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr0-f195.google.com (mail-wr0-f195.google.com [209.85.128.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CF484689F83 for ; Mon, 12 Jun 2017 16:43:51 +0300 (EEST) Received: by mail-wr0-f195.google.com with SMTP id u101so22136931wrc.1 for ; Mon, 12 Jun 2017 06:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=Id+drMQrF0B9PZoJV+voU42AJjO1Ffo9C4Z+FuigjGw=; b=zxoFF2N2Ag4HxoQ+I+ERO6vJBOpAXjSHuPrU325minmOgtUPGrT0dEvzCdmGeAiW0w A/OPtNqpfd7nmZ3YRwnqvC2hkwgD2N0l5rD2LW+2M05qktFmVrpTV/Jl2s2Q0EILbAkj yMaS/yke/H3ITkiUy0zJN73SlRRURhA324rA9P4fnTzSb0Epbps28PB84+lxHKgrYzVV qkOC81D4XhuJHvUmGNuiQXeuALEdh0r91xTM4rWxP3K3cVrDbeVEfp7cMxvuZ6jnLGhf E2CqAIkzVjJ/q1fnd+3+h/HNdhqc96yeb+EGgCXJEE9LloelyY0a/Ij5j+6oKJKHiwaC 9FAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=Id+drMQrF0B9PZoJV+voU42AJjO1Ffo9C4Z+FuigjGw=; b=gzlUa68iVqcIpO3nIOFZ88ffEtgfTMQv/eInM94uHVP04BDmKU4hZHPVfVxL0eHqJN TIVYGpeM16NM45gaCbs+IQpvAxtPsELWDhT0I19qGjnudlT0lduTADBXvW5cZC6KTMYv y89pxS+MJ+gbofdtbw5x7f2j/N7Tm3z0YuHQ91/cnrXTsR8VVtEQlkFRMX9va8xUU2ut /kHo+FIlWI2IqI1pNUm0nVn2CW7Mo6b7tyE2oVARqgwbs8N/s6LkDcCihyxvjGbsMXuF WL3k4wpFQtBNdkbqfPdo8eNvl5q0/BVxpupq7dVdMfvM/p6XnDnaxAzodVHQAFIw2M9R 9t4w== X-Gm-Message-State: AODbwcA6yj3QFyW7NOgwNwA435E10AloL6BZ7OsFwdpzshZaEsbetC7A f+As1St1t3Rx/9PQEzE= X-Received: by 10.80.145.182 with SMTP id g51mr30663339eda.179.1497274584469; Mon, 12 Jun 2017 06:36:24 -0700 (PDT) Received: from Highwind.systemlords.lan (d51a44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id t27sm4691091edh.1.2017.06.12.06.36.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Jun 2017 06:36:24 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Mon, 12 Jun 2017 15:36:07 +0200 Message-Id: <20170612133609.24172-5-jdarnley@obe.tv> X-Mailer: git-send-email 2.13.0 In-Reply-To: <20170612133609.24172-1-jdarnley@obe.tv> References: <20170612133609.24172-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 4/6] avcodec/x86: add x86-64 8-bit simple_idct put function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavcodec/x86/idctdsp_init.c | 2 ++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 23 +++++++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 4b2145e478..1826d01e0e 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -102,6 +102,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, avctx->idct_algo == FF_IDCT_SIMPLEAUTO || avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { c->idct = ff_simple_idct8_sse2; + c->idct_put = ff_simple_idct8_put_sse2; c->perm_type = FF_IDCT_PERM_TRANSPOSE; } } @@ -113,6 +114,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, avctx->idct_algo == FF_IDCT_SIMPLEAUTO || avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { c->idct = ff_simple_idct8_avx; + c->idct_put = ff_simple_idct8_put_avx; c->perm_type = FF_IDCT_PERM_TRANSPOSE; } diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index d17a855312..b559f8527c 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -32,6 +32,9 @@ void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) void ff_simple_idct8_sse2(int16_t *block); void ff_simple_idct8_avx(int16_t *block); +void ff_simple_idct8_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct8_put_avx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block); diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm index 168b6a08e0..f31fb5cfa5 100644 --- a/libavcodec/x86/simple_idct10.asm +++ b/libavcodec/x86/simple_idct10.asm @@ -71,11 +71,34 @@ CONST_DEC w7_min_w5, W7sh2, -W5sh2 SECTION .text +%macro STORE_HI_LO 12 + movq %1, %9 + movq %3, %10 + movq %5, %11 + movq %7, %12 + movhps %2, %9 + movhps %4, %10 + movhps %6, %11 + movhps %8, %12 +%endmacro + %macro idct_fn 0 cglobal simple_idct8, 1, 1, 16, block IDCT_FN "", 11, pw_round_20_div_w4, 20, "store" RET +; TODO: optimise by not writing the final data to the block. +cglobal simple_idct8_put, 3, 4, 16, pixels, lsize, block + IDCT_FN "", 11, pw_round_20_div_w4, 20 + lea r3, [3*lsizeq] + lea r2, [pixelsq + r3] + packuswb m8, m0 + packuswb m1, m2 + packuswb m4, m11 + packuswb m9, m10 + STORE_HI_LO PASS8ROWS(pixelsq, r2, lsizeq, r3), m8, m1, m4, m9 +RET + cglobal simple_idct10, 1, 1, 16, block IDCT_FN "", 12, "", 19, "store" RET