From patchwork Sat Jun 3 00:18:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 3811 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.2 with SMTP id 2csp431194vsk; Fri, 2 Jun 2017 17:26:43 -0700 (PDT) X-Received: by 10.28.134.3 with SMTP id i3mr1028698wmd.68.1496449603711; Fri, 02 Jun 2017 17:26:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1496449603; cv=none; d=google.com; s=arc-20160816; b=OA+5TnCMnF2kjBJ/O25kHjJUodp1FBqvC0Ld3KK0Ae3yLdRmuQimSCMrhPIPjbjMQM nh8Lz3mM7evII4yqBlsiyZBAv7pIPF7U+u4tZvbuqqKp6HUVrLZG0269MWFh2wgffOJw NiRts9nNvmoSV/rdSlpMrtveoTO5OhH7FDInJURRqRcrrUDt7681K8f6nCZKSu7MKOZX GbzgIIigJx/TTf2UDPTxhBB3EUY6GwW8EnBvU8hdn+wGhexzyNtJnheWJHbtN5LmoUOL Z1D/TXVeayj55JY1dvS65FB1LtTWw4TnSF76sRE/kEzO6ThtHBDSbSTXDNzzfAaT2o9I gFAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=CC+N1bPd1B0GJ95oYq5WYQhl5Xiyfv+nx5bfhDJbkzk=; b=RHZlq7f74XHfwgp/2IS7llcxpIawYgsYW/LmxrG1P6gJIIemFfIeIrExLAPQ6IWnKk ONbGqY8Bb34OSbMPaQ04TYBoDmjHEh87Sp8cRS55PkF1iq2VKTjbmkwEsEonf9EIx+QV auZA7vCf3mgHOqGRu5BMmSRnQj6rX505E3Jl4K6PncOlwaDVDanx2HtJq1OYxcCqdyOQ SkuQJjRY5KDF6BdQjxRsEmZf3FyAbcgZfhQu9MSoat6bPkAL65qCMtj3pVWrJunzxTJ8 tLESC3+lQ5nvdruW4APejL5gwSBLWXAFhwvnAX8+nA1RQXyJwss3pMuaKBH5c5CR+RaS 6wxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d5si24151255wrb.215.2017.06.02.17.26.43; Fri, 02 Jun 2017 17:26:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 87160689CAD; Sat, 3 Jun 2017 03:26:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D9D4D689C66 for ; Sat, 3 Jun 2017 03:26:29 +0300 (EEST) Received: by mail-wm0-f66.google.com with SMTP id g15so21199586wmc.2 for ; Fri, 02 Jun 2017 17:26:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=yzuGAf7k+6K4ToRbmYIH4tRdvkyAXVqUyAzNahwNdMU=; b=AZn7vw6rYh6scpVkaT2sHs+jM4tKfkUyjos2ngCLIiHJ5VjPI6GhRYaGFWzVzYTL1O GkjICwB66cmJjkzJi6Hgo/BNugxUbkfEwFyUqAv4oabwlLAToz0BnuWIp3HnU0Qpo925 HGYVEajuJKI9FahzDtfFebwHbECF5sX81C/9rwTbJkR0IyFEmyIz1Ylmd9lrbA+TtCVW 3D48qJ3Fez1UJIZAYnoEUlrCnai4sstG/25ZJ4V9L/zqp6XzLD5hh2ZTwrzjZApZ3CoH MEUzG+h3mF1TNTNaNH3bUQYiqwNFar5FeWq+mqBYsuUNRn4781nXhwPIpYeXl0v7g7wj ElzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=yzuGAf7k+6K4ToRbmYIH4tRdvkyAXVqUyAzNahwNdMU=; b=E/0eimpc3hzwhUu5eOh1eNjQ9yABQnZud6X1CEI9+Pl2NZaptydn2gdU7WG70OJSYM reY1dQzCz+biDTDjxIKrDROl1JogTrEiraStd3iWs6dL+CgAcMncJOeDY/E1bwSUA183 o2dPJpzrb/bMr55goyCRvSDfLnL/E2zyrZnDE8ZvkR1s0u+PnkOnexDO6NQCfKtA9Xag 8uo+lo4buwQYkFWoLyxgYNZcr/saf/xDtAi6+8h62fl9EUoX955MCCXRqVlIkr4RF2Hk RZobrvD8eWvFkYSl+en3cy0tVY6wL2eN+Psw0BU/SrG6fb9pgwceRecAYAt8dHwtYl5y mvlw== X-Gm-Message-State: AODbwcAoajcFkHmkuEv2bcJZYCSqyaPSv0pt/AhKVO41VV1OCttZbF0r 45aIxN0Akowbp2qg76I= X-Received: by 10.28.71.201 with SMTP id m70mr977787wmi.89.1496449106405; Fri, 02 Jun 2017 17:18:26 -0700 (PDT) Received: from Highwind.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id 25sm23284762wrz.8.2017.06.02.17.18.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Jun 2017 17:18:26 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Sat, 3 Jun 2017 02:18:08 +0200 Message-Id: <20170603001809.13960-6-jdarnley@obe.tv> X-Mailer: git-send-email 2.12.2 In-Reply-To: <20170603001809.13960-1-jdarnley@obe.tv> References: <20170603001809.13960-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 5/6] add x86_64 8-bit simple_idct function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavcodec/tests/x86/dct.c | 2 ++ libavcodec/x86/idctdsp_init.c | 10 ++++++++++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 6 ++++++ 4 files changed, 21 insertions(+) diff --git a/libavcodec/tests/x86/dct.c b/libavcodec/tests/x86/dct.c index 97116570f4..a9b949f2b1 100644 --- a/libavcodec/tests/x86/dct.c +++ b/libavcodec/tests/x86/dct.c @@ -88,10 +88,12 @@ static const struct algo idct_tab_arch[] = { #if HAVE_YASM #if ARCH_X86_64 #if HAVE_SSE2_EXTERNAL + { "SIMPLE8-SSE2", ff_simple_idct8_sse2, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_SSE2}, { "SIMPLE10-SSE2", ff_simple_idct10_sse2, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_SSE2}, { "SIMPLE12-SSE2", ff_simple_idct12_sse2, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_SSE2, 1 }, #endif #if HAVE_AVX_EXTERNAL + { "SIMPLE8-AVX", ff_simple_idct8_avx, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_AVX}, { "SIMPLE10-AVX", ff_simple_idct10_avx, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_AVX}, { "SIMPLE12-AVX", ff_simple_idct12_avx, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_AVX, 1 }, #endif diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 82530a5cc4..1e30496da0 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -95,6 +95,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, c->idct = ff_simple_idct_sse2; c->perm_type = FF_IDCT_PERM_SIMPLE; } + + if (ARCH_X86_64 && + !high_bit_depth && + avctx->lowres == 0 && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLE)) { + c->idct = ff_simple_idct8_sse2; + c->perm_type = FF_IDCT_PERM_TRANSPOSE; + } } if (ARCH_X86_64 && avctx->lowres == 0) { diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index b19e910372..7a26e96b60 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -30,6 +30,9 @@ void ff_simple_idct_sse2(int16_t *block); void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct8_sse2(int16_t *block); +void ff_simple_idct8_avx(int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block); diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm index 7cfd33eaa3..b4b47afcee 100644 --- a/libavcodec/x86/simple_idct10.asm +++ b/libavcodec/x86/simple_idct10.asm @@ -33,9 +33,11 @@ cextern pw_2 cextern pw_16 cextern pw_1023 cextern pw_4095 +pd_round_11: times 4 dd 1<<(11-1) pd_round_12: times 4 dd 1<<(12-1) pd_round_15: times 4 dd 1<<(15-1) pd_round_19: times 4 dd 1<<(19-1) +pd_round_20: times 4 dd 1<<(20-1) %macro CONST_DEC 3 const %1 @@ -68,6 +70,10 @@ CONST_DEC w7_min_w5, W7sh2, -W5sh2 SECTION .text %macro idct_fn 0 +cglobal simple_idct8, 1, 1, 16, block + IDCT_FN "", 11, "", 20 + RET + cglobal simple_idct10, 1, 1, 16, block IDCT_FN "", 12, "", 19 RET