From patchwork Mon Jun 19 15:11:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 4039 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.22.4 with SMTP id 4csp961482vsw; Mon, 19 Jun 2017 08:23:22 -0700 (PDT) X-Received: by 10.28.127.10 with SMTP id a10mr16395921wmd.36.1497885802735; Mon, 19 Jun 2017 08:23:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497885802; cv=none; d=google.com; s=arc-20160816; b=iaJNfNWS6Rz/BsH+1XgdbmyHeNYQsid1LCnsjvIdvsnIn2aw8CCRQrk64pZzYpx86+ J6JLbYMyV4Mgk2zZN7i1IrY32ffef0WabCVZgQeoTAl2+GUiGJ+67RjYGmAfou+usTu7 tDKASlZb1jKLq2cHDGpEUJuYvzA/SHO+A+L2ab0TQleVYDem3oYGMT1cqg6zcG+lyyyE 3zueaq13+gcx2dL4/RdY9kwEnct70GwOGqTKCspLILz2BHtLOVjPtoeQwUixxX4n1vwg HHNqDbtwdVmVtVQ51bkEGBbO8YtCfHWPG7Tk4VCTAyWPiDiyjqAGvUI1eXZiqGeTpxKg Bj/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=sf9qKbPkQ902NgAuGxx9Q79r77L6Zdyq5D0F/uj8Mko=; b=tjssy/J7A5n9qzAFzmfAgthmYG0O25NSIbkYDUM6y2c8/yTjkGRTAmyg8NYHtVkqjz 6CDEVBTJ0bzeOF48XvaJ4ZhQdPF9scOBTQUEvXP7HNbr5rDcUJXbBTJ5diVV2MFTR0wo hptZeNuQGXQ2Ex1VUWJqNgCqkiwyPaOn1W33hCOFoEKq4BT6Zcu+Shz7p/AZ0Qa2Y8Zs YCl+r8VZRcXAV+WBATt8kfcI49J36Wl1lTRpnymbNqpAsekgAUkWL5iH5UWlhMFQzN2q SuOFwINbKH99q+pGpn5crLHjvIXW447A7sWihbTNsw469qPfsTdtiEPRoPhhNBB0yfWw CKBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com header.b=yBum8Rd7; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r41si10389298wrb.298.2017.06.19.08.23.22; Mon, 19 Jun 2017 08:23:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com header.b=yBum8Rd7; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6BAC768A4FD; Mon, 19 Jun 2017 18:23:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f65.google.com (mail-wm0-f65.google.com [74.125.82.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A32DF68A4EF for ; Mon, 19 Jun 2017 18:23:08 +0300 (EEST) Received: by mail-wm0-f65.google.com with SMTP id d17so17096073wme.3 for ; Mon, 19 Jun 2017 08:23:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=kBr0hsfhvuOeB/xMRfnIzPkRMh4JxL5/mc6IJMYoJTA=; b=yBum8Rd7X3OZocRT2PpTGBh/UKEynUSQLssb5J9rPIJI5yVYnu7+8FnCdQRw8IJeoA 9rbSsPF1qK5S2sgAxJpg1S9GRTps3pjio0m4VDijJdTDT+boplDHWreldc6LrEqR5Pvf 6HjGFX5abpxujspNo70AZEnKUjyZ31YXev3amMZ378oopOfmmxBNkjwNxpJRB1h5RMpg vqbyZIxvgubtYT5oRjXRDHl5RFc2elLVzoXXx8AnxE6DK+gGBs+ppwFBYnWK2BLVq8s9 W84aF9BgfHWreVzhoAUreqdFfL3T6rXwtNKkPYMkQUFJMYbbNF8UrRwzKc3beDwMiUnW yW2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=kBr0hsfhvuOeB/xMRfnIzPkRMh4JxL5/mc6IJMYoJTA=; b=pe7RmaTFilW/srVqPipeJujv+ZlVVCk9Lo6Cz1k3rLGnej+k1xxS0sxPl8e6S2Yc1e IvUfeOa+3jcBFHxmleKVTzDeiafcxhvR67DJilkLyxA4b9FJ8NxQQlUuuYlNUHkUMr7x gQPjm6Xl2DNM4TmuRL3c4a6MeUmfZMkcH2rpBAr3gkWK74I8t0vwpx57dd4dNl6taMDO R86PJmnyNrAiRnUzM19jus3FgAUc9ro6j9RvtTddt8/nP5kduzclH+nsa2O0c/E30PFk ApSL6ewz3oLYzJ7iUf4uRw5G4ElKf1b9lYHtauOpRKE7EzgtmqnTTDeHSAiFJCFHT87F cYIw== X-Gm-Message-State: AKS2vOxqcudRJu2B2OdWsRtAjzTsfsVcj7qaDzCklgrTb7gk9+qf9f5M YBJJzJYPOWwJIO9Ot9g= X-Received: by 10.28.234.79 with SMTP id i76mr16546046wmh.3.1497885463748; Mon, 19 Jun 2017 08:17:43 -0700 (PDT) Received: from Ifrit.systemlords.lan (d51a44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id 6sm8059540wrg.61.2017.06.19.08.17.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Jun 2017 08:17:43 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Mon, 19 Jun 2017 17:11:01 +0200 Message-Id: <20170619151104.31273-9-jdarnley@obe.tv> X-Mailer: git-send-email 2.13.1 In-Reply-To: <20170619151104.31273-1-jdarnley@obe.tv> References: <20170619151104.31273-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 08/11] avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libavcodec/x86/proresdsp.asm | 18 ++++++++++-------- libavcodec/x86/simple_idct10.asm | 29 +++++++++++++++++++++-------- libavcodec/x86/simple_idct10_template.asm | 19 +++++++++++++++++++ 3 files changed, 50 insertions(+), 16 deletions(-) diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm index 3be0ff7757..65c9fad51c 100644 --- a/libavcodec/x86/proresdsp.asm +++ b/libavcodec/x86/proresdsp.asm @@ -33,14 +33,14 @@ cextern pw_1 cextern pw_4 cextern pw_1019 ; Below are defined in simple_idct10.asm built from selecting idctdsp -cextern w4_plus_w2 -cextern w4_min_w2 -cextern w4_plus_w6 -cextern w4_min_w6 -cextern w1_plus_w3 -cextern w3_min_w1 -cextern w7_plus_w3 -cextern w3_min_w7 +cextern w4_plus_w2_hi +cextern w4_min_w2_hi +cextern w4_plus_w6_hi +cextern w4_min_w6_hi +cextern w1_plus_w3_hi +cextern w3_min_w1_hi +cextern w7_plus_w3_hi +cextern w3_min_w7_hi cextern w1_plus_w5 cextern w5_min_w1 cextern w5_plus_w7 @@ -50,6 +50,8 @@ cextern w7_min_w5 SECTION .text +define_constants _hi + %macro idct_fn 0 cglobal prores_idct_put_10, 4, 4, 15, pixels, lsize, block, qmat IDCT_FN pw_1, 15, pw_88, 18, "put", pw_4, pw_1019, r3 diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm index 1a5a2eae9b..b492303a57 100644 --- a/libavcodec/x86/simple_idct10.asm +++ b/libavcodec/x86/simple_idct10.asm @@ -46,28 +46,41 @@ times 4 dw %2, %3 %define W2sh2 21407 ; W2 = 85627 = 21407<<2 - 1 %define W3sh2 19265 ; W3 = 77062 = 19265<<2 + 2 %define W4sh2 16384 ; W4 = 65535 = 16384<<2 - 1 +%define W3sh2_lo 19266 +%define W4sh2_lo 16383 %define W5sh2 12873 ; W5 = 51491 = 12873<<2 - 1 %define W6sh2 8867 ; W6 = 35468 = 8867<<2 %define W7sh2 4520 ; W7 = 18081 = 4520<<2 + 1 -CONST_DEC w4_plus_w2, W4sh2, +W2sh2 -CONST_DEC w4_min_w2, W4sh2, -W2sh2 -CONST_DEC w4_plus_w6, W4sh2, +W6sh2 -CONST_DEC w4_min_w6, W4sh2, -W6sh2 -CONST_DEC w1_plus_w3, W1sh2, +W3sh2 -CONST_DEC w3_min_w1, W3sh2, -W1sh2 -CONST_DEC w7_plus_w3, W7sh2, +W3sh2 -CONST_DEC w3_min_w7, W3sh2, -W7sh2 +CONST_DEC w4_plus_w2_hi, W4sh2, +W2sh2 +CONST_DEC w4_min_w2_hi, W4sh2, -W2sh2 +CONST_DEC w4_plus_w6_hi, W4sh2, +W6sh2 +CONST_DEC w4_min_w6_hi, W4sh2, -W6sh2 +CONST_DEC w1_plus_w3_hi, W1sh2, +W3sh2 +CONST_DEC w3_min_w1_hi, W3sh2, -W1sh2 +CONST_DEC w7_plus_w3_hi, W7sh2, +W3sh2 +CONST_DEC w3_min_w7_hi, W3sh2, -W7sh2 CONST_DEC w1_plus_w5, W1sh2, +W5sh2 CONST_DEC w5_min_w1, W5sh2, -W1sh2 CONST_DEC w5_plus_w7, W5sh2, +W7sh2 CONST_DEC w7_min_w5, W7sh2, -W5sh2 +CONST_DEC w4_plus_w2_lo, W4sh2_lo, +W2sh2 +CONST_DEC w4_min_w2_lo, W4sh2_lo, -W2sh2 +CONST_DEC w4_plus_w6_lo, W4sh2_lo, +W6sh2 +CONST_DEC w4_min_w6_lo, W4sh2_lo, -W6sh2 +CONST_DEC w1_plus_w3_lo, W1sh2, +W3sh2_lo +CONST_DEC w3_min_w1_lo, W3sh2_lo, -W1sh2 +CONST_DEC w7_plus_w3_lo, W7sh2, +W3sh2_lo +CONST_DEC w3_min_w7_lo, W3sh2_lo, -W7sh2 %include "libavcodec/x86/simple_idct10_template.asm" SECTION .text %macro idct_fn 0 + +define_constants _hi + cglobal simple_idct10, 1, 1, 16, block IDCT_FN "", 12, "", 19, "store" RET diff --git a/libavcodec/x86/simple_idct10_template.asm b/libavcodec/x86/simple_idct10_template.asm index 8367011dfd..d8ea0bcc6b 100644 --- a/libavcodec/x86/simple_idct10_template.asm +++ b/libavcodec/x86/simple_idct10_template.asm @@ -26,6 +26,25 @@ %if ARCH_X86_64 +%macro define_constants 1 + %undef w4_plus_w2 + %undef w4_min_w2 + %undef w4_plus_w6 + %undef w4_min_w6 + %undef w1_plus_w3 + %undef w3_min_w1 + %undef w7_plus_w3 + %undef w3_min_w7 + %define w4_plus_w2 w4_plus_w2%1 + %define w4_min_w2 w4_min_w2%1 + %define w4_plus_w6 w4_plus_w6%1 + %define w4_min_w6 w4_min_w6%1 + %define w1_plus_w3 w1_plus_w3%1 + %define w3_min_w1 w3_min_w1%1 + %define w7_plus_w3 w7_plus_w3%1 + %define w3_min_w7 w3_min_w7%1 +%endmacro + ; interleave data while maintaining source ; %1=type, %2=dstlo, %3=dsthi, %4=src, %5=interleave %macro SBUTTERFLY3 5