From patchwork Sat Jun 10 11:46:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 3902 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.10.195 with SMTP id 186csp133419vsk; Sat, 10 Jun 2017 04:55:08 -0700 (PDT) X-Received: by 10.223.160.172 with SMTP id m41mr1894670wrm.176.1497095708310; Sat, 10 Jun 2017 04:55:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1497095708; cv=none; d=google.com; s=arc-20160816; b=IE2M9sNOos5+MrJzPC2oDfAH/v5MJomGeetXVyGPy+EK3BnaUC0xOfbguOwwF8z1Uv 1+qnWo8aT+F2ZTqf06K78S3StZOAwRkrTpspuKp+FQaNutZW87j2HhqIjqRfnLIZeCE2 Us66rbnbA0GPVmWpeD7VNPViGbwX0n1O0x+rPTurkqpL33cGGq6KIUKsN7uinavkAbGT YCJihNOKvw3X9jS9+eEIdMeoXEdvH/cgGpjY5qzQek/p7pbjrIvYbPNJvTeCFvhaP/cc YJrZF0nJgJdalK+iIeuDtNk2CFaTKaZNEpHoj7YqGrJTDzDGAzp4B1VJgAotUosow5X3 xe1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to:arc-authentication-results; bh=BAMj9qYNkRggT+tKJ10N0pCExdm/HoLmKnHieNuWreg=; b=GLjKA/rv4Fgd814BjdYL3cdfYdzSdsDq3ZTaSkpBQUIlj0CwO0aiwXUDHdN3JW2rwc T3j6FVUGX8lo2Kb9X/oZh/DwoC2WbOo2WZRhyFVun8OzFTqYZsdH9ss6mQNHl2LFC6e5 gyMTlIVbW517lLSZYlKkhS8LLmFZqpSjFgj8HDLpUmT2Hq9/HrFhOlLPNdZeKUXDeGY+ DD/Bb3fmSd4wLpxt4Q5NP1ZaYADR+BAEZgbPw2Ss137tVZZYxbpkjzXHj0gKoWUWB9wL iBPS9kYVe+eh1Vbj8oZa0DPw2RpSo6g9cm+ZakCgxxQ5iZTHNrkx6eLpvuFp/0KaFWOV dDXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c22si4917925wra.270.2017.06.10.04.55.07; Sat, 10 Jun 2017 04:55:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@ob-encoder-com.20150623.gappssmtp.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5662D689F2C; Sat, 10 Jun 2017 14:55:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr0-f195.google.com (mail-wr0-f195.google.com [209.85.128.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E659B689EE1 for ; Sat, 10 Jun 2017 14:54:57 +0300 (EEST) Received: by mail-wr0-f195.google.com with SMTP id v104so11915266wrb.0 for ; Sat, 10 Jun 2017 04:54:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ob-encoder-com.20150623.gappssmtp.com; s=20150623; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=AsV399Yh96gGSlEi5cwxTCa0g8l8gRfPh22k22FGu/I=; b=SkhnhreUgs/PaWshakOGTzafSo4PEXcfv4Veze3XE0S++HpKNaeFuYCvpw8osMmrQp of6cuH6onobIp4Lh0ggheSIlzCLX7ZxYRKNIJ4L6zZCO1eGX8wXrjiGYNdBgK0r6j6A0 9OM4VAZqsGuNnbg/NZE5kaDQaVOj1goKaG7z4pzKZ11+FQJAoG7+B6bemH60HthLxcql cEE1yFrW/VPfIysbuFjFhXgkNyEV+jMIGgqEG6znzJYLDPk1S4hejU0sTmI6dHytej71 E1b9+SLsXaWqpo8ACnET0lMUoK29ZuGtJNPJZ6rsQSFDkLq1iq4pl4ypWvvj2yWU8HtV 2NTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=AsV399Yh96gGSlEi5cwxTCa0g8l8gRfPh22k22FGu/I=; b=haAaW/P32FDYzuUHsWf/l4HRtHz38ioHk3M3ybHTguDco6WtvkHw3ArRjtSUHwF2lf bA5TkVLBEb0QOxTWMgYDLvQel/LVZ/WKtvIPD4SxwQaZdDje5Nb0mK5mx+ACXi/LddLC NYIsNuxG8A+tISZlJRiBuH1947v7A2o8aJznvd2vpWt+8lRUx3ukqSlEOLVfwCO7v9FV AbAtIcuKgsZf6r8th0SqvnhxJ+FtP7vRE0KXWK9boALohCTTmhiqdZF2Dl9kOqLxMD77 99J6ib80hVTEexkmRAgD+r0h/LAEKLOCGEWb4CAOkLcmBpIHHd/r0qh4GEaOQsjeAfvl g7wQ== X-Gm-Message-State: AODbwcBXb9q3MRQu1BrRIMczxKm5mpVaLRwRRCwg4ZGhm8TY9EgnqLea aUZkPHUtBvvoK4N0mRg= X-Received: by 10.223.139.158 with SMTP id o30mr1879179wra.178.1497095222306; Sat, 10 Jun 2017 04:47:02 -0700 (PDT) Received: from Highwind.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id c71sm1575231wmh.21.2017.06.10.04.47.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Jun 2017 04:47:01 -0700 (PDT) From: James Darnley To: FFmpeg development discussions and patches Date: Sat, 10 Jun 2017 13:46:41 +0200 Message-Id: <20170610114644.3138-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.13.0 In-Reply-To: <20170610114644.3138-1-jdarnley@obe.tv> References: <20170610114644.3138-1-jdarnley@obe.tv> Subject: [FFmpeg-devel] [PATCH 2/5] avcodec/x86: add x86-64 8-bit simple_idct function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Rounding contributed by Ronald S. Bultje --- libavcodec/tests/x86/dct.c | 2 ++ libavcodec/x86/idctdsp_init.c | 19 +++++++++++++++++++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 8 ++++++++ 4 files changed, 32 insertions(+) diff --git a/libavcodec/tests/x86/dct.c b/libavcodec/tests/x86/dct.c index 34f5b8767b..317d973f9f 100644 --- a/libavcodec/tests/x86/dct.c +++ b/libavcodec/tests/x86/dct.c @@ -88,10 +88,12 @@ static const struct algo idct_tab_arch[] = { #if HAVE_YASM #if ARCH_X86_64 #if HAVE_SSE2_EXTERNAL + { "SIMPLE8-SSE2", ff_simple_idct8_sse2, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_SSE2}, { "SIMPLE10-SSE2", ff_simple_idct10_sse2, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_SSE2}, { "SIMPLE12-SSE2", ff_simple_idct12_sse2, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_SSE2, 1 }, #endif #if HAVE_AVX_EXTERNAL + { "SIMPLE8-AVX", ff_simple_idct8_avx, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_AVX}, { "SIMPLE10-AVX", ff_simple_idct10_avx, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_AVX}, { "SIMPLE12-AVX", ff_simple_idct12_avx, FF_IDCT_PERM_TRANSPOSE, AV_CPU_FLAG_AVX, 1 }, #endif diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index f1c915aa00..4b2145e478 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -94,9 +94,28 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, c->idct_add = ff_simple_idct_add_sse2; c->perm_type = FF_IDCT_PERM_SIMPLE; } + + if (ARCH_X86_64 && + !high_bit_depth && + avctx->lowres == 0 && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { + c->idct = ff_simple_idct8_sse2; + c->perm_type = FF_IDCT_PERM_TRANSPOSE; + } } if (ARCH_X86_64 && avctx->lowres == 0) { + if (EXTERNAL_AVX(cpu_flags) && + !high_bit_depth && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { + c->idct = ff_simple_idct8_avx; + c->perm_type = FF_IDCT_PERM_TRANSPOSE; + } + if (avctx->bits_per_raw_sample == 10 && (avctx->idct_algo == FF_IDCT_AUTO || avctx->idct_algo == FF_IDCT_SIMPLEAUTO || diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index d17ef6a462..d17a855312 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -29,6 +29,9 @@ void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct8_sse2(int16_t *block); +void ff_simple_idct8_avx(int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block); diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm index 7cfd33eaa3..2807731b54 100644 --- a/libavcodec/x86/simple_idct10.asm +++ b/libavcodec/x86/simple_idct10.asm @@ -33,9 +33,11 @@ cextern pw_2 cextern pw_16 cextern pw_1023 cextern pw_4095 +pd_round_11: times 4 dd 1<<(11-1) pd_round_12: times 4 dd 1<<(12-1) pd_round_15: times 4 dd 1<<(15-1) pd_round_19: times 4 dd 1<<(19-1) +pd_round_20: times 4 dd 1<<(20-1) %macro CONST_DEC 3 const %1 @@ -50,6 +52,8 @@ times 4 dw %2, %3 %define W6sh2 8867 ; W6 = 35468 = 8867<<2 %define W7sh2 4520 ; W7 = 18081 = 4520<<2 + 1 +pw_round_20_div_w4: times 8 dw ((1 << (20 - 1)) / W4sh2) + CONST_DEC w4_plus_w2, W4sh2, +W2sh2 CONST_DEC w4_min_w2, W4sh2, -W2sh2 CONST_DEC w4_plus_w6, W4sh2, +W6sh2 @@ -68,6 +72,10 @@ CONST_DEC w7_min_w5, W7sh2, -W5sh2 SECTION .text %macro idct_fn 0 +cglobal simple_idct8, 1, 1, 16, block + IDCT_FN "", 11, pw_round_20_div_w4, 20 +RET + cglobal simple_idct10, 1, 1, 16, block IDCT_FN "", 12, "", 19 RET