From patchwork Tue Apr 4 16:48:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ronald S. Bultje" X-Patchwork-Id: 3291 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.44.195 with SMTP id s186csp275866vss; Tue, 4 Apr 2017 09:50:43 -0700 (PDT) X-Received: by 10.28.113.204 with SMTP id d73mr15959577wmi.15.1491324643269; Tue, 04 Apr 2017 09:50:43 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id o15si25262840wra.182.2017.04.04.09.50.42; Tue, 04 Apr 2017 09:50:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2715B68831C; Tue, 4 Apr 2017 19:50:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk0-f195.google.com (mail-qk0-f195.google.com [209.85.220.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E9D0F688305 for ; Tue, 4 Apr 2017 19:50:06 +0300 (EEST) Received: by mail-qk0-f195.google.com with SMTP id k139so13104739qke.2 for ; Tue, 04 Apr 2017 09:50:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=WmJFPtzn4NZxcJOgJYcgBEDI3RV5vKxTZ1YZtN8x8nk=; b=ix5n0gSDH7WMwTEYADTc9J9TUqJcIx8s4mn4D3XbD2DOmX/rGV5Rh/q5ou9kEXGPZQ 9EBj4hObembIdGBDnjw/BPHhVcTOU5R5TvmB8Wfnx0MTJ8gfnYB0nqAPBdao79RTW4zV sH+0npYVu83Uqxg3TsKTp20BU+ZYV7MWX9Cubt23HxLSwohYEq8hxfaWVcQ79wKuGrxE yZMk9gz/klM20Q1eywzdCYzHucA6eqU2Z5vIYX3ZWW4vaE19yCVgE7pTr8vd4HwhgGaa fcaYUaDzyBPBjW2Ta7mfR6JBxfZkk+ZoXv/0sNXIDnRT+YargfrWeOfNXgmo1XuL8vkF MPdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=WmJFPtzn4NZxcJOgJYcgBEDI3RV5vKxTZ1YZtN8x8nk=; b=KEOCZFFaZIiGvEDx4dZOtQVgPw1TdUHKKg5X2sBLwJrAw4HAZE7f4asAkNg0LyyZte VwBlCmnUyhE7RnZaAocx13U5fzD0Y4EBs2VXNUMi5U+vP5yiuvo/IhqdYyPDyCETrJ3f x0wnXjdTzESML3hW4WdziiIdfJO4tT8ELxPGZ9dZveZ6KSk89iLeJZmgTbHARNRhxMep NRQkMd5JqWqUcnEfB3rCiBNu2attmvmgZ79OGEn63tVZTJy/2PgKG+gvNEmZoHP1G++r jl7+nlNsf0HZfL52IyFC1iQd4Ar9SF6wJ/Xql8Y6lbcbvHyEUWgKFejd72XA3IYEUuT5 vjFw== X-Gm-Message-State: AFeK/H1N+eQ8tIvKdM5uqb6ro52OVbfa+x6WRM7INUUOYwk97gby6UUnRBWHP4wX+AgZOA== X-Received: by 10.55.170.215 with SMTP id t206mr13023382qke.303.1491324608032; Tue, 04 Apr 2017 09:50:08 -0700 (PDT) Received: from localhost.localdomain ([65.206.95.146]) by smtp.gmail.com with ESMTPSA id x19sm12205798qtc.23.2017.04.04.09.50.07 (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 04 Apr 2017 09:50:07 -0700 (PDT) From: "Ronald S. Bultje" To: ffmpeg-devel@ffmpeg.org Date: Tue, 4 Apr 2017 12:48:17 -0400 Message-Id: <1491324498-26655-5-git-send-email-rsbultje@gmail.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1491324498-26655-1-git-send-email-rsbultje@gmail.com> References: <1491324498-26655-1-git-send-email-rsbultje@gmail.com> Subject: [FFmpeg-devel] [PATCH 5/6] x86/simple_idct: add explicit sse2 simple_idct_put/add versions. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: "Ronald S. Bultje" MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers. --- libavcodec/x86/idctdsp_init.c | 10 ++++++++++ libavcodec/x86/simple_idct.c | 15 +++++++++++++-- libavcodec/x86/simple_idct.h | 3 +++ 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index bcf7e5b..579c5e7 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, } if (ARCH_X86_64 && avctx->lowres == 0) { + if (!high_bit_depth && + avctx->lowres == 0 && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { + c->idct_put = ff_simple_idct_put_sse2; + c->idct_add = ff_simple_idct_add_sse2; + c->perm_type = FF_IDCT_PERM_SIMPLE; + } + if (avctx->bits_per_raw_sample == 10 && (avctx->idct_algo == FF_IDCT_AUTO || avctx->idct_algo == FF_IDCT_SIMPLEAUTO || diff --git a/libavcodec/x86/simple_idct.c b/libavcodec/x86/simple_idct.c index d3a19fa..1155920 100644 --- a/libavcodec/x86/simple_idct.c +++ b/libavcodec/x86/simple_idct.c @@ -24,6 +24,7 @@ #include "libavutil/x86/asm.h" #include "libavcodec/idctdsp.h" +#include "libavcodec/x86/idctdsp.h" #include "idctdsp.h" #include "simple_idct.h" @@ -907,12 +908,22 @@ void ff_simple_idct_mmx(int16_t *block) void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block) { idct(block); - ff_put_pixels_clamped(block, dest, line_size); + ff_put_pixels_clamped_mmx(block, dest, line_size); } void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block) { idct(block); - ff_add_pixels_clamped(block, dest, line_size); + ff_add_pixels_clamped_mmx(block, dest, line_size); +} +void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) +{ + idct(block); + ff_put_pixels_clamped_sse2(block, dest, line_size); +} +void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) +{ + idct(block); + ff_add_pixels_clamped_sse2(block, dest, line_size); } #endif /* HAVE_INLINE_ASM */ diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index ad76baf..d17ef6a 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -26,6 +26,9 @@ void ff_simple_idct_mmx(int16_t *block); void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block);