From patchwork Wed Apr 5 18:53:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ronald S. Bultje" X-Patchwork-Id: 3308 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.44.195 with SMTP id s186csp323949vss; Wed, 5 Apr 2017 11:54:07 -0700 (PDT) X-Received: by 10.28.215.18 with SMTP id o18mr20979637wmg.98.1491418447697; Wed, 05 Apr 2017 11:54:07 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g67si1472131wmf.42.2017.04.05.11.54.07; Wed, 05 Apr 2017 11:54:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B8BC368830A; Wed, 5 Apr 2017 21:54:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt0-f193.google.com (mail-qt0-f193.google.com [209.85.216.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1EE196882AD for ; Wed, 5 Apr 2017 21:53:56 +0300 (EEST) Received: by mail-qt0-f193.google.com with SMTP id n37so2837406qtb.3 for ; Wed, 05 Apr 2017 11:53:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=MgVYdvvLU2zKrIBZnBEkvrWM+gyZEL9RZDMwV/D8+JM=; b=bcw0/trV7v1iCZByYnmBDjRaYswYBTns07BRmcjnyN567VdIX1Tn5VHIyf51AAcC+/ kEU3sOthQ6Vko4dbkqbb70KHTj7a9Oz0Cgb9aZYFsH0sfMQARD0+uioCN/fcMAoWrQ8W cAxMTqy8a1xKKA0eFdzptWGjKrWM/OEab6wRPxrWtXLNEE83335/D4OwtX4N09Lsdrnv HabFcXjtNv6P5ZnWN5cSX2+HrhHHq0GSw8O66XbOR4izC6cE5b6l68gT71iAUnCH1A6E b++lR+g8zngEK9a3QKCHemDa+/rkIJOskLVpG9ikRnG2DtzBnz79qfDHqnui62RK+4gV K7Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=MgVYdvvLU2zKrIBZnBEkvrWM+gyZEL9RZDMwV/D8+JM=; b=gFs56OSUNsx8+pGAFuq2mAg07vyG0xbU6M2I0+hzsD2LpnPr0jjBUMYCWZXp4omgBm +rGvdcZzW1rZgrhNvgjh1mf4GXC2iwm5pvpnrEjvcmohw3759elgoMaR2qhaSQ5IYhB2 riYB028W8Uu+yKvF4Kqct5EHNUtdQh5hCoZzApe7ezqnmEq0PGvehfqpZOqDZeJsT945 toRgxfub2xJYZO+yEDUzpfse87SthikMuzQx5IIPvEe8GSEotJoLb/ttEepfuEGDCO2v 8kWy1vB3RJ1L45y7ftATwz5iUOLQVqJsCURclZKQbjN6PG2UCIOWiMEtzZd5lVe8tw50 7toA== X-Gm-Message-State: AFeK/H1aQu4ylwBBMFRv4PJtEd2DOMXT2+tA68lL0xC/F9h9AcnwULJf4WUHFOMGSRzvmw== X-Received: by 10.200.3.157 with SMTP id t29mr30522417qtg.110.1491418437694; Wed, 05 Apr 2017 11:53:57 -0700 (PDT) Received: from localhost.localdomain ([65.206.95.146]) by smtp.gmail.com with ESMTPSA id g15sm14566485qte.58.2017.04.05.11.53.56 (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 05 Apr 2017 11:53:57 -0700 (PDT) From: "Ronald S. Bultje" To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Apr 2017 14:53:55 -0400 Message-Id: <1491418435-47849-1-git-send-email-rsbultje@gmail.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: References: Subject: [FFmpeg-devel] [PATCH] x86/simple_idct: add explicit sse2 simple_idct_put/add versions. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: "Ronald S. Bultje" MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers. --- libavcodec/x86/idctdsp_init.c | 38 ++++++++++++++++++++++++++------------ libavcodec/x86/simple_idct.c | 15 +++++++++++++-- libavcodec/x86/simple_idct.h | 3 +++ 3 files changed, 42 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index bcf7e5b..3f078e8 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -63,27 +63,41 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, { int cpu_flags = av_get_cpu_flags(); - if (INLINE_MMX(cpu_flags)) { - if (!high_bit_depth && - avctx->lowres == 0 && - (avctx->idct_algo == FF_IDCT_AUTO || - avctx->idct_algo == FF_IDCT_SIMPLEAUTO || - avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { - c->idct_put = ff_simple_idct_put_mmx; - c->idct_add = ff_simple_idct_add_mmx; - c->idct = ff_simple_idct_mmx; - c->perm_type = FF_IDCT_PERM_SIMPLE; - } - } if (EXTERNAL_MMX(cpu_flags)) { c->put_signed_pixels_clamped = ff_put_signed_pixels_clamped_mmx; c->put_pixels_clamped = ff_put_pixels_clamped_mmx; c->add_pixels_clamped = ff_add_pixels_clamped_mmx; + + if (INLINE_MMX(cpu_flags)) { + if (!high_bit_depth && + avctx->lowres == 0 && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { + c->idct_put = ff_simple_idct_put_mmx; + c->idct_add = ff_simple_idct_add_mmx; + c->idct = ff_simple_idct_mmx; + c->perm_type = FF_IDCT_PERM_SIMPLE; + } + } } + if (EXTERNAL_SSE2(cpu_flags)) { c->put_signed_pixels_clamped = ff_put_signed_pixels_clamped_sse2; c->put_pixels_clamped = ff_put_pixels_clamped_sse2; c->add_pixels_clamped = ff_add_pixels_clamped_sse2; + + if (INLINE_SSE2(cpu_flags)) { + if (!high_bit_depth && + avctx->lowres == 0 && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { + c->idct_put = ff_simple_idct_put_sse2; + c->idct_add = ff_simple_idct_add_sse2; + c->perm_type = FF_IDCT_PERM_SIMPLE; + } + } } if (ARCH_X86_64 && avctx->lowres == 0) { diff --git a/libavcodec/x86/simple_idct.c b/libavcodec/x86/simple_idct.c index d3a19fa..1155920 100644 --- a/libavcodec/x86/simple_idct.c +++ b/libavcodec/x86/simple_idct.c @@ -24,6 +24,7 @@ #include "libavutil/x86/asm.h" #include "libavcodec/idctdsp.h" +#include "libavcodec/x86/idctdsp.h" #include "idctdsp.h" #include "simple_idct.h" @@ -907,12 +908,22 @@ void ff_simple_idct_mmx(int16_t *block) void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block) { idct(block); - ff_put_pixels_clamped(block, dest, line_size); + ff_put_pixels_clamped_mmx(block, dest, line_size); } void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block) { idct(block); - ff_add_pixels_clamped(block, dest, line_size); + ff_add_pixels_clamped_mmx(block, dest, line_size); +} +void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) +{ + idct(block); + ff_put_pixels_clamped_sse2(block, dest, line_size); +} +void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) +{ + idct(block); + ff_add_pixels_clamped_sse2(block, dest, line_size); } #endif /* HAVE_INLINE_ASM */ diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index ad76baf..d17ef6a 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -26,6 +26,9 @@ void ff_simple_idct_mmx(int16_t *block); void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block);