From patchwork Sun Sep 15 16:15:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 15078 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id CC3014471E9 for ; Sun, 15 Sep 2019 19:23:36 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A73076881F0; Sun, 15 Sep 2019 19:23:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F28E368814F for ; Sun, 15 Sep 2019 19:23:29 +0300 (EEST) Received: by mail-wr1-f65.google.com with SMTP id o18so5247268wrv.13 for ; Sun, 15 Sep 2019 09:23:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=7dQpJRn0VD3g6Gzl/P43/hMO7NnMF4NqdhQ6x89KCzM=; b=U6+WPpqHLmQ1rt/iwTxoC4cJ1/mdg1R1omc3ohss6U2/jWf5Pd3oBPVnk3zSlwL7qK cJ4HYu+kXbEIpNgSQIYnAu+o8f5F8Y/tF3PG6lQ1xS9aCmg9lgwaAC+y6Jy3Z+okIU+X vXIlEuY99EQ0jFzoBwuEcVete1taa7fRrRHpP21P5RHfbbWgcH3IqH0b1478wXIcFR1A GwAk//ENum+cWLetclGDPXgo8neKy9xm5lZ0bv4WVsJwkP+ugyaXjZV2iRg0AQjU+M8t ZHU9L/Y9+dLj7beqinqUsy7Z6v9agJgiPMQ6otUTG02ZHJPN7cZzYf+aqGXF8DvDKyS3 W9eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=7dQpJRn0VD3g6Gzl/P43/hMO7NnMF4NqdhQ6x89KCzM=; b=oTTPYLuMlPZmHago5pke9Q5NrdsmhwmR6Rtq54DHoTZiajpQzwAMmaQi4lc7VYHBaJ vSVcSucmZt7thBks2D+YBaKWlxAZIrbsiBkka+HMlNd6PTgw1mpSOWE24tbu+cyBj2F5 kDii1TP0MQiU04zWQeb8oe2uc/R9gN9kmwWgTUqXCrjayRA4Jdzv/SAIrEOgaTeA2sR8 gLE3p+EH4vnwNbRvUtx6FvrHPIyBPfix3s2JzE5mOhxqksUhkQEBOdnuWM8wlgPwHAg4 o2lC7wKKCGY3wa9o+mpdQw6fk849pvx0gRohDRSTdJbjGBjeb8ywVDBG1dOtt+51yPqH fHSA== X-Gm-Message-State: APjAAAWlTYm2DHWG5JDJxSuHXiW4ecNUjqZG8ZKdMu/uLP3QHieYq7y9 RBQ69pgdfGRfLmO9l2E2E9NKIEIEzGU= X-Google-Smtp-Source: APXvYqxG5sGXnH0SGSFSs4GaEXFl8qqNga6d3/04gktYBCDQvr2Sx99c7RqcUV/8gBHNsQpbXoTleQ== X-Received: by 2002:adf:e951:: with SMTP id m17mr17741159wrn.154.1568564151303; Sun, 15 Sep 2019 09:15:51 -0700 (PDT) Received: from localhost.localdomain ([77.237.106.98]) by smtp.gmail.com with ESMTPSA id a10sm8710368wrm.52.2019.09.15.09.15.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 15 Sep 2019 09:15:50 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sun, 15 Sep 2019 18:15:41 +0200 Message-Id: <20190915161541.8562-1-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH] avfilter/x86/vf_360: add most of >8 depth asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/x86/vf_v360.asm | 67 ++++++++++++++++++++++++++++++++++ libavfilter/x86/vf_v360_init.c | 12 ++++++ 2 files changed, 79 insertions(+) diff --git a/libavfilter/x86/vf_v360.asm b/libavfilter/x86/vf_v360.asm index a0936eb6dc..5b241220d8 100644 --- a/libavfilter/x86/vf_v360.asm +++ b/libavfilter/x86/vf_v360.asm @@ -26,7 +26,9 @@ SECTION_RODATA pb_mask: db 0,4,8,12,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1 +pw_mask: db 0,1,4, 5, 8, 9,12,13,-1,-1,-1,-1,-1,-1,-1,-1 pd_255: times 4 dd 255 +pd_65535: times 4 dd 65535 SECTION .text @@ -60,6 +62,34 @@ cglobal remap1_8bit_line, 6, 7, 6, dst, width, src, in_linesize, u, v, x jl .loop RET +INIT_YMM avx2 +cglobal remap1_16bit_line, 6, 7, 6, dst, width, src, in_linesize, u, v, x + movsxdifnidn widthq, widthd + xor xq, xq + movd xm0, in_linesized + pcmpeqw m4, m4 + VBROADCASTI128 m3, [pw_mask] + vpbroadcastd m0, xm0 + + .loop: + pmovsxwd m1, [vq + xq * 2] + pmovsxwd m2, [uq + xq * 2] + + pslld m2, 0x1 + pmulld m1, m0 + paddd m1, m2 + mova m2, m4 + vpgatherdd m5, [srcq + m1], m2 + pshufb m1, m5, m3 + vextracti128 xm2, m1, 1 + movq [dstq+xq*2], xm1 + movq [dstq+xq*2+8], xm2 + + add xq, mmsize / 4 + cmp xq, widthq + jl .loop + RET + INIT_YMM avx2 cglobal remap2_8bit_line, 7, 8, 8, dst, width, src, in_linesize, u, v, ker, x movsxdifnidn widthq, widthd @@ -96,6 +126,43 @@ DEFINE_ARGS dst, width, src, x, u, v, ker jl .loop RET +INIT_YMM avx2 +cglobal remap2_16bit_line, 7, 8, 8, dst, width, src, in_linesize, u, v, ker, x + movsxdifnidn widthq, widthd + movd xm0, in_linesized +%if ARCH_X86_32 +DEFINE_ARGS dst, width, src, x, u, v, ker +%endif + xor xq, xq + pcmpeqw m7, m7 + vpbroadcastd m0, xm0 + vpbroadcastd m6, [pd_65535] + + .loop: + pmovsxwd m1, [kerq + xq * 8] + pmovsxwd m2, [vq + xq * 8] + pmovsxwd m3, [uq + xq * 8] + + pslld m3, 0x1 + pmulld m4, m2, m0 + paddd m4, m3 + mova m3, m7 + vpgatherdd m2, [srcq + m4], m3 + pand m2, m6 + pmulld m2, m1 + phaddd m2, m2 + phaddd m1, m2, m2 + psrld m1, m1, 0xe + vextracti128 xm2, m1, 1 + + pextrw [dstq+xq*2], xm1, 0 + pextrw [dstq+xq*2+2], xm2, 0 + + add xq, mmsize / 16 + cmp xq, widthq + jl .loop + RET + %if ARCH_X86_64 INIT_YMM avx2 diff --git a/libavfilter/x86/vf_v360_init.c b/libavfilter/x86/vf_v360_init.c index 8c1a10c705..c7f4a3dd6d 100644 --- a/libavfilter/x86/vf_v360_init.c +++ b/libavfilter/x86/vf_v360_init.c @@ -32,6 +32,12 @@ void ff_remap2_8bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdi void ff_remap4_8bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdiff_t in_linesize, const uint16_t *u, const uint16_t *v, const int16_t *ker); +void ff_remap1_16bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdiff_t in_linesize, + const uint16_t *u, const uint16_t *v, const int16_t *ker); + +void ff_remap2_16bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdiff_t in_linesize, + const uint16_t *u, const uint16_t *v, const int16_t *ker); + av_cold void ff_v360_init_x86(V360Context *s, int depth) { int cpu_flags = av_get_cpu_flags(); @@ -42,6 +48,12 @@ av_cold void ff_v360_init_x86(V360Context *s, int depth) if (EXTERNAL_AVX2_FAST(cpu_flags) && s->interp == BILINEAR && depth <= 8) s->remap_line = ff_remap2_8bit_line_avx2; + if (EXTERNAL_AVX2_FAST(cpu_flags) && s->interp == NEAREST && depth > 8) + s->remap_line = ff_remap1_16bit_line_avx2; + + if (EXTERNAL_AVX2_FAST(cpu_flags) && s->interp == BILINEAR && depth > 8) + s->remap_line = ff_remap2_16bit_line_avx2; + #if ARCH_X86_64 if (EXTERNAL_AVX2_FAST(cpu_flags) && (s->interp == BICUBIC || s->interp == LANCZOS) && depth <= 8)