From patchwork Tue Mar 31 13:51:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 18542 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 9090244BA5E for ; Tue, 31 Mar 2020 16:58:15 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6522368B0AA; Tue, 31 Mar 2020 16:58:15 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1BEB268AFEB for ; Tue, 31 Mar 2020 16:58:09 +0300 (EEST) Received: by mail-wr1-f67.google.com with SMTP id h15so26054210wrx.9 for ; Tue, 31 Mar 2020 06:58:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=haUkkfsCCTFBIoPau4nCE1ZMHrb/g2I+vMsHEMxyPqE=; b=fNL++SEhMMvOxLAhCjgJmkbGgbDYkaQAZVGMRsOrW2Mya1rZuoe+uTfW2Zi0eUDIgN G3CASf4WpvZFNst9YPOSovfsdHtCKhMgv5VA0GggTXNJy2Cutcsr8AUcYuHV8+Vr7oi1 Gk4nErV2zPMlu5yg5UNyUjf9Rzb53gkMBDM/sP/XbnvoAOrTaltmiUW4lVoXFL/MSNsJ sbncee6jzaILwCms6lzWRlL0GNuxMMnGYdGlWDWobhw1+jrfFFbNS6O8BH3wTPOMlrG1 6oifUAWIczXenE+KDlkaEi339fyvUy+VSrNWkCEUKt/J4H+BdACHsUxEklmhUjtKMFCV qnIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=haUkkfsCCTFBIoPau4nCE1ZMHrb/g2I+vMsHEMxyPqE=; b=NERDVW7/sU8bnvn2FEz1gi+zQs7uCJBrojggK2CP+Y7QDaooEIg9Rk3JNVfzLzLDaW 5f7gC9qopL9zsnZBdnGfFzd0lni+yV1pmNAgjI6OXUvrPOiEvEwpc+t5QlR23mNZ04ex 4dM8RcNKhVZZo4IWy8YQCBhDcOPXpX887Ds6zGxWXvlYyqImn/GT0l1wJyaWBqawwFaM f4RfRtna0y9KMgQ0tkEszZXEm7OPd0v9txA34e1IYwZoGygRS7oI39EGWZo50Z/Z9sX2 58KXRrXxcobjSTlyfDQ9Jh0kSNMUZhOajM4horPfwlnzOT9hHoJFwWO2Vd5YQGN+7ALQ jlOw== X-Gm-Message-State: AGi0PuZTfHlYp8Hkb4CuraQMWh94rhd5ZaqjecrEaqBY685s8Opwtosx bXOTwUdc12uT7a6ShkBybQQXbBTV X-Google-Smtp-Source: ADFU+vuDXdtfgsirux+8wzBihAtHPI0grvV3NHw6K4UNQs0FauBoTDzOPHZ+wnD3UtgDiSGycACznQ== X-Received: by 2002:a5d:474b:: with SMTP id o11mr20480429wrs.391.1585662678481; Tue, 31 Mar 2020 06:51:18 -0700 (PDT) Received: from localhost.localdomain ([37.244.237.154]) by smtp.gmail.com with ESMTPSA id q8sm28518739wrc.8.2020.03.31.06.51.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Mar 2020 06:51:17 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Tue, 31 Mar 2020 15:51:06 +0200 Message-Id: <20200331135106.32490-2-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200331135106.32490-1-onemda@gmail.com> References: <20200331135106.32490-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/2] avfilter/vf_v360: add SIMD for lagrange interpolation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/x86/vf_v360.asm | 46 ++++++++++++++++++++++++++++++++++ libavfilter/x86/vf_v360_init.c | 6 +++++ 2 files changed, 52 insertions(+) diff --git a/libavfilter/x86/vf_v360.asm b/libavfilter/x86/vf_v360.asm index 5b241220d8..e1908e5e71 100644 --- a/libavfilter/x86/vf_v360.asm +++ b/libavfilter/x86/vf_v360.asm @@ -165,6 +165,52 @@ DEFINE_ARGS dst, width, src, x, u, v, ker %if ARCH_X86_64 +INIT_YMM avx2 +cglobal remap3_8bit_line, 7, 11, 8, dst, width, src, in_linesize, u, v, ker, x, y, tmp, z + movsxdifnidn widthq, widthd + xor zq, zq + xor yq, yq + xor xq, xq + movd xm0, in_linesized + pcmpeqw m7, m7 + vpbroadcastd m0, xm0 + vpbroadcastd m6, [pd_255] + + .loop: + pmovsxwd m1, [kerq + yq] + pmovsxwd m2, [vq + yq] + pmovsxwd m3, [uq + yq] + + pmulld m4, m2, m0 + paddd m4, m3 + mova m3, m7 + vpgatherdd m2, [srcq + m4], m3 + pand m2, m6 + pmulld m2, m1 + phaddd m2, m2 + phaddd m1, m2, m2 + vextracti128 xm2, m1, 1 + paddd m2, m1 + movzx tmpq, word [vq + yq + 16] + imul tmpq, in_linesizeq + movzx zq, word [uq + yq + 16] + add tmpq, zq + movzx zq, byte [srcq + tmpq] + movzx tmpq, word [kerq + yq + 16] + imul zq, tmpq + movd xm1, zd + paddd m2, m1 + psrld m2, m2, 0xe + + packuswb m2, m2 + pextrb [dstq+xq], xm2, 0 + + add xq, 1 + add yq, 18 + cmp xq, widthq + jl .loop + RET + INIT_YMM avx2 cglobal remap4_8bit_line, 7, 9, 11, dst, width, src, in_linesize, u, v, ker, x, y movsxdifnidn widthq, widthd diff --git a/libavfilter/x86/vf_v360_init.c b/libavfilter/x86/vf_v360_init.c index babc6c426a..83f58bb96a 100644 --- a/libavfilter/x86/vf_v360_init.c +++ b/libavfilter/x86/vf_v360_init.c @@ -29,6 +29,9 @@ void ff_remap1_8bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdi void ff_remap2_8bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdiff_t in_linesize, const int16_t *const u, const int16_t *const v, const int16_t *const ker); +void ff_remap3_8bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdiff_t in_linesize, + const int16_t *const u, const int16_t *const v, const int16_t *const ker); + void ff_remap4_8bit_line_avx2(uint8_t *dst, int width, const uint8_t *src, ptrdiff_t in_linesize, const int16_t *const u, const int16_t *const v, const int16_t *const ker); @@ -48,6 +51,9 @@ av_cold void ff_v360_init_x86(V360Context *s, int depth) if (EXTERNAL_AVX2_FAST(cpu_flags) && s->interp == BILINEAR && depth <= 8) s->remap_line = ff_remap2_8bit_line_avx2; + if (EXTERNAL_AVX2_FAST(cpu_flags) && s->interp == LAGRANGE && depth <= 8) + s->remap_line = ff_remap3_8bit_line_avx2; + if (EXTERNAL_AVX2_FAST(cpu_flags) && s->interp == NEAREST && depth > 8) s->remap_line = ff_remap1_16bit_line_avx2;