From patchwork Sat Feb 13 11:10:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25606 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 39E02441BE4 for ; Sat, 13 Feb 2021 13:11:00 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1A5B968A636; Sat, 13 Feb 2021 13:11:00 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A604068819D for ; Sat, 13 Feb 2021 13:10:52 +0200 (EET) Received: by mail-ej1-f45.google.com with SMTP id u18so1492399ejf.6 for ; Sat, 13 Feb 2021 03:10:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=b3H2k5TIc4nlStRBwHbPj6hqFHoZ6O1ZOyWcRbFj/ck=; b=UYWhVeOG/Wxk0j3d0T7rdOwKBcb374AkuS9KDyz04pWwwpKHCcI645ThYfyQPyQJCk 0VuksiDS53cmgMACFUPQ0vOvYx0xp+d0QMdHWsxbpwetjiJAU8Evb/hDHEGdRGtEaANl yzzMiLbNm16qhkhfO+gPmZgHBFjlRH3n1H9ihCilcfszp4DtLvk9divYtWNXvN+ynTG5 Vw4S4e02JIqUAMIUrDSMx5NqFtr9Cads4+dkY8q64p0IC2JNPC/+H+giIJkfe7KjRoBa rqbqkTSPEdUNtd0C+lSgjkwot1H4bkaGv+B6nSXhoR3dr2I2vRbu5uCUcSbKUeyMwXs9 Dq5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=b3H2k5TIc4nlStRBwHbPj6hqFHoZ6O1ZOyWcRbFj/ck=; b=gh5wQluemS+5epJyDEP9P3PmnqzWhBAXvdO8roza1cOBIc0zEYbB/B9nE7KotRqKyp NV7Z79Yq66VW5WbxCkhUZjF3NGYPVC8IhjMsIOonft9rqOq/SFa6yQtzPQ4z1OSc2zib PL35I2lCtleBJolUMbKzFEAOx51kXkIV47m2Je0mRjHoNq/PkYyWXNtxBRN6CFeAU2Gn OQE3hPrAIhlIC095z+mnOzdD/uOcmeAfTlhLdvESf19OrxEwBtW3yz0iN643T/qT7rA2 WlGFT1JREzbA4rDNFr7rhbIO53U1579vKQs8Z8duiEi4wQKCumjoUWEgLtSWC4elTXSM UH2g== X-Gm-Message-State: AOAM531Sx1nOSBcOU3Hv1npY0FokDHk5JRz7I4QXFJIGzK7o30uu1GTL EFIgI7gj9qFNV+Pu3HOAitHiA4UPhiwgFg== X-Google-Smtp-Source: ABdhPJycOZDtsWxnZ4q0meVcBoweptDaJ6eDYQphGYrjIQRX8eURSmdbLpdG9TAR4HE67T12pRClxA== X-Received: by 2002:a17:906:2ed1:: with SMTP id s17mr7139828eji.153.1613214652224; Sat, 13 Feb 2021 03:10:52 -0800 (PST) Received: from localhost.localdomain ([212.15.167.195]) by smtp.gmail.com with ESMTPSA id hr3sm7095100ejc.41.2021.02.13.03.10.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 03:10:51 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 13 Feb 2021 12:10:38 +0100 Message-Id: <20210213111038.14840-2-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210213111038.14840-1-onemda@gmail.com> References: <20210213111038.14840-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/2] avfilter/x86/vf_gblur: add postscale SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/x86/vf_gblur.asm | 46 +++++++++++++++++++++++++++++++++ libavfilter/x86/vf_gblur_init.c | 11 ++++++-- 2 files changed, 55 insertions(+), 2 deletions(-) diff --git a/libavfilter/x86/vf_gblur.asm b/libavfilter/x86/vf_gblur.asm index a25b1659f5..8fea6d2a61 100644 --- a/libavfilter/x86/vf_gblur.asm +++ b/libavfilter/x86/vf_gblur.asm @@ -183,3 +183,49 @@ HORIZ_SLICE INIT_XMM avx2 HORIZ_SLICE %endif + +%macro POSTSCALE_SLICE 0 +%if UNIX64 +cglobal postscale_slice, 2, 6, 4, ptr, length, postscale, min, max, x +%else +cglobal postscale_slice, 5, 6, 4, ptr, length, postscale, min, max, x +%endif + shl lengthd, 2 +%if WIN64 + SWAP 0, 2 + SWAP 1, 3 + SWAP 2, 4 +%endif + shufps xm0, xm0, 0 + shufps xm1, xm1, 0 + shufps xm2, xm2, 0 +%if cpuflag(avx2) + vinsertf128 m0, m0, xm0, 1 + vinsertf128 m1, m1, xm1, 1 + vinsertf128 m2, m2, xm2, 1 +%endif + xor xq, xq + + .loop: + movu m3, [ptrq + xq] + mulps m3, m0 + maxps m3, m1 + minps m3, m2 + movu [ptrq+xq], m3 + + add xq, mmsize + cmp xd, lengthd + jl .loop + + RET +%endmacro + +%if ARCH_X86_64 +INIT_XMM sse4 +POSTSCALE_SLICE + +%if HAVE_AVX_EXTERNAL +INIT_YMM avx2 +POSTSCALE_SLICE +%endif +%endif diff --git a/libavfilter/x86/vf_gblur_init.c b/libavfilter/x86/vf_gblur_init.c index e63e59fe23..7a9b40b0ad 100644 --- a/libavfilter/x86/vf_gblur_init.c +++ b/libavfilter/x86/vf_gblur_init.c @@ -27,14 +27,21 @@ void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, float nu, float bscale); void ff_horiz_slice_avx2(float *ptr, int width, int height, int steps, float nu, float bscale); +void ff_postscale_slice_sse4(float *ptr, int length, float postscale, float min, float max); +void ff_postscale_slice_avx2(float *ptr, int length, float postscale, float min, float max); + av_cold void ff_gblur_init_x86(GBlurContext *s) { #if ARCH_X86_64 int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_SSE4(cpu_flags)) + if (EXTERNAL_SSE4(cpu_flags)) { s->horiz_slice = ff_horiz_slice_sse4; - if (EXTERNAL_AVX2(cpu_flags)) + s->postscale_slice = ff_postscale_slice_sse4; + } + if (EXTERNAL_AVX2(cpu_flags)) { s->horiz_slice = ff_horiz_slice_avx2; + s->postscale_slice = ff_postscale_slice_avx2; + } #endif }