From patchwork Sun Feb 14 14:32:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 25615 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 0F2CF44A31F for ; Sun, 14 Feb 2021 16:38:00 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B891F689AB3; Sun, 14 Feb 2021 16:37:59 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 99AB1688134 for ; Sun, 14 Feb 2021 16:37:53 +0200 (EET) Received: by mail-lj1-f177.google.com with SMTP id x1so4692510ljj.11 for ; Sun, 14 Feb 2021 06:37:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=YKSKN5YPdIvnz8TsvOHK/FeJDBxYad4ko+8CLUzg348=; b=fQ/hUibf4S937tCEz/uo6ri7hmvsOAtbaNKPL+TyrEGViDK7jOPN7aVHZE7Tlxr+zT b6gClz9ZrjydX4CCHrBpKvYOeTfe7FUap6+o5pNkTnCBsKXJjpiz2kb8oQrVn/y8tbeF /jevE7I8+I76V6ZFEY37HoVzAHRNZsUuRthof4+zHNWoDb85UB1duR731LMdexByhB8i 3AVOG1ZI27ctWRFjnPVJDp4BX7rm+BAD96fn+S5nZmZ42b7tRubc4Vb50kE7cSkvi55p xYUCP9s9fXKkp//rLihirD+S2KaZD0vnDTjOH1HXANgEVzu3r6/55jxrYSLEY9R5alvK k4Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=YKSKN5YPdIvnz8TsvOHK/FeJDBxYad4ko+8CLUzg348=; b=BAnQ6EdPoOa315knjokgbpTAKPBcTNPpmE26GgL/8k5I4EWoWTdDZtK8R7gRe7M8B9 FGmi+pZ4/lBncbRB/+PCq0Kx8HrryiaOGDcJgbv4NiQT87k8PyHAhZLaThDqqawmNXCz IU1OqN2lY9BUqpK8KTj2vDdkRqvcBHmyufHWkFK6fnL31EzAJ/zCws684PA9aW42oIyF Duk27LWLSXH3kJ2YjPy53AkzX9itAFxe5gZDwaCx01rdGzNXP8Xyaw2e80T3Uc5i2n2Q kj+4uW208IHudy9Tm5si9nunI0yS8R5MlVdgSXxcrc6xjqrHzbW+4EECK4euHnhTOHS5 UfUQ== X-Gm-Message-State: AOAM531IAdcMYLRFBEpodJ8PBdarqN8joV+CaComMykdvFSlj/YUIKQl Co4HXcMiVE8Bp/XVs182NtNBpuGApZ5LFw== X-Google-Smtp-Source: ABdhPJzIFxob7qw8tXpZqfdv/4EgqG+iO/vLf7DUO4YyCTY42TUxiVF7+hmRE2DTsPpWqlDT8XFnVA== X-Received: by 2002:aa7:cac6:: with SMTP id l6mr11974008edt.357.1613313133981; Sun, 14 Feb 2021 06:32:13 -0800 (PST) Received: from localhost.localdomain ([94.250.162.225]) by smtp.gmail.com with ESMTPSA id ks13sm1915838ejb.69.2021.02.14.06.32.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Feb 2021 06:32:13 -0800 (PST) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sun, 14 Feb 2021 15:32:05 +0100 Message-Id: <20210214143205.9320-2-onemda@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210214143205.9320-1-onemda@gmail.com> References: <20210214143205.9320-1-onemda@gmail.com> Subject: [FFmpeg-devel] [PATCH 2/2] avfilter/x86/vf_gblur: add postscale SIMD X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: Paul B Mahol --- libavfilter/x86/vf_gblur.asm | 49 +++++++++++++++++++++++++++++++++ libavfilter/x86/vf_gblur_init.c | 17 ++++++++++-- 2 files changed, 63 insertions(+), 3 deletions(-) diff --git a/libavfilter/x86/vf_gblur.asm b/libavfilter/x86/vf_gblur.asm index a25b1659f5..8ccfbdc56b 100644 --- a/libavfilter/x86/vf_gblur.asm +++ b/libavfilter/x86/vf_gblur.asm @@ -183,3 +183,52 @@ HORIZ_SLICE INIT_XMM avx2 HORIZ_SLICE %endif + +%macro POSTSCALE_SLICE 0 +%if UNIX64 +cglobal postscale_slice, 2, 3, 4, ptr, length, x +%else +cglobal postscale_slice, 5, 6, 4, ptr, length, postscale, min, max, x +%endif + shl lengthd, 2 +%if WIN64 + SWAP 0, 2 + SWAP 1, 3 + SWAP 2, 4 +%endif +%if cpuflag(avx2) + vbroadcastss m0, xm0 + vbroadcastss m1, xm1 + vbroadcastss m2, xm2 +%else + shufps xm0, xm0, 0 + shufps xm1, xm1, 0 + shufps xm2, xm2, 0 +%endif + xor xq, xq + + .loop: +%if cpuflag(avx2) + mulps m3, m0, [ptrq + xq] +%else + movu m3, [ptrq + xq] + mulps m3, m0 +%endif + maxps m3, m1 + minps m3, m2 + movu [ptrq+xq], m3 + + add xq, mmsize + cmp xd, lengthd + jl .loop + + RET +%endmacro + +INIT_XMM sse +POSTSCALE_SLICE + +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +POSTSCALE_SLICE +%endif diff --git a/libavfilter/x86/vf_gblur_init.c b/libavfilter/x86/vf_gblur_init.c index e63e59fe23..9223cb797d 100644 --- a/libavfilter/x86/vf_gblur_init.c +++ b/libavfilter/x86/vf_gblur_init.c @@ -27,14 +27,25 @@ void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, float nu, float bscale); void ff_horiz_slice_avx2(float *ptr, int width, int height, int steps, float nu, float bscale); +void ff_postscale_slice_sse(float *ptr, int length, float postscale, float min, float max); +void ff_postscale_slice_avx2(float *ptr, int length, float postscale, float min, float max); + av_cold void ff_gblur_init_x86(GBlurContext *s) { -#if ARCH_X86_64 int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_SSE4(cpu_flags)) + if (EXTERNAL_SSE(cpu_flags)) { + s->postscale_slice = ff_postscale_slice_sse; + } + if (EXTERNAL_AVX2(cpu_flags)) { + s->postscale_slice = ff_postscale_slice_avx2; + } +#if ARCH_X86_64 + if (EXTERNAL_SSE4(cpu_flags)) { s->horiz_slice = ff_horiz_slice_sse4; - if (EXTERNAL_AVX2(cpu_flags)) + } + if (EXTERNAL_AVX2(cpu_flags)) { s->horiz_slice = ff_horiz_slice_avx2; + } #endif }