From patchwork Mon Aug 2 05:34:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29182 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6c0f:0:0:0:0:0 with SMTP id a15csp1282651ioh; Sun, 1 Aug 2021 22:35:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz24LJhZ94FdtPC3NFkNigOky0JUpcQ9DgLDq2nDgVy0B9CAthVC90iP2JVZgcTijfyorYc X-Received: by 2002:a17:906:3983:: with SMTP id h3mr13536137eje.249.1627882529706; Sun, 01 Aug 2021 22:35:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627882529; cv=none; d=google.com; s=arc-20160816; b=ChVR17aP3N0TCI8ZIjIYnDpUZ+5JnJrJDmcTV4/TVyNWpGUBdFMToFJy/xAeuCEAB3 s9fHXBzf46prQmXPDk74DYzvtKQ5VghJxN1Sb7AwbH/AX8caE0R+E3589PZ3OI6in4RH Z0CMMv/9QXW21fssFG+zYDIOn/CKbbD7SnWz5B5Pees9SfzD2k3p+RMpKjlytHqRpJRw toQmXpdEfUXtruX6w4Jerc3iAifWnjA3N7VjwhE1CB1epzkXh58LIeMxlC9jyELuHS+Z 7pAjjaUKSV59k6j2B9sIpu6LaOjmsIjw2vTy6cVHcBhFNfH/DR1b71Qk21WBj1XfkipO N9TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:delivered-to; bh=2ZavV0vTEI6RT3E9ZB2b895qmeC6VhmeMt47sGM8/tU=; b=L/mZZY1xKPExOO0MaJzi2+Yjye03dNE9zIb5Qan8Y9RkWNX3h0lzzc+6/KV4c2cCX2 UUOyoeStkC2gG6KCbZSuJ1phI/SLLbvDI1bUJ75eSHkWh6Ar2n00p7/WpkT+yPfGy20F wEQLp+M90k2+RFVXd2pstSpouYSG2QIfKgt9nFB0VqaU7wiwBBazxZBc7jORHC9QbOcu 9rnFTyL6n/poC70zDRrlPjwoKsTBE+eB6p91+5AjTr6zhyRXPA8/Mp3hzlwHwjvC5KWd lQObzbko34Yr/BNZi0hViZ2K66niJPqkV85Yb81S8BNKegB2uBQUoaHixPIPNJV0kOld XDkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qb29si6584000ejc.254.2021.08.01.22.35.28; Sun, 01 Aug 2021 22:35:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 517A168A57B; Mon, 2 Aug 2021 08:35:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BBED568A181 for ; Mon, 2 Aug 2021 08:35:16 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="213420916" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="213420916" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 22:35:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,287,1620716400"; d="scan'208";a="457815390" Received: from skl-e5.sh.intel.com ([10.239.43.106]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2021 22:35:13 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Aug 2021 13:34:35 +0800 Message-Id: <20210802053439.42828-1-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH 1/5] libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua , yanfei.cheng@intel.com MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: SZMPpfeNc+0D Co-authored-by: Cheng Yanfei Co-authored-by: Jin Jun Signed-off-by: Wu Jianhua --- libavfilter/x86/vf_gblur.asm | 21 ++++++++++++--------- libavfilter/x86/vf_gblur_init.c | 4 ++++ 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/libavfilter/x86/vf_gblur.asm b/libavfilter/x86/vf_gblur.asm index 4d84e6d011..276fe347f5 100644 --- a/libavfilter/x86/vf_gblur.asm +++ b/libavfilter/x86/vf_gblur.asm @@ -194,19 +194,17 @@ cglobal postscale_slice, 2, 2, 4, ptr, length, postscale, min, max VBROADCASTSS m1, minm VBROADCASTSS m2, maxm %elif WIN64 - SWAP 0, 2 - SWAP 1, 3 - VBROADCASTSS m0, xm0 - VBROADCASTSS m1, xm1 + VBROADCASTSS m0, xmm2 + VBROADCASTSS m1, xmm3 VBROADCASTSS m2, maxm -%else ; UNIX64 - VBROADCASTSS m0, xm0 - VBROADCASTSS m1, xm1 - VBROADCASTSS m2, xm2 +%else ; UNIX + VBROADCASTSS m0, xmm0 + VBROADCASTSS m1, xmm1 + VBROADCASTSS m2, xmm2 %endif .loop: -%if cpuflag(avx2) +%if cpuflag(avx2) || cpuflag(avx512) mulps m3, m0, [ptrq + lengthq] %else movu m3, [ptrq + lengthq] @@ -229,3 +227,8 @@ POSTSCALE_SLICE INIT_YMM avx2 POSTSCALE_SLICE %endif + +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +POSTSCALE_SLICE +%endif diff --git a/libavfilter/x86/vf_gblur_init.c b/libavfilter/x86/vf_gblur_init.c index d80fb46fe4..34aba4ca6e 100644 --- a/libavfilter/x86/vf_gblur_init.c +++ b/libavfilter/x86/vf_gblur_init.c @@ -29,6 +29,7 @@ void ff_horiz_slice_avx2(float *ptr, int width, int height, int steps, float nu, void ff_postscale_slice_sse(float *ptr, int length, float postscale, float min, float max); void ff_postscale_slice_avx2(float *ptr, int length, float postscale, float min, float max); +void ff_postscale_slice_avx512(float *ptr, int length, float postscale, float min, float max); av_cold void ff_gblur_init_x86(GBlurContext *s) { @@ -47,5 +48,8 @@ av_cold void ff_gblur_init_x86(GBlurContext *s) if (EXTERNAL_AVX2(cpu_flags)) { s->horiz_slice = ff_horiz_slice_avx2; } + if (EXTERNAL_AVX512(cpu_flags)) { + s->postscale_slice = ff_postscale_slice_avx512; + } #endif } From patchwork Mon Aug 2 05:34:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29179 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6c0f:0:0:0:0:0 with SMTP id a15csp1282754ioh; Sun, 1 Aug 2021 22:35:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyfV7ghTgEik2twLh7nm07XA5toksDBO9dJwiGLwVTSUxMkJ1VcVbcQ1yr9Vrq3oB8Hs6pw X-Received: by 2002:a17:906:3b97:: with SMTP id u23mr14018309ejf.437.1627882541118; Sun, 01 Aug 2021 22:35:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627882541; cv=none; d=google.com; s=arc-20160816; b=e9sTBHI660OVjsnf/d45pJfM7ONyt3M/MneWcCOhu3f1lqLL4EdoUBe6KdKulXimyT ENAYyT2byS6K4Ga6ThxF8GVNgatKuZB61oWeRkW7dAElI33yUdDu0o5kp3Eb3s5nERg4 g9OzwGFrNQ/NEAcwKGyqwe+JQ6JFfyqBh0rkzxazInnDbhe7VD8mgg+ghJbUP7ODbrFU 1jBNCu5/ic4xur8CYRGa4n5tcKApwr5XbIVQDnEgyCuTSqv7kCcw194CJK2YPrvgqQ5O slBoNmkc/xVmv3T/Ef984OKoe783llX3k9rz78XP6qWuLFhA5keUp+h1h0KCPTmhRedE Ga8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=9evhqIY03v5Io54szKy1y1BvuVwaxdQf1e/tT4cEMF4=; b=Rr7nJou+kw9xjcJ+LhVucJLsZOx9xxR6T53BvBm1UTXV7Mu5lOPXIKrrEtM9TQWYOt 5t5InUaVbDT+f+qUCWg7b+7fEKfXIZ/4xW/YMwQlOWzIEmtlQCxuileXhwfzj4ZRniXn rMkVc7A0xLDwemsnkNf0EcA9vcY361QvrEL53RDuhSwntPnKbYs2wUW6qRctNylcuQhl 8vBcCq86vAp6k1I9Wnt9kr/u2E7OFgAq7xZR0u80h5LIg2S99Z8nX9mKzLs2SzZ5FEoP yhbdp/SpcFHW1Gr8K/NRQ+kQh8skWZHfAzZ0dpsx2LKtBo6//fDPtRyKdS3NImg4dJim FUJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hs32si8843959ejc.619.2021.08.01.22.35.40; Sun, 01 Aug 2021 22:35:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7528968A6B6; Mon, 2 Aug 2021 08:35:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C572A689CB4 for ; Mon, 2 Aug 2021 08:35:18 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="213420920" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="213420920" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 22:35:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,287,1620716400"; d="scan'208";a="457815395" Received: from skl-e5.sh.intel.com ([10.239.43.106]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2021 22:35:14 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Aug 2021 13:34:36 +0800 Message-Id: <20210802053439.42828-2-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210802053439.42828-1-jianhua.wu@intel.com> References: <20210802053439.42828-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 2/5] libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua , yanfei.cheng@intel.com MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: zuW/kWHGMWJJ The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data (fps): ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei Co-authored-by: Jin Jun --- libavfilter/gblur.h | 2 + libavfilter/vf_gblur.c | 24 ++-- libavfilter/x86/vf_gblur.asm | 187 ++++++++++++++++++++++++++++++++ libavfilter/x86/vf_gblur_init.c | 7 ++ 4 files changed, 212 insertions(+), 8 deletions(-) diff --git a/libavfilter/gblur.h b/libavfilter/gblur.h index dce50671f6..367575a6db 100644 --- a/libavfilter/gblur.h +++ b/libavfilter/gblur.h @@ -50,6 +50,8 @@ typedef struct GBlurContext { float nuV; int nb_planes; void (*horiz_slice)(float *buffer, int width, int height, int steps, float nu, float bscale); + void (*verti_slice)(float *buffer, int width, int height, int slice_start, int slice_end, int steps, + float nu, float bscale); void (*postscale_slice)(float *buffer, int length, float postscale, float min, float max); } GBlurContext; diff --git a/libavfilter/vf_gblur.c b/libavfilter/vf_gblur.c index 3f61275658..de7ed82d49 100644 --- a/libavfilter/vf_gblur.c +++ b/libavfilter/vf_gblur.c @@ -138,6 +138,19 @@ static void do_vertical_columns(float *buffer, int width, int height, } } +static void verti_slice_c(float *buffer, int width, int height, + int slice_start, int slice_end, int steps, + float nu, float boundaryscale) +{ + int aligned_end = slice_start + (((slice_end - slice_start) >> 3) << 3); + /* Filter vertically along columns (process 8 columns in each step) */ + do_vertical_columns(buffer, width, height, slice_start, aligned_end, + steps, nu, boundaryscale, 8); + /* Filter un-aligned columns one by one */ + do_vertical_columns(buffer, width, height, aligned_end, slice_end, + steps, nu, boundaryscale, 1); +} + static int filter_vertically(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) { GBlurContext *s = ctx->priv; @@ -150,16 +163,10 @@ static int filter_vertically(AVFilterContext *ctx, void *arg, int jobnr, int nb_ const int steps = s->steps; const float nu = s->nuV; float *buffer = s->buffer; - int aligned_end; - aligned_end = slice_start + (((slice_end - slice_start) >> 3) << 3); - /* Filter vertically along columns (process 8 columns in each step) */ - do_vertical_columns(buffer, width, height, slice_start, aligned_end, - steps, nu, boundaryscale, 8); + s->verti_slice(buffer, width, height, slice_start, slice_end, + steps, nu, boundaryscale); - /* Filter un-aligned columns one by one */ - do_vertical_columns(buffer, width, height, aligned_end, slice_end, - steps, nu, boundaryscale, 1); return 0; } @@ -233,6 +240,7 @@ static int query_formats(AVFilterContext *ctx) void ff_gblur_init(GBlurContext *s) { s->horiz_slice = horiz_slice_c; + s->verti_slice = verti_slice_c; s->postscale_slice = postscale_c; if (ARCH_X86) ff_gblur_init_x86(s); diff --git a/libavfilter/x86/vf_gblur.asm b/libavfilter/x86/vf_gblur.asm index 276fe347f5..74174fdc43 100644 --- a/libavfilter/x86/vf_gblur.asm +++ b/libavfilter/x86/vf_gblur.asm @@ -22,6 +22,43 @@ SECTION .text +%xdefine AVX2_MMSIZE 32 +%xdefine AVX512_MMSIZE 64 + +%macro MOVSXDIFNIDN 1-* + %rep %0 + movsxdifnidn %1q, %1d + %rotate 1 + %endrep +%endmacro + +%macro PUSH_MASK 5 +%if mmsize == AVX2_MMSIZE + %assign %%n mmsize/4 + %assign %%i 0 + %rep %%n + mov %4, %3 + and %4, 1 + neg %4 + mov dword [%5 + %%i*4], %4 + sar %3, 1 + %assign %%i %%i+1 + %endrep + movu %1, [%5] +%else + kmovd %2, %3 +%endif +%endmacro + +%macro VMASKMOVPS 4 +%if mmsize == AVX2_MMSIZE + vpmaskmovd %1, %3, %2 +%else + kmovw k7, %4 + vmovups %1{k7}, %2 +%endif +%endmacro + ; void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, ; float nu, float bscale) @@ -232,3 +269,153 @@ POSTSCALE_SLICE INIT_ZMM avx512 POSTSCALE_SLICE %endif + + +;******************************************************************************* +; void ff_verti_slice(float *buffer, int width, int height, int column_begin, +; int column_end, int steps, float nu, float bscale); +;******************************************************************************* +%macro VERTI_SLICE 0 +%if UNIX64 +cglobal verti_slice, 6, 12, 9, 0-mmsize*2, buffer, width, height, cbegin, cend, \ + steps, x, y, cwidth, step, ptr, stride +%else +cglobal verti_slice, 6, 12, 9, 0-mmsize*2, buffer, width, height, cbegin, cend, \ + steps, nu, bscale, x, y, cwidth, step, \ + ptr, stride +%endif +%assign cols mmsize/4 +%if WIN64 + VBROADCASTSS m0, num + VBROADCASTSS m1, bscalem + DEFINE_ARGS buffer, width, height, cbegin, cend, \ + steps, x, y, cwidth, step, ptr, stride + MOVSXDIFNIDN width, height, cbegin, cend, steps +%else + VBROADCASTSS m0, xmm0 ; nu + VBROADCASTSS m1, xmm1 ; bscale +%endif + mov cwidthq, cendq + sub cwidthq, cbeginq + lea strideq, [widthq * 4] + + xor xq, xq ; x = 0 + cmp cwidthq, cols + jl .x_scalar + cmp cwidthq, 0x0 + je .end_scalar + + sub cwidthq, cols +.loop_x: + xor stepq, stepq + .loop_step: + ; ptr = buffer + x + column_begin; + lea ptrq, [xq + cbeginq] + lea ptrq, [bufferq + ptrq*4] + + ; ptr[15:0] *= bcale; + movu m2, [ptrq] + mulps m2, m1 + movu [ptrq], m2 + + ; Filter downwards + mov yq, 1 + .loop_y_down: + add ptrq, strideq ; ptrq += width + movu m3, [ptrq] + FMULADD_PS m2, m2, m0, m3, m2 + movu [ptrq], m2 + + inc yq + cmp yq, heightq + jl .loop_y_down + + mulps m2, m1 + movu [ptrq], m2 + + ; Filter upwards + dec yq + .loop_y_up: + sub ptrq, strideq + movu m3, [ptrq] + FMULADD_PS m2, m2, m0, m3, m2 + movu [ptrq], m2 + + dec yq + cmp yq, 0 + jg .loop_y_up + + inc stepq + cmp stepq, stepsq + jl .loop_step + + add xq, cols + cmp xq, cwidthq + jle .loop_x + + add cwidthq, cols + cmp xq, cwidthq + jge .end_scalar + +.x_scalar: + xor stepq, stepq + mov qword [rsp + 0x10], xq + sub cwidthq, xq + mov xq, 1 + shlx cwidthq, xq, cwidthq + sub cwidthq, 1 + PUSH_MASK m4, k1, cwidthd, xd, rsp + 0x20 + mov xq, qword [rsp + 0x10] + + .loop_step_scalar: + lea ptrq, [xq + cbeginq] + lea ptrq, [bufferq + ptrq*4] + + VMASKMOVPS m2, [ptrq], m4, k1 + mulps m2, m1 + VMASKMOVPS [ptrq], m2, m4, k1 + + ; Filter downwards + mov yq, 1 + .x_scalar_loop_y_down: + add ptrq, strideq + VMASKMOVPS m3, [ptrq], m4, k1 + FMULADD_PS m2, m2, m0, m3, m2 + VMASKMOVPS [ptrq], m2, m4, k1 + + inc yq + cmp yq, heightq + jl .x_scalar_loop_y_down + + mulps m2, m1 + VMASKMOVPS [ptrq], m2, m4, k1 + + ; Filter upwards + dec yq + .x_scalar_loop_y_up: + sub ptrq, strideq + VMASKMOVPS m3, [ptrq], m4, k1 + FMULADD_PS m2, m2, m0, m3, m2 + VMASKMOVPS [ptrq], m2, m4, k1 + + dec yq + cmp yq, 0 + jg .x_scalar_loop_y_up + + inc stepq + cmp stepq, stepsq + jl .loop_step_scalar + +.end_scalar: + RET +%endmacro + +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +VERTI_SLICE +%endif + +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +VERTI_SLICE +%endif diff --git a/libavfilter/x86/vf_gblur_init.c b/libavfilter/x86/vf_gblur_init.c index 34aba4ca6e..3e173410c2 100644 --- a/libavfilter/x86/vf_gblur_init.c +++ b/libavfilter/x86/vf_gblur_init.c @@ -31,6 +31,11 @@ void ff_postscale_slice_sse(float *ptr, int length, float postscale, float min, void ff_postscale_slice_avx2(float *ptr, int length, float postscale, float min, float max); void ff_postscale_slice_avx512(float *ptr, int length, float postscale, float min, float max); +void ff_verti_slice_avx2(float *buffer, int width, int height, int column_begin, int column_end, + int steps, float nu, float bscale); +void ff_verti_slice_avx512(float *buffer, int width, int height, int column_begin, int column_end, + int steps, float nu, float bscale); + av_cold void ff_gblur_init_x86(GBlurContext *s) { int cpu_flags = av_get_cpu_flags(); @@ -47,9 +52,11 @@ av_cold void ff_gblur_init_x86(GBlurContext *s) } if (EXTERNAL_AVX2(cpu_flags)) { s->horiz_slice = ff_horiz_slice_avx2; + s->verti_slice = ff_verti_slice_avx2; } if (EXTERNAL_AVX512(cpu_flags)) { s->postscale_slice = ff_postscale_slice_avx512; + s->verti_slice = ff_verti_slice_avx512; } #endif } From patchwork Mon Aug 2 05:34:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29180 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6c0f:0:0:0:0:0 with SMTP id a15csp1282850ioh; Sun, 1 Aug 2021 22:35:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyotdO14XbkKLXvzRL7guehG2hrGtE8UAwfyqKDrfyjIAzLOL0hw7AmIXWF5+dF6SZUk3P6 X-Received: by 2002:a05:6402:48f:: with SMTP id k15mr16908637edv.262.1627882549658; Sun, 01 Aug 2021 22:35:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627882549; cv=none; d=google.com; s=arc-20160816; b=Tq5Smlhi1kv1lgFI2KcCf0UfcMrb3PvKOGth31Pqy5FT2AIurENIjrirP8t2bVf/lG CePAG+1Gv9ni694/Ggbt1DNkObUOvG1ZO1VmEh3DPuD3/cVrbbNPzvLERPbgn8dgo6Ba 0I2xevkF986GP9RGsJg3YutGFm5sRokIlwlci40vH0w4TWJC/5PmmRuiIpKdPLqZqklK Q6aXEM6uDZYlrKzUa+6/ETAVUP7XczMZ1w38GDUideKo8/+GpEAKi82MQ8lXwS0AAXPN NZAdLA6jHBnJXyNGS3WprXLAjIIFihiCH0MCqf85Vh/+MewlRIuXNV5RwCzWnMHzV9hf MZ8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=1lr716rpiGKbPH/ely7DXFHTYzD5uqqbX2mMcfuCJ/E=; b=ZszCvbi/0ALL/qX1/znSrk2vnwRre68YhAM5Nur9y74YblngR4UIxm8chQ+VPxjrnD uAylH45PwM7/O6bY9H3mteBpToTafi3OgXzxGWKvshmMNCj0O92KqTfvznxoxbKFjj6e Mn7sR2yLAo4Kc7cAxG3hHqmvIKTgEDbHJvme5C2uRyKb0N/haycIXD65MZXUNSzUNAnk oqgQ9oRdtVRP8efOfviAzCYcS/MDnV8u5S18E0B9aMHJcmKRiy69Pl5AW0U9Q6M20DFh sKG3XmMKrs+jsDBGhEaXo/5qSfBuH/jvk9d+1WKF2pyJpT3nd0dRkYKnaPV4j01Qmkce k9Iw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f23si3388263eds.64.2021.08.01.22.35.49; Sun, 01 Aug 2021 22:35:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 84B9668A2DE; Mon, 2 Aug 2021 08:35:28 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2CEED68A46C for ; Mon, 2 Aug 2021 08:35:19 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="213420923" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="213420923" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 22:35:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,287,1620716400"; d="scan'208";a="457815404" Received: from skl-e5.sh.intel.com ([10.239.43.106]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2021 22:35:15 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Aug 2021 13:34:37 +0800 Message-Id: <20210802053439.42828-3-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210802053439.42828-1-jianhua.wu@intel.com> References: <20210802053439.42828-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 3/5] tests/checkasm/vf_gblur.c: add check_verti_slice() for unit test X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua , yanfei.cheng@intel.com MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Ca1vGE9Ygkkp Co-authored-by: Cheng Yanfei Co-authored-by: Jin Jun Signed-off-by: Wu Jianhua --- tests/checkasm/vf_gblur.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tests/checkasm/vf_gblur.c b/tests/checkasm/vf_gblur.c index b9fe2f9a36..0fac37b6be 100644 --- a/tests/checkasm/vf_gblur.c +++ b/tests/checkasm/vf_gblur.c @@ -49,6 +49,22 @@ static void check_horiz_slice(float *dst_ref, float *dst_new) bench_new(dst_new, WIDTH, HEIGHT, 1, nu, bscale); } +static void check_verti_slice(float *dst_ref, float *dst_new) +{ + int steps = 2; + float nu = 0.101f; + float bscale = 1.112f; + + declare_func(void, float *buffer, int width, int height, int column_begin, + int column_end, int steps, float nu, float bscale); + call_ref(dst_ref, WIDTH, HEIGHT, 0, WIDTH, steps, nu, bscale); + call_new(dst_new, WIDTH, HEIGHT, 0, WIDTH, steps, nu, bscale); + if (!float_near_abs_eps_array(dst_ref, dst_new, 0.01f, PIXELS)) { + fail(); + } + bench_new(dst_new, WIDTH, HEIGHT, 0, WIDTH, 1, nu, bscale); +} + static void check_postscale_slice(float *dst_ref, float *dst_new) { float postscale = 0.0603f; @@ -85,6 +101,13 @@ void checkasm_check_vf_gblur(void) } report("postscale_slice"); + randomize_buffers(dst_ref, PIXELS); + memcpy(dst_new, dst_ref, BUF_SIZE); + if (check_func(s.verti_slice, "verti_slice")) { + check_verti_slice(dst_ref, dst_new); + } + report("verti_slice"); + av_freep(&dst_ref); av_freep(&dst_new); } From patchwork Mon Aug 2 05:34:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29181 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6c0f:0:0:0:0:0 with SMTP id a15csp1282943ioh; Sun, 1 Aug 2021 22:35:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxG8TU1jbHVv0nAypz1a3qfpY9H+BIKFzxj90VBLS0Xsy4eJJSCqG3mtaQgxejyG9ITZ/uI X-Received: by 2002:a17:906:86c4:: with SMTP id j4mr13964466ejy.431.1627882559101; Sun, 01 Aug 2021 22:35:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627882559; cv=none; d=google.com; s=arc-20160816; b=jp5kraDDprUnXRKUFfCiYONxN2SjrkIcG0EwATZ2Di9C6csrZidUnl5XCd9hk5wChJ xJc9rx/rrRq+HdAttjUD/IS/pnr9rXmRvgwzW52k2XFyjvqj+URQqkUpZeQ3c/3job+4 HXwvR2eqeOVzUv7mMKFfEaVYJN2uw6/xfpRXY2dgLy196WD5eQAkpJO/7TrVusAqpI/f rt3/afCekNBw09/EgpSAr0fXdIgtnFgCjfyciCzbmYaVYkv8Z47Mf/Mh4Rn2orQbEQyE ckc4SeZ3SO9rnZq6haPTyE8MON/M6CAhdx7NNapjH0HtS7w3kQiwjP1cslJjFXaL6iTF vGDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=WHdHFIfK6qoq41Jqfm2Fy5Q1K4qtgU6BHT2iS0GUAlE=; b=NtV2/DCraTRIe9jQGj8yzvdjHkQ/t4UPNttzuoX17fxqsP78BUbK3r4ehCng81mKI+ MZ1Jjq+9tcdj3YM8g+Y05MBmCoduzEBnh2UFsDVd+vETiGNynMefHr2v8nU/eo4F6nPv 5kjNQMGA17x/opPSN+LonBHjLyUVuAFzIqjkAaAOXNURsjqk5Xld0E8P1Zr6gi3tAo/Z 9UywYxSVWHaCVaxGU9UomO1DjdKTWwG0uk/zOSX2pYXVjKA3ysd5sOo0oR6zirAFbOLT wSmQMxHZtf7bSWpx8ox5ltaIQrQmZc6Pd6Bakx252/7H9Z2K6t4CzXGiY5o+a7/RL3Lc ZuqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z26si9032777eja.646.2021.08.01.22.35.58; Sun, 01 Aug 2021 22:35:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 956A068A74F; Mon, 2 Aug 2021 08:35:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CB5EF68A701 for ; Mon, 2 Aug 2021 08:35:22 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="213420928" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="213420928" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 22:35:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,287,1620716400"; d="scan'208";a="457815411" Received: from skl-e5.sh.intel.com ([10.239.43.106]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2021 22:35:17 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Aug 2021 13:34:38 +0800 Message-Id: <20210802053439.42828-4-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210802053439.42828-1-jianhua.wu@intel.com> References: <20210802053439.42828-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 4/5] libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512() X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua , yanfei.cheng@intel.com MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1cpv6BW2UZE0 We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data (fps): ff_verti_slice_avx2(old): 109.89 ff_verti_slice_avx2(new): 666.67 ff_verti_slice_avx512: 1000 Co-authored-by: Cheng Yanfei Co-authored-by: Jin Jun Signed-off-by: Wu Jianhua --- libavfilter/gblur.h | 4 +- libavfilter/vf_gblur.c | 11 +- libavfilter/x86/vf_gblur.asm | 579 +++++++++++++++++++++++++++++++- libavfilter/x86/vf_gblur_init.c | 17 +- 4 files changed, 601 insertions(+), 10 deletions(-) diff --git a/libavfilter/gblur.h b/libavfilter/gblur.h index 367575a6db..3a66984b06 100644 --- a/libavfilter/gblur.h +++ b/libavfilter/gblur.h @@ -39,9 +39,11 @@ typedef struct GBlurContext { int flt; int depth; + int stride; int planewidth[4]; int planeheight[4]; float *buffer; + float *localbuf; ///< temporary buffer for horiz_slice. NULL if not used float boundaryscale; float boundaryscaleV; float postscale; @@ -49,7 +51,7 @@ typedef struct GBlurContext { float nu; float nuV; int nb_planes; - void (*horiz_slice)(float *buffer, int width, int height, int steps, float nu, float bscale); + void (*horiz_slice)(float *buffer, int width, int height, int steps, float nu, float bscale, float *localbuf); void (*verti_slice)(float *buffer, int width, int height, int slice_start, int slice_end, int steps, float nu, float bscale); void (*postscale_slice)(float *buffer, int length, float postscale, float min, float max); diff --git a/libavfilter/vf_gblur.c b/libavfilter/vf_gblur.c index de7ed82d49..0768fe12e1 100644 --- a/libavfilter/vf_gblur.c +++ b/libavfilter/vf_gblur.c @@ -64,7 +64,7 @@ static void postscale_c(float *buffer, int length, } static void horiz_slice_c(float *buffer, int width, int height, int steps, - float nu, float bscale) + float nu, float bscale, float *localbuf) { int step, x, y; float *ptr; @@ -97,9 +97,13 @@ static int filter_horizontally(AVFilterContext *ctx, void *arg, int jobnr, int n const int steps = s->steps; const float nu = s->nu; float *buffer = s->buffer; + float *localbuf = NULL; + + if (s->localbuf) + localbuf = s->localbuf + s->stride * width * slice_start; s->horiz_slice(buffer + width * slice_start, width, slice_end - slice_start, - steps, nu, boundaryscale); + steps, nu, boundaryscale, localbuf); emms_c(); return 0; } @@ -239,6 +243,7 @@ static int query_formats(AVFilterContext *ctx) void ff_gblur_init(GBlurContext *s) { + s->localbuf = NULL; s->horiz_slice = horiz_slice_c; s->verti_slice = verti_slice_c; s->postscale_slice = postscale_c; @@ -381,6 +386,8 @@ static av_cold void uninit(AVFilterContext *ctx) GBlurContext *s = ctx->priv; av_freep(&s->buffer); + if (s->localbuf) + av_free(s->localbuf); } static const AVFilterPad gblur_inputs[] = { diff --git a/libavfilter/x86/vf_gblur.asm b/libavfilter/x86/vf_gblur.asm index 74174fdc43..6c14efee12 100644 --- a/libavfilter/x86/vf_gblur.asm +++ b/libavfilter/x86/vf_gblur.asm @@ -20,6 +20,14 @@ %include "libavutil/x86/x86util.asm" +SECTION .data + +gblur_transpose_16x16_indices1: dq 2, 3, 0, 1, 6, 7, 4, 5 +gblur_transpose_16x16_indices2: dq 1, 0, 3, 2, 5, 4, 7, 6 +gblur_transpose_16x16_indices3: dd 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 +gblur_transpose_16x16_mask: dw 0xcc, 0x33, 0xaa, 0x55, 0xaaaa, 0x5555 +gblur_vindex_width: dd 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 + SECTION .text %xdefine AVX2_MMSIZE 32 @@ -32,6 +40,29 @@ SECTION .text %endrep %endmacro +%macro KXNOR 2-* +%if mmsize == AVX512_MMSIZE + kxnorw %2, %2, %2 +%else + %if %0 == 3 + mov %3, -1 + %else + vpcmpeqd %1, %1, %1 + %endif +%endif +%endmacro + +%macro KMOVW 2-4 +%if mmsize == AVX2_MMSIZE && %0 == 4 + mova %1, %2 +%elif mmsize == AVX512_MMSIZE + %if %0 == 4 + %rotate 2 + %endif + kmovw %1, %2 +%endif +%endmacro + %macro PUSH_MASK 5 %if mmsize == AVX2_MMSIZE %assign %%n mmsize/4 @@ -59,15 +90,546 @@ SECTION .text %endif %endmacro -; void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, -; float nu, float bscale) +%macro VGATHERDPS 4 +%if mmsize == AVX2_MMSIZE + vgatherdps %1, %2, %3 +%else + vgatherdps %1{%4}, %2 +%endif +%endmacro + +%macro VSCATTERDPS128 7 + %rep 4 + mov %7, %6 + and %7, 1 + cmp %7, 0 + je %%end_scatter + movss [%2 + %3*%4], xm%1 + vpshufd m%1, m%1, 0x39 + add %3, %5 + sar %6, 1 + %endrep + %%end_scatter: +%endmacro + +; %1=register index +; %2=base address %3=vindex +; %4=scale %5=width +; %6=mask %7=tmp +; m15=reserved +%macro VSCATTERDPS256 7 + mova m15, m%1 + xor %3, %3 + VSCATTERDPS128 15, %2, %3, %4, %5, %6, %7 + vextractf128 xm15, m%1, 1 + VSCATTERDPS128 15, %2, %3, %4, %5, %6, %7 +%endmacro + +; %1=base address %2=avx2 vindex +; %3=avx512 vindex %4=avx2 mask +; %5=avx512 mask %6=register index +; %7=width %8-*=tmp +%macro VSCATTERDPS 8-* +%if mmsize == AVX2_MMSIZE + %if %0 == 9 + mov %9, %4 + VSCATTERDPS256 %6, %1, %2, 4, %7, %9, %8 + %else + VSCATTERDPS256 %6, %1, %2, 4, %7, %4, %8 + %endif +%else + vscatterdps [%1 + %3*4]{%5}, m%6 +%endif +%endmacro + +%macro INIT_WORD_MASK 1-* + %assign %%i 0 + %rep %0 + kmovw %1, [gblur_transpose_16x16_mask + %%i * 2] + %assign %%i %%i+1 + %rotate 1 + %endrep +%endmacro + +%macro INIT_INDICES 1-* + %assign %%i 1 + %rep %0 + movu %1, [gblur_transpose_16x16_indices %+ %%i] + %assign %%i %%i+1 + %rotate 1 + %endrep +%endmacro + +%assign stack_offset 0 +%macro PUSH_MM 1 +%if mmsize == AVX2_MMSIZE + movu [rsp + stack_offset], %1 + %assign stack_offset stack_offset+mmsize +%endif +%endmacro + +%macro POP_MM 1 +%if mmsize == AVX2_MMSIZE + %assign stack_offset stack_offset-mmsize + movu %1, [rsp + stack_offset] +%endif +%endmacro + +%macro READ_LOCAL_BUFFER 1 + %if mmsize == AVX512_MMSIZE + %assign %%i 19 + %else + %assign %%i 9 + %endif + %assign %%j %%i-1 + %assign %%k %1-1 + %xdefine %%m m %+ %%i + mova %%m, m3 + FMULADD_PS %%m, %%m, m0, [localbufq + %%k * mmsize], %%m + %assign %%k %%k-1 + %rep %1-1 + %xdefine %%m m %+ %%j + mova %%m, m %+ %%i + FMULADD_PS %%m, %%m, m0, [localbufq + %%k * mmsize], %%m + %assign %%i %%i-1 + %assign %%j %%j-1 + %assign %%k %%k-1 + %endrep + %if mmsize == AVX512_MMSIZE + mova m3, m %+ %%i + %endif +%endmacro + +%macro FMADD_WRITE 4 + FMULADD_PS %1, %1, %2, %3, %1 + mova %4, %1 +%endmacro + +%macro WRITE_LOCAL_BUFFER_INTERNAL 8-16 + %assign %%i 0 + %rep %0 + FMADD_WRITE m3, m0, m %+ %1, [localbufq + %%i * mmsize] + %assign %%i %%i+1 + %rotate 1 + %endrep +%endmacro + +%macro GATHERPS 1 + %if mmsize == AVX512_MMSIZE + %assign %%i 4 + %else + %assign %%i 2 + %endif + movu m %+ %%i, [ptrq] + mov strideq, widthq + %assign %%i %%i+1 + %rep %1-2 + movu m %+ %%i, [ptrq + strideq*4] + add strideq, widthq + %assign %%i %%i+1 + %endrep + movu m %+ %%i, [ptrq + strideq*4] +%endmacro + +%macro SCATTERPS_INTERNAL 8-16 + movu [ptrq + strideq*0], m %+ %1 + mov strideq, widthq + %rotate 1 + %rep %0-2 + movu [ptrq + strideq*4], m %+ %1 + add strideq, widthq + %rotate 1 + %endrep + movu [ptrq + strideq*4], m %+ %1 +%endmacro + +%macro BATCH_INSERT64X4 4-* + %assign %%imm8 %1 + %rotate 1 + %rep (%0-1)/3 + vinserti64x4 m%1, m%2, ym%3, %%imm8 + %rotate 3 + %endrep +%endmacro + +%macro BATCH_EXTRACT_INSERT 2-* + %assign %%imm8 %1 + %rotate 1 + %rep (%0-1)/2 + vextractf64x4 ym%1, m%1, %%imm8 + vextractf64x4 ym%2, m%2, %%imm8 + vinserti64x4 m%1, m%1, ym%2, %%imm8 + %rotate 2 + %endrep +%endmacro + +%macro BATCH_MOVE 2-* + %rep %0/2 + mova m%1, m%2 + %rotate 2 + %endrep +%endmacro + +%macro BATCH_PERMUTE 3-* + %xdefine %%decorator %1 + %xdefine %%mask %2 + %assign %%index %3 + %rotate 3 + %rep (%0-3)/2 + vperm %+ %%decorator m%1{%%mask}, m %+ %%index, m%2 + %rotate 2 + %endrep +%endmacro +; input : m3-m19 +; output: m8 m5 m9 m15 m16 m7 m17 m27 m24 m21 m25 m19 m12 m23 m13 m11 +%macro TRANSPOSE_16X16_AVX512 0 + BATCH_INSERT64X4 0x1, 20,4,12, 21,5,13, 22,6,14, 23,7,15 + BATCH_INSERT64X4 0x1, 24,8,16, 25,9,17, 26,10,18, 27,11,19 + + BATCH_EXTRACT_INSERT 0x1, 4,12, 5,13, 6,14, 7,15 + BATCH_EXTRACT_INSERT 0x1, 8,16, 9,17, 10,18, 11,19 + + BATCH_MOVE 12,20, 13,21, 14,22, 15,23 + BATCH_PERMUTE q, k6, 28, 12,24, 13,25, 14,26, 15,27 + BATCH_PERMUTE q, k5, 28, 24,20, 25,21, 26,22, 27,23 + + BATCH_MOVE 16,4, 17,5, 18,6, 19,7 + BATCH_PERMUTE q, k6, 28, 16,8, 17,9, 18,10, 19,11 + BATCH_PERMUTE q, k5, 28, 8,4, 9,5, 10,6, 11,7 + + BATCH_MOVE 4,12, 5,13, 6,24, 7,25 + BATCH_MOVE 20,16, 21,17, 22,8, 23,9 + + BATCH_PERMUTE q, k4, 29, 4,14, 5,15, 6,26, 7,27 + BATCH_PERMUTE q, k3, 29, 14,12, 15,13, 26,24, 27,25 + BATCH_PERMUTE q, k4, 29, 20,18, 21,19, 22,10, 23,11 + BATCH_PERMUTE q, k3, 29, 18,16, 19,17, 10,8, 11,9 + + BATCH_MOVE 8,4, 9,14, 16,6, 17,26 + BATCH_MOVE 24,20, 25,18, 12,22, 13,10 + + BATCH_PERMUTE d, k2, 30, 8,5, 9,15, 16,7, 17,27 + BATCH_PERMUTE d, k1, 30, 5,4, 15,14, 7,6, 27,26 + BATCH_PERMUTE d, k2, 30, 24,21, 25,19, 12,23, 13,11 + BATCH_PERMUTE d, k1, 30, 21,20, 19,18, 23,22, 11,10 +%endmacro + +%macro INSERT_UNPACK 8 + vinsertf128 m%5, m%1, xm%3, 0x1 + vinsertf128 m%6, m%2, xm%4, 0x1 + vunpcklpd m%7, m%5, m%6 + vunpckhpd m%8, m%5, m%6 +%endmacro + +%macro SHUFFLE 4 + vshufps m%3, m%1, m%2, 0x88 + vshufps m%4, m%1, m%2, 0xDD + mova m%1, m%3 + mova m%2, m%4 +%endmacro + +%macro EXTRACT_INSERT_UNPACK 6 + vextractf128 xm%1, m%1, 0x1 + vextractf128 xm%2, m%2, 0x1 + vinsertf128 m%3, m%3, xm%1, 0x0 + vinsertf128 m%4, m%4, xm%2, 0x0 + vunpcklpd m%5, m%3, m%4 + vunpckhpd m%6, m%3, m%4 +%endmacro + +; Transpose 8x8 AVX2 +; Limit the number ym# register to 16 for compatibility +; Used up registers instead of using stack memory +; Input: m2-m9 +; Output: m12, m14, m13, m15, m8, m10, m9, m11 +%macro TRANSPOSE_8X8_AVX2 0 + INSERT_UNPACK 2, 3, 6, 7, 10, 11, 12, 13 + INSERT_UNPACK 4, 5, 8, 9, 10, 11, 14, 15 + + SHUFFLE 12, 14, 10, 11 + SHUFFLE 13, 15, 10, 11 + + EXTRACT_INSERT_UNPACK 4, 5, 8, 9, 10, 11 + EXTRACT_INSERT_UNPACK 2, 3, 6, 7, 8, 9 + + SHUFFLE 8, 10, 6, 7 + SHUFFLE 9, 11, 6, 7 +%endmacro + +%macro TRANSPOSE 0 + %if cpuflag(avx512) + TRANSPOSE_16X16_AVX512 + %elif cpuflag(avx2) + TRANSPOSE_8X8_AVX2 + %endif +%endmacro + +%macro WRITE_LOCAL_BUFFER 0 + %if cpuflag(avx512) + WRITE_LOCAL_BUFFER_INTERNAL 8, 5, 9, 15, 16, 7, 17, 27, \ + 24, 21, 25, 19, 12, 23, 13, 11 + %elif cpuflag(avx2) + WRITE_LOCAL_BUFFER_INTERNAL 12, 14, 13, 15, 8, 10, 9, 11 + %endif +%endmacro + +%macro SCATTERPS 0 + %if cpuflag(avx512) + SCATTERPS_INTERNAL 8, 5, 9, 15, 16, 7, 17, 27, \ + 24, 21, 25, 19, 12, 23, 13, 11 + %elif cpuflag(avx2) + SCATTERPS_INTERNAL 12, 14, 13, 15, 8, 10, 9, 11 + %endif +%endmacro + +%macro OPTIMIZED_LOOP_STEP 0 + lea stepd, [stepsd - 1] + cmp stepd, 0 + jle %%bscale_scalar +%%loop_step: + sub localbufq, mmsize + mulps m3, m1 + movu [localbufq], m3 + + ; Filter leftwards + lea xq, [widthq - 1] + %%loop_step_x_back: + sub localbufq, mmsize + FMULADD_PS m3, m3, m0, [localbufq], m3 + movu [localbufq], m3 + + dec xq + cmp xq, 0 + jg %%loop_step_x_back + + ; Filter rightwards + mulps m3, m1 + movu [localbufq], m3 + add localbufq, mmsize + + lea xq, [widthq - 1] + %%loop_step_x: + FMULADD_PS m3, m3, m0, [localbufq], m3 + movu [localbufq], m3 + add localbufq, mmsize + + dec xq + cmp xq, 0 + jg %%loop_step_x + + dec stepd + cmp stepd, 0 + jg %%loop_step + +%%bscale_scalar: +%endmacro + +;*************************************************************************** +; void ff_horiz_slice(float *ptr, int width, int height, int steps, +; float nu, float bscale) +;*************************************************************************** %macro HORIZ_SLICE 0 %if UNIX64 +%if cpuflag(avx512) || cpuflag(avx2) +cglobal horiz_slice, 5, 12, mmnum, 0-mmsize*4, buffer, width, height, steps, \ + localbuf, x, y, step, stride, remain, ptr, mask +%else cglobal horiz_slice, 4, 9, 9, ptr, width, height, steps, x, y, step, stride, remain +%endif +%else +%if cpuflag(avx512) || cpuflag(avx2) +cglobal horiz_slice, 5, 12, mmnum, 0-mmsize*4, buffer, width, height, steps, nu, bscale, \ + localbuf, x, y, step, stride, remain, ptr, mask %else cglobal horiz_slice, 4, 9, 9, ptr, width, height, steps, nu, bscale, x, y, step, stride, remain %endif +%endif +%if cpuflag(avx512) || cpuflag(avx2) +%assign rows mmsize/4 +%assign cols mmsize/4 +%if WIN64 + VBROADCASTSS m0, num ; nu + VBROADCASTSS m1, bscalem ; bscale + + mov nuq, localbufm + DEFINE_ARGS buffer, width, height, steps, \ + localbuf, x, y, step, stride, remain, ptr, mask + MOVSXDIFNIDN width, height, steps +%else + VBROADCASTSS m0, xmm0 ; nu + VBROADCASTSS m1, xmm1 ; bscale +%endif + +%if cpuflag(avx512) + vpbroadcastd m2, widthd + INIT_WORD_MASK k6, k5, k4, k3, k2, k1 + INIT_INDICES m28, m29, m30 +%else + movd xm2, widthd + VBROADCASTSS m2, xm2 +%endif + + vpmulld m2, m2, [gblur_vindex_width] ; vindex width + + xor yq, yq ; y = 0 + xor xq, xq ; x = 0 + + cmp heightq, rows + jl .y_scalar + sub heightq, rows + +.loop_y: + ; ptr = buffer + y * width; + mov ptrq, yq + imul ptrq, widthq + lea ptrq, [bufferq + ptrq*4] + + KXNOR m5, k7 + VGATHERDPS m3, [ptrq + m2*4], m5, k7 + mulps m3, m1 + movu [localbufq], m3 + add ptrq, 4 + add localbufq, mmsize + + ; Filter rightwards + PUSH_MM m2 + lea xq, [widthq - 1] + .loop_x: + PUSH_MM m3 + GATHERPS cols + TRANSPOSE + POP_MM m3 + WRITE_LOCAL_BUFFER + + add ptrq, mmsize + add localbufq, rows * mmsize + sub xq, cols + cmp xq, cols + jge .loop_x + POP_MM m2 + + cmp xq, 0 + jle .bscale_scalar + .loop_x_scalar: + KXNOR m5, k7 + VGATHERDPS m4, [ptrq + m2*4], m5, k7 + FMULADD_PS m3, m3, m0, m4, m3 + movu [localbufq], m3 + + add ptrq, 0x4 + add localbufq, mmsize + dec xq + cmp xq, 0 + jg .loop_x_scalar + + OPTIMIZED_LOOP_STEP + + .bscale_scalar: + sub ptrq, 4 + sub localbufq, mmsize + mulps m3, m1 + KXNOR m5, k7, maskq + VSCATTERDPS ptrq, strideq, m2, maskq, k7, 3, widthq, remainq + + ; Filter leftwards + PUSH_MM m2 + lea xq, [widthq - 1] + .loop_x_back: + sub localbufq, rows * mmsize + READ_LOCAL_BUFFER cols + PUSH_MM m2 + TRANSPOSE + POP_MM m3 + sub ptrq, mmsize + SCATTERPS + + sub xq, cols + cmp xq, cols + jge .loop_x_back + POP_MM m2 + + cmp xq, 0 + jle .end_loop_x + .loop_x_back_scalar: + sub ptrq, 0x4 + sub localbufq, mmsize + FMULADD_PS m3, m3, m0, [localbufq], m3 + KXNOR m5, k7, maskq + VSCATTERDPS ptrq, strideq, m2, maskq, k7, 3, widthq, remainq + + dec xq + cmp xq, 0 + jg .loop_x_back_scalar + + .end_loop_x: + + add yq, rows + cmp yq, heightq + jle .loop_y + + add heightq, rows + cmp yq, heightq + jge .end_scalar + + mov remainq, widthq + imul remainq, mmsize + add ptrq, remainq + +.y_scalar: + mov remainq, heightq + sub remainq, yq + mov maskq, 1 + shlx maskq, maskq, remainq + sub maskq, 1 + mov remainq, maskq + PUSH_MASK m5, k1, remaind, xd, rsp + 0x20 + + mov ptrq, yq + imul ptrq, widthq + lea ptrq, [bufferq + ptrq * 4] ; ptrq = buffer + y * width + KMOVW m6, m5, k7, k1 + VGATHERDPS m3, [ptrq + m2 * 4], m6, k7 + mulps m3, m1 ; p0 *= bscale + movu [localbufq], m3 + add localbufq, mmsize + + ; Filter rightwards + lea xq, [widthq - 1] + .y_scalar_loop_x: + add ptrq, 4 + KMOVW m6, m5, k7, k1 + VGATHERDPS m4, [ptrq + m2 * 4], m6, k7 + FMULADD_PS m3, m3, m0, m4, m3 + movu [localbufq], m3 + add localbufq, mmsize + + dec xq + cmp xq, 0 + jg .y_scalar_loop_x + + OPTIMIZED_LOOP_STEP + + sub localbufq, mmsize + mulps m3, m1 ; p0 *= bscale + KMOVW k7, k1 + VSCATTERDPS ptrq, strideq, m2, maskq, k7, 3, widthq, remainq, heightq + + ; Filter leftwards + lea xq, [widthq - 1] + .y_scalar_loop_x_back: + sub ptrq, 4 + sub localbufq, mmsize + FMULADD_PS m3, m3, m0, [localbufq], m3 + KMOVW k7, k1 + VSCATTERDPS ptrq, strideq, m2, maskq, k7, 3, widthq, remainq, heightq + dec xq + cmp xq, 0 + jg .y_scalar_loop_x_back + +.end_scalar: + RET +%else %if WIN64 movss m0, num movss m1, bscalem @@ -211,16 +773,26 @@ cglobal horiz_slice, 4, 9, 9, ptr, width, height, steps, nu, bscale, x, y, step, jl .loop_y RET +%endif %endmacro %if ARCH_X86_64 INIT_XMM sse4 HORIZ_SLICE -INIT_XMM avx2 +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +%xdefine mmnum 16 HORIZ_SLICE %endif +%if HAVE_AVX512_EXTERNAL +INIT_ZMM avx512 +%xdefine mmnum 32 +HORIZ_SLICE +%endif +%endif + %macro POSTSCALE_SLICE 0 cglobal postscale_slice, 2, 2, 4, ptr, length, postscale, min, max shl lengthd, 2 @@ -270,7 +842,6 @@ INIT_ZMM avx512 POSTSCALE_SLICE %endif - ;******************************************************************************* ; void ff_verti_slice(float *buffer, int width, int height, int column_begin, ; int column_end, int steps, float nu, float bscale); diff --git a/libavfilter/x86/vf_gblur_init.c b/libavfilter/x86/vf_gblur_init.c index 3e173410c2..b47f6fbffb 100644 --- a/libavfilter/x86/vf_gblur_init.c +++ b/libavfilter/x86/vf_gblur_init.c @@ -24,8 +24,9 @@ #include "libavutil/x86/cpu.h" #include "libavfilter/gblur.h" -void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, float nu, float bscale); -void ff_horiz_slice_avx2(float *ptr, int width, int height, int steps, float nu, float bscale); +void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, float nu, float bscale, float *localbuf); +void ff_horiz_slice_avx2(float *ptr, int width, int height, int steps, float nu, float bscale, float *localbuf); +void ff_horiz_slice_avx512(float *ptr, int width, int height, int steps, float nu, float bscale, float *localbuf); void ff_postscale_slice_sse(float *ptr, int length, float postscale, float min, float max); void ff_postscale_slice_avx2(float *ptr, int length, float postscale, float min, float max); @@ -51,12 +52,22 @@ av_cold void ff_gblur_init_x86(GBlurContext *s) s->horiz_slice = ff_horiz_slice_sse4; } if (EXTERNAL_AVX2(cpu_flags)) { - s->horiz_slice = ff_horiz_slice_avx2; s->verti_slice = ff_verti_slice_avx2; } if (EXTERNAL_AVX512(cpu_flags)) { s->postscale_slice = ff_postscale_slice_avx512; s->verti_slice = ff_verti_slice_avx512; } + if (EXTERNAL_AVX2(cpu_flags)) { + s->stride = EXTERNAL_AVX512(cpu_flags) ? 16 : 8; + s->localbuf = av_malloc(s->stride * sizeof(float) * s->planewidth[0] * s->planeheight[0]); + if (!s->localbuf) + return; + + s->horiz_slice = ff_horiz_slice_avx2; + if (EXTERNAL_AVX512(cpu_flags)) { + s->horiz_slice = ff_horiz_slice_avx512; + } + } #endif } From patchwork Mon Aug 2 05:34:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wu Jianhua X-Patchwork-Id: 29183 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:6c0f:0:0:0:0:0 with SMTP id a15csp1283042ioh; Sun, 1 Aug 2021 22:36:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyIKQA1CHIqLJg4e2Sfn74Nhg/NsKR3dIDfuF7U8PXG6+WEmrZROF1oBxzxoIGJe4LqALn/ X-Received: by 2002:a17:906:8152:: with SMTP id z18mr14234339ejw.419.1627882569638; Sun, 01 Aug 2021 22:36:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627882569; cv=none; d=google.com; s=arc-20160816; b=mSWXlLrdIpS/ITGEQ+GYNhVDowI4/JE4MTQd2+uRyOpkk3TjCKifl4iXxmk5YP6Njk mz6oqOsoqyCFbxu7orHaB4yQh2LsIEb8+u63fIAtkKo/QkoUeJOwOsBGh9SJkYiOF85k J5aGX7TDPedQLqJze+J1PlFa69NB6ywkqy06Ux049UzPhqN31jCbowpwbwcUrz3I+24l w9FuQYSGrnC+ioGC1Gs9+8ntw/meMMYw6dHvEAv1XEsFcbLS0WcOdGPCrw0d6ig6tRcx 3F4xLZ9JcqTgFy9sAZ5Q9d14RfU0pFPCQ61QHtvufl3W435TPAfaeQU8PVzLfoo2nxN7 JMFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:delivered-to; bh=wMj40hyrtUsNF2uh+b8oWs9m+qlko+nWzIBL9QEkYKg=; b=f02P/C/TzcD5GZeyyngQLwwCkxfGIoH6KvNXrVXBukUrdx1EmziGRekPdw8OTfyQsk 9NRRY0xLnJqib4zvW1mMMXt59dy/iTWBa5pYAXffqxJojXQX0MIoF0D/ENnLhB28eDkT e8B7MYQ3VRb0StJqhpmmV+j/+8833jEsMkkM5wHhnFFHyAF8x6qBGvmt3FEkD4RLA6+y bc47D1TIaYlyrsWr8Y1CB4oUPUSC/3fjdLcCeUnRMT6sL2WPVOiaD8uF73hZuniRTthT De05/jecUJyZ7pGeBoPM3d3CfJbrO2pnp+A1RAIcVbqQ6TS7jEd1VrrADA0RqAMR6xaX lGdg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u4si11432406ejy.271.2021.08.01.22.36.09; Sun, 01 Aug 2021 22:36:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A864F68A4E5; Mon, 2 Aug 2021 08:35:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B601D689CB4 for ; Mon, 2 Aug 2021 08:35:24 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="213420929" X-IronPort-AV: E=Sophos;i="5.84,288,1620716400"; d="scan'208";a="213420929" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2021 22:35:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,287,1620716400"; d="scan'208";a="457815416" Received: from skl-e5.sh.intel.com ([10.239.43.106]) by orsmga007.jf.intel.com with ESMTP; 01 Aug 2021 22:35:18 -0700 From: Wu Jianhua To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Aug 2021 13:34:39 +0800 Message-Id: <20210802053439.42828-5-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210802053439.42828-1-jianhua.wu@intel.com> References: <20210802053439.42828-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 5/5] tests/checkasm/vf_gblur.c: update check_horiz_slice for the new ff_horiz_slice_avx2/512 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua , yanfei.cheng@intel.com MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: blse9zE45SvF Co-authored-by: Cheng Yanfei Co-authored-by: Jin Jun Signed-off-by: Wu Jianhua --- tests/checkasm/vf_gblur.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/tests/checkasm/vf_gblur.c b/tests/checkasm/vf_gblur.c index 0fac37b6be..a7a1c1a24e 100644 --- a/tests/checkasm/vf_gblur.c +++ b/tests/checkasm/vf_gblur.c @@ -34,19 +34,19 @@ tmp_buf[j] = (float)(rnd() & 0xFF); \ } while (0) -static void check_horiz_slice(float *dst_ref, float *dst_new) +static void check_horiz_slice(float *dst_ref, float *dst_new, float *localbuf) { int steps = 2; float nu = 0.101f; float bscale = 1.112f; - declare_func(void, float *dst, int w, int h, int steps, float nu, float bscale); - call_ref(dst_ref, WIDTH, HEIGHT, steps, nu, bscale); - call_new(dst_new, WIDTH, HEIGHT, steps, nu, bscale); + declare_func(void, float *dst, int w, int h, int steps, float nu, float bscale, float *localbuf); + call_ref(dst_ref, WIDTH, HEIGHT, steps, nu, bscale, localbuf); + call_new(dst_new, WIDTH, HEIGHT, steps, nu, bscale, localbuf); if (!float_near_abs_eps_array(dst_ref, dst_new, 0.01f, PIXELS)) { fail(); } - bench_new(dst_new, WIDTH, HEIGHT, 1, nu, bscale); + bench_new(dst_new, WIDTH, HEIGHT, 1, nu, bscale, localbuf); } static void check_verti_slice(float *dst_ref, float *dst_new) @@ -87,10 +87,12 @@ void checkasm_check_vf_gblur(void) randomize_buffers(dst_ref, PIXELS); memcpy(dst_new, dst_ref, BUF_SIZE); + s.planewidth[0] = WIDTH; + s.planeheight[0] = HEIGHT; ff_gblur_init(&s); if (check_func(s.horiz_slice, "horiz_slice")) { - check_horiz_slice(dst_ref, dst_new); + check_horiz_slice(dst_ref, dst_new, s.localbuf); } report("horiz_slice"); @@ -108,6 +110,9 @@ void checkasm_check_vf_gblur(void) } report("verti_slice"); + if (s.localbuf) + av_free(s.localbuf); + av_freep(&dst_ref); av_freep(&dst_new); }