From patchwork Wed Feb 17 16:41:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 25707 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id F27134499AE for ; Wed, 17 Feb 2021 18:41:31 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C7B4D68A0EA; Wed, 17 Feb 2021 18:41:31 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 547CF68A073 for ; Wed, 17 Feb 2021 18:41:25 +0200 (EET) Received: by mail-qk1-f178.google.com with SMTP id q85so13330719qke.8 for ; Wed, 17 Feb 2021 08:41:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=8Ia629RccoWEWa91KQltExqrWtyIn9a4lzNDPprzyI8=; b=uw2LqNFbKDE5y3u8EvBxsKazyZq69t0qh6LFEaDAsQvkplSusFp8UE0bbFewygD2Vs pkhccXdX4kxw+RVSedTy3SCS5OxI6n2C9yfmW+KSemhL9LJ6wZHfKDjMwGVoqavMF48i FJsdpN2cK8qPghSeSvzZZIC4tPkmAFhNAxz+NyxgtFMOZOSZkFa3LLcRrFaNP9y2Pwej spojbKEfr1EP5+G6yiqwdqaU9evCV/jC8XEGDtBUUQGKgOcQpfEIa8123NiiPPX8ozxN 1bQwe1fl37E8K3begUzcZcd/1nruJvkKcWN8P2T0Y2jgLPt3QGh4ibAZvthyjEeuLAAO Zh4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=8Ia629RccoWEWa91KQltExqrWtyIn9a4lzNDPprzyI8=; b=JprwISFnzuy0E59x0r5C/cUAkmKaiWnOGSShcA71i2gr9yIoaFIZaZ7ovks0BOJbbL E+sftL7Nr3SO6u7Y5QNsDzF4dnTu1+BNGn4YmGnqnqglB6PaGk8C8kcrO6RukF5Go+Aw od2ZgPdCo162+1VylIAtpali0rv+gmUN4QU+us2pZiKiK9OXuaLlGk6ZFXSnQjAeuvqe 7q2iFS9fxi6quvIeSTw6JZOloT/7NSPL26gjdeFhvVN0xrET91BJGXoFvJUsBt6TWoTV Qknk6Eke6c6hOvCh275X2+UT0YjoaCWqtunUtvJUL8aCtK5y7fVBojXDlKCqH3ViKf3X Xtkg== X-Gm-Message-State: AOAM531qIX8vR6A9R/I02qpIpuRExHa9+Jn8tZZ+zu7A5+PqGa/lIksz S40KOmu2eNZTRkJFEPk/EsAUlsbQt0M= X-Google-Smtp-Source: ABdhPJyX0MuFXBVvtOoGjESxjUHFsoQ0O0eZ9T/Yt2s3/Pa3Rx4pkb7SMFnZrdLFqqtneRUBVx3Sig== X-Received: by 2002:a37:4ecd:: with SMTP id c196mr39461qkb.264.1613580083530; Wed, 17 Feb 2021 08:41:23 -0800 (PST) Received: from localhost.localdomain ([181.23.76.251]) by smtp.gmail.com with ESMTPSA id f12sm1944390qkl.2.2021.02.17.08.41.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 08:41:22 -0800 (PST) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Wed, 17 Feb 2021 13:41:04 -0300 Message-Id: <20210217164106.6370-1-jamrial@gmail.com> X-Mailer: git-send-email 2.30.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] x86/vf_gblur: fix postscale_slice prologue X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer --- libavfilter/vf_gblur.c | 3 +-- libavfilter/x86/vf_gblur.asm | 29 +++++++++++++---------------- 2 files changed, 14 insertions(+), 18 deletions(-) diff --git a/libavfilter/vf_gblur.c b/libavfilter/vf_gblur.c index 109a7a95f9..40956e122d 100644 --- a/libavfilter/vf_gblur.c +++ b/libavfilter/vf_gblur.c @@ -234,8 +234,7 @@ void ff_gblur_init(GBlurContext *s) { s->horiz_slice = horiz_slice_c; s->postscale_slice = postscale_c; - if (ARCH_X86_64) - ff_gblur_init_x86(s); + ff_gblur_init_x86(s); } static int config_input(AVFilterLink *inlink) diff --git a/libavfilter/x86/vf_gblur.asm b/libavfilter/x86/vf_gblur.asm index c29ecba889..c2b2998202 100644 --- a/libavfilter/x86/vf_gblur.asm +++ b/libavfilter/x86/vf_gblur.asm @@ -185,27 +185,24 @@ HORIZ_SLICE %endif %macro POSTSCALE_SLICE 0 -%if UNIX64 -cglobal postscale_slice, 2, 2, 4, ptr, length -%else -cglobal postscale_slice, 5, 5, 4, ptr, length, postscale, min, max -%endif +cglobal postscale_slice, 2, 2, 4, ptr, length, postscale, min, max shl lengthd, 2 add ptrq, lengthq neg lengthq -%if WIN64 +%if ARCH_X86_32 + VBROADCASTSS m0, postscalem + VBROADCASTSS m1, minm + VBROADCASTSS m2, maxm +%elif WIN64 SWAP 0, 2 SWAP 1, 3 - SWAP 2, 4 -%endif -%if cpuflag(avx2) - vbroadcastss m0, xm0 - vbroadcastss m1, xm1 - vbroadcastss m2, xm2 -%else - shufps xm0, xm0, 0 - shufps xm1, xm1, 0 - shufps xm2, xm2, 0 + VBROADCASTSS m0, xm0 + VBROADCASTSS m1, xm1 + VBROADCASTSS m2, maxm +%else ; UNIX64 + VBROADCASTSS m0, xm0 + VBROADCASTSS m1, xm1 + VBROADCASTSS m2, xm3 %endif .loop: