From patchwork Mon Oct 21 20:06:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 15892 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id C689544A003 for ; Mon, 21 Oct 2019 23:06:44 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 99D5968B10F; Mon, 21 Oct 2019 23:06:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f195.google.com (mail-qk1-f195.google.com [209.85.222.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DA47868B09B for ; Mon, 21 Oct 2019 23:06:37 +0300 (EEST) Received: by mail-qk1-f195.google.com with SMTP id f18so13371526qkm.1 for ; Mon, 21 Oct 2019 13:06:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=+/nGP07xMxbvuhdTYnawNtKVvALZUw8uJTRMKKDBcOo=; b=UYfqR+QjRflWejxm8eDxXhL2d7314/JC2crmAeBWmRnBgP0aZtrlzHxSZ1j7nfNAyf YpBU96xl+cNtITQAPvEzdQ8XwVBsQvV2oebfy3c5afEuPSQuyz97QVlBJxwb281cdj7d uIdhvx18nz0/RVAL5QpjqwKgZ6ocxTJ4ZtF4JnZ3G2+i4leuxwRN5Ma9mXhDZZ1ey8aC O91V4cxmUDb6Xg0U6j/TT4RgrmUTmlyLDhe2WtPjPLUGUKa3Aqz1ZvzU1Ok9+MXAlz19 Lc7m+IqqUsSJjKJD2ERoGsUYQXWYeOCwcZharRQMD+T+cxCZDZAACnpnI0vxAXbGLgFo f5Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=+/nGP07xMxbvuhdTYnawNtKVvALZUw8uJTRMKKDBcOo=; b=aVWgYkcAmOrjWVIWJTkpVtwZ1N5hAiTfKw8LcfG5SFo2R6GiAdZC8aEzBhFPFIZ/hA 4LP9TqTzJzQlI//yF/2hvYVmNBlNdxFfEEcSXmWXivXXOVFPW5uvgxgnUpsB0n8hZKAp wWqQAhOcrCUjc5ReRP/5tko+zokGeHx3py7cCaXqL2HiuRWCE3CUeOGUMdxcKBYexQFD qmuxkyPnJ3qyMYiAsR5oFqFJKL6ewIvxPL0PvwmKlCn45L4wpOaSZ0ks03qJPJ+hl9bK 9QH9FiNHEGhuAYo5fg5uUhkDmNvh5W+BlXt1uYA+sKPbsfrnUJVfs7PQGTP/NKrvNrvI 3LMw== X-Gm-Message-State: APjAAAVSw8I3e6U/D1fxDFTQ6fd55GmaJW7rqjHM1P6GubJgw+c5yAn2 NA0mCnx/avg5zks2MU5whkAOsWT6 X-Google-Smtp-Source: APXvYqytRUgBpS79H5LbNrgso5Zp1vMB+vAzo4hbD7X1zg54XhDo8oxdz1kmN6s5CEySieQmOi0x1w== X-Received: by 2002:a37:8d01:: with SMTP id p1mr9608830qkd.210.1571688396301; Mon, 21 Oct 2019 13:06:36 -0700 (PDT) Received: from localhost.localdomain ([181.23.69.30]) by smtp.gmail.com with ESMTPSA id e9sm7572019qkl.10.2019.10.21.13.06.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Oct 2019 13:06:35 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Mon, 21 Oct 2019 17:06:18 -0300 Message-Id: <20191021200618.10078-1-jamrial@gmail.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Signed-off-by: James Almer --- libavfilter/x86/vf_transpose.asm | 11 +++++------ libavfilter/x86/vf_transpose_init.c | 2 +- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/libavfilter/x86/vf_transpose.asm b/libavfilter/x86/vf_transpose.asm index f9f585369a..c532c899ee 100644 --- a/libavfilter/x86/vf_transpose.asm +++ b/libavfilter/x86/vf_transpose.asm @@ -56,10 +56,7 @@ cglobal transpose_8x8_8, 4,5,8, src, src_linesize, dst, dst_linesize, linesize3 movq [dstq + linesize3q], m7 RET -%if ARCH_X86_64 - -INIT_XMM sse2 -cglobal transpose_8x8_16, 4,5,9, src, src_linesize, dst, dst_linesize, linesize3 +cglobal transpose_8x8_16, 4,5,9, ARCH_X86_32 * 32, src, src_linesize, dst, dst_linesize, linesize3 lea linesize3q, [src_linesizeq * 3] movu m0, [srcq + src_linesizeq * 0] movu m1, [srcq + src_linesizeq * 1] @@ -71,7 +68,11 @@ cglobal transpose_8x8_16, 4,5,9, src, src_linesize, dst, dst_linesize, linesize3 movu m6, [srcq + src_linesizeq * 2] movu m7, [srcq + linesize3q] +%if ARCH_X86_64 TRANSPOSE8x8W 0, 1, 2, 3, 4, 5, 6, 7, 8 +%else + TRANSPOSE8x8W 0, 1, 2, 3, 4, 5, 6, 7, [rsp], [rsp + 16] +%endif lea linesize3q, [dst_linesizeq * 3] movu [dstq + dst_linesizeq * 0], m0 @@ -84,5 +85,3 @@ cglobal transpose_8x8_16, 4,5,9, src, src_linesize, dst, dst_linesize, linesize3 movu [dstq + dst_linesizeq * 2], m6 movu [dstq + linesize3q], m7 RET - -%endif diff --git a/libavfilter/x86/vf_transpose_init.c b/libavfilter/x86/vf_transpose_init.c index f1a9cd058b..6bb9908725 100644 --- a/libavfilter/x86/vf_transpose_init.c +++ b/libavfilter/x86/vf_transpose_init.c @@ -43,7 +43,7 @@ av_cold void ff_transpose_init_x86(TransVtable *v, int pixstep) v->transpose_8x8 = ff_transpose_8x8_8_sse2; } - if (ARCH_X86_64 && EXTERNAL_SSE2(cpu_flags) && pixstep == 2) { + if (EXTERNAL_SSE2(cpu_flags) && pixstep == 2) { v->transpose_8x8 = ff_transpose_8x8_16_sse2; } }