From patchwork Tue Feb 23 13:40:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 25921 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id B76E4449634 for ; Tue, 23 Feb 2021 15:48:51 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 822DB68AB8F; Tue, 23 Feb 2021 15:48:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8609468AB1D for ; Tue, 23 Feb 2021 15:48:44 +0200 (EET) Received: by mail-pf1-f201.google.com with SMTP id r6so6414225pfg.7 for ; Tue, 23 Feb 2021 05:48:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=bS+d27Mwq83VejFrNaUl+zbAS/ejXoiW8TMIgJtxVoU=; b=KZW0sfauby5Mlr5QOXRjTwsv3xf/CWS7rwgYZk1M1wd1tMlXMcp0jBpAknz+kEZ//N TkyYkh3XhJ3Nc3elry78uuvnxZRKZoOJe2l8Jb2gRjCFJwYkEIzXfE7NjhtZN5Si1ghu Dieph6/7deAVcXcB54ZDhWQB4S/KZEpkW6qT/8m7lzaPiUjwcIABEUu55tIQPYjd3a0y bw9Gl12fxKrrev/6SoUiy24v7Rxo2ePSkdsG6glUlNMz+OFiqzR+K5iYPrImZL0rq6az SmXsCb8eoA6j/DFAEQUEWfgmKf2KomoxgvKNrKa7+k0NjT3wx9L0WEQG7gSDRi0IforU dOqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=bS+d27Mwq83VejFrNaUl+zbAS/ejXoiW8TMIgJtxVoU=; b=ivreF3H2j1hC33hjhldD+YrXuhMSUN3d3MszpA4fHwUpreYQWOrVKZPLtfG82uQqKO iWvZsBYRAZPHLTq01tpGnRt8C4OUp3vqUnYVeEQZwDszKlZiUuS2Oi5PsxT9Kee/Ss2v 85FsOtNJxdh7Py/CvPVSALVnUgUcUN2uX+/KtWJkoaJFSFSEZ+VWMfFroyhuz5hBKbRL L/IPIsoCdCv74pvyQ14zOIn7BW2LZsCkJMxGFFd8DccMKZcZorix46zL8M0ClxUcVAC9 oBazzWs1blbwQzuRmFqJCF9qem1BG90gUr5dcH/ZvlIjFAyG2muLogIBlgLZlQgFMCpZ yGQw== X-Gm-Message-State: AOAM5316rHBU3dSSjTla2HwV2jnY7jcj7q6CrdCAeXCFjcTjLBL5UdY7 Pukk1gwgVbbeS3Q4csCNWzjZMuRXD0TXRChJVz4y6LtRxSg0Fxu8/7W6wlaEJtRYuSYN7KO9jKY mgDQwahbJRDMic2H8IXSYkX5JCM5nLlmlNHyaL1kRoaJvje6vgf+EYjkkbnMLmooCj1bK/N8= X-Google-Smtp-Source: ABdhPJySG4fYdf2XFhXmtrqccnE3T4jgCKm1akzR3jKw95i/EgrwvHPt8+KvtXzpbtbUw/Lle9VY/s7NrOw0xxg= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:42:205:f103:c0d9:882a:4eb4]) (user=alankelly job=sendgmr) by 2002:a0c:ef51:: with SMTP id t17mr20427976qvs.1.1614087654922; Tue, 23 Feb 2021 05:40:54 -0800 (PST) Date: Tue, 23 Feb 2021 14:40:45 +0100 Message-Id: <20210223134047.1834787-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.30.0.617.g56c4b15f3c-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 1/3] libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- This is so that tails of size 8 may safely be processed libswscale/x86/yuv2yuvX.asm | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm index 521880dabe..b6294cb919 100644 --- a/libswscale/x86/yuv2yuvX.asm +++ b/libswscale/x86/yuv2yuvX.asm @@ -37,8 +37,10 @@ SECTION .text cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset %if notcpuflag(sse3) %define movr mova +%define unroll 1 %else %define movr movdqu +%define unroll 2 %endif movsxdifnidn dstWq, dstWd movsxdifnidn offsetq, offsetd @@ -70,8 +72,10 @@ cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset .outerloop: mova m4, m7 mova m3, m7 +%if cpuflag(sse3) mova m6, m7 mova m1, m7 +%endif .loop: %if cpuflag(avx2) vpbroadcastq m0, [filterSizeq + 8] @@ -84,28 +88,36 @@ cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset pmulhw m5, m0, [srcq + offsetq * 2 + mmsize] paddw m3, m3, m2 paddw m4, m4, m5 +%if cpuflag(sse3) pmulhw m2, m0, [srcq + offsetq * 2 + 2 * mmsize] pmulhw m5, m0, [srcq + offsetq * 2 + 3 * mmsize] paddw m6, m6, m2 paddw m1, m1, m5 +%endif add filterSizeq, $10 mov srcq, [filterSizeq] test srcq, srcq jnz .loop psraw m3, m3, 3 psraw m4, m4, 3 +%if cpuflag(sse3) psraw m6, m6, 3 psraw m1, m1, 3 +%endif packuswb m3, m3, m4 +%if cpuflag(sse3) packuswb m6, m6, m1 +%endif mov srcq, [filterq] %if cpuflag(avx2) vpermq m3, m3, 216 vpermq m6, m6, 216 %endif movr [destq + offsetq], m3 +%if cpuflag(sse3) movr [destq + offsetq + mmsize], m6 - add offsetq, mmsize * 2 +%endif + add offsetq, mmsize * unroll mov filterSizeq, filterq cmp offsetq, dstWq jb .outerloop