From patchwork Thu Feb 17 10:04:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 34357 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp490079nkx; Thu, 17 Feb 2022 02:04:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJzCiYv2IddmJ82LKALvxfxUEzLb0FyU8UaMAMb6Ym4fyidlWfJvzq5EO5uZ+1RCkENzF17s X-Received: by 2002:a17:907:3a0f:b0:6cd:5ca7:648d with SMTP id fb15-20020a1709073a0f00b006cd5ca7648dmr1724072ejc.79.1645092260303; Thu, 17 Feb 2022 02:04:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645092260; cv=none; d=google.com; s=arc-20160816; b=lB9WufLnt/n9Mt7lJo87vIo7hU5ybyxTb2KGnpEFM7S3Zhvs9KpjXaPzxiUU+ceoqG C776mx+mkFnlH0jjs3my9IB4touFcHUjosUE7qFPaSxQa42WjAx0IY3QJ5brgPMka4SY yhwyhf6ryl3zooKNjyXmt9gKXPYeWaX4ShHXvY28faKz9lvAyimeu6atzVMxISpo7m33 Qp6vwGBuAzt6aF9nR6n6eJ0zT94TR5ZtoyzWpL1y5+Q90+L01miG5oUWezqfx6EziIFK InbEgwGveejunL04t/SEw9rRqUZcJNlW+HcHt8ts0WZb2BaY9K9GYeYCRQm6R571ok7L QXlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:mime-version:message-id:date :dkim-signature:delivered-to; bh=eptn5wwl0rYdfSup79PiEHd62CaDCraMZ9YkPDJER94=; b=TWvA1reGXAw/7NqF3F4ZYCEURWZDhQdbLigaai+FsxiWTbKmB5nSTQBmMQimEVo4bZ 84zxl50gVmlhrIm8bgWpKq3PFSleMQJfP8vrgj5kmgVzj6OPdN+Y4zl463Hoesjoc4Zq ShV9EsB9gRff+piZNM7Zw65SU1M1M3BBKuS5bUpg3/scFbNSuPmLXS7pY0Uhg9fHu7A6 g87clUewWFxjKB3uYdhJSGJWH2HJ9sAn+zLeQ7Tkxl4swmDvlQleXxSl4pb+gnonxrbC s1Dmls0Gpi5gcvqRXDPBr29EnQ7IIzV+y+b90UTouBVKVC+MT1S9bQCHWPv8RNiQ89es 3Ypg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=lIr4Ht+h; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b12si1532210ejc.97.2022.02.17.02.04.19; Thu, 17 Feb 2022 02:04:20 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=lIr4Ht+h; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1994068B253; Thu, 17 Feb 2022 12:04:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B92FD68AAA0 for ; Thu, 17 Feb 2022 12:04:10 +0200 (EET) Received: by mail-wm1-f73.google.com with SMTP id r11-20020a1c440b000000b0037bb51b549aso3235297wma.4 for ; Thu, 17 Feb 2022 02:04:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=Fi2fqh6LCwMrm7FieXk9hCnhWxzkIuYcqP9NI7/EJTc=; b=lIr4Ht+hLbvMLrYO/eLssM+dyVTI9tDqbt+/4jpPXlxghbtFR2ftZJpmddYqDaxxc6 I3Qo+13SYct5imiK4MfXsr/rwRLjw0OrOb4eJ9J1W2pacYVUHv5fAKPIsAb7t+N9iP7R vCz6/dAEy+3c84lhZllgpC5rmmKAIJp8gZtcwplrr6d+F1aWnisUxGksWOfKpx65ykdP sKg4UdHZJTN2kBgNNGDkEb5TVp8e/sLHScUGuQEOrCAoqtM4eAiGQGtfVGy3drxDUPpZ 3hKKK6FoU9fY4FNMmFvytIDidukFK64WgqSI+wzznq7LBgmVLaDxOqBhkHXxUA+HxxSm KeVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=Fi2fqh6LCwMrm7FieXk9hCnhWxzkIuYcqP9NI7/EJTc=; b=vlOMhgHRejxmWZfNHazU8h4mZSn0nGuZcwqCbepLQzUVy1bgTIM6MdYbCE1jV40mjc cTXUNPyNZ0xzH1pGxd1Tr7GNetqGCKKwS/4vlm55f1w1VTyTXrXJjd5tHkTs0R35IlCN RZzC5i1KdcH/8s9IDi1SSMa00gabuJ5ArBIXtPEdIuR/ItMgXCCMwFWgA2/WKpBPAsPB gIVSNIkR05MRLkXaHYm3zhAt0x/KSE4rvZHLcR9VjTkQqgtREdG6NVAi9Ncz5lMpT2Ap MbnuTOvuUmqitbBMEVlS7xN/K7vOeAE23DvD4eP2XJzgY6TRBqvuXrfQ7npYn+ll6ruv ALsA== X-Gm-Message-State: AOAM533pxYwX7t2PtGb03fus8neGhCj5ZnEfaw8dgQNLUdMdd2cEDafz ZDZizocVPEkP5SPBm+3qwQ2Hs7rIQToQu/o92MstB+zDSF/G/wSc5OgqwdnLkz8b2MmP4/F7dso VLDKnPqY4DnnRtL2S4GUrRknnR7kcbCkslED/e1jz6uBKJimgxmTE7Bv7ED7mL0RsTc6MMFc= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:b159:808d:943e:13ba]) (user=alankelly job=sendgmr) by 2002:a5d:59a2:0:b0:1e3:3df1:78a2 with SMTP id p2-20020a5d59a2000000b001e33df178a2mr1714788wrr.312.1645092247609; Thu, 17 Feb 2022 02:04:07 -0800 (PST) Date: Thu, 17 Feb 2022 11:04:04 +0100 Message-Id: <20220217100404.1112755-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH v2 3/5] libswscale: Avx2 hscale can process inputs of any size. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: +CfQbep+Uylf The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. --- libswscale/x86/scale_avx2.asm | 48 +++++++++++++++++++++++++++++++++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd633..dc42abb100 100644 --- a/libswscale/x86/scale_avx2.asm +++ b/libswscale/x86/scale_avx2.asm @@ -53,6 +53,9 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, mova m14, [four] shr fltsized, 2 %endif + cmp wq, 16 + jl .tail_loop + mov countq, 0x10 .loop: movu m1, [fltposq] movu m2, [fltposq+32] @@ -97,11 +100,52 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, vpsrad m6, 7 vpackssdw m5, m5, m6 vpermd m5, m15, m5 - vmovdqu [dstq + countq * 2], m5 + vmovdqu [dstq], m5 + add dstq, 0x20 add fltposq, 0x40 add countq, 0x10 cmp countq, wq - jl .loop + jle .loop + + sub countq, 0x10 + cmp countq, wq + jge .end + +.tail_loop: + movu xm1, [fltposq] +%ifidn %1, X4 + pxor xm9, xm9 + pxor xm10, xm10 + xor innerq, innerq +.tail_innerloop: +%endif + vpcmpeqd xm13, xm13 + vpgatherdd xm3,[srcmemq + xm1], xm13 + vpunpcklbw xm5, xm3, xm0 + vpunpckhbw xm6, xm3, xm0 + vpmaddwd xm5, xm5, [filterq] + vpmaddwd xm6, xm6, [filterq + 16] + add filterq, 0x20 +%ifidn %1, X4 + paddd xm9, xm5 + paddd xm10, xm6 + paddd xm1, xm14 + add innerq, 1 + cmp innerq, fltsizeq + jl .tail_innerloop + vphaddd xm5, xm9, xm10 +%else + vphaddd xm5, xm5, xm6 +%endif + vpsrad xm5, 7 + vpackssdw xm5, xm5, xm5 + vmovq [dstq], xm5 + add dstq, 0x8 + add fltposq, 0x10 + add countq, 0x4 + cmp countq, wq + jl .tail_loop +.end: REP_RET %endmacro