From patchwork Wed Jul 20 04:41:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36851 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172110pzb; Tue, 19 Jul 2022 21:41:33 -0700 (PDT) X-Google-Smtp-Source: AGRyM1s4XaPk2sq6urzrz+WxsPu6SHJL3fwEaKIklBmr9QnToN6frjRhd+2XQ3001fQJK3RPOzqB X-Received: by 2002:a17:907:1dce:b0:72b:40c4:deec with SMTP id og14-20020a1709071dce00b0072b40c4deecmr34354336ejc.70.1658292092700; Tue, 19 Jul 2022 21:41:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292092; cv=none; d=google.com; s=arc-20160816; b=EeyVfc15ZEQeKvdN7tgPxJ4pZkm4zhW319bqhSM4a6C8M2SjDM8nEfaoACz7Sm5tAH E+b/4ZmE6TZTrVweYGyFGIsQvi6SZ1cYD1ra5HgGaioCcUqkA5KiXyqu6klX+1oaL/qW Q0UMMYVJw0OQmruybVPErzdQrOwN73E7UVf6sfG7ej+i9Nz+Qiu1IUbE7Is2zy2yBkZJ 06kY4ghyJbKtYNbUidMYRZfdwCMQ6eIZsEHbW+QwNGARUGO9MPVRnPFvVG7N2GDU5VZH tB30ZSZs+e6nSmPB8IoiD4aw33Euw+MoOJdcwItx3yW76pIPAeZxoui0HMBODkbWdMJo Lojg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=e8Ovz2Yu0LrounZV3v0jaKAWXfiFfHGcrxZn1Hzu2n4=; b=lgIsh6vl9U29rc6+WSLPT/+0XuXZa6P/wHsRmZ9z2iZZ90cC8RG9XkYOd7vcS+ubD/ 2kfPe9XS5lT+4Ff74Yc5BGdpx/tbfo2ZzZnMMpIzcRZe6rPprvGquaLLxSHRPV5lczJh j0Jb1SmF+35B/SsT1fHiijIloa5qB8+bTgiPc3ZgyJNjxiYnSffse+pgNvHewEApHYY4 iMIzBOqkc0Wg549d4lzYBH0xZj7RnEF2Al1wsXAQmskyzUg3OMxX/+KL0cp9ZCdsenTp 3VqMPfYaruXpafOBhiDG3Tn1VLTZimmoM7WMoA9sz4FQfMX/huhFkiL3yelx5MF+X9Cj vmpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=LicMFOkP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u4-20020a1709063b8400b0072b3a316cc6si18832417ejf.977.2022.07.19.21.41.32; Tue, 19 Jul 2022 21:41:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=LicMFOkP; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 772E568B7DB; Wed, 20 Jul 2022 07:41:29 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5A14C68B495 for ; Wed, 20 Jul 2022 07:41:23 +0300 (EEST) Received: by mail-pg1-f178.google.com with SMTP id 23so15348558pgc.8 for ; Tue, 19 Jul 2022 21:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=O87wVmR7M7BeqGWY6sT2yeYJi/0sVKg81MI7tuW4Rjs=; b=LicMFOkP0L1pjqDSzL9+qH0RZaPLPow8pL3X0EmdGu2jtCsuypPmiUBXnrB22DFpKA GfaG7z3Us9bTL6bf3wqF0blbwXkbwAvzNJ149kMpJDjkupyB8WlOcn6M+uMVa6rcJMUa H/mtCeskSi6UiC7c8ZHDpIei3T0L76IVg2v9DUXLMn/tSzHJwrNYWCW+RskI7uJjvwLC NsCV6oAOsiLhIFYJdbeF7fC/2TnfDc3HvKloh/o3S1IIyzVj+woRnplsC/nDLVCcTAxw MwtAUkK6eSNHhg3uOm0earxNiSJOspvliXUzvq460n+af+zKbCbMCdkfvWsdbaVTu5yV QNkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=O87wVmR7M7BeqGWY6sT2yeYJi/0sVKg81MI7tuW4Rjs=; b=2iTyCdm1gXnTk5SHBAb1GJVefSiM0Qew8IFz5SJAGAOht55U4/GjRjj6b/CK5Nq1RW EvbbCxeJGOujNv8s0JXzjIbYGaDtPLGS+eKC3e2VMXxR2DrmsayWNzc/H5MO02IIxVXc aqiLBvHSDBnsfyO1isujzHiOHvkwmXcrAuTxHF/BBgC3bX2hktP2j4bSlQ9fXl1cGmw+ TJCumDYFrbiNNZxshFu4HYgktmk8iZ1jt5UeD7VW14azK2hWVikfQpwTazMfBRMsdGlc p1Z9BRk6FLUlijKDuHPwngfvvC+UompNkSGmoAbh79siuzCPbZMB0/RlQBd2JgI/3sXf EOYg== X-Gm-Message-State: AJIora+/uuGoEd0CLKoW1W00U6eGjcyL1M7MaYTy0crN8rZL213P/C/7 nHQ2/O+9hzmjw2FSTQzffQN6uQf4R3vNkw== X-Received: by 2002:a05:6a00:1a0f:b0:52b:13f0:6ab1 with SMTP id g15-20020a056a001a0f00b0052b13f06ab1mr35482228pfv.60.1658292081068; Tue, 19 Jul 2022 21:41:21 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:20 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:13 -0700 Message-Id: <20220720044117.1282961-1-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/5] avfilter/vf_yadif: Fix edge size when MAX_ALIGN is < 4 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: eiNsAO0DEm7o If alignment is set to less than 4 filter_edges will produce incorrect output and not filter the entire edge. To fix this, make sure that the edge size is at least 3. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index afa4d1d53d..055327d7a4 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -120,7 +120,7 @@ static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, uint8_t *prev2 = parity ? prev : cur ; uint8_t *next2 = parity ? cur : next; - const int edge = MAX_ALIGN - 1; + const int edge = FFMAX(MAX_ALIGN - 1, 3); int offset = FFMAX(w - edge, 3); /* Only edge pixels need to be processed here. A constant value of false @@ -169,7 +169,7 @@ static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, uint16_t *prev2 = parity ? prev : cur ; uint16_t *next2 = parity ? cur : next; - const int edge = MAX_ALIGN / 2 - 1; + const int edge = FFMAX(MAX_ALIGN / 2 - 1, 3); int offset = FFMAX(w - edge, 3); mrefs /= 2; From patchwork Wed Jul 20 04:41:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36852 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172161pzb; Tue, 19 Jul 2022 21:41:41 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uDUlNLk+A6JPcsup1GWvkIOfk/ME2tRko5nxfov5NLU5kF4vf1wRxbqpKYZLM6t6wibBEY X-Received: by 2002:a17:906:8a5a:b0:72b:6b60:2d9f with SMTP id gx26-20020a1709068a5a00b0072b6b602d9fmr34160559ejc.324.1658292101250; Tue, 19 Jul 2022 21:41:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292101; cv=none; d=google.com; s=arc-20160816; b=Ja3VYtkcyE1BUof0A2U9uYmnbgJ388obXbOBPpGy2PEbD5o8bi6h3KN6OEOvc2NMG2 gQfg1rSBzJlB5sBjphtmGosja8C/3iiATy0LEnBMrc3JCDsOkcn9O7575GByk0tVUiTU nEXGdZqW/opb4Po9zSUdy8e1RLa0yZmntG9n4dQUMoTEnmxJsyywrlDegHGRfzKiQbv9 tWHL6g5PA1yauFN6L5HobOOZOx8znpMYlqgSL/m9youYaLDWTP8Zci4nZZGmtq3WxFZC M6jHioSIBxkSXIu0SlrVxhVlaAuck8xswDM+uGEqNU9K87aT2BUgHq7sRKeyMrqsq5iY YvDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=xYrSVJM+JEazhWda1O4nTY8X7YLPgjE4tnWsk/SRt4Y=; b=059ocaOk4Cb4/MpQg9REQJvnQjPwPXwbmR5vaSiLJ94BQNy4/RpSUt7VE966tcZBsH Tr24KwKoECdeVcT0bP3gDWK2ZLUciu91EwXJx7OkQqdXqiL9z0lZJYBNvF/UP8Bc5K/I bz7mQn+/qn9iM6UaSBTdnNNl1yo88rFKyifb8zd22Ucd6O9xLUda/mijdfkt/reuH3fW 04hmE/FK1ZyEnDYlVAaK2UffYmMzQ1tw2uhR+JtmNXZXVFSn/wSb5fib01m/q5h40V0j xUCnANXiczkY+4XvtgQW/N602R9NK8XA5v7JPvdNXUW7XJEaAwsuKqtOt4nno9iKiHnM aCrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=jKG1pTmq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z18-20020a05640240d200b00435bcb8758dsi6471167edb.12.2022.07.19.21.41.40; Tue, 19 Jul 2022 21:41:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=jKG1pTmq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6C75468B7D4; Wed, 20 Jul 2022 07:41:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 14CC368B7CC for ; Wed, 20 Jul 2022 07:41:24 +0300 (EEST) Received: by mail-pg1-f177.google.com with SMTP id r186so15362349pgr.2 for ; Tue, 19 Jul 2022 21:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=B9B0QDrFUjTO8dgges6mFHrfz1CKo0jCmRo91X5zRTk=; b=jKG1pTmq/j54S+tcVikjZ7KTknAgdJdKVUdpp3QXVRpZ/659NiNst0qWvHYuqsovN6 frf+PMiD98V4VOxq6npZgL6JYTp+9l7efNPNkb7Pfxx5W2jpsqXaiAnFJ/ugA9xNeWby gfbNjYTvirGawvLen41ohK2TeGSCeXUAnEvIesuiTvqQi/T5XQ0qK1axYK+Vt3M3Bis8 uFQ7cLa25nnVQVgbTStybfJb7DY2imSo6JOMrn6XfV0KAhc2bn6TkzpnwCWhis6PdqMK g+2zQi3T8phChcpsYiZP3EtbByvSeOZFkX4ZnfgyZtandi/n9d/lLy2T32ji8L17VCuo BLxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B9B0QDrFUjTO8dgges6mFHrfz1CKo0jCmRo91X5zRTk=; b=iOnr2Mfs/GTvrNOLv6VaclCwcMRKSXMD0N9dXMnQ6ZkBrv2X1Udqhrj/N+bKx2apkt SAohcmr59BjJs6xQfUXpkuDHmQ193HOZ6tojoxamOlMVgwZ87qIC5vxz3eFHMnLy/nL9 ah20Vaksv1SowJkYIY7p/4FrlK42OKLckhXBMACyMLaB9imj6KhJIybhu8q32YTYx/fa OcabNQOwJu96MKvO7EHzjrl8lrYx69NO2dR/rDjIfRh5W2HAfTSvOEtkON9LJhscLn1c RygGxzkmbLFDkDefmDO+U8Kj36zsFmkveC3B02FkBDQU6Vejm9N7w9SnsO+39v+4Wwaf dNYg== X-Gm-Message-State: AJIora91iGgX9P/q8nZjOEUj5GtVmFG1qz4smZfbzA1akDu8nTOP4rOv dVbdCbr/HSsr3y3T4Xfrjg/JUEOEXJXMAg== X-Received: by 2002:a63:688a:0:b0:412:6728:4bf3 with SMTP id d132-20020a63688a000000b0041267284bf3mr32569746pgc.339.1658292082108; Tue, 19 Jul 2022 21:41:22 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:21 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:14 -0700 Message-Id: <20220720044117.1282961-2-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com> References: <20220720044117.1282961-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] avfilter/vf_yadif: Allow alignment to be configurable X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KQytkCwi8L+3 Allow the alignment to be determined based on what yadif_filter_line implementation is used. Currently this is either 1, or 8 depending on whether the C code or the x86 SSE code is used, but allows for other future implementations that use a larger alignment. Adjusting MAX_ALIGN to 32 in the case of an AVX2 implementation could potentially hurt the performance of the SSE implementation, so we allow yadif to use the smallest needed alignment instead to maintain existing performance if implementations with wider vectors are added. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 16 +++++++++------- libavfilter/x86/vf_yadif_init.c | 1 + libavfilter/yadif.h | 4 +++- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 055327d7a4..42f6246330 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -108,9 +108,9 @@ static void filter_line_c(void *dst1, FILTER(0, w, 1) } -#define MAX_ALIGN 8 static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int parity, int mode) + int w, int prefs, int mrefs, int parity, int mode, + int alignment) { uint8_t *dst = dst1; uint8_t *prev = prev1; @@ -120,7 +120,7 @@ static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, uint8_t *prev2 = parity ? prev : cur ; uint8_t *next2 = parity ? cur : next; - const int edge = FFMAX(MAX_ALIGN - 1, 3); + const int edge = FFMAX(alignment - 1, 3); int offset = FFMAX(w - edge, 3); /* Only edge pixels need to be processed here. A constant value of false @@ -159,7 +159,8 @@ static void filter_line_c_16bit(void *dst1, } static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int parity, int mode) + int w, int prefs, int mrefs, int parity, int mode, + int alignment) { uint16_t *dst = dst1; uint16_t *prev = prev1; @@ -169,7 +170,7 @@ static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, uint16_t *prev2 = parity ? prev : cur ; uint16_t *next2 = parity ? cur : next; - const int edge = FFMAX(MAX_ALIGN / 2 - 1, 3); + const int edge = FFMAX(alignment / 2 - 1, 3); int offset = FFMAX(w - edge, 3); mrefs /= 2; @@ -199,7 +200,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int slice_start = (td->h * jobnr ) / nb_jobs; int slice_end = (td->h * (jobnr+1)) / nb_jobs; int y; - int edge = 3 + MAX_ALIGN / df - 1; + int edge = 3 + s->req_align / df - 1; /* filtering reads 3 pixels to the left/right; to avoid invalid reads, * we need to call the c variant which avoids this for border pixels @@ -219,7 +220,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) s->filter_edges(dst, prev, cur, next, td->w, y + 1 < td->h ? refs : -refs, y ? -refs : refs, - td->parity ^ td->tff, mode); + td->parity ^ td->tff, mode, s->req_align); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df); @@ -303,6 +304,7 @@ static int config_output(AVFilterLink *outlink) s->csp = av_pix_fmt_desc_get(outlink->format); s->filter = filter; + s->req_align = 1; if (s->csp->comp[0].depth > 8) { s->filter_line = filter_line_c_16bit; s->filter_edges = filter_edges_16bit; diff --git a/libavfilter/x86/vf_yadif_init.c b/libavfilter/x86/vf_yadif_init.c index 257c3f9199..9dd73f8e44 100644 --- a/libavfilter/x86/vf_yadif_init.c +++ b/libavfilter/x86/vf_yadif_init.c @@ -53,6 +53,7 @@ av_cold void ff_yadif_init_x86(YADIFContext *yadif) int bit_depth = (!yadif->csp) ? 8 : yadif->csp->comp[0].depth; + yadif->req_align = 8; if (bit_depth >= 15) { if (EXTERNAL_SSE2(cpu_flags)) yadif->filter_line = ff_yadif_filter_line_16bit_sse2; diff --git a/libavfilter/yadif.h b/libavfilter/yadif.h index c928911b35..b81f2fc1d9 100644 --- a/libavfilter/yadif.h +++ b/libavfilter/yadif.h @@ -66,11 +66,13 @@ typedef struct YADIFContext { /** * Required alignment for filter_line */ + int req_align; void (*filter_line)(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode); void (*filter_edges)(void *dst, void *prev, void *cur, void *next, - int w, int prefs, int mrefs, int parity, int mode); + int w, int prefs, int mrefs, int parity, int mode, + int alignment); const AVPixFmtDescriptor *csp; int eof; From patchwork Wed Jul 20 04:41:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36853 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172214pzb; Tue, 19 Jul 2022 21:41:50 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tnBvrt36o4bpz0TNSMgF5deyNNRu+u3nlpyiGyDUYZKFY7LVF0M7UYPUtOS9zMSN1mCjqD X-Received: by 2002:a05:6402:1914:b0:43a:d548:8adc with SMTP id e20-20020a056402191400b0043ad5488adcmr48755413edz.214.1658292110162; Tue, 19 Jul 2022 21:41:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292110; cv=none; d=google.com; s=arc-20160816; b=EIgkaEhRsvO4LjMGPXIv+gf0LBfLujTc6Z/QJzLhD/MrQ4KpXqV4G5YtoGYEl2Txwu i6Ii3Gz+4h33aUfhyTQpu7YMdWHnc/lwqhdah1w4S+e7lHe59YC1dppuoxPWMIlkn9dd wcKrxFdQM+12J4DaQy6dwU85oS1E2rQNXwDJYOQwgq4ZfvmS1aWGwnQ2I+SUm4uAc+Zj +qlLzImPwAv+ONfy9mT09EV1vFlH5f+ctf6oB/muHtfWHmzel7zNoAkMo4sjvCjHpwSq t9lWV5yNla0XvhT/HT/s2lOob/ukQ372j+0K/U9PHFNqkzmh0iGrE8jiy+3dR+1Wft+m XBjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=STz3hiAuRAEdb9R5+nYk+N1MANoHAcZ4ttB7evb7UPQ=; b=g5faI6R9D1TOS5BRs1YqtCdVScdpMHnIez2WEW1SGZ9dXZYj0JgQz1sD1lCHOB9gtT InIf59+RyXa/VdSlhdCsRF3Dyz1/wOOGRyag+RC7ZjnkNQmdJzMBiL/1qG7+7xNk53/U 2PWsNSJmloBxg7PLtvER9umV8zJ7hU+ZTXTQKy+H76ZplsTEtk+0ZJNPruq4bAAHlckg 1G0wvIthLWBkGBEfSDA+MEyHO6MEObxIiDeLsdWQHSMkamzvXd9yoMVpdzbawHbjTggb 8Wp+rnQMp8iD/LZdMrL0LAoAUV327rKhsOENbUYOaiAnnkb7mibaqbLhyGgaiUGM3ehH IhLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=nhMV3ZSV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ji18-20020a170907981200b0072f267eeba3si11828348ejc.677.2022.07.19.21.41.49; Tue, 19 Jul 2022 21:41:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=nhMV3ZSV; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7D9C268B7D1; Wed, 20 Jul 2022 07:41:33 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 30CAD68B7D4 for ; Wed, 20 Jul 2022 07:41:25 +0300 (EEST) Received: by mail-pl1-f175.google.com with SMTP id g17so13898752plh.2 for ; Tue, 19 Jul 2022 21:41:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VT5Jq3nHG6f9ea/3i4Kb1djNuxPV6b/LWW/4zfLKLhc=; b=nhMV3ZSVJcE2YgE6AEEqdwxwvLgr5IHyNVJJhtubYifCgsxyeTM1KG3v+uFq+4iUgq w88aSCTCRp2uAiuvOnKoXwpdP2sIMYmZ9fnbc1QyAhnjUnOqa4swhrxznopsxbdbcoSa +vFb+5FUv4a+dBj5yaJ/fZdqa2r+2Ob920QW0ByPdeDJeqRVmBKzCLNjzx7FiRIPoT4j FGM+95bE/XVmycJsJ5teFaNHBiDZ9C+m1WD4ECK+Na5oCPTtdlbwfN5N+JG/Z1faLxfV ZXIPSBUSjx+szdzvq2fDl9giphMTCVfQvR6eqQ6r5XJFcB2oJts/TevePltIXjRUSx/o aVjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VT5Jq3nHG6f9ea/3i4Kb1djNuxPV6b/LWW/4zfLKLhc=; b=rHt9n6yoixvgY7qumm8Q2UVkeAhjDbcK+7SBkvuWLwcgyMQiWfI6ukHSHynC1dcwfo qiA9u7z2cTQxqc6P09OnbtlFF0oVAS3R3lKwqy3MaJ9W+4ZS2WBXgYUYHr9SYFo3JoVa +xD5S5D8WGrQLxdUXxWF4aaHtHQne3AWwYpHtlLilzSr3PYBV5Bl2avet7+dJPjRIvHz sAWEeR+4eWR8KYXl3yHKzULa9wjRC5W9ljjVloeRIq914rVt3l/mTCmGbC4hf/6eEKhQ LDHSfXvD2opO1ZmGZ9cxSXm99MAuXRq+wgTlJ9t1PDnb3jB4aARX7Vj9FQsP5jvLsBty QthA== X-Gm-Message-State: AJIora8w7xRWCq3Msp5okSUEFVzgd3uVYRQMgenKXJbEMhz1p0uRRzvP 2Al84rlvBcsS+NTZPiyyMFcW8nrWLaVFag== X-Received: by 2002:a17:903:120f:b0:15f:99f:9597 with SMTP id l15-20020a170903120f00b0015f099f9597mr36516141plh.45.1658292082955; Tue, 19 Jul 2022 21:41:22 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:22 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:15 -0700 Message-Id: <20220720044117.1282961-3-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com> References: <20220720044117.1282961-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/5] avfilter/vf_yadif: reformat code to improve readability X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: YQrE9MT2ohr5 Reformat some of the code to improve readability and reduce code duplication. This change is intended to be purely cosmentic and shouldn't result in any functional changes. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 11 +++++------ libavfilter/yadif.h | 3 +-- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 42f6246330..54109566be 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -211,16 +211,15 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) uint8_t *cur = &s->cur ->data[td->plane][y * refs]; uint8_t *next = &s->next->data[td->plane][y * refs]; uint8_t *dst = &td->frame->data[td->plane][y * td->frame->linesize[td->plane]]; + int prefs = y + 1 < td->h ? refs : -refs; + int mrefs = y ? -refs : refs; + int parity = td->parity ^ td->tff; int mode = y == 1 || y + 2 == td->h ? 2 : s->mode; s->filter_line(dst + pix_3, prev + pix_3, cur + pix_3, next + pix_3, td->w - edge, - y + 1 < td->h ? refs : -refs, - y ? -refs : refs, - td->parity ^ td->tff, mode); + prefs, mrefs, parity, mode); s->filter_edges(dst, prev, cur, next, td->w, - y + 1 < td->h ? refs : -refs, - y ? -refs : refs, - td->parity ^ td->tff, mode, s->req_align); + prefs, mrefs, parity, mode, s->req_align); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df); diff --git a/libavfilter/yadif.h b/libavfilter/yadif.h index b81f2fc1d9..f271fe8304 100644 --- a/libavfilter/yadif.h +++ b/libavfilter/yadif.h @@ -67,8 +67,7 @@ typedef struct YADIFContext { * Required alignment for filter_line */ int req_align; - void (*filter_line)(void *dst, - void *prev, void *cur, void *next, + void (*filter_line)(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode); void (*filter_edges)(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode, From patchwork Wed Jul 20 04:41:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36854 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172263pzb; Tue, 19 Jul 2022 21:41:59 -0700 (PDT) X-Google-Smtp-Source: AGRyM1txkr2KjpD39QgvAgmJCcYMokAe1o8IvdjWw+lbic9fYPyE2SPeXTUPS+NWdEBgFICjonhj X-Received: by 2002:a17:906:5055:b0:6ff:1dfb:1e2c with SMTP id e21-20020a170906505500b006ff1dfb1e2cmr33931025ejk.200.1658292119277; Tue, 19 Jul 2022 21:41:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292119; cv=none; d=google.com; s=arc-20160816; b=Tc/8JA+Tv8c7es5c741o2ar0hnOh3mF3k+MUMBgyLtj0ZS0nLOpczUNBYG5JyidOte dXvJZZlBFFC6yzSu3IvE2urUPnfSsDc/+prWWWRBb+mk5Ci+i7zLGG1SXeOe/R76X+bx iLR3YnylcROiLIXGDrpbtqv5gGO4l6lUgPfc7wmb9qLs1i4sdHJwRsGiiyqvH/NhnLzA vgWPJI2XacO0U67fTPHxranj/XQF+rCJBYrFNldksiowlAyX6Nh7pTSFneEy5Hn5Ksmm TJRfl/raQi2peUktVKFAP5AH8iaSy+d8Zcf7MoPYxaS56Q/NJmqFpnJkAScH//ukeufV BX1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=6lfXK8KYO5o9/eywNyRRuV67x63Ct/NJWROFJpmSnMo=; b=hbg69oOjh03TlltdiN0Y+6DIsEuBvkgVdB3ZYUimjgNDf0mn0uqw8mTnr5VTDPUtno EjJQHIlw9Nl8ErFunulQVCbkfRPZ+RP75S00OoXKAlgNHrBYImhSJX2am4UMmbSsz2ha I27DnJmKsMhloFI2siZ/C8vE47suJIdXBeYYzgGzvbR9DTFCh5IQ9P1QVCoCEzTysjuf GZ8MQTjr7teW3BNprOpjYe0QOAChwqVDOZOj7MmDmACYnxipGw+9hKzoUqAphwkbP9iu Mr0N5cKX3iEiaKMYlTqcpz7ohuWbx31xKOHh06zhvTm8Wvb40WQ/I2c0rHvo2lmyCKWZ aZgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=mPTInKTp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qb29-20020a1709077e9d00b0072ab8073979si23012517ejc.460.2022.07.19.21.41.58; Tue, 19 Jul 2022 21:41:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=mPTInKTp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EDF2568B856; Wed, 20 Jul 2022 07:41:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E722168B82B for ; Wed, 20 Jul 2022 07:41:25 +0300 (EEST) Received: by mail-pg1-f171.google.com with SMTP id bh13so15366988pgb.4 for ; Tue, 19 Jul 2022 21:41:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=c4xeuuzjgkCyJeehhi155/UaxzBBbcPC2HabThBId/I=; b=mPTInKTpJeKEWZCUxtsOvlEIMBkFH6T0Kn+OITXaAe7hwlW21QyK0oiM8gxutllaQA KPqXymJmf4okA5s3LIirZv9VLKg0zfD/vl+Ck/BjO4lVKvVJ9FFfMa/BzMuta1MsR2Rd mBDv5KX+T8uK0R67mNjsj4TNO04i5FriVSKpO94e8+L4iSjaOXIAu6ZrYjdUNdV8rqkw yUdDWRFtn2t/C42VTrCkOJVCSKAuf92lT3bmzi6+ffENSAzdHPJnI0G3+SI95gh/6IjV 5Nr0R8KOWfRwKWvFynRVGm4ZsW7qmuY95lRm8HbzuNp5XDmW9+sL+s5RIXy1jDMPry3z ylrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=c4xeuuzjgkCyJeehhi155/UaxzBBbcPC2HabThBId/I=; b=NFXOW7tj8HGxFwVV8hHRJEzUCAQS1n8KFsefw/r/BbDEIJdlSwD/onkzyyKnDxnwSb XqIXFtycPLLXeC36KSC0HZa4As45CeaNR765z92Q/Qg6qWonvXbvLdDlFTWBv1ONsOdR Dz2jJt6QhA03+nH2ZBNYJTOa1pzFU7KQpZzgWwWYAk4LckNFxfqewABCW0t6lpU/nJ7m e/YVg4G5sPGgeeDed+3jOwETwTz6AbA4er0HEAxGqOL1tDP7tqG9P450OKZU6dHG8SKZ lg7KsOOUbRabIhZbiQW1jKpCJOt5jPe1KDHItHsMWDQ+to31a9PuvdyTyHzlH+Qwe1bF 2hMw== X-Gm-Message-State: AJIora9NjcuL/Va04IRy9ulGlNHTOFH6fWrFvhG9LTtiZBnaOxcFvuIZ X5O+McW2gp53WmXfkY/ioRLNbn0FMetKdw== X-Received: by 2002:a63:9049:0:b0:412:b11b:c630 with SMTP id a70-20020a639049000000b00412b11bc630mr31785614pge.175.1658292083977; Tue, 19 Jul 2022 21:41:23 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:23 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:16 -0700 Message-Id: <20220720044117.1282961-4-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com> References: <20220720044117.1282961-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/5] avfilter/vf_yadif: Process more pixels using filter_line X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CiDZuzKI/ifK filter_line is generally vectorized, wheras filter_edge is implemented in C. Currently we rely on filter_edge to process non-edges in cases where the width doesn't match the alignment. This causes us to process non-edge pixels with the slow C implementation vs the faster SSE implementation. It is generally faster to process 8 pixels with the slowest SSE2 vectorized implementation than it is to process 2 pixels with the C implementation. Therefore, if filter_edge needs to process 2 or more non-edge pixels, it would be faster to process these non-edge pixels with filter_line instead even if it processes more pixels than necessary. To address this, we use filter_line so long as we know that at least 2 pixels will be used in the final output even if the rest of the computed pixels are invalid. Any incorrect output pixels generated by filter_line will be overwritten by the following call to filter_edge. In addtion we avoid running filter_line if it would read or write pixels outside the current slice. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 54109566be..394c04a985 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -201,6 +201,8 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int slice_end = (td->h * (jobnr+1)) / nb_jobs; int y; int edge = 3 + s->req_align / df - 1; + int filter_width_target = td->w - 3; + int filter_width_rounded_up = (filter_width_target & ~(s->req_align-1)) + s->req_align; /* filtering reads 3 pixels to the left/right; to avoid invalid reads, * we need to call the c variant which avoids this for border pixels @@ -215,11 +217,28 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int mrefs = y ? -refs : refs; int parity = td->parity ^ td->tff; int mode = y == 1 || y + 2 == td->h ? 2 : s->mode; + + /* Adjust width and alignment to process extra pixels in filter_line + * using potentially vectorized code so long as it doesn't cause + * reads or writes outside of the current slice. filter_edge will + * correct any incorrect pixels written by filter_line in this + * scenario. + */ + int filter_width; + int edge_alignment; + if (filter_width_rounded_up - filter_width_target >= 2 + && y*refs + filter_width_rounded_up < slice_end * refs + refs - 3) { + filter_width = filter_width_rounded_up; + edge_alignment = 1; + } else { + filter_width = td->w - edge; + edge_alignment = s->req_align; + } s->filter_line(dst + pix_3, prev + pix_3, cur + pix_3, - next + pix_3, td->w - edge, + next + pix_3, filter_width, prefs, mrefs, parity, mode); s->filter_edges(dst, prev, cur, next, td->w, - prefs, mrefs, parity, mode, s->req_align); + prefs, mrefs, parity, mode, edge_alignment); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df); From patchwork Wed Jul 20 04:41:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36855 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172327pzb; Tue, 19 Jul 2022 21:42:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1u2qQUQoJwBSwMAQ5rqViSRxXv6wMg2eop33KmeaXdQharAni8PH29VXc6m30scNwXnnBFh X-Received: by 2002:a05:6402:5516:b0:43a:42f9:24d6 with SMTP id fi22-20020a056402551600b0043a42f924d6mr47851838edb.204.1658292130096; Tue, 19 Jul 2022 21:42:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292130; cv=none; d=google.com; s=arc-20160816; b=h89/YPK7PuY0x//LW6J85Mu3h+qRSn+N5iokNzP1XGBomZfi72AcdsB6vz4MtzL2Bf BAu7OgQw1GF5c2meYUL+o873RkC157IyfBoEQ2mgO+yl2Njhj3HFdaVPQRKxA3Zft64e hjRodRKnMQHr/Wxy/PNGvnoCaFDJpmnPTarTUwd67G1qfeMIWz7qtNSDAeSUDQ/H7gYl b2eMhaGGYL+N1M1R+oWEaoYffn5BWPrB3NrhZltSPktzcRyaBiUeoLspilxgEdhzqcwN CwrjBuDJFV2Ek31LnNeiS2c8/ZKhSgRmAUs8AqwNIfLvmS92Rt0HU6lwunpnDj8mFNjx MwKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=oxRyn0IGKaPBUwfc/oqOfYRxPtJbwhov2BpyphAEtkw=; b=QImhMIIVu1l0DvH/mLTISQgY8iEHN58jMGfO13I7rGoawhWMTXvzIiBLJ8kTmNZvni MQNW3jSFDMiQbgfy6GCwhyhqgFAy4ua5xphuRB6MWVR9XWaLlB7bqBxvdMCNaxpvWKVc XeynJ7Cmj1+u+KiunMPbipJ5r2Ht7vtsLxO982Ira82krbXgQNUEBCZQzq3GOHfrqIrj fg+J7xMz28OFn28+0pyh3Q6qQdOkAUTP1mf0edO36j2oL+jvxC0Q/LoknMTmDWq2Fmi2 jaO5W3UOffQGGufLfNn20XHEMRP545Pcj2p5WoZa2HZpJYQh0dfApP5kJXYApNt6KJU2 mMgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=GIxOLvzb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i14-20020aa7dd0e000000b0043a785074bdsi20121790edv.108.2022.07.19.21.42.09; Tue, 19 Jul 2022 21:42:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=GIxOLvzb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E988B68B83A; Wed, 20 Jul 2022 07:41:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4998168B808 for ; Wed, 20 Jul 2022 07:41:27 +0300 (EEST) Received: by mail-pj1-f50.google.com with SMTP id t3-20020a17090a3b4300b001f21eb7e8b0so325712pjf.1 for ; Tue, 19 Jul 2022 21:41:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=u1aWNxpfhmSpeskDFqF2jXO4WfYrC854qbaI4wdHMso=; b=GIxOLvzbpRjDobPB100mPqj0mOQY7gER+nwS+KpgVNzNmv4/XoJJYe4ZeyT1GOyWgs oDg1syTzyQ+yzdBm2mVLF1SAaCmMSy8L56dJfhSUR1gxAh92+Ca2XDqJfBoWl4Bf2yX/ M4es/0nsnw2rh4mTO8ZAllIDZGaFeOrRv39aE521onKVzjSi0/GBneJvharoGUe8ondL gDmHav56fpSgFmVhjdkov1u7AOAc830TLrf+LKpKiSHBhuQkCbHA5HeRZwHUY2WcGW5F fvlSkB705VQg+55MPb8KAYiWotDhWlfvLQm9XTeXg5s7HKEYfThuoe88V0L1fdFPhI1f WfMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=u1aWNxpfhmSpeskDFqF2jXO4WfYrC854qbaI4wdHMso=; b=zXHZLl83KGDRfYT6CAV/QR5PH2cl0Fjefa18cwYlqQZxzgKnH3TAPnXk+pkJPknoQC QVlJa7OhGaXplAfBJviPPc5rnd/4x6zTbkVBscFqemY6onP17AoJaCLVyilEi4w1YMOT bAamg/h3DbA3ZNKiFjWWDewr2HyYGUBXhO78yvapwRnIKfmTJFvb02FXum+7r6jUHdFK XKw8h3hbkj9uzI2ts7RkGJ0388Tzl2V6M0zENTnpT4UNm+iWFiGGgta7zNWMuM+OzZw2 1zEN1C9UkdjJPb52nzghe+/e8ilZyT+oUCo66BO/0I+s/6YnfWq6Rb6D+822OYU4YKay q5Yw== X-Gm-Message-State: AJIora99Ot1PfelZi3BzQAK83za2JP2SpripbyYmjZPJMfLgAZYK589U s1PwKGNYDEV6qo3ofa54E/xnY+Pb3Fw22A== X-Received: by 2002:a17:902:6ac5:b0:16d:23c9:4fbd with SMTP id i5-20020a1709026ac500b0016d23c94fbdmr434499plt.143.1658292085058; Tue, 19 Jul 2022 21:41:25 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:24 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:17 -0700 Message-Id: <20220720044117.1282961-5-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com> References: <20220720044117.1282961-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/5] avfilter/vf_yadif: Add x86_64 avx yadif asm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: dy89dRvmzv8G Add a new version of yadif_filter_line performed using packed bytes instead of the packed words used by the current implementaiton. As a result this implementation runs almost 2x as fast as the current fastest SSSE3 implementation. This implementation is created from scratch based on the C code, with the goal of keeping all intermediate values within 8-bits so that the vectorized code can be computed using packed bytes. differences are as follows: - Use algorithms to compute avg and abs difference using only 8-bit intermediate values. - Reworked the mode 1 code by applying various mathematical identities to keep all intermediate values within 8-bits. - Attempt to compute the spatial score using only 8-bits. The actual spatial score fits within this range 97% (content dependent) of the time for the entire 128-bit xmm vector. In the case that spatial score needs more than 8-bits to be represented, we detect this case, and recompute the spatial score using 16-bit packed words instead. In 3% of cases the spatial_score will need more than 8-bytes to store so we have a slow path, where the spatial score is computed using packed words instead. This implementation is currently limited to x86_64 due to the number of registers required. x86_32 is possible, but the performance benefit over the existing SSSE3 implentation is not as great, due to all of the stack spills that would result from having far fewer registers. ASM was not generated for the 32-bit varient due to limited ROI, as most AVX users are likely on 64-bit OS at this point and 32-bit users would lose out on most of the performance benefit. Signed-off-by: Chris Phlipot --- libavfilter/x86/Makefile | 2 +- libavfilter/x86/vf_yadif_init.c | 9 + libavfilter/x86/vf_yadif_x64.asm | 489 +++++++++++++++++++++++++++++++ 3 files changed, 499 insertions(+), 1 deletion(-) create mode 100644 libavfilter/x86/vf_yadif_x64.asm diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile index e87481bd7a..19161ffa23 100644 --- a/libavfilter/x86/Makefile +++ b/libavfilter/x86/Makefile @@ -80,4 +80,4 @@ X86ASM-OBJS-$(CONFIG_TRANSPOSE_FILTER) += x86/vf_transpose.o X86ASM-OBJS-$(CONFIG_VOLUME_FILTER) += x86/af_volume.o X86ASM-OBJS-$(CONFIG_V360_FILTER) += x86/vf_v360.o X86ASM-OBJS-$(CONFIG_W3FDIF_FILTER) += x86/vf_w3fdif.o -X86ASM-OBJS-$(CONFIG_YADIF_FILTER) += x86/vf_yadif.o x86/yadif-16.o x86/yadif-10.o +X86ASM-OBJS-$(CONFIG_YADIF_FILTER) += x86/vf_yadif.o x86/vf_yadif_x64.o x86/yadif-16.o x86/yadif-10.o diff --git a/libavfilter/x86/vf_yadif_init.c b/libavfilter/x86/vf_yadif_init.c index 9dd73f8e44..a46bd7ccca 100644 --- a/libavfilter/x86/vf_yadif_init.c +++ b/libavfilter/x86/vf_yadif_init.c @@ -29,6 +29,9 @@ void ff_yadif_filter_line_sse2(void *dst, void *prev, void *cur, void ff_yadif_filter_line_ssse3(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode); +void ff_yadif_filter_line_avx(void *dst, void *prev, void *cur, + void *next, int w, int prefs, + int mrefs, int parity, int mode); void ff_yadif_filter_line_16bit_sse2(void *dst, void *prev, void *cur, void *next, int w, int prefs, @@ -71,5 +74,11 @@ av_cold void ff_yadif_init_x86(YADIFContext *yadif) yadif->filter_line = ff_yadif_filter_line_sse2; if (EXTERNAL_SSSE3(cpu_flags)) yadif->filter_line = ff_yadif_filter_line_ssse3; +#if ARCH_X86_64 + if (EXTERNAL_AVX(cpu_flags)) { + yadif->filter_line = ff_yadif_filter_line_avx; + yadif->req_align = 16; + } +#endif } } diff --git a/libavfilter/x86/vf_yadif_x64.asm b/libavfilter/x86/vf_yadif_x64.asm new file mode 100644 index 0000000000..3f70aa0fd2 --- /dev/null +++ b/libavfilter/x86/vf_yadif_x64.asm @@ -0,0 +1,489 @@ +;****************************************************************************** +;* Copyright (C) 2006-2011 Michael Niedermayer +;* 2010 James Darnley +;* 2013-2022 Chris Phlipot +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software;* you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation;* either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +pb_1: times 16 db 1 +pb_127: times 16 db 127 +pb_128: times 16 db 128 + +SECTION .text + +; Rename a register so that it can be used for a new purpose. The old name is +; will become undefined so that any additional usage of the old name will +; result in a compiler/assembler error. +%macro RENAME_REGISTER 2 + %ifidni %1,%2 + %error "Can't rename a register to itself." + %endif + %xdefine %1 %2 + %undef %2 +%endmacro + +; Usage: dst, arg1, arg2, temp1 +; Compute the absolute difference of arg1 and arg2 and places them in dst. +; all operations are perfomed using packed bytes. Unlike ARM NEON there is no +; instruction to do this, so instead we emulate it with multiple instructions. +; eg. dst = abs(arg1 - arg2) +%macro absdif_pb 4 + %ifidni %1,%3 + %error "arg1 and arg3 must be different" + %elifidni %1,%4 + %error "arg1 and arg4 must be different" + %elifidni %3,%4 + %error "arg3 and arg4 must be different" + %endif + psubusb %4, %3, %2 + psubusb %1, %2, %3 + por %1, %1, %4 +%endmacro + +; Usage: dst, arg1, arg2, pb_1, temp1 +; Compute the average of 2 unsigned values rounded down. +; SSE provides pavgb, which rounds up. Unlike ARM NEON, SSE doen't provide +; an instruction that computes the avg of 2 unsigned bytes rounded down, so +; instead we emulate it with this macro. +; eg. dst = (arg1 + arg2) >> 1 +%macro avg_truncate_pb 5 + %ifidni %1,%3 + %error "arg1 and arg3 must be different" + %elifidni %1,%4 + %error "arg1 and arg5 must be different" + %endif + pxor %5, %2, %3 + pavgb %1, %2, %3 + pand %5, %5, %4 + psubb %1, %1, %5 +%endmacro + +INIT_XMM avx + +cglobal yadif_filter_line, 5, 15, 8, 240, dst, prev, cur, next, width, prefs, \ + mrefs, parity, mode +%xdefine cur_plus_prefs r5 +%xdefine cur_plus_mrefs r6 +%xdefine prefs r7 +%xdefine next2 r8 +%xdefine prev2_2mrefs r9 +%xdefine mrefs r10 +%xdefine prev2_2prefs r11 +%xdefine next2_2mrefs r12 +%xdefine prev_plus_mrefs r13 +%xdefine next_plus_mrefs r14 +%xdefine prev2_2mrefs_stack_spill [rsp - 24] +%xdefine pb_1_reg m15 + +%xdefine old_absdif_ahead_stack [rsp - 128] +%xdefine absdif_here [rsp - 80] +%xdefine absdif_behind [rsp - 64] + +%xdefine spatial_predicate_stack [rsp - 112] +%xdefine spatial_pred_check_minus_1 [rsp - 16] + +; unaligned loads are slower than aligned loads. It is often benificial to +; store values in an aligned location after doing an aligned load so that all +; future loads of that value will be aligned. +%xdefine cur_plus_prefs_x_stack [rsp] +%xdefine cur_plus_mrefs_x_stack [rsp + 16] +%xdefine cur_plus_mrefs_x_2_stack [rsp + 96] +%xdefine cur_plus_prefs_x_minus_2 [rsp + 80] + +; Absolute differences used for CHECK(-1) +%xdefine chkneg1_ad2_stack [rsp - 96] +%xdefine chkneg1_ad1_stack [rsp + 176] +%xdefine chkneg1_ad0_stack [rsp - 48] + +; Absolute differences used for CHECK(-2) +%xdefine chkneg2_ad2_stack [rsp + 160] +%xdefine chkneg2_ad1_stack [rsp + 144] +%xdefine chkneg2_ad0_stack [rsp + 208] + +; Absolute differences used for CHECK(1) +%xdefine chkpos1_ad2_stack [rsp + 112] +%xdefine chkpos1_ad1_stack [rsp + 128] +; chkpos1_ad0 has no stack locatation since it is kept in a register. + +; Absolute differences used for CHECK(2) +%xdefine chkpos2_ad2_stack [rsp + 64] +%xdefine chkpos2_ad1_stack [rsp + 48] +%xdefine chkpos2_ad0_stack [rsp + 32] + + movsxd prefs, DWORD prefsm + movsxd mrefs, DWORD mrefsm +; Bail out early if width is zero. + test widthd, widthd + jle .return + +; Initialize all pointers. Unlike the C code the pointers all point to the +; location where x equals 0 and remain unchanged instead of the pointers being +; incremented on every loop iteration. Instead only x is incremented, and x86 +; memory addressing is used to add the current value of x on every memory +; access at (most likely) zero cost. + lea cur_plus_prefs, [curq + prefs] + movu m0, [curq + prefs - 1] + lea cur_plus_mrefs, [curq + mrefs] + movu m1, [curq + mrefs - 1] + absdif_pb m0, m0, m1, m5 + cmp dword paritym, 0 + mov next2, curq +RENAME_REGISTER prev2, curq + cmove next2, nextq + pslldq m8, m0, 14 + mova old_absdif_ahead_stack, m8 + cmovne prev2, prevq + lea prev_plus_mrefs, [prevq + mrefs] + add prevq, prefs +RENAME_REGISTER prev_plus_prefs, prevq + lea next_plus_mrefs, [nextq + mrefs] + add nextq, prefs +RENAME_REGISTER next_plus_prefs, nextq + lea prev2_2mrefs, [prev2 + 2*mrefs] + mov prev2_2mrefs_stack_spill, prev2_2mrefs + lea prev2_2prefs, [prev2 + 2*prefs] + lea next2_2mrefs, [next2 + 2*mrefs] +RENAME_REGISTER next2_2prefs, mrefs + lea next2_2prefs, [next2 + 2*prefs] +RENAME_REGISTER x, prefs + xor x, x + mova pb_1_reg, [pb_1] + mov prev2_2mrefs, prev2_2mrefs_stack_spill + + jmp .loop_start +.loop_tail: + paddusb m3, m2, m1 + pminub m0, m9, m3 + psubusb m2, m2, m1 + pmaxub m0, m0, m2 + movu [dstq + x], m0 + add x, 16 + cmp x, widthq + jge .return +.loop_start: +; Start by computing the spatial score +; We attempt to Compute the spatial score using saturated adds. In real +; world content the entire spatial score 16-byte xmm vector will be able +; to accurately represent the spatial score in 8-bits > 97% of the +; time. Because of this we try computing the spatial score with 8-bit +; first since it is 2x as fast, and check if we saturated the computation later. +; The original spatial score can potentially be in the range of -1 to 765 +; Instead for this approach, we map the lower end of that to 8-bits using +; the range -128 to 127. +; If we detected that this assumption may have failed we instead re-compute +; the spatial score using the full 16-bit range needed to represent -1 to 765. +; +; Before we compute the spatial score, we pre-compute most of the absolute +; difference values used in the C code's CHECK() macros. These absolute +; differences are then stored to the stack so that they can be re-used for the +; slower 16-bit spatial score approach in case that is needed. + movu m6, [cur_plus_mrefs + x - 3] + movu m11, [cur_plus_mrefs + x - 2] + movu m2, [cur_plus_mrefs + x - 1] + movu m3, [cur_plus_mrefs + x] + movu m13, [cur_plus_prefs + x] + movu m0, [cur_plus_mrefs + x + 1] + movu m1, [cur_plus_prefs + x + 1] + absdif_pb m14, m0, m1, m5 ; abs(cur[mrefs+1]-cur[prefs+1]) + avg_truncate_pb m10, m13, m3, pb_1_reg, m5 ; spatial_pred = (c+d) >> 2 + mova spatial_predicate_stack, m10 + movu m7, [cur_plus_prefs + x + 2] + absdif_pb m10, m11, m13, m5 ; abs(cur[mrefs-2]-cur[prefs]) + mova chkneg1_ad2_stack, m10 + absdif_pb m8, m2, m1, m5 ; abs(cur[mrefs-1]-cur[prefs+1]) + absdif_pb m9, m3, m7, m5 ; abs(cur[mrefs]-cur[prefs+2]) + absdif_pb m10, m6, m1, m5 ; abs(cur[mrefs-3]-cur[prefs+1]) + mova chkneg2_ad2_stack, m10 + absdif_pb m10, m11, m7, m5 ; abs(cur[mrefs-2]-cur[prefs+2]) + mova chkneg2_ad1_stack, m10 + movu m4, [cur_plus_prefs + x + 3] + absdif_pb m10, m2, m4, m5 ; abs(cur[mrefs-1]-cur[prefs+3]) + mova chkneg2_ad0_stack, m10 + movu m12, [cur_plus_mrefs + x + 2] + absdif_pb m10, m12, m13, m5 ; abs(cur[mrefs+2]-cur[prefs]) + mova cur_plus_prefs_x_stack, m13 + mova chkpos1_ad2_stack, m10 + movu m6, [cur_plus_prefs + x - 1] + absdif_pb m10, m0, m6, m5 ; abs(cur[mrefs+1]-cur[prefs-1]) + mova chkpos1_ad1_stack, m10 + movu m10, [cur_plus_prefs + x - 2] + mova cur_plus_mrefs_x_stack, m3 + absdif_pb m13, m10, m3, m5 ; abs(cur[mrefs]-cur[prefs-2]) + movu m4, [cur_plus_mrefs + x + 3] + absdif_pb m4, m4, m6, m5 ; abs(cur[mrefs+3]-cur[prefs-1]) + mova chkpos2_ad2_stack, m4 + absdif_pb m3, m12, m10, m5 ; abs(cur[mrefs+2]-cur[prefs-2]) + mova chkpos2_ad1_stack, m3 + movu m4, [cur_plus_prefs + x - 3] + absdif_pb m3, m0, m4, m5 ; abs(cur[mrefs+1]-cur[prefs-3]) + mova chkpos2_ad0_stack, m3 + mova chkneg1_ad1_stack, m8 + paddusb m5, m8, chkneg1_ad2_stack + mova chkneg1_ad0_stack, m9 + paddusb m4, m9, pb_1_reg + paddusb m5, m5, m4 + mova m3, old_absdif_ahead_stack + palignr m4, m14, m3, 15 + palignr m3, m14, m3, 14 + mova old_absdif_ahead_stack, m14 + mova absdif_here, m4 + paddusb m4, m14, m4 + mova absdif_behind, m3 + paddusb m4, m4, m3 + pxor m4, m4, [pb_128] + pxor m5, m5, [pb_128] + pcmpgtb m8, m4, m5 + pcmpeqb m14, m4, [pb_127] + por m8, m8, m14 + pminsb m4, m4, m5 + avg_truncate_pb m1, m1, m2, pb_1_reg, m5 + mova spatial_pred_check_minus_1, m1 + mova m1, chkneg2_ad1_stack + paddusb m2, m1, chkneg2_ad2_stack + paddusb m5, pb_1_reg, chkneg2_ad0_stack + paddusb m2, m2, m5 + avg_truncate_pb m11, m11, m7, pb_1_reg, m5 + mova m3, chkpos1_ad1_stack + paddusb m3, m3, chkpos1_ad2_stack + paddusb m5, m13, pb_1_reg + paddusb m7, m3, m5 + avg_truncate_pb m3, m6, m0, pb_1_reg, m5 + pxor m0, m2, [pb_128] + pcmpgtb m2, m4, m0 + pand m5, m8, m2 + pblendvb m6, m4, m0, m5 + pcmpeqb m0, m4, [pb_127] + pand m2, m5, m0 + pxor m7, m7, [pb_128] + pminsb m0, m6, m7 + pcmpeqb m4, m0, [pb_127] + por m2, m4, m2 + ptest m2, m2 + jne .spatial_check_16_bit +; At this point we know if we can continue on the fast path with saturating +; spatial score computation while maintaining bit-accuracy, or if we need to +; bail out and perform the spatial score computation using full 16-bit words +; to store the score value. check_2_saturate is only executed here if we know +; we don't need to go down the slow path. +.check_2_saturate: + mova m2, spatial_predicate_stack + pblendvb m1, m2, spatial_pred_check_minus_1, m8 + pblendvb m1, m11, m5 + pcmpgtb m2, m6, m7 + pcmpeqb m5, m6, [pb_127] + por m2, m5, m2 + pblendvb m1, m3, m2 + mova m3, chkpos2_ad1_stack + paddusb m3, m3, chkpos2_ad2_stack + paddusb m4, pb_1_reg, chkpos2_ad0_stack + paddusb m3, m3, m4 + pxor m3, m3, [pb_128] + pcmpgtb m0, m0, m3 + pand m0, m0, m2 + avg_truncate_pb m2, m12, m10, pb_1_reg, m5 + pblendvb m9, m1, m2, m0 +.temporal_check: + mova m0, cur_plus_mrefs_x_stack + mova m8, cur_plus_prefs_x_stack + movu m1, [prev2 + x] + movu m6, [next2 + x] + avg_truncate_pb m2, m6, m1, pb_1_reg, m5 + absdif_pb m1, m1, m6, m5 + movu m6, [prev_plus_mrefs + x] + movu m4, [prev_plus_prefs + x] + absdif_pb m6, m6, m0, m5 + absdif_pb m4, m4, m8, m5 + avg_truncate_pb m6, m6, m4, pb_1_reg, m5 + movu m4, [next_plus_mrefs + x] + movu m3, [next_plus_prefs + x] + absdif_pb m4, m4, m0, m5 + absdif_pb m3, m3, m8, m5 + avg_truncate_pb m4, m4, m3, pb_1_reg, m5 + pmaxub m6, m6, m4 + psrlw m1, m1, 1 + pand m1, m1, [pb_127] + pmaxub m1, m1, m6 + cmp DWORD modem, 1 + jg .loop_tail +.handle_mode_1: +; Handle the "if (!(mode&2))" section. +; This section has undergone some complex +; tranformations with respect to the c implementation in order to +; ensure that all inputs, outputs and intermeidate values can be +; stored in 8-bit unsigned values. The code is transformed with +; various identities to prevent signed intermediate values which +; would require an extra 9th bit for the sign, which we don't have. +; The main identities are applied: +; 1. -MAX(a-b, c-d) = MIN(b-c, d-c) +; 2. MIN(a-c, b-c) = MIN(a, b)-c +; The following from the C code: +; +; int max = FFMAX3(d-e, d-c, FFMIN(b-c, f-e)); +; diff = FFMAX3(diff, min, -max); +; +; becomes: +; int negative_max = FFMIN( FFMIN(e, c)-d, FFMAX(c-b, e-f)) +; diff = FFMAX3(diff, min, negative_max); +; +; Lastly we know that diff must be non-negative in the end, so +; intermediate negative values don't matter. to keep computations +; within 8 bits, we use saturating subtraction which replaces all +; negative intermediate results with 0, but doesn't affect the +; final value assigned to diff. + movu m6, [prev2_2mrefs + x] + movu m4, [next2_2mrefs + x] + avg_truncate_pb m6, m6, m4, pb_1_reg, m5 + movu m4, [prev2_2prefs + x] + movu m3, [next2_2prefs + x] + avg_truncate_pb m4, m4, m3, pb_1_reg, m5 + psubusb m3, m8, m2 + psubusb m5, m0, m2 + pminub m3, m3, m5 + psubusb m5, m0, m6 + psubusb m7, m8, m4 + pmaxub m5, m5, m7 + pminub m3, m3, m5 + psubusb m5, m2, m8 + psubusb m7, m2, m0 + pminub m5, m5, m7 + psubusb m6, m6, m0 + psubusb m4, m4, m8 + pmaxub m6, m6, m4 + pminub m6, m5, m6 + pmaxub m6, m6, m3 + pmaxub m1, m1, m6 + jmp .loop_tail +.spatial_check_16_bit: +; Assuming all else fails, we compute the spatial score using packed words to +; store the temporary values. Every input register containing packed bytes is +; unpacked into 2 separate registers with packed words, which are then +; processed identically. This path should generally be run < 3% of time, and +; is kept mainly to ensure output is bit-accurate compared to the C +; impelmentation + mova cur_plus_mrefs_x_2_stack, m12 + mova cur_plus_prefs_x_minus_2, m10 + mova m5, old_absdif_ahead_stack + pmovzxbw m0, m5 + mova m4, absdif_here + pmovzxbw m2, m4 + paddw m0, m0, m2 + pxor m12, m12, m12 + punpckhbw m2, m4, m12 + punpckhbw m5, m5, m12 + paddw m2, m5, m2 + mova m7, absdif_behind + pmovzxbw m5, m7 + pcmpeqd m4, m4, m4 + paddw m5, m5, m4 + paddw m9, m0, m5 + punpckhbw m5, m7, m12 + paddw m5, m5, m4 + paddw m7, m2, m5 + mova m0, chkneg1_ad2_stack + pmovzxbw m2, m0 + mova m4, chkneg1_ad1_stack + pmovzxbw m5, m4 + paddw m2, m2, m5 + punpckhbw m5, m0, m12 + punpckhbw m4, m4, m12 + paddw m4, m5, m4 + mova m0, chkneg1_ad0_stack + pmovzxbw m5, m0 + paddw m5, m2, m5 + pminsw m6, m9, m5 + punpckhbw m2, m0, m12 + paddw m2, m4, m2 + pminsw m14, m2, m7 + pcmpgtw m4, m9, m5 + pcmpgtw m10, m7, m2 + packsswb m0, m4, m10 + mova m2, spatial_predicate_stack + pblendvb m0, m2, spatial_pred_check_minus_1, m0 + mova spatial_predicate_stack, m0 + mova m0, chkneg2_ad2_stack + pmovzxbw m2, m0 + mova m1, chkneg2_ad0_stack + pmovzxbw m7, m1 + paddw m2, m7, m2 + mova m9, chkneg2_ad1_stack + pmovzxbw m7, m9 + paddw m2, m2, m7 + punpckhbw m7, m0, m12 + punpckhbw m5, m1, m12 + paddw m5, m5, m7 + punpckhbw m7, m9, m12 + paddw m5, m5, m7 + mova m0, chkpos1_ad2_stack + pmovzxbw m7, m0 + mova m1, chkpos1_ad1_stack + pmovzxbw m8, m1 + paddw m8, m8, m7 + punpckhbw m7, m0, m12 + punpckhbw m0, m1, m12 + paddw m7, m0, m7 + pmovzxbw m1, m13 + paddw m9, m8, m1 + punpckhbw m0, m13, m12 + paddw m8, m7, m0 + pcmpgtw m0, m6, m2 + pand m0, m0, m4 + pcmpgtw m4, m14, m5 + pand m4, m10, m4 + pblendvb m1, m6, m2, m0 + pblendvb m14, m5, m4 + packsswb m0, m0, m4 + mova m5, spatial_predicate_stack + pblendvb m0, m5, m11, m0 + pcmpgtw m5, m1, m9 + pcmpgtw m4, m14, m8 + packsswb m6, m5, m4 + pblendvb m13, m0, m3, m6 + mova m0, chkpos2_ad2_stack + pmovzxbw m3, m0 + mova m7, chkpos2_ad1_stack + pmovzxbw m6, m7 + paddw m3, m6, m3 + punpckhbw m6, m0, m12 + punpckhbw m0, m7, m12 + paddw m0, m0, m6 + mova m7, chkpos2_ad0_stack + pmovzxbw m6, m7 + paddw m3, m3, m6 + punpckhbw m6, m7, m12 + paddw m0, m0, m6 + pminsw m1, m9, m1 + pcmpgtw m1, m1, m3 + pminsw m14, m8, m14 + pcmpgtw m14, m14, m0 + pand m1, m1, m5 + pand m14, m14, m4 + packsswb m14, m1, m14 + mova m0, cur_plus_mrefs_x_2_stack + mova m5, cur_plus_prefs_x_minus_2 + pxor m1, m5, m0 + pavgb m0, m0, m5 + pand m1, m1, [pb_1] + psubb m1, m0, m1 + pblendvb m9, m13, m1, m14 + jmp .temporal_check +.return: + RET