From patchwork Wed Jul 20 04:41:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36854 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172263pzb; Tue, 19 Jul 2022 21:41:59 -0700 (PDT) X-Google-Smtp-Source: AGRyM1txkr2KjpD39QgvAgmJCcYMokAe1o8IvdjWw+lbic9fYPyE2SPeXTUPS+NWdEBgFICjonhj X-Received: by 2002:a17:906:5055:b0:6ff:1dfb:1e2c with SMTP id e21-20020a170906505500b006ff1dfb1e2cmr33931025ejk.200.1658292119277; Tue, 19 Jul 2022 21:41:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292119; cv=none; d=google.com; s=arc-20160816; b=Tc/8JA+Tv8c7es5c741o2ar0hnOh3mF3k+MUMBgyLtj0ZS0nLOpczUNBYG5JyidOte dXvJZZlBFFC6yzSu3IvE2urUPnfSsDc/+prWWWRBb+mk5Ci+i7zLGG1SXeOe/R76X+bx iLR3YnylcROiLIXGDrpbtqv5gGO4l6lUgPfc7wmb9qLs1i4sdHJwRsGiiyqvH/NhnLzA vgWPJI2XacO0U67fTPHxranj/XQF+rCJBYrFNldksiowlAyX6Nh7pTSFneEy5Hn5Ksmm TJRfl/raQi2peUktVKFAP5AH8iaSy+d8Zcf7MoPYxaS56Q/NJmqFpnJkAScH//ukeufV BX1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=6lfXK8KYO5o9/eywNyRRuV67x63Ct/NJWROFJpmSnMo=; b=hbg69oOjh03TlltdiN0Y+6DIsEuBvkgVdB3ZYUimjgNDf0mn0uqw8mTnr5VTDPUtno EjJQHIlw9Nl8ErFunulQVCbkfRPZ+RP75S00OoXKAlgNHrBYImhSJX2am4UMmbSsz2ha I27DnJmKsMhloFI2siZ/C8vE47suJIdXBeYYzgGzvbR9DTFCh5IQ9P1QVCoCEzTysjuf GZ8MQTjr7teW3BNprOpjYe0QOAChwqVDOZOj7MmDmACYnxipGw+9hKzoUqAphwkbP9iu Mr0N5cKX3iEiaKMYlTqcpz7ohuWbx31xKOHh06zhvTm8Wvb40WQ/I2c0rHvo2lmyCKWZ aZgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=mPTInKTp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qb29-20020a1709077e9d00b0072ab8073979si23012517ejc.460.2022.07.19.21.41.58; Tue, 19 Jul 2022 21:41:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=mPTInKTp; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EDF2568B856; Wed, 20 Jul 2022 07:41:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E722168B82B for ; Wed, 20 Jul 2022 07:41:25 +0300 (EEST) Received: by mail-pg1-f171.google.com with SMTP id bh13so15366988pgb.4 for ; Tue, 19 Jul 2022 21:41:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=c4xeuuzjgkCyJeehhi155/UaxzBBbcPC2HabThBId/I=; b=mPTInKTpJeKEWZCUxtsOvlEIMBkFH6T0Kn+OITXaAe7hwlW21QyK0oiM8gxutllaQA KPqXymJmf4okA5s3LIirZv9VLKg0zfD/vl+Ck/BjO4lVKvVJ9FFfMa/BzMuta1MsR2Rd mBDv5KX+T8uK0R67mNjsj4TNO04i5FriVSKpO94e8+L4iSjaOXIAu6ZrYjdUNdV8rqkw yUdDWRFtn2t/C42VTrCkOJVCSKAuf92lT3bmzi6+ffENSAzdHPJnI0G3+SI95gh/6IjV 5Nr0R8KOWfRwKWvFynRVGm4ZsW7qmuY95lRm8HbzuNp5XDmW9+sL+s5RIXy1jDMPry3z ylrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=c4xeuuzjgkCyJeehhi155/UaxzBBbcPC2HabThBId/I=; b=NFXOW7tj8HGxFwVV8hHRJEzUCAQS1n8KFsefw/r/BbDEIJdlSwD/onkzyyKnDxnwSb XqIXFtycPLLXeC36KSC0HZa4As45CeaNR765z92Q/Qg6qWonvXbvLdDlFTWBv1ONsOdR Dz2jJt6QhA03+nH2ZBNYJTOa1pzFU7KQpZzgWwWYAk4LckNFxfqewABCW0t6lpU/nJ7m e/YVg4G5sPGgeeDed+3jOwETwTz6AbA4er0HEAxGqOL1tDP7tqG9P450OKZU6dHG8SKZ lg7KsOOUbRabIhZbiQW1jKpCJOt5jPe1KDHItHsMWDQ+to31a9PuvdyTyHzlH+Qwe1bF 2hMw== X-Gm-Message-State: AJIora9NjcuL/Va04IRy9ulGlNHTOFH6fWrFvhG9LTtiZBnaOxcFvuIZ X5O+McW2gp53WmXfkY/ioRLNbn0FMetKdw== X-Received: by 2002:a63:9049:0:b0:412:b11b:c630 with SMTP id a70-20020a639049000000b00412b11bc630mr31785614pge.175.1658292083977; Tue, 19 Jul 2022 21:41:23 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:23 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:16 -0700 Message-Id: <20220720044117.1282961-4-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com> References: <20220720044117.1282961-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/5] avfilter/vf_yadif: Process more pixels using filter_line X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CiDZuzKI/ifK filter_line is generally vectorized, wheras filter_edge is implemented in C. Currently we rely on filter_edge to process non-edges in cases where the width doesn't match the alignment. This causes us to process non-edge pixels with the slow C implementation vs the faster SSE implementation. It is generally faster to process 8 pixels with the slowest SSE2 vectorized implementation than it is to process 2 pixels with the C implementation. Therefore, if filter_edge needs to process 2 or more non-edge pixels, it would be faster to process these non-edge pixels with filter_line instead even if it processes more pixels than necessary. To address this, we use filter_line so long as we know that at least 2 pixels will be used in the final output even if the rest of the computed pixels are invalid. Any incorrect output pixels generated by filter_line will be overwritten by the following call to filter_edge. In addtion we avoid running filter_line if it would read or write pixels outside the current slice. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 54109566be..394c04a985 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -201,6 +201,8 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int slice_end = (td->h * (jobnr+1)) / nb_jobs; int y; int edge = 3 + s->req_align / df - 1; + int filter_width_target = td->w - 3; + int filter_width_rounded_up = (filter_width_target & ~(s->req_align-1)) + s->req_align; /* filtering reads 3 pixels to the left/right; to avoid invalid reads, * we need to call the c variant which avoids this for border pixels @@ -215,11 +217,28 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int mrefs = y ? -refs : refs; int parity = td->parity ^ td->tff; int mode = y == 1 || y + 2 == td->h ? 2 : s->mode; + + /* Adjust width and alignment to process extra pixels in filter_line + * using potentially vectorized code so long as it doesn't cause + * reads or writes outside of the current slice. filter_edge will + * correct any incorrect pixels written by filter_line in this + * scenario. + */ + int filter_width; + int edge_alignment; + if (filter_width_rounded_up - filter_width_target >= 2 + && y*refs + filter_width_rounded_up < slice_end * refs + refs - 3) { + filter_width = filter_width_rounded_up; + edge_alignment = 1; + } else { + filter_width = td->w - edge; + edge_alignment = s->req_align; + } s->filter_line(dst + pix_3, prev + pix_3, cur + pix_3, - next + pix_3, td->w - edge, + next + pix_3, filter_width, prefs, mrefs, parity, mode); s->filter_edges(dst, prev, cur, next, td->w, - prefs, mrefs, parity, mode, s->req_align); + prefs, mrefs, parity, mode, edge_alignment); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df);