From patchwork Wed Jul 20 04:41:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36852 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2172161pzb; Tue, 19 Jul 2022 21:41:41 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uDUlNLk+A6JPcsup1GWvkIOfk/ME2tRko5nxfov5NLU5kF4vf1wRxbqpKYZLM6t6wibBEY X-Received: by 2002:a17:906:8a5a:b0:72b:6b60:2d9f with SMTP id gx26-20020a1709068a5a00b0072b6b602d9fmr34160559ejc.324.1658292101250; Tue, 19 Jul 2022 21:41:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658292101; cv=none; d=google.com; s=arc-20160816; b=Ja3VYtkcyE1BUof0A2U9uYmnbgJ388obXbOBPpGy2PEbD5o8bi6h3KN6OEOvc2NMG2 gQfg1rSBzJlB5sBjphtmGosja8C/3iiATy0LEnBMrc3JCDsOkcn9O7575GByk0tVUiTU nEXGdZqW/opb4Po9zSUdy8e1RLa0yZmntG9n4dQUMoTEnmxJsyywrlDegHGRfzKiQbv9 tWHL6g5PA1yauFN6L5HobOOZOx8znpMYlqgSL/m9youYaLDWTP8Zci4nZZGmtq3WxFZC M6jHioSIBxkSXIu0SlrVxhVlaAuck8xswDM+uGEqNU9K87aT2BUgHq7sRKeyMrqsq5iY YvDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=xYrSVJM+JEazhWda1O4nTY8X7YLPgjE4tnWsk/SRt4Y=; b=059ocaOk4Cb4/MpQg9REQJvnQjPwPXwbmR5vaSiLJ94BQNy4/RpSUt7VE966tcZBsH Tr24KwKoECdeVcT0bP3gDWK2ZLUciu91EwXJx7OkQqdXqiL9z0lZJYBNvF/UP8Bc5K/I bz7mQn+/qn9iM6UaSBTdnNNl1yo88rFKyifb8zd22Ucd6O9xLUda/mijdfkt/reuH3fW 04hmE/FK1ZyEnDYlVAaK2UffYmMzQ1tw2uhR+JtmNXZXVFSn/wSb5fib01m/q5h40V0j xUCnANXiczkY+4XvtgQW/N602R9NK8XA5v7JPvdNXUW7XJEaAwsuKqtOt4nno9iKiHnM aCrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=jKG1pTmq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z18-20020a05640240d200b00435bcb8758dsi6471167edb.12.2022.07.19.21.41.40; Tue, 19 Jul 2022 21:41:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=jKG1pTmq; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6C75468B7D4; Wed, 20 Jul 2022 07:41:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 14CC368B7CC for ; Wed, 20 Jul 2022 07:41:24 +0300 (EEST) Received: by mail-pg1-f177.google.com with SMTP id r186so15362349pgr.2 for ; Tue, 19 Jul 2022 21:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=B9B0QDrFUjTO8dgges6mFHrfz1CKo0jCmRo91X5zRTk=; b=jKG1pTmq/j54S+tcVikjZ7KTknAgdJdKVUdpp3QXVRpZ/659NiNst0qWvHYuqsovN6 frf+PMiD98V4VOxq6npZgL6JYTp+9l7efNPNkb7Pfxx5W2jpsqXaiAnFJ/ugA9xNeWby gfbNjYTvirGawvLen41ohK2TeGSCeXUAnEvIesuiTvqQi/T5XQ0qK1axYK+Vt3M3Bis8 uFQ7cLa25nnVQVgbTStybfJb7DY2imSo6JOMrn6XfV0KAhc2bn6TkzpnwCWhis6PdqMK g+2zQi3T8phChcpsYiZP3EtbByvSeOZFkX4ZnfgyZtandi/n9d/lLy2T32ji8L17VCuo BLxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B9B0QDrFUjTO8dgges6mFHrfz1CKo0jCmRo91X5zRTk=; b=iOnr2Mfs/GTvrNOLv6VaclCwcMRKSXMD0N9dXMnQ6ZkBrv2X1Udqhrj/N+bKx2apkt SAohcmr59BjJs6xQfUXpkuDHmQ193HOZ6tojoxamOlMVgwZ87qIC5vxz3eFHMnLy/nL9 ah20Vaksv1SowJkYIY7p/4FrlK42OKLckhXBMACyMLaB9imj6KhJIybhu8q32YTYx/fa OcabNQOwJu96MKvO7EHzjrl8lrYx69NO2dR/rDjIfRh5W2HAfTSvOEtkON9LJhscLn1c RygGxzkmbLFDkDefmDO+U8Kj36zsFmkveC3B02FkBDQU6Vejm9N7w9SnsO+39v+4Wwaf dNYg== X-Gm-Message-State: AJIora91iGgX9P/q8nZjOEUj5GtVmFG1qz4smZfbzA1akDu8nTOP4rOv dVbdCbr/HSsr3y3T4Xfrjg/JUEOEXJXMAg== X-Received: by 2002:a63:688a:0:b0:412:6728:4bf3 with SMTP id d132-20020a63688a000000b0041267284bf3mr32569746pgc.339.1658292082108; Tue, 19 Jul 2022 21:41:22 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id f16-20020a635110000000b003fba1a97c49sm10855907pgb.61.2022.07.19.21.41.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 21:41:21 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Tue, 19 Jul 2022 21:41:14 -0700 Message-Id: <20220720044117.1282961-2-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220720044117.1282961-1-cphlipot0@gmail.com> References: <20220720044117.1282961-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] avfilter/vf_yadif: Allow alignment to be configurable X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: KQytkCwi8L+3 Allow the alignment to be determined based on what yadif_filter_line implementation is used. Currently this is either 1, or 8 depending on whether the C code or the x86 SSE code is used, but allows for other future implementations that use a larger alignment. Adjusting MAX_ALIGN to 32 in the case of an AVX2 implementation could potentially hurt the performance of the SSE implementation, so we allow yadif to use the smallest needed alignment instead to maintain existing performance if implementations with wider vectors are added. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 16 +++++++++------- libavfilter/x86/vf_yadif_init.c | 1 + libavfilter/yadif.h | 4 +++- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 055327d7a4..42f6246330 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -108,9 +108,9 @@ static void filter_line_c(void *dst1, FILTER(0, w, 1) } -#define MAX_ALIGN 8 static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int parity, int mode) + int w, int prefs, int mrefs, int parity, int mode, + int alignment) { uint8_t *dst = dst1; uint8_t *prev = prev1; @@ -120,7 +120,7 @@ static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, uint8_t *prev2 = parity ? prev : cur ; uint8_t *next2 = parity ? cur : next; - const int edge = FFMAX(MAX_ALIGN - 1, 3); + const int edge = FFMAX(alignment - 1, 3); int offset = FFMAX(w - edge, 3); /* Only edge pixels need to be processed here. A constant value of false @@ -159,7 +159,8 @@ static void filter_line_c_16bit(void *dst1, } static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int parity, int mode) + int w, int prefs, int mrefs, int parity, int mode, + int alignment) { uint16_t *dst = dst1; uint16_t *prev = prev1; @@ -169,7 +170,7 @@ static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, uint16_t *prev2 = parity ? prev : cur ; uint16_t *next2 = parity ? cur : next; - const int edge = FFMAX(MAX_ALIGN / 2 - 1, 3); + const int edge = FFMAX(alignment / 2 - 1, 3); int offset = FFMAX(w - edge, 3); mrefs /= 2; @@ -199,7 +200,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int slice_start = (td->h * jobnr ) / nb_jobs; int slice_end = (td->h * (jobnr+1)) / nb_jobs; int y; - int edge = 3 + MAX_ALIGN / df - 1; + int edge = 3 + s->req_align / df - 1; /* filtering reads 3 pixels to the left/right; to avoid invalid reads, * we need to call the c variant which avoids this for border pixels @@ -219,7 +220,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) s->filter_edges(dst, prev, cur, next, td->w, y + 1 < td->h ? refs : -refs, y ? -refs : refs, - td->parity ^ td->tff, mode); + td->parity ^ td->tff, mode, s->req_align); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df); @@ -303,6 +304,7 @@ static int config_output(AVFilterLink *outlink) s->csp = av_pix_fmt_desc_get(outlink->format); s->filter = filter; + s->req_align = 1; if (s->csp->comp[0].depth > 8) { s->filter_line = filter_line_c_16bit; s->filter_edges = filter_edges_16bit; diff --git a/libavfilter/x86/vf_yadif_init.c b/libavfilter/x86/vf_yadif_init.c index 257c3f9199..9dd73f8e44 100644 --- a/libavfilter/x86/vf_yadif_init.c +++ b/libavfilter/x86/vf_yadif_init.c @@ -53,6 +53,7 @@ av_cold void ff_yadif_init_x86(YADIFContext *yadif) int bit_depth = (!yadif->csp) ? 8 : yadif->csp->comp[0].depth; + yadif->req_align = 8; if (bit_depth >= 15) { if (EXTERNAL_SSE2(cpu_flags)) yadif->filter_line = ff_yadif_filter_line_16bit_sse2; diff --git a/libavfilter/yadif.h b/libavfilter/yadif.h index c928911b35..b81f2fc1d9 100644 --- a/libavfilter/yadif.h +++ b/libavfilter/yadif.h @@ -66,11 +66,13 @@ typedef struct YADIFContext { /** * Required alignment for filter_line */ + int req_align; void (*filter_line)(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode); void (*filter_edges)(void *dst, void *prev, void *cur, void *next, - int w, int prefs, int mrefs, int parity, int mode); + int w, int prefs, int mrefs, int parity, int mode, + int alignment); const AVPixFmtDescriptor *csp; int eof;