From patchwork Thu Jul 21 02:25:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Phlipot X-Patchwork-Id: 36877 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:1649:b0:8b:613a:194d with SMTP id no9csp2770981pzb; Wed, 20 Jul 2022 19:26:04 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sKvfySb+5+kCaNmvtHQ2sxLEFZCGIiTEc3qcIEkBCRAA7Jigc9/PzzFe87n894W89SgpPK X-Received: by 2002:a17:906:cc12:b0:72b:67bb:80c3 with SMTP id ml18-20020a170906cc1200b0072b67bb80c3mr37263947ejb.668.1658370364198; Wed, 20 Jul 2022 19:26:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658370364; cv=none; d=google.com; s=arc-20160816; b=YHbx8JVRHAIx8IOHt1+LVIMR4riCoV/3sE8X+6GUbTu+cerl0GIo67FmbKqdHb97Yj JgTfvd5tcsGG/7iSUD+jm/OULInZElmbHKjU3YPU85F27SC1qT/ojMhmG6z5ZkC1CwC5 EeA+FU/VCnLi6RoPEqkcRJWXDW9lB9lHs7P2SdlX8iYUIWLrdk+T6BPBuVJVatu1abm/ wooiI1hxCOYxw0QhaPvCi9qThsE5vfubO+MytbayNpFL0Gpo1aTI0LHTkYKDXdb2D5xZ O1agW7LupNNlBK/NQ0sK8L5QrRpD2DyabwDsNxCYJctK6z5cGI/IydHsvyrwSWZlNURh nGPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=xYrSVJM+JEazhWda1O4nTY8X7YLPgjE4tnWsk/SRt4Y=; b=yUVcD+8KF2K5QdxaS0ODuq+235IKOlckzngVYsfrLxwD4wTd/apn5faSko4YW2s09Z CtTA6gH/kYaWsFP1r6lZskWMJ1PJjAchU/VaiZpCZOgB/Z0PPP5vq6A7yfXDjN+yvIMF 7OyEzdv/8dXJjgR4k74oa8a7TZPHsYf8/J92iGtQkejFNQww88y7GBFltXRmXbtU2WRL 0ckHiUKrzffxlJJyxlxPDlG8YgDk3s3nBXNIYsUBs+GceIQ6jufrzv8Fav0m7zPdjL4w s+VL2gDJ4DZ0dpwSoHx3z4Y2+30UUjVk4miYYdqktM4nKQHOJZssJq9iuZChA/JtfyLv HBDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=JbGudGeA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dm1-20020a05640222c100b0043a5aef7661si842990edb.334.2022.07.20.19.26.03; Wed, 20 Jul 2022 19:26:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=JbGudGeA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B67C268B543; Thu, 21 Jul 2022 05:25:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2A0F668B369 for ; Thu, 21 Jul 2022 05:25:43 +0300 (EEST) Received: by mail-pf1-f179.google.com with SMTP id l124so466408pfl.8 for ; Wed, 20 Jul 2022 19:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=B9B0QDrFUjTO8dgges6mFHrfz1CKo0jCmRo91X5zRTk=; b=JbGudGeAecrK529S0m5ePmdaJLzIF67pHEnBnfFPsO2+jLegx14NaZXa0/p4RqF5eH omkgrQbQqrURvGuuMq5TlF+23btRMvv1wiinunzQPYQRs4h1odVg+1AvncUY2XB15tJd o97Bg0Ez2nH3pEX65gtgapTwBM4COGUajAnQTtyMjIuVmlcFD8fivYvlVSWG9urqY2Ff UnjdRZDt4PHAyVzHWWk0C7TlEzHufixs8EIqkNHHuEsf9bwcj6J835PRfCrvgPjT12ex Azbu42QaLU8tDWByQ/6V6Z9gdL4A9HpozeMFRaeBsXUCYR/mrzQeIEifnq5Xqa/vKosm zV1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B9B0QDrFUjTO8dgges6mFHrfz1CKo0jCmRo91X5zRTk=; b=2v7H+EZFMEJ4C7e7s4MRkd8+J9LxHWOvbkSmY1FcBuX3pvR2oinF2/A6cwtzgu/xCD q2Q+K6AnLBFmUbhg/FFsUl8T18M3x79+8d99nkEAkbdyi6mSxHLT4RUWgY4uNKnidKSz hv7zhq2GhJGyw4FUI/lPROK5XU+pd2LZ/tnp2KZsO7skmWlMStHqtKCTOhMQ7zU49I9E vzwgri1w0TEr3LKk5jmqHNvhXT0cmACM9MOCgtwldbb/nH1MHn+5hUQGWgVscM3qpZLI MfKvzufW5mBwdGX3N4vVDcr1QE5JHdJy3uf60me00gf6zlG1qyub3lcwQBtHn1m6kS+/ GyyQ== X-Gm-Message-State: AJIora/aaqX2aZwIEukU5uvgpfgmanqPS29GWHgCZyvT7CjoG1YU5gES LyPR1D2B0x3fMbyQiRc7uRnIRIsaKdaaMg== X-Received: by 2002:a05:6a00:134c:b0:52a:c52f:b339 with SMTP id k12-20020a056a00134c00b0052ac52fb339mr41702644pfu.32.1658370340878; Wed, 20 Jul 2022 19:25:40 -0700 (PDT) Received: from localhost.localdomain (23-121-159-29.lightspeed.sntcca.sbcglobal.net. [23.121.159.29]) by smtp.googlemail.com with ESMTPSA id j3-20020a62c503000000b005251c6fbd0csm340079pfg.29.2022.07.20.19.25.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Jul 2022 19:25:40 -0700 (PDT) From: Chris Phlipot To: ffmpeg-devel@ffmpeg.org Date: Wed, 20 Jul 2022 19:25:11 -0700 Message-Id: <20220721022514.1466331-2-cphlipot0@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220721022514.1466331-1-cphlipot0@gmail.com> References: <20220721022514.1466331-1-cphlipot0@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/5] avfilter/vf_yadif: Allow alignment to be configurable X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Chris Phlipot Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: dR0trp5LVF80 Allow the alignment to be determined based on what yadif_filter_line implementation is used. Currently this is either 1, or 8 depending on whether the C code or the x86 SSE code is used, but allows for other future implementations that use a larger alignment. Adjusting MAX_ALIGN to 32 in the case of an AVX2 implementation could potentially hurt the performance of the SSE implementation, so we allow yadif to use the smallest needed alignment instead to maintain existing performance if implementations with wider vectors are added. Signed-off-by: Chris Phlipot --- libavfilter/vf_yadif.c | 16 +++++++++------- libavfilter/x86/vf_yadif_init.c | 1 + libavfilter/yadif.h | 4 +++- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 055327d7a4..42f6246330 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -108,9 +108,9 @@ static void filter_line_c(void *dst1, FILTER(0, w, 1) } -#define MAX_ALIGN 8 static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int parity, int mode) + int w, int prefs, int mrefs, int parity, int mode, + int alignment) { uint8_t *dst = dst1; uint8_t *prev = prev1; @@ -120,7 +120,7 @@ static void filter_edges(void *dst1, void *prev1, void *cur1, void *next1, uint8_t *prev2 = parity ? prev : cur ; uint8_t *next2 = parity ? cur : next; - const int edge = FFMAX(MAX_ALIGN - 1, 3); + const int edge = FFMAX(alignment - 1, 3); int offset = FFMAX(w - edge, 3); /* Only edge pixels need to be processed here. A constant value of false @@ -159,7 +159,8 @@ static void filter_line_c_16bit(void *dst1, } static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, - int w, int prefs, int mrefs, int parity, int mode) + int w, int prefs, int mrefs, int parity, int mode, + int alignment) { uint16_t *dst = dst1; uint16_t *prev = prev1; @@ -169,7 +170,7 @@ static void filter_edges_16bit(void *dst1, void *prev1, void *cur1, void *next1, uint16_t *prev2 = parity ? prev : cur ; uint16_t *next2 = parity ? cur : next; - const int edge = FFMAX(MAX_ALIGN / 2 - 1, 3); + const int edge = FFMAX(alignment / 2 - 1, 3); int offset = FFMAX(w - edge, 3); mrefs /= 2; @@ -199,7 +200,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int slice_start = (td->h * jobnr ) / nb_jobs; int slice_end = (td->h * (jobnr+1)) / nb_jobs; int y; - int edge = 3 + MAX_ALIGN / df - 1; + int edge = 3 + s->req_align / df - 1; /* filtering reads 3 pixels to the left/right; to avoid invalid reads, * we need to call the c variant which avoids this for border pixels @@ -219,7 +220,7 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) s->filter_edges(dst, prev, cur, next, td->w, y + 1 < td->h ? refs : -refs, y ? -refs : refs, - td->parity ^ td->tff, mode); + td->parity ^ td->tff, mode, s->req_align); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df); @@ -303,6 +304,7 @@ static int config_output(AVFilterLink *outlink) s->csp = av_pix_fmt_desc_get(outlink->format); s->filter = filter; + s->req_align = 1; if (s->csp->comp[0].depth > 8) { s->filter_line = filter_line_c_16bit; s->filter_edges = filter_edges_16bit; diff --git a/libavfilter/x86/vf_yadif_init.c b/libavfilter/x86/vf_yadif_init.c index 257c3f9199..9dd73f8e44 100644 --- a/libavfilter/x86/vf_yadif_init.c +++ b/libavfilter/x86/vf_yadif_init.c @@ -53,6 +53,7 @@ av_cold void ff_yadif_init_x86(YADIFContext *yadif) int bit_depth = (!yadif->csp) ? 8 : yadif->csp->comp[0].depth; + yadif->req_align = 8; if (bit_depth >= 15) { if (EXTERNAL_SSE2(cpu_flags)) yadif->filter_line = ff_yadif_filter_line_16bit_sse2; diff --git a/libavfilter/yadif.h b/libavfilter/yadif.h index c928911b35..b81f2fc1d9 100644 --- a/libavfilter/yadif.h +++ b/libavfilter/yadif.h @@ -66,11 +66,13 @@ typedef struct YADIFContext { /** * Required alignment for filter_line */ + int req_align; void (*filter_line)(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode); void (*filter_edges)(void *dst, void *prev, void *cur, void *next, - int w, int prefs, int mrefs, int parity, int mode); + int w, int prefs, int mrefs, int parity, int mode, + int alignment); const AVPixFmtDescriptor *csp; int eof;