From patchwork Thu Feb 17 10:03:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 34356 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp489910nkx; Thu, 17 Feb 2022 02:04:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJxcJk8ZFrb2I73yzLDt84uwgws8ZKwmNPMVtwGJWpMBQ3t0pRu11hKWahAOxKkFEwnHrOXL X-Received: by 2002:a05:6402:228f:b0:410:ae4e:d1bf with SMTP id cw15-20020a056402228f00b00410ae4ed1bfmr1716025edb.231.1645092247438; Thu, 17 Feb 2022 02:04:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645092247; cv=none; d=google.com; s=arc-20160816; b=L0Hg5V4xcozLjxO4tL7XvYXCCrk3zITCojcqZxZyU6olHjL+hwIeKB8SmAWKc98es1 /QWqqZ28dKX4VhoJVfkAAVQuPAk2FpcgsmA4ootynLhIgph0GgmPCpAVnxqRSE9emdFO T1avi8kWpojBifV/EplOfRsDrl879F3EAy3C0NSVAL7dX4svVhJvDWI5hWaQa+tWHvyJ 9D/W+HiZGe0UGoxC3TdwBvaDC1U9PUK687VSWxmLN9mZ9w9fkZqA2yoS1pCIehPptQ8g pAFKumuhIKFC4LNFb7+QSjBfTBK3+pDwVDKZwahfa5XZpHFHTDm+9AGjKxrZ1/sdG0n6 JK6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:mime-version:message-id:date :dkim-signature:delivered-to; bh=h/AIunn7k/S2xhmJc6k0HLVVGKMM6R2WY3SvFFDvm4c=; b=KuxXWAqkTAtfLb6dRpzr0xlPoBbAaBz0rlrImKJUiRJxrdhFN8soRakQ5LMGu88JPI vpR7MhJCc71zJpNw/7i2n/H0JMw1B9IbaEm9nW89tQy83ihqwLyqDZ30h0zdAkmezoS4 GXD+ucBBp9GQXEYZLg8aUKdVjZ8zbZ+Gomdv0ujjpBQBz26UVOX3WMepbJubxmgrKtiR mDwaKA3qxUdegAALWD3AuKtzXNAOVUxL2mJ79F7g/Mb+ni6Ila3Gl+Y3/P9VyzXgMqK2 Wsg5J7JmZ8DiHHarDQCv2vLUyp1aTf9rGNOeG8w0rv8M5Onnfoon/AUKrQRxXWYJg0Cr qKlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=eOnX3w9L; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id v18si3890739edc.420.2022.02.17.02.04.06; Thu, 17 Feb 2022 02:04:06 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=eOnX3w9L; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1664568B1F9; Thu, 17 Feb 2022 12:04:03 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7E7DD68AAA0 for ; Thu, 17 Feb 2022 12:03:56 +0200 (EET) Received: by mail-yb1-f201.google.com with SMTP id j17-20020a25ec11000000b0061dabf74012so9503113ybh.15 for ; Thu, 17 Feb 2022 02:03:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=xC1nT2XTjJzG57tuJWzW/nUloQCoei4n24re72ODP40=; b=eOnX3w9Lgh45LDQ8D4hGYCoeUiC4TLtK3dvtnNt09YUr67FMw1FqAY/AjQu3ZlEQ9D i1MmC179IbELjFD6luAXgOq3l5IQcUzocBjbKFUgMImoZgJjdeK/kbRtthmpLy4A0Igm leNpm61tZoATsjfbZREB5YoC2mqoerg6ZgI5/uSgpfFqycGmQ8vES3gkyO0dIT9OO85R tiJr3XiQ0/jyg+KBctB6zdepbhICtq8TzY26boV+qQoPFkoGYRxTDPLnXQtXTZNeI0v1 iGEEugCsH2vJ8ZmvsFJT4jcu6LN3uGJ06X0HXVbByZU5oHaK2cyneGRuM7IZIGjMaN7z DPhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=xC1nT2XTjJzG57tuJWzW/nUloQCoei4n24re72ODP40=; b=YtyNlNwvoGDoHe49rITroM3YhsH7sDCQDWma5nWCest7dPr/o8xxZv6WOzvt3Dqc8C nLDHKpe38hn2wjlSlh+K5QmLMWkvRM8sGMDc3DbFXBqsLHs8d/NBqOnosmF6amyCPVX8 AOJZzsZY2s8cOik/re+WOGEMHbNkveQdo7hr678BGW9uZYGLXm0V9h6Napu3FQLLd7FY z7gNS62rlcEfVEuu0tIi4WHgz3zR+JrQmhR6EeAmA1o4TfuU/kjrVTwP0gOZ57VSQ4EH 5lOSBKQ6GGTgbT5NAobJYX2LH/d9aETY3hVlkKlSNkC44osqcA/zFOz6XSd0j2Pu9O2O 4GXQ== X-Gm-Message-State: AOAM533+sfV9pFurYDhqRrww6tZXFymSk+vv98T3xn/f06QiAb5uQDpZ DDtpcf/rfD9/eqHkCBnFvwwt+ihBPRZHOKe1JO9rhpVA9/PqjC7nBs1FqmdBOXeWnHBLeYJlqch yoics0gmwRJiqKTU2aMTuuF0XWlxciuaBKywiYc7h8cSMsc39GPkFTojFSzPrrqr9ST3lizI= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:b159:808d:943e:13ba]) (user=alankelly job=sendgmr) by 2002:a0d:da45:0:b0:2d0:bd53:b39 with SMTP id c66-20020a0dda45000000b002d0bd530b39mr1755363ywe.463.1645092235258; Thu, 17 Feb 2022 02:03:55 -0800 (PST) Date: Thu, 17 Feb 2022 11:03:52 +0100 Message-Id: <20220217100352.1112032-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH v2 2/5] libswscale: Re-factor ff_shuffle_filter_coefficients. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FhzbJent0Anh Make the code more readable and follow the style guide. --- libswscale/utils.c | 66 +++++++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 30 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 344c87dfdf..7c8e1bbdde 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -278,42 +278,48 @@ static const FormatEntry format_entries[] = { [AV_PIX_FMT_P416LE] = { 1, 1 }, }; -int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, int filterSize, int16_t *filter, int dstW){ +int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, + int filterSize, int16_t *filter, + int dstW) +{ #if ARCH_X86_64 - int i, j, k, l; + int i, j, k; int cpu_flags = av_get_cpu_flags(); + // avx2 hscale filter processes 16 pixel blocks. + if (!filter || dstW % 16 != 0) + return 0; if (EXTERNAL_AVX2_FAST(cpu_flags) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) { - if ((c->srcBpc == 8) && (c->dstBpc <= 14)){ - if (dstW % 16 == 0){ - if (filter != NULL){ - for (i = 0; i < dstW; i += 8){ - FFSWAP(int, filterPos[i + 2], filterPos[i+4]); - FFSWAP(int, filterPos[i + 3], filterPos[i+5]); - } - if (filterSize > 4){ - int16_t *tmp2 = av_malloc(dstW * filterSize * 2); - if (!tmp2) - return AVERROR(ENOMEM); - memcpy(tmp2, filter, dstW * filterSize * 2); - for (i = 0; i < dstW; i += 16){//pixel - for (k = 0; k < filterSize / 4; ++k){//fcoeff - for (j = 0; j < 16; ++j){//inner pixel - for (l = 0; l < 4; ++l){//coeff - int from = i * filterSize + j * filterSize + k * 4 + l; - int to = (i) * filterSize + j * 4 + l + k * 64; - filter[to] = tmp2[from]; - } - } - } - } - av_free(tmp2); - } - } - } + if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { + int16_t *filterCopy = NULL; + if (filterSize > 4) { + if (!FF_ALLOC_TYPED_ARRAY(filterCopy, dstW * filterSize)) + return AVERROR(ENOMEM); + memcpy(filterCopy, filter, dstW * filterSize * sizeof(int16_t)); + } + // Do not swap filterPos for pixels which won't be processed by + // the main loop. + for (i = 0; i + 8 <= dstW; i += 8) { + FFSWAP(int, filterPos[i + 2], filterPos[i + 4]); + FFSWAP(int, filterPos[i + 3], filterPos[i + 5]); + } + if (filterSize > 4) { + // 16 pixels are processed at a time. + for (i = 0; i + 16 <= dstW; i += 16) { + // 4 filter coeffs are processed at a time. + for (k = 0; k + 4 <= filterSize; k += 4) { + for (j = 0; j < 16; ++j) { + int from = (i + j) * filterSize + k; + int to = i * filterSize + j * 4 + k * 16; + memcpy(&filter[to], &filterCopy[from], 4 * sizeof(int16_t)); + } + } + } + } + av_free(filterCopy); } } - return 0; #endif + return 0; } int sws_isSupportedInput(enum AVPixelFormat pix_fmt)