From patchwork Wed Feb 9 09:09:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 34205 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2c4e:0:0:0:0 with SMTP id x14csp1560273iov; Wed, 9 Feb 2022 01:10:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJxfSZImUUnPE0zAI7wbtn1kbNtJiukQYDZZYXYEQXq9gmWsIzfsFS6N9UaTrEl8neToLKhT X-Received: by 2002:a17:907:6d83:: with SMTP id sb3mr1046213ejc.21.1644397805930; Wed, 09 Feb 2022 01:10:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644397805; cv=none; d=google.com; s=arc-20160816; b=rkuwH+6bpdNAcIooi1Tm6rCL1aiZHT7RTxzpVAJgALqiv4OLTu6Y3+jB9kv7v+GWdo suWwFZqhHeLGgEo5TL+NaqFxO+dR2U5eJMF2CbtW7Uu/uvWK9MPEbOFQT0QsaKjVpK4J 3teLNxZL4HCFXi60Yzj06hQddNE+c1LZ+Pr7RJMvSovY3hiHw0ZWRKYJCSoSRouZoOvG T/V6ikKq1Vlc0SrjjWEYuu+I997lDSKLTnpKCQQOTmRTBk6f9gRW7jtuaU7Op+SGOx0Q ouK8gfQSMYq/785esBeMUV9W0q84qUZG/0trTqCX1pF7FyBvxyluROwOZ8Z9jHF/+k3V JmVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:to:from:mime-version:message-id:date :dkim-signature:delivered-to; bh=Il8e0o0+6IOafK8+VSH+XvD5QR2BCX2kfpofxE2nbgc=; b=lhbMJPlH8Kvt/5HaCR5a0tdZR0ZQjLy6nDUM8Q1kD4C8wCaKW8rlBNyZJ4LvrPoVrT TQnyQtk2jQQguQWJ3p2uKtchLog7F9OVPe9O83ed2PLgHsxebduGaX11a/zDL4X403Ad 4hRszlHgPULD4b5NwAaDX9mh/tgtAndZILqmO9+niTSWg+LIFBIJ9sPC/qaO+X8UemeD NHTFoXYI5WphYLNWCqoH29QXRTnLStHwBqzhGqm/mnAx5/NqXJUPAx5ETXQKFgn4g+/Y rASoU96kfkIg3nG1zcfsGfDEaevtm1/TnTTc3vfGBn1972YoRNZxTT0p+35G4iOhvpmR AxTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=MMiLHsG5; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 10si9627587ejc.890.2022.02.09.01.10.05; Wed, 09 Feb 2022 01:10:05 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@google.com header.s=20210112 header.b=MMiLHsG5; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AA59268B192; Wed, 9 Feb 2022 11:10:01 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7F35F68B05A for ; Wed, 9 Feb 2022 11:09:54 +0200 (EET) Received: by mail-wm1-f74.google.com with SMTP id l4-20020a05600c4f0400b0037bb2ce79d8so2339586wmq.9 for ; Wed, 09 Feb 2022 01:09:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=1MobYHjS9erwzffYOOiKPv85FxvpX2eCy3TXODXNNGU=; b=MMiLHsG5TUXaR/Oib/8wyUdMFttnGkUB5Zq1D6jV1pslHqKwmnEPuCjRdMxZAdUH9Y hy6HmtjwfQa4zCTlCMt00a9RjZNhjV0fB2oYsqxYkZpl7dVEzWkaPJ1cXRdhTiCg82+j r2LBfmrEC/8cg/AudvJ9/uM1/m6/FxbUgU6OpvyOPtG0xhCz3y2RXt7Y/QbUolWlfNfL Mwlbn2tWVXVoXaLd7HM/FtOkNeTXUSyIHBqBYYrqSHNgiFIDmYxG202pkY3H06ok6Fmy hGVvTVHSAdRUwTcVa2TINrOUORwmO4BHqCG6/XnpC5ybXtOgHGAM0AQY18IAPf9Q4V5h LxCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=1MobYHjS9erwzffYOOiKPv85FxvpX2eCy3TXODXNNGU=; b=mnx47aawJBYCFe/gbevuujSitaEE5rLz8yqeUz2TliR+KBWJxV24OT7yF22JrEC08X ko5s6C0vOe7mwv5qyHByoeMQ/xCJgbvRVHtX+fEH66dUOZ8ez0mDUZWD6NTcpyy+mnlW h6iBTnZGD97ThDCfBBxvePDuPU791n/Wk7eauHsd4Pv+ObAPTOw3Ef7+EXcfhw7BUEDq hH666iWKvJCqC5wx4Qc1Vyjmo7abq634OIYdH0GrRZ+EjTUoVc5ka1z0vz7P6mBtTKKP TI+sstCDZkd1xSSncPA6X6qUQzhG1GE0j39Dm0cATJfAhFL53MpATHj6WW+3DNnkstbm eqEw== X-Gm-Message-State: AOAM533PoE609NKM1Wh5GgHJ+uNnZecQ+hqjIYegzXOGUbnZDjr8ZwZJ i2raB6A/b+ypOfK+RyEltLKMEnQ7mMk8ImiAHe/0D3pqXocfVo+a2fkBX4aYXARjJKIAnwZ3HIY pRhyuCl/jZaPJ2uSDZ9g6Aal38f1WaUYkVVpcDEPlXF65ihGNoNFYqxFsFlEus8V+bOkJbZg= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:61:301:388b:9d0c:1bc4:40c]) (user=alankelly job=sendgmr) by 2002:adf:f08b:: with SMTP id n11mr1246124wro.7.1644397793528; Wed, 09 Feb 2022 01:09:53 -0800 (PST) Date: Wed, 9 Feb 2022 10:09:45 +0100 Message-Id: <20220209090945.3450752-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.0.263.gb82422642f-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 1/5] libswscale: Re-factor ff_shuffle_filter_coefficients. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: CkTwsPUXy9ac Make the code more readable and follow the style guide. --- libswscale/utils.c | 64 +++++++++++++++++++++++++++------------------- 1 file changed, 37 insertions(+), 27 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index c5ea8853d5..1d919e863a 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -278,39 +278,49 @@ static const FormatEntry format_entries[] = { [AV_PIX_FMT_P416LE] = { 1, 1 }, }; -void ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, int filterSize, int16_t *filter, int dstW){ +void ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, + int filterSize, int16_t *filter, + int dstW) +{ #if ARCH_X86_64 - int i, j, k, l; + int i, j, k; int cpu_flags = av_get_cpu_flags(); + // avx2 hscale filter processes 16 pixel blocks. + if (!filter || dstW % 16 != 0) + return; if (EXTERNAL_AVX2_FAST(cpu_flags) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) { - if ((c->srcBpc == 8) && (c->dstBpc <= 14)){ - if (dstW % 16 == 0){ - if (filter != NULL){ - for (i = 0; i < dstW; i += 8){ - FFSWAP(int, filterPos[i + 2], filterPos[i+4]); - FFSWAP(int, filterPos[i + 3], filterPos[i+5]); - } - if (filterSize > 4){ - int16_t *tmp2 = av_malloc(dstW * filterSize * 2); - memcpy(tmp2, filter, dstW * filterSize * 2); - for (i = 0; i < dstW; i += 16){//pixel - for (k = 0; k < filterSize / 4; ++k){//fcoeff - for (j = 0; j < 16; ++j){//inner pixel - for (l = 0; l < 4; ++l){//coeff - int from = i * filterSize + j * filterSize + k * 4 + l; - int to = (i) * filterSize + j * 4 + l + k * 64; - filter[to] = tmp2[from]; - } - } - } - } - av_free(tmp2); - } - } - } + if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { + int16_t *filterCopy = NULL; + if (filterSize > 4) { + if (!FF_ALLOC_TYPED_ARRAY(filterCopy, dstW * filterSize)) + return; + memcpy(filterCopy, filter, dstW * filterSize * sizeof(int16_t)); + } + // Do not swap filterPos for pixels which won't be processed by + // the main loop. + for (i = 0; i + 8 <= dstW; i += 8) { + FFSWAP(int, filterPos[i + 2], filterPos[i + 4]); + FFSWAP(int, filterPos[i + 3], filterPos[i + 5]); + } + if (filterSize > 4) { + // 16 pixels are processed at a time. + for (i = 0; i + 16 <= dstW; i += 16) { + // 4 filter coeffs are processed at a time. + for (k = 0; k + 4 <= filterSize; k += 4) { + for (j = 0; j < 16; ++j) { + int from = (i + j) * filterSize + k; + int to = i * filterSize + j * 4 + k * 16; + memcpy(&filter[to], &filterCopy[from], 4 * sizeof(int16_t)); + } + } + } + } + if (filterCopy) + av_free(filterCopy); } } #endif + return; } int sws_isSupportedInput(enum AVPixelFormat pix_fmt)