From patchwork Fri Apr 24 16:31:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nelson Gomez X-Patchwork-Id: 19220 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7708544A760 for ; Fri, 24 Apr 2020 19:32:19 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 55A3D68C16E; Fri, 24 Apr 2020 19:32:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 725D868C167 for ; Fri, 24 Apr 2020 19:32:13 +0300 (EEST) Received: from linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net (linux.microsoft.com [13.77.154.182]) by linux.microsoft.com (Postfix) with ESMTPSA id 64F4720B4737 for ; Fri, 24 Apr 2020 09:32:12 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 64F4720B4737 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1587745932; bh=k+VD2vEIWDDbb0MxFnMrm5Asjf38kFyBd6VXPc5fJOY=; h=From:To:Subject:Date:From; b=pQkH7SoFctM9ccLY595Ds7pdhyKX8A6bLSF3D1I+Rz92jDkUlhabKveqWbksLazQH hRRy7I5/WBYkJUXfimCldsEIGDNNbZEmr63DAfr+6ueoK08MXt1ClMM4/jUGO0nmJg +hi00vSsQ7TQN73HZf6qGrVS2QQM+iOl8DP9T8FM= From: Nelson Gomez To: ffmpeg-devel@ffmpeg.org Date: Fri, 24 Apr 2020 09:31:40 -0700 Message-Id: <1587745903-74364-1-git-send-email-negomez@linux.microsoft.com> X-Mailer: git-send-email 1.8.3.1 Subject: [FFmpeg-devel] [PATCH v2 0/3] swscale: add AVX2 version of yuv2nv12cX X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" v2: - Addressing comments James left on iter. 1 - Cleaned up how dither gets read to avoid using stack space v1: http://ffmpeg.org/pipermail/ffmpeg-devel/2020-April/261313.html This patchset aims to optimize yuv2nv12cX_c for Intel/AMD chips by adding an AVX2 implementation of it. To support this change, the typedef declaration for yuv2interleavedX_fn has been changed to pass two additional parameters for chrDither8 and dstFormat rather than passing a pointer to the entire SwsContext. Output is bit-identical to the software implementation. Patchset validated on an Intel Xeon W-2133, Core i7-8650U, and an AMD Ryzen 1700. Passes fate tests; this patch is exercised by fate-filter-pixdesc-nv{12,21,24,42}. Benchmarks measured on the W-2133. Flags used are: -benchmark -i /dev/shm/benchmark.mp4 -pix_fmt nv42 -f null - Benchmark material is a yuv420p file: http://linux.microsoft.com/~negomez/ffmpeg/yuv420p-benchmark.mp4 Results: * Single-threaded conversion: +95% fps -cpuflags -avx2 -threads 1: frame= 9959 fps=114 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=3.79x bench: utime=87.648s stime=0.060s rtime=87.709s bench: maxrss=35020kB -cpuflags all -threads 1: frame= 9959 fps=222 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=7.39x bench: utime=44.900s stime=0.040s rtime=44.941s bench: maxrss=33048kB * Multi-threaded conversion: +197% fps -cpuflags -avx2: frame= 9959 fps=159 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=5.3x bench: utime=90.381s stime=0.430s rtime=62.663s bench: maxrss=77420kB -cpuflags all: frame= 9959 fps=473 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=15.8x bench: utime=48.625s stime=0.459s rtime=21.058s bench: maxrss=78500kB --- Nelson Gomez (3): swscale: make yuv2interleavedX more asm-friendly swscale/x86/output: add AVX2 version of yuv2nv12cX swscale: cosmetic fixes libswscale/output.c | 25 +++--- libswscale/swscale_internal.h | 8 +- libswscale/vscale.c | 2 libswscale/x86/output.asm | 124 +++++++++++++++++++++++++++++++- libswscale/x86/swscale.c | 24 ++++++ 5 files changed, 166 insertions(+), 17 deletions(-)