From patchwork Fri Mar 12 15:14:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Ecord X-Patchwork-Id: 26355 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id EBCE344B3E7 for ; Fri, 12 Mar 2021 17:23:20 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C046768AF96; Fri, 12 Mar 2021 17:23:20 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 079EF68AE4A for ; Fri, 12 Mar 2021 17:23:14 +0200 (EET) Received: by mail-oi1-f170.google.com with SMTP id o22so17462509oic.3 for ; Fri, 12 Mar 2021 07:23:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=nKldOA0cYVrjzBFA1QBXGq4kMDmoDIO+HUd4/MoBcaA=; b=lIu98z7Hcso7m0/sYIn7J6VVe7sw1cGHgVa9ExjfTELOX3tN5SVI2rxz9j7mLcPZqN mdbu/pctqsrtJLn8QaD2oEWSGtr0O61PaRg41q0UIx05dmdmNlv3Y/vK1s3okZCLxc7s pfcWLOKLJSlA30i3viJLXGT/r4KnoZsIrxAoOPY/Hi4eUeFLjxl5rX6MBVjIQd4ETXC2 jEO9K7bkXTvl7kjD0VCLdLDuo1IB5L5wEM+3NoxwFpkV3geBlEEJVpoCyc0WiSlSfbHN fIKB8P6bYrYrRMJgCk9J19Us3Mtufeo5nexvLz8GKc7O95Mn+E6cy7AG3cusKT4wYtmh wvNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=nKldOA0cYVrjzBFA1QBXGq4kMDmoDIO+HUd4/MoBcaA=; b=T0GDYFBNN+9xWZi5XmJzud4BrE3awO2PxEGt6qUo1kNaBT80FdmxMDscvKQuVs1rYh +OVB37AUKIuv0fWTNh8iPW3gD9eM4jcMde7RfLzBXxig12myRgo0hCxgu8EONg+5CMP3 StjBAs+vGqrY5x4VrwO0/UFpy7Hi2kL2Hg+Fc2uR4hwBmzaLs315FCMQ6W5zfiFBA12O QZIeELsuMGAyhoQ2YGhZrskRhKKS/rv230NTuyliCFHPIDBX3ubCfGYe0+wnbw3AxIUZ kxS0bonCmFXSPdHserykR5MWE0bRIRMjbIET/ZNLuu0qHM0N8GuUVmglY04j+2JbNRUT VAog== X-Gm-Message-State: AOAM532gtUxqwJHy5w9eE4h1UTQbtYPgqwZ1N5mk9tQsapbXHggxjsU4 v6xe+KThL/djoor48t1JhzbSZ0ZLmYZH3Q== X-Google-Smtp-Source: ABdhPJwkTTmQHlj7kOHpsyLLRsR9FZynNpsiUCF9yEEtpnUuti43TFR5x0LQSWopHH772IbxUEqSsA== X-Received: by 2002:aca:4a12:: with SMTP id x18mr10038903oia.8.1615562094722; Fri, 12 Mar 2021 07:14:54 -0800 (PST) Received: from [192.168.2.3] ([136.35.50.248]) by smtp.gmail.com with ESMTPSA id t5sm510766oog.20.2021.03.12.07.14.53 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Mar 2021 07:14:54 -0800 (PST) From: Patrick Ecord Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Message-Id: <98591987-1670-4F65-9163-1F553A0408B8@gmail.com> Date: Fri, 12 Mar 2021 09:14:52 -0600 To: ffmpeg-devel@ffmpeg.org X-Mailer: Apple Mail (2.3654.60.0.2.21) Subject: [FFmpeg-devel] [PATCH] CUDA - make it work for multiple GPU architectures X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Hello, My friend was running into issues trying to compile ffpmeg with cuda support so I tried to replicate the issue on machine with my 1070. Started by following Nvidia’s guide for compiling with CUDA support - https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/ It uses the wrong flag (`-–enable-cuda-sdk` instead of `--enable-cuda-nvcc`) got that figured out. Then when I tried to run ./configure with the right flag I got `nvcc fatal : Unsupported gpu architecture 'compute_30'` Googled that and found this github issue where one person suggested changing the `nvccflags_default` flags and they said - "I went with 75 because I'm on Turing architecture” https://github.com/NVIDIA/cuda-samples/issues/46 Started looking around for what flags I would want to use and I found this webpage that listed what cards were supported by which CUDA versions. https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ I had just installed nvcc off Nvidia’s site and it came with CUDA 11 and there was a section that had flags for CUDA 11 with compatibility for "V100 and T4 Turing cards, but also support newer RTX 3080 and other Ampere cards”. Also according to that person’s site a lot of the older cards got dropped with CUDA 8, 9, 10 and now 11 these flags should cover Maxwell and up ``` -arch=sm_52 \ -gencode=arch=compute_52,code=sm_52 \ -gencode=arch=compute_60,code=sm_60 \ -gencode=arch=compute_61,code=sm_61 \ -gencode=arch=compute_70,code=sm_70 \ -gencode=arch=compute_75,code=sm_75 \ -gencode=arch=compute_80,code=sm_80 \ -gencode=arch=compute_86,code=sm_86 \ -gencode=arch=compute_86,code=compute_86 ``` Tried that and ran configure and it failed with Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures" So I removed the `-ptx` flag and I was able to run configure and make and make install without any errors. Tested by converting Big Buck Bunny and it played fine. ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i ./Big_Buck_Bunny_1080_10s_30MB.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4 Other stuff - I am not really a CUDA expert so I am not sure if this is the "correct" way so let me know if there is a better way of doing it. I haven't tried timing it to see if there is a slow down from supporting multiple architectures and not using the -ptx flag. I saw there were also flags for clang, haven't tried messing with that yet my understanding is you can pass the flag multiple times. "You can pass --cuda-gpu-arch multiple times to compile for multiple archs." - https://llvm.org/docs/CompileCudaWithLLVM.html Wanted to send what I had and see what you all think, Thanks --- configure | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index d11942fced..d9e4eff592 100755 --- a/configure +++ b/configure @@ -4344,7 +4344,7 @@ fi if enabled cuda_nvcc; then nvcc_default="nvcc" - nvccflags_default="-gencode arch=compute_30,code=sm_30 -O2" + nvccflags_default="-arch=sm_52 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86" else nvcc_default="clang" nvccflags_default="--cuda-gpu-arch=sm_30 -O2" @@ -6240,7 +6240,7 @@ else fi if enabled cuda_nvcc; then - nvccflags="$nvccflags -ptx" + nvccflags="$nvccflags" else nvccflags="$nvccflags -S -nocudalib -nocudainc --cuda-device-only -Wno-c++11-narrowing -include ${source_link}/compat/cuda/cuda_runtime.h" check_nvcc cuda_llvm