From patchwork Fri Mar 12 15:14:52 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Patrick Ecord <pecord@gmail.com>
X-Patchwork-Id: 26355
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id EBCE344B3E7
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 12 Mar 2021 17:23:20 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C046768AF96;
	Fri, 12 Mar 2021 17:23:20 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com
 [209.85.167.170])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 079EF68AE4A
 for <ffmpeg-devel@ffmpeg.org>; Fri, 12 Mar 2021 17:23:14 +0200 (EET)
Received: by mail-oi1-f170.google.com with SMTP id o22so17462509oic.3
 for <ffmpeg-devel@ffmpeg.org>; Fri, 12 Mar 2021 07:23:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:content-transfer-encoding:mime-version:subject:message-id:date
 :to; bh=nKldOA0cYVrjzBFA1QBXGq4kMDmoDIO+HUd4/MoBcaA=;
 b=lIu98z7Hcso7m0/sYIn7J6VVe7sw1cGHgVa9ExjfTELOX3tN5SVI2rxz9j7mLcPZqN
 mdbu/pctqsrtJLn8QaD2oEWSGtr0O61PaRg41q0UIx05dmdmNlv3Y/vK1s3okZCLxc7s
 pfcWLOKLJSlA30i3viJLXGT/r4KnoZsIrxAoOPY/Hi4eUeFLjxl5rX6MBVjIQd4ETXC2
 jEO9K7bkXTvl7kjD0VCLdLDuo1IB5L5wEM+3NoxwFpkV3geBlEEJVpoCyc0WiSlSfbHN
 fIKB8P6bYrYrRMJgCk9J19Us3Mtufeo5nexvLz8GKc7O95Mn+E6cy7AG3cusKT4wYtmh
 wvNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:content-transfer-encoding:mime-version
 :subject:message-id:date:to;
 bh=nKldOA0cYVrjzBFA1QBXGq4kMDmoDIO+HUd4/MoBcaA=;
 b=T0GDYFBNN+9xWZi5XmJzud4BrE3awO2PxEGt6qUo1kNaBT80FdmxMDscvKQuVs1rYh
 +OVB37AUKIuv0fWTNh8iPW3gD9eM4jcMde7RfLzBXxig12myRgo0hCxgu8EONg+5CMP3
 StjBAs+vGqrY5x4VrwO0/UFpy7Hi2kL2Hg+Fc2uR4hwBmzaLs315FCMQ6W5zfiFBA12O
 QZIeELsuMGAyhoQ2YGhZrskRhKKS/rv230NTuyliCFHPIDBX3ubCfGYe0+wnbw3AxIUZ
 kxS0bonCmFXSPdHserykR5MWE0bRIRMjbIET/ZNLuu0qHM0N8GuUVmglY04j+2JbNRUT
 VAog==
X-Gm-Message-State: AOAM532gtUxqwJHy5w9eE4h1UTQbtYPgqwZ1N5mk9tQsapbXHggxjsU4
 v6xe+KThL/djoor48t1JhzbSZ0ZLmYZH3Q==
X-Google-Smtp-Source: 
 ABdhPJwkTTmQHlj7kOHpsyLLRsR9FZynNpsiUCF9yEEtpnUuti43TFR5x0LQSWopHH772IbxUEqSsA==
X-Received: by 2002:aca:4a12:: with SMTP id x18mr10038903oia.8.1615562094722;
 Fri, 12 Mar 2021 07:14:54 -0800 (PST)
Received: from [192.168.2.3] ([136.35.50.248])
 by smtp.gmail.com with ESMTPSA id t5sm510766oog.20.2021.03.12.07.14.53
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 12 Mar 2021 07:14:54 -0800 (PST)
From: Patrick Ecord <pecord@gmail.com>
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\))
Message-Id: <98591987-1670-4F65-9163-1F553A0408B8@gmail.com>
Date: Fri, 12 Mar 2021 09:14:52 -0600
To: ffmpeg-devel@ffmpeg.org
X-Mailer: Apple Mail (2.3654.60.0.2.21)
Subject: [FFmpeg-devel] [PATCH] CUDA - make it work for multiple GPU
	architectures
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Hello, 

My friend was running into issues trying to compile ffpmeg with cuda support so I tried to replicate the issue on machine with my 1070.

Started by following Nvidia’s guide for compiling with CUDA support -
https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/

It uses the wrong flag (`-–enable-cuda-sdk` instead of `--enable-cuda-nvcc`) got that figured out.

Then when I tried to run ./configure with the right flag I got `nvcc fatal : Unsupported gpu architecture 'compute_30'`

Googled that and found this github issue where one person suggested changing the `nvccflags_default` flags and they said - "I went with 75 because I'm on Turing architecture”
https://github.com/NVIDIA/cuda-samples/issues/46 

Started looking around for what flags I would want to use and I found this webpage that listed what cards were supported by which CUDA versions.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I had just installed nvcc off Nvidia’s site and it came with CUDA 11 and there was a section that had flags for CUDA 11 with compatibility for "V100 and T4 Turing cards, but also support newer RTX 3080 and other Ampere cards”.

Also according to that person’s site a lot of the older cards got dropped with CUDA 8, 9, 10 and now 11 these flags should cover Maxwell and up

```
-arch=sm_52 \ 
-gencode=arch=compute_52,code=sm_52 \ 
-gencode=arch=compute_60,code=sm_60 \     
-gencode=arch=compute_61,code=sm_61 \ 
-gencode=arch=compute_70,code=sm_70 \ 
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_86,code=compute_86
```

Tried that and ran configure and it failed with Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures"

So I removed the `-ptx` flag and I was able to run configure and make and make install without any errors.

Tested by converting Big Buck Bunny and it played fine.
ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i ./Big_Buck_Bunny_1080_10s_30MB.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

Other stuff - 
I am not really a CUDA expert so I am not sure if this is the "correct" way so let me know if there is a better way of doing it.
I haven't tried timing it to see if there is a slow down from supporting multiple architectures and not using the -ptx flag.
I saw there were also flags for clang, haven't tried messing with that yet my understanding is you can pass the flag multiple times.
"You can pass --cuda-gpu-arch multiple times to compile for multiple archs." - https://llvm.org/docs/CompileCudaWithLLVM.html

Wanted to send what I had and see what you all think, 
Thanks
---
configure | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index d11942fced..d9e4eff592 100755
--- a/configure
+++ b/configure
@@ -4344,7 +4344,7 @@ fi

if enabled cuda_nvcc; then
    nvcc_default="nvcc"
-    nvccflags_default="-gencode arch=compute_30,code=sm_30 -O2"
+    nvccflags_default="-arch=sm_52 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86"
else
    nvcc_default="clang"
    nvccflags_default="--cuda-gpu-arch=sm_30 -O2"
@@ -6240,7 +6240,7 @@ else
fi

if enabled cuda_nvcc; then
-    nvccflags="$nvccflags -ptx"
+    nvccflags="$nvccflags"
else
    nvccflags="$nvccflags -S -nocudalib -nocudainc --cuda-device-only -Wno-c++11-narrowing -include ${source_link}/compat/cuda/cuda_runtime.h"
    check_nvcc cuda_llvm