diff mbox series

[FFmpeg-devel] CUDA - make it work for multiple GPU architectures

Message ID 98591987-1670-4F65-9163-1F553A0408B8@gmail.com
State New
Headers show
Series [FFmpeg-devel] CUDA - make it work for multiple GPU architectures
Related show

Checks

Context Check Description
andriy/configure warning Failed to apply patch
andriy/configure warning Failed to apply patch

Commit Message

Patrick Ecord March 12, 2021, 3:14 p.m. UTC
Hello, 

My friend was running into issues trying to compile ffpmeg with cuda support so I tried to replicate the issue on machine with my 1070.

Started by following Nvidia’s guide for compiling with CUDA support -
https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/

It uses the wrong flag (`-–enable-cuda-sdk` instead of `--enable-cuda-nvcc`) got that figured out.

Then when I tried to run ./configure with the right flag I got `nvcc fatal : Unsupported gpu architecture 'compute_30'`

Googled that and found this github issue where one person suggested changing the `nvccflags_default` flags and they said - "I went with 75 because I'm on Turing architecture”
https://github.com/NVIDIA/cuda-samples/issues/46 

Started looking around for what flags I would want to use and I found this webpage that listed what cards were supported by which CUDA versions.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I had just installed nvcc off Nvidia’s site and it came with CUDA 11 and there was a section that had flags for CUDA 11 with compatibility for "V100 and T4 Turing cards, but also support newer RTX 3080 and other Ampere cards”.

Also according to that person’s site a lot of the older cards got dropped with CUDA 8, 9, 10 and now 11 these flags should cover Maxwell and up

```
-arch=sm_52 \ 
-gencode=arch=compute_52,code=sm_52 \ 
-gencode=arch=compute_60,code=sm_60 \     
-gencode=arch=compute_61,code=sm_61 \ 
-gencode=arch=compute_70,code=sm_70 \ 
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_86,code=compute_86
```

Tried that and ran configure and it failed with Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures"

So I removed the `-ptx` flag and I was able to run configure and make and make install without any errors.

Tested by converting Big Buck Bunny and it played fine.
ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i ./Big_Buck_Bunny_1080_10s_30MB.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

Other stuff - 
I am not really a CUDA expert so I am not sure if this is the "correct" way so let me know if there is a better way of doing it.
I haven't tried timing it to see if there is a slow down from supporting multiple architectures and not using the -ptx flag.
I saw there were also flags for clang, haven't tried messing with that yet my understanding is you can pass the flag multiple times.
"You can pass --cuda-gpu-arch multiple times to compile for multiple archs." - https://llvm.org/docs/CompileCudaWithLLVM.html

Wanted to send what I had and see what you all think, 
Thanks

---
configure | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Timo Rothenpieler March 12, 2021, 5:15 p.m. UTC | #1
On 12.03.2021 16:14, Patrick Ecord wrote:
> Tried that and ran configure and it failed with Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures"
> 
> So I removed the `-ptx` flag and I was able to run configure and make and make install without any errors.

FFmpeg embeds the ptx assembly code. Removing the -ptx option WILL 
breaks any and all CUDA filters.

> Tested by converting Big Buck Bunny and it played fine.
> ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i ./Big_Buck_Bunny_1080_10s_30MB.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

This only works because you are not actually using any CUDA filters.
There is zero need for nvcc support for that commandline.

On top of that, just use clang and don't bother with the Nvidia SDK 
unless you are developing filters.
diff mbox series

Patch

diff --git a/configure b/configure
index d11942fced..d9e4eff592 100755
--- a/configure
+++ b/configure
@@ -4344,7 +4344,7 @@  fi

if enabled cuda_nvcc; then
    nvcc_default="nvcc"
-    nvccflags_default="-gencode arch=compute_30,code=sm_30 -O2"
+    nvccflags_default="-arch=sm_52 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86"
else
    nvcc_default="clang"
    nvccflags_default="--cuda-gpu-arch=sm_30 -O2"
@@ -6240,7 +6240,7 @@  else
fi

if enabled cuda_nvcc; then
-    nvccflags="$nvccflags -ptx"
+    nvccflags="$nvccflags"
else
    nvccflags="$nvccflags -S -nocudalib -nocudainc --cuda-device-only -Wno-c++11-narrowing -include ${source_link}/compat/cuda/cuda_runtime.h"
    check_nvcc cuda_llvm