From patchwork Thu May 30 19:43:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 35136 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp66840vqg; Thu, 30 May 2024 12:43:59 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXNr9AiuwFgR3YfiX+NrDqedkuDqLyWYxiNcksmENEQtWvc6Xc7YTBBfvinKdtq6+x609kKUQEZk8l3uBXESg4Wk+yL+EuMzL3C6w== X-Google-Smtp-Source: AGHT+IFdO7bADhCPar7pimtJfdDJjby/gAGNToj2hSKod0Ej5q1Ilcw8BNe682qsOHQaLAzbKGlp X-Received: by 2002:a17:906:d922:b0:a66:3829:ce7c with SMTP id a640c23a62f3a-a663829cebfmr201516566b.0.1717098239214; Thu, 30 May 2024 12:43:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098239; cv=none; d=google.com; s=arc-20160816; b=P1+G8rtXp3n9eRVUh2a9U9QxrYNiPXL1D43U2DnkGGNGeRLdktrHz6tHR3DM++LMXN uCEHCwLyazgq3J0KCj1fYngSa4yrP+bWuVl6V8ExKa3j3VQvtYicpp6o72Pm4KXS56Vt 4/c9NCsJCsA1Gqgh2SJqh/mPu5tJ/1HUvl04jDJY8z6xMtQDo0dTfdxVJur2gzTYpLM9 wbrWk2yRSM5DSk7h6tRGr/HL+Ckzeq0cyBmlC4zdZU5yXOUdIwwYd6Xvv9nLC86mSU1o WBlyWqPsBw8ELy7q21Tairt4JPmHUsRm44Va78FWZyUuZqnAVMnp5vwp+vC/mrPysMTT djig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=k79HIWwHQJp+mbSes4xemkZlVAysTkKbrE19xyUflMU=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Ox5JtxQIEAsZ1lyF5BqgMxwzMIiBREFy2RGAHTWj+kI9tk3wVCFyAL+OR0KYexl91d 4pPXOviiAJt2E+qIIpmCa4pW+ZOBtPFAh986P9lGnhQgTHY9aXD240ccnJ5q6iAW0jHi T0G26VTdMp3N3U28i8ORT9U+CYolHB9vIWOvIUoiOwDf6Ewq/FyZnXqtjZYae2SkFbLI /Gwvv+cT2t2l4ht7KgfzyexJDOTpQ+UobEuh533y79k/nP0JiORa8XBVpxkGgLiEwO8N PR5Q5IccqCo7tK2FKjB4E1WozkLY9WfZpZoHVm7dQWd3JDHi2zyh0Yr8RmPHI9TIfoY9 UU4A==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Cb4yuuYU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67ea990f23si8516766b.639.2024.05.30.12.43.58; Thu, 30 May 2024 12:43:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Cb4yuuYU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B507E68D463; Thu, 30 May 2024 22:43:55 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6DA2C68CC71 for ; Thu, 30 May 2024 22:43:49 +0300 (EEST) Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-420180b5838so8799795e9.2 for ; Thu, 30 May 2024 12:43:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098229; x=1717703029; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=yObjKBVvylzPIMp5PfIMMjvie0k/z5VOV4V0Mq+HM9c=; b=Cb4yuuYUmpNIA1DtME5zZed7CLPeoTip9kgQJqe5DeD5sTXALj8xrKV1T/9pzfCTz3 pnvkJupVDPrNuYtVpwD3fgqi9LzXGPRXp7y0u77ejgh6s2twKXtLD1T9FYhmR/NHKlmW AAgAUfjpQ2V+7znjdh7+pBWyw5xY78agwFWKCRuA62CltDvNgxFqlA6KFMvU5CxuH6eE 8DI/fu634n1EQngIbOERb3vIiWwlQvRNgQwJ1+LZpTc2kWb4Y2e8j6JPHdwRFB3cZ0Q7 Ahr+8GUc2sKDYupnYWBHTcgGa5aYKf4G7msO4SBY1NVagdY1QtAJKFbenVLjs72kznTm 2Npw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098229; x=1717703029; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yObjKBVvylzPIMp5PfIMMjvie0k/z5VOV4V0Mq+HM9c=; b=QOxQSJ838huz/7giimHZbqPZ26XfVGgMVhAk8xmtK72hSEl6DFrx3BgiewPrQKyJnq YuRHxaNNapzt6ndYHb9yev2CXx+NFqGxIzl8gUGjkHSzK3zuzNoefgntzlzKKucuPexY WpWPyl1o+IAnwutMCpef5MuGGrCM52diOX6aLCpjs0bPb9QTiUjGb4mtb2OpuRwhvk7p RgQd5djdtmvtKd5st9ZXPoyq7K3a2z2o55T4lTz7QiwG+2XiihjSgghS8wSUv7lLcW0i QmR8W6U1RzPBbpYutr5QzSVXqZ3DBPglUv9kdv9PmV6Uk01d2hyOF3LXBnWnGKKdhRUx SOfQ== X-Gm-Message-State: AOJu0Yzc5xJpjJd6nuRtsA75olZet0hcal4QXfl6tkjUzGDEtlUYEVbV 6eS+zvlatp32lDmFYBT2bSzDFpmcO3nTBmAimnWCObncy9FFXVBSxyGa9A== X-Received: by 2002:a05:600c:3b87:b0:421:29d4:304a with SMTP id 5b1f17b1804b1-42129d43277mr14416715e9.12.1717098228541; Thu, 30 May 2024 12:43:48 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.43.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:43:48 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:02 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Z7S8eHxJMFbj Hi all, This patch series implements a hardware decoding backend for nvidia Tegra devices, notably the Nintendo Switch. It was primarily written for HorizonOS (Nintendo Switch OS), but also supports nvidia's Linux4Tegra distro. As for hardware, all Tegras later than the X1 (T210) should be supported, although the patch does not implement features that were added to subsequent revisions of multimedia engines (eg. 12-bit HEVC). However, since I only own T210 devices (Switch and jetson nano), I was not able to verify this. The backend is essentially a userspace NVDEC driver, as due to the OS design of the Switch, we cannot link to nvidia's system libraries. It notably uses (sparse) hardware documentation released by nvidia here: https://github.com/NVIDIA/open-gpu-doc/tree/master/classes/video. It supports all codecs available in hardware (MPEG1/2/4, VC1, H264, HEVC, VP8, VP9 and JPEG), with dynamic frequency scaling, and hardware-accelerated frame transfer. At the moment I'm submitting the series with some nvidia headers pulled from various sources, but I do think they should rather be put in nv-codec-headers, let me know. The code was tested for memory bugs and leaks with valgrind and asan on L4T. Some quick performance testing (decoding with -f null -) showed results in line with official software, tested against the nvv4l2 backend that was posted here a while ago: https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2020-June/263759.html. Note that the numbers are skewed because frame transfer cannot be disabled in nvidia's backend. - HEVC Main 10 @ 4k (~80Mbps): nvtegra 79fps, nvv4l2 66fps - HEVC Main 10 @ 1080p (~5Mbps): nvtegra 402fps, nvv4l2 229fps - H264 @ 1080p (~3Mbps): nvtegra 286fps, nvv4l2 260fps Several homebrew applications have been using this backend for some time, with no bugs reported. As far as I'm aware, this is the complete list of them: - NXMP, a media player based on mpv: https://github.com/proconsule/nxmp - WiliWili, a bilibili client: https://github.com/xfangfang/wiliwili - Switchfin, a Jellyfin client: https://github.com/dragonflylee/switchfin - Moonlight-Switch, a Moonlight client: https://github.com/XITRIX/Moonlight-Switch - chiaki: https://git.sr.ht/~kkwong/chiaki/ - My own media player, unreleased at this time Nintendo Switch support assumes a working devkitA64 homebrew environment, instructions regarding setup can be found here: https://devkitpro.org/wiki/devkitPro_pacman. The hwaccel can then be configured by eg.: ``` source /opt/devkitpro/switchvars.sh && ./configure --cross-prefix=aarch64-none-elf- --enable-cross-compile --arch=aarch64 --cpu=cortex-a57 --target-os=horizon --enable-pic --enable-gpl --enable-nvtegra ``` It should probably be noted that NVDEC usage on discrete gpus is very similar. As far as I know, the main difference is that the interfacing is done through the GPFIFO block (same block that manages the 3D engine), instead of host1x. Thank you for your consideration. averne (16): avutil/buffer: add helper to allocate aligned memory configure,avutil: add support for HorizonOS avutil: add ioctl definitions for tegra devices avutil: add hardware definitions for NVDEC, NVJPG and VIC avutil: add common code for nvtegra avutil: add nvtegra hwcontext hwcontext_nvtegra: add dynamic frequency scaling routines nvtegra: add common hardware decoding code nvtegra: add mpeg1/2 hardware decoding nvtegra: add mpeg4 hardware decoding nvtegra: add vc1 hardware decoding nvtegra: add h264 hardware decoding nvtegra: add hevc hardware decoding nvtegra: add vp8 hardware decoding nvtegra: add vp9 hardware decoding nvtegra: add mjpeg hardware decoding configure | 30 + libavcodec/Makefile | 11 + libavcodec/h263dec.c | 6 + libavcodec/h264_slice.c | 6 +- libavcodec/h264dec.c | 3 + libavcodec/hevcdec.c | 17 +- libavcodec/hevcdec.h | 2 + libavcodec/hwaccels.h | 10 + libavcodec/hwconfig.h | 2 + libavcodec/mjpegdec.c | 6 + libavcodec/mpeg12dec.c | 12 + libavcodec/mpeg4videodec.c | 3 + libavcodec/nvtegra_decode.c | 517 +++++++++ libavcodec/nvtegra_decode.h | 94 ++ libavcodec/nvtegra_h264.c | 506 +++++++++ libavcodec/nvtegra_hevc.c | 633 +++++++++++ libavcodec/nvtegra_mjpeg.c | 336 ++++++ libavcodec/nvtegra_mpeg12.c | 319 ++++++ libavcodec/nvtegra_mpeg4.c | 344 ++++++ libavcodec/nvtegra_vc1.c | 455 ++++++++ libavcodec/nvtegra_vp8.c | 334 ++++++ libavcodec/nvtegra_vp9.c | 665 ++++++++++++ libavcodec/vc1dec.c | 9 + libavcodec/vp8.c | 6 + libavcodec/vp9.c | 10 +- libavutil/Makefile | 9 + libavutil/buffer.c | 31 + libavutil/buffer.h | 7 + libavutil/clb0b6.h | 303 ++++++ libavutil/clc5b0.h | 436 ++++++++ libavutil/cle7d0.h | 129 +++ libavutil/cpu.c | 7 + libavutil/hwcontext.c | 4 + libavutil/hwcontext.h | 1 + libavutil/hwcontext_internal.h | 1 + libavutil/hwcontext_nvtegra.c | 1045 ++++++++++++++++++ libavutil/hwcontext_nvtegra.h | 92 ++ libavutil/nvdec_drv.h | 1858 ++++++++++++++++++++++++++++++++ libavutil/nvhost_ioctl.h | 511 +++++++++ libavutil/nvjpg_drv.h | 189 ++++ libavutil/nvmap_ioctl.h | 451 ++++++++ libavutil/nvtegra.c | 1035 ++++++++++++++++++ libavutil/nvtegra.h | 258 +++++ libavutil/nvtegra_host1x.h | 94 ++ libavutil/pixdesc.c | 4 + libavutil/pixfmt.h | 8 + libavutil/vic_drv.h | 279 +++++ 47 files changed, 11085 insertions(+), 3 deletions(-) create mode 100644 libavcodec/nvtegra_decode.c create mode 100644 libavcodec/nvtegra_decode.h create mode 100644 libavcodec/nvtegra_h264.c create mode 100644 libavcodec/nvtegra_hevc.c create mode 100644 libavcodec/nvtegra_mjpeg.c create mode 100644 libavcodec/nvtegra_mpeg12.c create mode 100644 libavcodec/nvtegra_mpeg4.c create mode 100644 libavcodec/nvtegra_vc1.c create mode 100644 libavcodec/nvtegra_vp8.c create mode 100644 libavcodec/nvtegra_vp9.c create mode 100644 libavutil/clb0b6.h create mode 100644 libavutil/clc5b0.h create mode 100644 libavutil/cle7d0.h create mode 100644 libavutil/hwcontext_nvtegra.c create mode 100644 libavutil/hwcontext_nvtegra.h create mode 100644 libavutil/nvdec_drv.h create mode 100644 libavutil/nvhost_ioctl.h create mode 100644 libavutil/nvjpg_drv.h create mode 100644 libavutil/nvmap_ioctl.h create mode 100644 libavutil/nvtegra.c create mode 100644 libavutil/nvtegra.h create mode 100644 libavutil/nvtegra_host1x.h create mode 100644 libavutil/vic_drv.h --- 2.45.1