From patchwork Thu May 30 19:43:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49424 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp71594vqg; Thu, 30 May 2024 12:55:23 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW1PqGbcmE4auPku0YrAVf7+vEbCt9G9tY2W+jXxodeiwcDtSIf7wtUgFkfOTfreAo6wQQRSuWdgWTVVYNwTLHWyN6N+QDYv0A8gg== X-Google-Smtp-Source: AGHT+IFBrTR96dX2T/tuHTA57eV62p7j95ljl5xPwEmd3NnRHQ9+LJxEgnhN0Hoi03V1gmog0pu0 X-Received: by 2002:a50:cc94:0:b0:57a:234d:abc0 with SMTP id 4fb4d7f45d1cf-57a234dada0mr2061087a12.1.1717098923292; Thu, 30 May 2024 12:55:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098923; cv=none; d=google.com; s=arc-20160816; b=ImrsPP6q/LLSRh9QltPAsettADy1oeh0ytLAOYiRO8c4euC63XWHiJe2QM3GPIaQQY 5o7izfIdGyU3trCyZlJ9XkMi9llCdhcnsqfxVBW5HYPKJLR76BtsMUxRJpYUC6toPFb0 NeYGSfblfhSoWxe757DKvOHIPR8oTmofIj/080S7ocdyAdrRps+as9pXMLICpxmb0uvL 1binjWujD0uBchWv7k0c/5XUOsF6MZvIE1BAxV2gnbjxT3hUTUXQKGM2t+yMqDOzVchQ 9GIiBr3FWUp7yQUZULfqcWYeSY02ddGoZIeDyzqXKGqGL15zveU65DAvFJws6t7263Vc 5Ixg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=IjNyz1mw7L7K/f5k8sJIQ2P4e3oim0+0GC6IbFp8xso=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=KcZn/09V1vzclYw5MTOpJHeVbsHxPQBQQN59IM+N4pi4ahOXuBavwlcjWqHs0W9XiH h1hNC9sEStUQyh5PriwIPMOpnZTpjc1Mv9xt/NoLiypgJXC0raftaL3q/1IBsns9CYEI D6sblGy6+CQoJQ/YFDOaO/4/Hq9h7v6e2RFUD8+j5v1zdUp/l8dGVUSHxDP8jpNJSsWD ZPDCqk9PCa/MNPEjkPhz1q8vUYuN3gGvomBl30o4Wl4rzs8OAFEAagRL9mMxUwtMK1YR XZnHBib12y64FfYYGwDiX9Qcy/Xu4rCMdk/9Am5gOrbFlQ6lEC7WdFyoYoWBRVgyB5qd HO2g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=U+ObnCex; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57a31ccc8fbsi146770a12.676.2024.05.30.12.55.22; Thu, 30 May 2024 12:55:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=U+ObnCex; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 41BAB68D4EF; Thu, 30 May 2024 22:43:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 721F568D463 for ; Thu, 30 May 2024 22:43:50 +0300 (EEST) Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-35dc984b3d2so811347f8f.1 for ; Thu, 30 May 2024 12:43:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098230; x=1717703030; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f3Tzej92HGcSZDK4MpZ8SuzXRAZTE1rwsfuVQhDzbUw=; b=U+ObnCex97BUsjlfwr+cStsvXPqbjCPsJAta4Ccsv96pvL2PvWJ645eMFdXMGQZ2Xk SQ/rsiiiFdkUtHUzAgUotfabz2pB7v2IRNmY7tXUMUAUhd0KN54CvlPhOU3wTEnPFHqu S4Z7bzes5yeogRI+Tgtj+NjCNNTBlhtPaMH8r9L5ZboLVNJ634eh2u3nni/XUahgUjgh TnLwhPymRF42vYq8Ldc5QLm+BFbroW5Qg6ez/10/ahwY7YqfKp3ehl8H5YDm/LmVD1+W lEGQLKOCogqOC2rmrj8koafykN/dFpNnuN8lHqgtwKfER+NNz4AUSnktnLVlEvrzTT2r q1wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098230; x=1717703030; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f3Tzej92HGcSZDK4MpZ8SuzXRAZTE1rwsfuVQhDzbUw=; b=kC5XMr6Fjx61xb4Uk4pY+82Z8oGcCU16b4Pq3ei2MBij1j2wxLw9hqA0FLAqbDvRaT Nm9YJmZ44wKSns7BBtiGcUh+PAvdg+mTxfgdlluChGLzyNJZJkwTwhEy9HTOyDgFohCz TfcM3mo5td8skOd0n0n6TiT55+STcNpRNJ/aiwl4Q9YXXudr43l0vgdpJdxDca1ajn9W mdI/AX7Jo8GiqlTVTbVNAaV8Y3WiFKSBzGMCKnp75OVZAeG/tSLX/UPY6nj8q1POvVCp 9wM+C9GJXxS3kWzrHHHQmDyhsLHzynHieqlBnimPqQpO6n8REr0vqPDVBsl4uRZYz0s5 2LXg== X-Gm-Message-State: AOJu0YwkCx+9/RVQxzgRT9BBvGRmMr8P2Qsoms+fYKPAr13o1fQNjR9y K2OQ30sJtfMp/ekqHqPmfM3v/lK+xeMqSUlXhud3p14qzcO+/wA+/ipmQg== X-Received: by 2002:a05:6000:4c6:b0:355:161:b7ec with SMTP id ffacd0b85a97d-35dc0087e81mr2911667f8f.14.1717098229594; Thu, 30 May 2024 12:43:49 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.43.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:43:49 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:03 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: e1CgaUn7VAwP This is useful eg. for memory-mapped buffers that need page-aligned memory, when dealing with hardware devices Signed-off-by: averne --- libavutil/buffer.c | 31 +++++++++++++++++++++++++++++++ libavutil/buffer.h | 7 +++++++ 2 files changed, 38 insertions(+) diff --git a/libavutil/buffer.c b/libavutil/buffer.c index e4562a79b1..b8e357f540 100644 --- a/libavutil/buffer.c +++ b/libavutil/buffer.c @@ -16,9 +16,14 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include "config.h" + #include #include #include +#if HAVE_MALLOC_H +#include +#endif #include "avassert.h" #include "buffer_internal.h" @@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size) return ret; } +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align) +{ + AVBufferRef *ret = NULL; + uint8_t *data = NULL; + +#if HAVE_POSIX_MEMALIGN + if (posix_memalign((void **)&data, align, size)) + return NULL; +#elif HAVE_ALIGNED_MALLOC + data = aligned_alloc(align, size); +#elif HAVE_MEMALIGN + data = memalign(align, size); +#else + return NULL; +#endif + + if (!data) + return NULL; + + ret = av_buffer_create(data, size, av_buffer_default_free, NULL, 0); + if (!ret) + av_freep(&data); + + return ret; +} + AVBufferRef *av_buffer_ref(const AVBufferRef *buf) { AVBufferRef *ret = av_mallocz(sizeof(*ret)); diff --git a/libavutil/buffer.h b/libavutil/buffer.h index e1ef5b7f07..8422ec3453 100644 --- a/libavutil/buffer.h +++ b/libavutil/buffer.h @@ -107,6 +107,13 @@ AVBufferRef *av_buffer_alloc(size_t size); */ AVBufferRef *av_buffer_allocz(size_t size); +/** + * Allocate an AVBuffer of the given size and alignment. + * + * @return an AVBufferRef of given size or NULL when out of memory + */ +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align); + /** * Always treat the buffer as read-only, even when it has only one * reference. From patchwork Thu May 30 19:43:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49412 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp66986vqg; Thu, 30 May 2024 12:44:18 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWd9BKMScB+bsSM82nH++sZP/CvOJ3KiDbL7yFeEylnGqbkx5QaXov53ds+mWSPWteFOPYsCc0X6h2a3M60HhovUiO+nod7BcyaUQ== X-Google-Smtp-Source: AGHT+IFJGn3SxRI8aXoxmfCq6w+ApkeGILbly7Yb1bDvsT/aRp/xhS5Oe5sBAuJlqOuhPS4gXQnp X-Received: by 2002:a17:906:619:b0:a62:49ba:8242 with SMTP id a640c23a62f3a-a65e9215dd7mr194035666b.77.1717098258164; Thu, 30 May 2024 12:44:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098258; cv=none; d=google.com; s=arc-20160816; b=zQkI1bCsg1w464bye+S16TNBSRTBKg6Ftbp89vUfkVMw3H++bB+7u76fT0YG7Wgj12 8g2Z0JnZyqQvK9gvMnRh/UzkddyXZU0OGO+JXzTkVfl019an+cmtp1KeizzFiqVdsSeg 7GwYidIJ+dpYL9J11VaahfDbdj+i1X1rzY8EOiJ1REvBEVvGyUQbE13Wy60UZvnDbj5p IM0oUfRwNC+5ALMt00BccFaj89Ich48x7GT/wi4QJ6v5Hbj9xK3AVKarTKKoJqOateuR vw5MCID7eh9OJpJNloTAHrQBO6W4OHtE3m/Ihuiyobyyr4HjWM7GMGW4aPuYkFDp0I/9 zp5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=LWDbWPrhn5sNcOsyf4cLpl5GoPcmsn/1aySAyq23o0s=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=XSlnrIp1Q/aK9Ztt3FczL6Khb7gMMxBVSyYeejXBCjWlxu6uz/u+5QnNfz/sX3hMJ6 lAxhvMNjnJ3Ml4wP62ExmMPTI+7k+TP+fXUSMJUX8zf3zB86WkKXISIZw97iVUHYny43 WxJKYiAa5m3FaScyA9i4dT61rEPECkYasD5RvKHd82gYTShfJ/qHo0b0c+CMa6fHwSjn AuiA5Hjp3g9XjMqsMf08wzgd/fRZCT7uq/mEZ3zd4gahvfjVb3AcWd1ykPzK+NFVo7OX GRu6S33lVAlGzyQeYLwcnB5YpjI9oJfwIMgyHBX7ox2LSIfJX+htOcjwG1QWyHJZi5Od kIwg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=JCeIG++q; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67e6f024efsi9159466b.76.2024.05.30.12.44.17; Thu, 30 May 2024 12:44:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=JCeIG++q; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2720C68C0E8; Thu, 30 May 2024 22:43:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 60D2168D499 for ; Thu, 30 May 2024 22:43:51 +0300 (EEST) Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-354f14bd80cso845102f8f.1 for ; Thu, 30 May 2024 12:43:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098231; x=1717703031; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YQokBrOZJDgr0wSMeCnmVVhgGHDGob+KoTnF5Za6/CI=; b=JCeIG++qjotQbwADnepsoR7V/EM/rhFtQTpB/jfuZBLjz9eSbeDLYC2PGSmsrhQS2B mEcTajlayNJozQ3pCguwCpiPW5PRyIh68YCv+W/bKiZVJ1SltlNk20zjBFAfc0Kxa7PJ nbRGt2Bg6SI3Eh6qzswtxHU8ir6rdBZqK/HrL1WIe4KgEPxVE+3dUgLQHzsFUFhoqmP9 XmlQXTw5cCxLyJV4tKsrlFsgUfJtnAjwVkE/CnJa0hXeS5AsE/ZqgQxTU0Jkt50AUK9c 1HokjxK6zYAVEdBguCPjY7IYMwgMdohWhF4/0mzy+dc2JulEVTroxNQVv6/Tx2/QZafQ vDFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098231; x=1717703031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YQokBrOZJDgr0wSMeCnmVVhgGHDGob+KoTnF5Za6/CI=; b=XPbomaFJecRafgLHbk66kDVqYmEWSuNWLyQ5qESx7dw64Lze7SUUDb6MTNspVFSCJ7 64b0aE3fP+lZMo9g2LgmM+yTH2Wv6PH6fR1+8Q2fs7+IgXCr3eviT9KECqE8abv6AIhK oId0MCw4D/SvZ6VISrq41uwhhTssRXnswQnOK63Ny1VzcpgyLeAtTHe39Y3C7A8NuU6H zQFdEVFgTDW8BXttm4Ob+D4Se5a47NKNGbVVR+l1ZXkT2RxF+mo8k/cE2BbIqRnmtzSc 39zRZAR3l6x4ynP7n0iaQ6NUG1X2V15R92zJtwLSkyqt3UuGQg/i5MNBnvcRiL693kLx cgYA== X-Gm-Message-State: AOJu0Yz9sLQRwjzmlzAITArxI5/C5Qfcuk39m3lJ68M38NZZigjHf4Fc xHczUOS4drgwq2ZrgaAzr3iCeW5dZm2D4oC2sv/uE6JYW6TmcW7zrvOKig== X-Received: by 2002:adf:ef4c:0:b0:354:fc65:39d6 with SMTP id ffacd0b85a97d-35dc00923c6mr2455352f8f.26.1717098230668; Thu, 30 May 2024 12:43:50 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.43.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:43:50 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:04 +0200 Message-ID: <2071da3c6620fd5ca9dd769a467f248796a51f67.1717083799.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: XgF1LJP1JvQr HorizonOS (HOS) is the operating system of the Nintendo Switch. This patch enables integration with the homebrew toolchain developped by the devkitPro team. Its two main components are devkitA64 (common toolchain for aarch64 targets) and libnx (library implementing interaction with the HOS kernel and system daemons, termed sysmodules). Signed-off-by: averne --- configure | 8 ++++++++ libavutil/cpu.c | 7 +++++++ 2 files changed, 15 insertions(+) diff --git a/configure b/configure index 96b181fd21..09fb2aed1b 100755 --- a/configure +++ b/configure @@ -5967,6 +5967,10 @@ case $target_os in ;; minix) ;; + horizon) + enable section_data_rel_ro + add_extralibs -lnx + ;; none) ;; *) @@ -7710,6 +7714,10 @@ haiku) disable memalign fi ;; +horizon) + disable sysctl + disable sysctlbyname + ;; esac flatten_extralibs(){ diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 9ac2f01c20..6a77df5e34 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -48,6 +48,9 @@ #if HAVE_UNISTD_H #include #endif +#ifdef __SWITCH__ +#include +#endif static atomic_int cpu_flags = -1; static atomic_int cpu_count = -1; @@ -247,6 +250,10 @@ int av_cpu_count(void) #elif HAVE_WINRT GetNativeSystemInfo(&sysinfo); nb_cpus = sysinfo.dwNumberOfProcessors; +#elif defined(__SWITCH__) + u64 core_mask = 0; + Result rc = svcGetInfo(&core_mask, InfoType_CoreMask, CUR_PROCESS_HANDLE, 0); + nb_cpus = R_SUCCEEDED(rc) ? av_popcount64(core_mask) : 3; #endif if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed)) From patchwork Thu May 30 19:43:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49413 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67061vqg; Thu, 30 May 2024 12:44:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXzwxKwmq53cchJCcQblWd3huOAE5LoYUBW2KMLfkq4i8+W5m5HS6zp9Fea85ET+UB+jgiXrznkpKqRv6coKK+buo7ntr7RyxKnbw== X-Google-Smtp-Source: AGHT+IGUL04SiW+r+/1YfgPdvEtJ6FiAZjATt/H4RugiL5/4OD1AhwashKttCgrEFvvud3omyxBs X-Received: by 2002:a2e:2a84:0:b0:2ea:91c1:f3d5 with SMTP id 38308e7fff4ca-2ea91c20a3dmr4520031fa.15.1717098268731; Thu, 30 May 2024 12:44:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098268; cv=none; d=google.com; s=arc-20160816; b=nHU2xNyoX3ovQ0Mis477uJrARloQJj5IPtw14UgB1P7x2KhsAhAI9Vd289IZGmje7l cAwvYlqO9Psq1JR2vduMzZ2HdOQhjpRZNF9dT88L9xlBTjlBrd+04auLRJi7eVXOqyY8 6HZbc8ci/gBxPyfTob92D7i4rOaYG+A/WoO2kYclYcgRhVZpnTDCxuFs29jW63bPxvR5 tnfWbYm94Qtutu2t6ZbQvw7n2xLZL+9TljZyEyp+lOqEHA1aomgsMr4gvxfoC4bucCL8 aZPEzBbyIUA0XBqT+BsrJrh2tGiAk0lPEW+Rdn94wxRPn7uVda1IkpHOZ5ORN5dc8zPv JZDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=Rb6UMSbmr0J3TuLKp3PgZUnkUSlepoBVsQA4nJqh/Kg=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=a+1KPWoM1f/oXRV55UuO7acFWWLmE80kfqI3d1TnwA/dGe3Qklax57dBYcEsStkQKr FWLFS8KYLl/ErUmYU1M4U5TqCQoeIAzlh5npwzrjYhY/34brx1PkvFFAemIg7CG6vmnI nvuDaQscxZ4PBUFaEzkvGz27dFaUKCtbcSDb2fVl1RkakmZqzFu/L+p3RK/1l1lQbBmO XmBRPxTA2aP0c9+zmmc6LntAdd03L6lIvSXC5WP0M52O5d/wNLlbYe8yvqSdlew/2dY3 KLuCM3iABEY0ptcn/l8QosHkcVXvOc5/sHlAEbsfPv+iDaQWlcnnuG03DAMFCHPiXWL/ GbWg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=YNH4WBpC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ea91d2138esi984161fa.309.2024.05.30.12.44.28; Thu, 30 May 2024 12:44:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=YNH4WBpC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3298D68D547; Thu, 30 May 2024 22:44:04 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6531468D52A for ; Thu, 30 May 2024 22:43:58 +0300 (EEST) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-35dbdd76417so997053f8f.3 for ; Thu, 30 May 2024 12:43:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098238; x=1717703038; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W/y5sTTH/aDMe7w8V02figKt1VIkCrv44CjjmTo4W0g=; b=YNH4WBpCpiTZniYHf53H5ESGdWrcpxTUaxSeMe+OzIP10OlF3HTKYsq0f7LK4+O5Wi iz0RuI/2PjM+t7cZ8qOVvztKfKPQNg/MOxpZ59xsA7RTtEqyMzvAXKg8qyl3uKv8uinw er7Z5LZyIKZTEtgzH+2ia/8x9MGq7GWtXQJzWPAjXfFHu9NefSpyUtgkaiBnfJ7jtbHb evRUwz+R0NqfAOH0Ns0VPFE01vZ24evty0aD9MJXJxibaXXqNYhePCjzQGvNrtzkNnH6 kHOasIj8Sww4PH9AlWCRp3gLvtgVki5zM5tPdsbwn936e4W9Wy/oMipolV6xV30rR3Jj +mVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098238; x=1717703038; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W/y5sTTH/aDMe7w8V02figKt1VIkCrv44CjjmTo4W0g=; b=KEKwfbFYv7d7yCHQ971YGw8vUfs9UINzfQTbgPVAo61nOb2KFpEdzZBJMdPcKe83wc aaTP1u8ySJvYOf3wwO4U5NOa9V3WY6+laG/CmrGdcj4b0bUC5iLnzwnLMRo2T3rtW1Eh bbN/A7RSGKChTt2feoWt6ws0ydjdvesIoFAehv6CjwVtsYa4P4OrvcjTMOPG34qZFKph 0va15mUgdWLnSZlngZEq55VMQkrSMM9fTOue82i68swewZPYgh9jHiD3XiF8lVfGmtoS TiGoEkdf3QYGmuPCzxVO0t7Ak9V8nH3aB3odL912T9O634o6+rqT0EdXiXw88uZyjFTu bNwg== X-Gm-Message-State: AOJu0YxEjsCg9K1lEDWqY6/BnVgXYyzTN13Cvc6R3Un5mfe3FcgeJJDB Ncx8ywKDluJKGDzmXnIvCJcf+ag0fdr1AIh2jCjo3r6H4jUTmsTetUknXg== X-Received: by 2002:adf:f492:0:b0:355:4cb:5048 with SMTP id ffacd0b85a97d-35dc00bd594mr2743964f8f.43.1717098237550; Thu, 30 May 2024 12:43:57 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.43.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:43:57 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:05 +0200 Message-ID: <6402cc5a782fcbc2e3fdd91056b73dbfdb88351b.1717083799.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: cXWk1bwd0us7 These files are taken with minimal modifications from nvidia's Linux4Tegra (L4T) tree. nvmap enables management of memory-mapped buffers for hardware devices. nvhost enables interaction with different hardware modules (multimedia engines, display engine, ...), through a common block, host1x. Signed-off-by: averne --- libavutil/Makefile | 2 + libavutil/nvhost_ioctl.h | 511 +++++++++++++++++++++++++++++++++++++++ libavutil/nvmap_ioctl.h | 451 ++++++++++++++++++++++++++++++++++ 3 files changed, 964 insertions(+) create mode 100644 libavutil/nvhost_ioctl.h create mode 100644 libavutil/nvmap_ioctl.h diff --git a/libavutil/Makefile b/libavutil/Makefile index 6e6fa8d800..9c112bc58a 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -52,6 +52,8 @@ HEADERS = adler32.h \ hwcontext_videotoolbox.h \ hwcontext_vdpau.h \ hwcontext_vulkan.h \ + nvhost_ioctl.h \ + nvmap_ioctl.h \ iamf.h \ imgutils.h \ intfloat.h \ diff --git a/libavutil/nvhost_ioctl.h b/libavutil/nvhost_ioctl.h new file mode 100644 index 0000000000..b0bf3e3ae6 --- /dev/null +++ b/libavutil/nvhost_ioctl.h @@ -0,0 +1,511 @@ +/* + * include/uapi/linux/nvhost_ioctl.h + * + * Tegra graphics host driver + * + * Copyright (c) 2016-2020, NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#ifndef AVUTIL_NVHOST_IOCTL_H +#define AVUTIL_NVHOST_IOCTL_H + +#ifndef __SWITCH__ +# include +# include +#else +# include + +# define _IO _NV_IO +# define _IOR _NV_IOR +# define _IOW _NV_IOW +# define _IOWR _NV_IOWR + +# define _IOC_DIR _NV_IOC_DIR +# define _IOC_TYPE _NV_IOC_TYPE +# define _IOC_NR _NV_IOC_NR +# define _IOC_SIZE _NV_IOC_SIZE +#endif + +#define __user + +#define NVHOST_INVALID_SYNCPOINT 0xFFFFFFFF +#define NVHOST_NO_TIMEOUT (-1) +#define NVHOST_NO_CONTEXT 0x0 +#define NVHOST_IOCTL_MAGIC 'H' +#define NVHOST_PRIORITY_LOW 50 +#define NVHOST_PRIORITY_MEDIUM 100 +#define NVHOST_PRIORITY_HIGH 150 + +#define NVHOST_TIMEOUT_FLAG_DISABLE_DUMP 0 + +#define NVHOST_SUBMIT_VERSION_V0 0x0 +#define NVHOST_SUBMIT_VERSION_V1 0x1 +#define NVHOST_SUBMIT_VERSION_V2 0x2 +#define NVHOST_SUBMIT_VERSION_MAX_SUPPORTED NVHOST_SUBMIT_VERSION_V2 + +struct nvhost_cmdbuf { + uint32_t mem; + uint32_t offset; + uint32_t words; +} __attribute__((packed)); + +struct nvhost_cmdbuf_ext { + int32_t pre_fence; + uint32_t reserved; +}; + +struct nvhost_reloc { + uint32_t cmdbuf_mem; + uint32_t cmdbuf_offset; + uint32_t target; + uint32_t target_offset; +}; + +struct nvhost_reloc_shift { + uint32_t shift; +} __attribute__((packed)); + +#define NVHOST_RELOC_TYPE_DEFAULT 0 +#define NVHOST_RELOC_TYPE_PITCH_LINEAR 1 +#define NVHOST_RELOC_TYPE_BLOCK_LINEAR 2 +#define NVHOST_RELOC_TYPE_NVLINK 3 +struct nvhost_reloc_type { + uint32_t reloc_type; + uint32_t padding; +}; + +struct nvhost_waitchk { + uint32_t mem; + uint32_t offset; + uint32_t syncpt_id; + uint32_t thresh; +}; + +struct nvhost_syncpt_incr { + uint32_t syncpt_id; + uint32_t syncpt_incrs; +}; + +struct nvhost_get_param_args { + uint32_t value; +} __attribute__((packed)); + +struct nvhost_get_param_arg { + uint32_t param; + uint32_t value; +}; + +struct nvhost_get_client_managed_syncpt_arg { + uint64_t name; + uint32_t param; + uint32_t value; +}; + +struct nvhost_free_client_managed_syncpt_arg { + uint32_t param; + uint32_t value; +}; + +struct nvhost_channel_open_args { + int32_t channel_fd; +}; + +struct nvhost_set_syncpt_name_args { + uint64_t name; + uint32_t syncpt_id; + uint32_t padding; +}; + +struct nvhost_set_nvmap_fd_args { + uint32_t fd; +} __attribute__((packed)); + +enum nvhost_clk_attr { + NVHOST_CLOCK = 0, + NVHOST_BW, + NVHOST_PIXELRATE, + NVHOST_BW_KHZ, +}; + +/* + * moduleid[15:0] => module id + * moduleid[24:31] => nvhost_clk_attr + */ +#define NVHOST_MODULE_ID_BIT_POS 0 +#define NVHOST_MODULE_ID_BIT_WIDTH 16 +#define NVHOST_CLOCK_ATTR_BIT_POS 24 +#define NVHOST_CLOCK_ATTR_BIT_WIDTH 8 +struct nvhost_clk_rate_args { + uint32_t rate; + uint32_t moduleid; +}; + +struct nvhost_set_timeout_args { + uint32_t timeout; +} __attribute__((packed)); + +struct nvhost_set_timeout_ex_args { + uint32_t timeout; + uint32_t flags; +}; + +struct nvhost_set_priority_args { + uint32_t priority; +} __attribute__((packed)); + +struct nvhost_set_error_notifier { + uint64_t offset; + uint64_t size; + uint32_t mem; + uint32_t padding; +}; + +struct nvhost32_ctrl_module_regrdwr_args { + uint32_t id; + uint32_t num_offsets; + uint32_t block_size; + uint32_t offsets; + uint32_t values; + uint32_t write; +}; + +struct nvhost_ctrl_module_regrdwr_args { + uint32_t id; + uint32_t num_offsets; + uint32_t block_size; + uint32_t write; + uint64_t offsets; + uint64_t values; +}; + +struct nvhost32_submit_args { + uint32_t submit_version; + uint32_t num_syncpt_incrs; + uint32_t num_cmdbufs; + uint32_t num_relocs; + uint32_t num_waitchks; + uint32_t timeout; + uint32_t syncpt_incrs; + uint32_t cmdbufs; + uint32_t relocs; + uint32_t reloc_shifts; + uint32_t waitchks; + uint32_t waitbases; + uint32_t class_ids; + + uint32_t pad[2]; /* future expansion */ + + uint32_t fences; + uint32_t fence; /* Return value */ +} __attribute__((packed)); + +#define NVHOST_SUBMIT_FLAG_SYNC_FENCE_FD 0 +#define NVHOST_SUBMIT_MAX_NUM_SYNCPT_INCRS 10 + +struct nvhost_submit_args { + uint32_t submit_version; + uint32_t num_syncpt_incrs; + uint32_t num_cmdbufs; + uint32_t num_relocs; + uint32_t num_waitchks; + uint32_t timeout; + uint32_t flags; + uint32_t fence; /* Return value */ + uint64_t syncpt_incrs; + uint64_t cmdbuf_exts; + + uint32_t checksum_methods; + uint32_t checksum_falcon_methods; + + uint64_t pad[1]; /* future expansion */ + + uint64_t reloc_types; + uint64_t cmdbufs; + uint64_t relocs; + uint64_t reloc_shifts; + uint64_t waitchks; + uint64_t waitbases; + uint64_t class_ids; + uint64_t fences; +}; + +struct nvhost_set_ctxswitch_args { + uint32_t num_cmdbufs_save; + uint32_t num_save_incrs; + uint32_t save_incrs; + uint32_t save_waitbases; + uint32_t cmdbuf_save; + uint32_t num_cmdbufs_restore; + uint32_t num_restore_incrs; + uint32_t restore_incrs; + uint32_t restore_waitbases; + uint32_t cmdbuf_restore; + uint32_t num_relocs; + uint32_t relocs; + uint32_t reloc_shifts; + + uint32_t pad; +}; + +struct nvhost_channel_buffer { + uint32_t dmabuf_fd; /* in */ + uint32_t reserved0; /* reserved, must be 0 */ + uint64_t reserved1[2]; /* reserved, must be 0 */ + uint64_t address; /* out, device view to the buffer */ +}; + +struct nvhost_channel_unmap_buffer_args { + uint32_t num_buffers; /* in, number of buffers to unmap */ + uint32_t reserved; /* reserved, must be 0 */ + uint64_t table_address; /* pointer to beginning of buffer */ +}; + +struct nvhost_channel_map_buffer_args { + uint32_t num_buffers; /* in, number of buffers to map */ + uint32_t reserved; /* reserved, must be 0 */ + uint64_t table_address; /* pointer to beginning of buffer */ +}; + +#define NVHOST_IOCTL_CHANNEL_GET_SYNCPOINTS \ + _IOR(NVHOST_IOCTL_MAGIC, 2, struct nvhost_get_param_args) +#define NVHOST_IOCTL_CHANNEL_GET_WAITBASES \ + _IOR(NVHOST_IOCTL_MAGIC, 3, struct nvhost_get_param_args) +#define NVHOST_IOCTL_CHANNEL_GET_MODMUTEXES \ + _IOR(NVHOST_IOCTL_MAGIC, 4, struct nvhost_get_param_args) +#define NVHOST_IOCTL_CHANNEL_SET_NVMAP_FD \ + _IOW(NVHOST_IOCTL_MAGIC, 5, struct nvhost_set_nvmap_fd_args) +#define NVHOST_IOCTL_CHANNEL_NULL_KICKOFF \ + _IOR(NVHOST_IOCTL_MAGIC, 6, struct nvhost_get_param_args) +#define NVHOST_IOCTL_CHANNEL_GET_CLK_RATE \ + _IOWR(NVHOST_IOCTL_MAGIC, 9, struct nvhost_clk_rate_args) +#define NVHOST_IOCTL_CHANNEL_SET_CLK_RATE \ + _IOW(NVHOST_IOCTL_MAGIC, 10, struct nvhost_clk_rate_args) +#define NVHOST_IOCTL_CHANNEL_SET_TIMEOUT \ + _IOW(NVHOST_IOCTL_MAGIC, 11, struct nvhost_set_timeout_args) +#define NVHOST_IOCTL_CHANNEL_GET_TIMEDOUT \ + _IOR(NVHOST_IOCTL_MAGIC, 12, struct nvhost_get_param_args) +#define NVHOST_IOCTL_CHANNEL_SET_PRIORITY \ + _IOW(NVHOST_IOCTL_MAGIC, 13, struct nvhost_set_priority_args) +#define NVHOST32_IOCTL_CHANNEL_MODULE_REGRDWR \ + _IOWR(NVHOST_IOCTL_MAGIC, 14, struct nvhost32_ctrl_module_regrdwr_args) +#define NVHOST32_IOCTL_CHANNEL_SUBMIT \ + _IOWR(NVHOST_IOCTL_MAGIC, 15, struct nvhost32_submit_args) +#define NVHOST_IOCTL_CHANNEL_GET_SYNCPOINT \ + _IOWR(NVHOST_IOCTL_MAGIC, 16, struct nvhost_get_param_arg) +#define NVHOST_IOCTL_CHANNEL_GET_WAITBASE \ + _IOWR(NVHOST_IOCTL_MAGIC, 17, struct nvhost_get_param_arg) +#define NVHOST_IOCTL_CHANNEL_SET_TIMEOUT_EX \ + _IOWR(NVHOST_IOCTL_MAGIC, 18, struct nvhost_set_timeout_ex_args) +#define NVHOST_IOCTL_CHANNEL_GET_CLIENT_MANAGED_SYNCPOINT \ + _IOWR(NVHOST_IOCTL_MAGIC, 19, struct nvhost_get_client_managed_syncpt_arg) +#define NVHOST_IOCTL_CHANNEL_FREE_CLIENT_MANAGED_SYNCPOINT \ + _IOWR(NVHOST_IOCTL_MAGIC, 20, struct nvhost_free_client_managed_syncpt_arg) +#define NVHOST_IOCTL_CHANNEL_GET_MODMUTEX \ + _IOWR(NVHOST_IOCTL_MAGIC, 23, struct nvhost_get_param_arg) +#define NVHOST_IOCTL_CHANNEL_SET_CTXSWITCH \ + _IOWR(NVHOST_IOCTL_MAGIC, 25, struct nvhost_set_ctxswitch_args) + +/* ioctls added for 64bit compatibility */ +#define NVHOST_IOCTL_CHANNEL_SUBMIT \ + _IOWR(NVHOST_IOCTL_MAGIC, 26, struct nvhost_submit_args) +#define NVHOST_IOCTL_CHANNEL_MODULE_REGRDWR \ + _IOWR(NVHOST_IOCTL_MAGIC, 27, struct nvhost_ctrl_module_regrdwr_args) + +#define NVHOST_IOCTL_CHANNEL_MAP_BUFFER \ + _IOWR(NVHOST_IOCTL_MAGIC, 28, struct nvhost_channel_map_buffer_args) +#define NVHOST_IOCTL_CHANNEL_UNMAP_BUFFER \ + _IOWR(NVHOST_IOCTL_MAGIC, 29, struct nvhost_channel_unmap_buffer_args) + +#define NVHOST_IOCTL_CHANNEL_SET_SYNCPOINT_NAME \ + _IOW(NVHOST_IOCTL_MAGIC, 30, struct nvhost_set_syncpt_name_args) + +#define NVHOST_IOCTL_CHANNEL_SET_ERROR_NOTIFIER \ + _IOWR(NVHOST_IOCTL_MAGIC, 111, struct nvhost_set_error_notifier) +#define NVHOST_IOCTL_CHANNEL_OPEN \ + _IOR(NVHOST_IOCTL_MAGIC, 112, struct nvhost_channel_open_args) + +#define NVHOST_IOCTL_CHANNEL_LAST \ + _IOC_NR(NVHOST_IOCTL_CHANNEL_OPEN) +#define NVHOST_IOCTL_CHANNEL_MAX_ARG_SIZE sizeof(struct nvhost_submit_args) + +struct nvhost_ctrl_syncpt_read_args { + uint32_t id; + uint32_t value; +}; + +struct nvhost_ctrl_syncpt_incr_args { + uint32_t id; +} __attribute__((packed)); + +struct nvhost_ctrl_syncpt_wait_args { + uint32_t id; + uint32_t thresh; + int32_t timeout; +} __attribute__((packed)); + +struct nvhost_ctrl_syncpt_waitex_args { + uint32_t id; + uint32_t thresh; + int32_t timeout; + uint32_t value; +}; + +struct nvhost_ctrl_syncpt_waitmex_args { + uint32_t id; + uint32_t thresh; + int32_t timeout; + uint32_t value; + uint32_t tv_sec; + uint32_t tv_nsec; + uint32_t clock_id; + uint32_t reserved; +}; + +struct nvhost_ctrl_sync_fence_info { + uint32_t id; + uint32_t thresh; +}; + +struct nvhost32_ctrl_sync_fence_create_args { + uint32_t num_pts; + uint64_t pts; /* struct nvhost_ctrl_sync_fence_info* */ + uint64_t name; /* const char* */ + int32_t fence_fd; /* fd of new fence */ +}; + +struct nvhost_ctrl_sync_fence_create_args { + uint32_t num_pts; + int32_t fence_fd; /* fd of new fence */ + uint64_t pts; /* struct nvhost_ctrl_sync_fence_info* */ + uint64_t name; /* const char* */ +}; + +struct nvhost_ctrl_sync_fence_name_args { + uint64_t name; /* const char* for name */ + int32_t fence_fd; /* fd of fence */ +}; + +struct nvhost_ctrl_module_mutex_args { + uint32_t id; + uint32_t lock; +}; + +enum nvhost_module_id { + NVHOST_MODULE_NONE = -1, + NVHOST_MODULE_DISPLAY_A = 0, + NVHOST_MODULE_DISPLAY_B, + NVHOST_MODULE_VI, + NVHOST_MODULE_ISP, + NVHOST_MODULE_MPE, + NVHOST_MODULE_MSENC, + NVHOST_MODULE_TSEC, + NVHOST_MODULE_GPU, + NVHOST_MODULE_VIC, + NVHOST_MODULE_NVDEC, + NVHOST_MODULE_NVJPG, + NVHOST_MODULE_VII2C, + NVHOST_MODULE_NVENC1, + NVHOST_MODULE_NVDEC1, + NVHOST_MODULE_NVCSI, + NVHOST_MODULE_TSECB = (1<<16) | NVHOST_MODULE_TSEC, +}; + +struct nvhost_characteristics { +#define NVHOST_CHARACTERISTICS_GFILTER (1 << 0) +#define NVHOST_CHARACTERISTICS_RESOURCE_PER_CHANNEL_INSTANCE (1 << 1) +#define NVHOST_CHARACTERISTICS_SUPPORT_PREFENCES (1 << 2) + uint64_t flags; + + uint32_t num_mlocks; + uint32_t num_syncpts; + + uint32_t syncpts_base; + uint32_t syncpts_limit; + + uint32_t num_hw_pts; + uint32_t padding; +}; + +struct nvhost_ctrl_get_characteristics { + uint64_t nvhost_characteristics_buf_size; + uint64_t nvhost_characteristics_buf_addr; +}; + +struct nvhost_ctrl_check_module_support_args { + uint32_t module_id; + uint32_t value; +}; + +struct nvhost_ctrl_poll_fd_create_args { + int32_t fd; + uint32_t padding; +}; + +struct nvhost_ctrl_poll_fd_trigger_event_args { + int32_t fd; + uint32_t id; + uint32_t thresh; + uint32_t padding; +}; + +#define NVHOST_IOCTL_CTRL_SYNCPT_READ \ + _IOWR(NVHOST_IOCTL_MAGIC, 1, struct nvhost_ctrl_syncpt_read_args) +#define NVHOST_IOCTL_CTRL_SYNCPT_INCR \ + _IOW(NVHOST_IOCTL_MAGIC, 2, struct nvhost_ctrl_syncpt_incr_args) +#define NVHOST_IOCTL_CTRL_SYNCPT_WAIT \ + _IOW(NVHOST_IOCTL_MAGIC, 3, struct nvhost_ctrl_syncpt_wait_args) + +#define NVHOST_IOCTL_CTRL_MODULE_MUTEX \ + _IOWR(NVHOST_IOCTL_MAGIC, 4, struct nvhost_ctrl_module_mutex_args) +#define NVHOST32_IOCTL_CTRL_MODULE_REGRDWR \ + _IOWR(NVHOST_IOCTL_MAGIC, 5, struct nvhost32_ctrl_module_regrdwr_args) + +#define NVHOST_IOCTL_CTRL_SYNCPT_WAITEX \ + _IOWR(NVHOST_IOCTL_MAGIC, 6, struct nvhost_ctrl_syncpt_waitex_args) + +#define NVHOST_IOCTL_CTRL_GET_VERSION \ + _IOR(NVHOST_IOCTL_MAGIC, 7, struct nvhost_get_param_args) + +#define NVHOST_IOCTL_CTRL_SYNCPT_READ_MAX \ + _IOWR(NVHOST_IOCTL_MAGIC, 8, struct nvhost_ctrl_syncpt_read_args) + +#define NVHOST_IOCTL_CTRL_SYNCPT_WAITMEX \ + _IOWR(NVHOST_IOCTL_MAGIC, 9, struct nvhost_ctrl_syncpt_waitmex_args) + +#define NVHOST32_IOCTL_CTRL_SYNC_FENCE_CREATE \ + _IOWR(NVHOST_IOCTL_MAGIC, 10, struct nvhost32_ctrl_sync_fence_create_args) +#define NVHOST_IOCTL_CTRL_SYNC_FENCE_CREATE \ + _IOWR(NVHOST_IOCTL_MAGIC, 11, struct nvhost_ctrl_sync_fence_create_args) +#define NVHOST_IOCTL_CTRL_MODULE_REGRDWR \ + _IOWR(NVHOST_IOCTL_MAGIC, 12, struct nvhost_ctrl_module_regrdwr_args) +#define NVHOST_IOCTL_CTRL_SYNC_FENCE_SET_NAME \ + _IOWR(NVHOST_IOCTL_MAGIC, 13, struct nvhost_ctrl_sync_fence_name_args) +#define NVHOST_IOCTL_CTRL_GET_CHARACTERISTICS \ + _IOWR(NVHOST_IOCTL_MAGIC, 14, struct nvhost_ctrl_get_characteristics) +#define NVHOST_IOCTL_CTRL_CHECK_MODULE_SUPPORT \ + _IOWR(NVHOST_IOCTL_MAGIC, 15, struct nvhost_ctrl_check_module_support_args) +#define NVHOST_IOCTL_CTRL_POLL_FD_CREATE \ + _IOR(NVHOST_IOCTL_MAGIC, 16, struct nvhost_ctrl_poll_fd_create_args) +#define NVHOST_IOCTL_CTRL_POLL_FD_TRIGGER_EVENT \ + _IOW(NVHOST_IOCTL_MAGIC, 17, struct nvhost_ctrl_poll_fd_trigger_event_args) + +#define NVHOST_IOCTL_CTRL_LAST \ + _IOC_NR(NVHOST_IOCTL_CTRL_POLL_FD_TRIGGER_EVENT) +#define NVHOST_IOCTL_CTRL_MAX_ARG_SIZE \ + sizeof(struct nvhost_ctrl_syncpt_waitmex_args) + +#endif /* AVUTIL_NVHOST_IOCTL_H */ diff --git a/libavutil/nvmap_ioctl.h b/libavutil/nvmap_ioctl.h new file mode 100644 index 0000000000..55e0bea4dc --- /dev/null +++ b/libavutil/nvmap_ioctl.h @@ -0,0 +1,451 @@ +/* + * include/uapi/linux/nvmap.h + * + * structure declarations for nvmem and nvmap user-space ioctls + * + * Copyright (c) 2009-2020, NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef __SWITCH__ +#include +#include +#else +# include + +# define _IO _NV_IO +# define _IOR _NV_IOR +# define _IOW _NV_IOW +# define _IOWR _NV_IOWR + +# define _IOC_DIR _NV_IOC_DIR +# define _IOC_TYPE _NV_IOC_TYPE +# define _IOC_NR _NV_IOC_NR +# define _IOC_SIZE _NV_IOC_SIZE +#endif + +#ifndef AVUTIL_NVMAP_IOCTL_H +#define AVUTIL_NVMAP_IOCTL_H + +/* + * From linux-headers nvidia/include/linux/nvmap.h + */ +#define NVMAP_HEAP_IOVMM (1ul<<30) + +/* common carveout heaps */ +#define NVMAP_HEAP_CARVEOUT_IRAM (1ul<<29) +#define NVMAP_HEAP_CARVEOUT_VPR (1ul<<28) +#define NVMAP_HEAP_CARVEOUT_TSEC (1ul<<27) +#define NVMAP_HEAP_CARVEOUT_VIDMEM (1ul<<26) +#define NVMAP_HEAP_CARVEOUT_IVM (1ul<<1) +#define NVMAP_HEAP_CARVEOUT_GENERIC (1ul<<0) + +#define NVMAP_HEAP_CARVEOUT_MASK (NVMAP_HEAP_IOVMM - 1) + +/* allocation flags */ +#define NVMAP_HANDLE_UNCACHEABLE (0x0ul << 0) +#define NVMAP_HANDLE_WRITE_COMBINE (0x1ul << 0) +#define NVMAP_HANDLE_INNER_CACHEABLE (0x2ul << 0) +#define NVMAP_HANDLE_CACHEABLE (0x3ul << 0) +#define NVMAP_HANDLE_CACHE_FLAG (0x3ul << 0) + +#define NVMAP_HANDLE_SECURE (0x1ul << 2) +#define NVMAP_HANDLE_KIND_SPECIFIED (0x1ul << 3) +#define NVMAP_HANDLE_COMPR_SPECIFIED (0x1ul << 4) +#define NVMAP_HANDLE_ZEROED_PAGES (0x1ul << 5) +#define NVMAP_HANDLE_PHYS_CONTIG (0x1ul << 6) +#define NVMAP_HANDLE_CACHE_SYNC (0x1ul << 7) +#define NVMAP_HANDLE_CACHE_SYNC_AT_RESERVE (0x1ul << 8) +#define NVMAP_HANDLE_RO (0x1ul << 9) + +/* + * DOC: NvMap Userspace API + * + * create a client by opening /dev/nvmap + * most operations handled via following ioctls + * + */ +enum { + NVMAP_HANDLE_PARAM_SIZE = 1, + NVMAP_HANDLE_PARAM_ALIGNMENT, + NVMAP_HANDLE_PARAM_BASE, + NVMAP_HANDLE_PARAM_HEAP, + NVMAP_HANDLE_PARAM_KIND, + NVMAP_HANDLE_PARAM_COMPR, /* ignored, to be removed */ +}; + +enum { + NVMAP_CACHE_OP_WB = 0, + NVMAP_CACHE_OP_INV, + NVMAP_CACHE_OP_WB_INV, +}; + +enum { + NVMAP_PAGES_UNRESERVE = 0, + NVMAP_PAGES_RESERVE, + NVMAP_INSERT_PAGES_ON_UNRESERVE, + NVMAP_PAGES_PROT_AND_CLEAN, +}; + +#define NVMAP_ELEM_SIZE_U64 (1 << 31) + +struct nvmap_create_handle { + union { + struct { + union { + /* size will be overwritten */ + uint32_t size; /* CreateHandle */ + int32_t fd; /* DmaBufFd or FromFd */ + }; + uint32_t handle; /* returns nvmap handle */ + }; + struct { + /* one is input parameter, and other is output parameter + * since its a union please note that input parameter + * will be overwritten once ioctl returns + */ + union { + uint64_t ivm_id; /* CreateHandle from ivm*/ + int32_t ivm_handle; /* Get ivm_id from handle */ + }; + }; + struct { + union { + /* size64 will be overwritten */ + uint64_t size64; /* CreateHandle */ + uint32_t handle64; /* returns nvmap handle */ + }; + }; + }; +}; + +struct nvmap_create_handle_from_va { + uint64_t va; /* FromVA*/ + uint32_t size; /* non-zero for partial memory VMA. zero for end of VMA */ + uint32_t flags; /* wb/wc/uc/iwb, tag etc. */ + union { + uint32_t handle; /* returns nvmap handle */ + uint64_t size64; /* used when size is 0 */ + }; +}; + +struct nvmap_gup_test { + uint64_t va; /* FromVA*/ + uint32_t handle; /* returns nvmap handle */ + uint32_t result; /* result=1 for pass, result=-err for failure */ +}; + +struct nvmap_alloc_handle { + uint32_t handle; /* nvmap handle */ + uint32_t heap_mask; /* heaps to allocate from */ + uint32_t flags; /* wb/wc/uc/iwb etc. */ + uint32_t align; /* min alignment necessary */ +}; + +struct nvmap_alloc_ivm_handle { + uint32_t handle; /* nvmap handle */ + uint32_t heap_mask; /* heaps to allocate from */ + uint32_t flags; /* wb/wc/uc/iwb etc. */ + uint32_t align; /* min alignment necessary */ + uint32_t peer; /* peer with whom handle must be shared. Used + * only for NVMAP_HEAP_CARVEOUT_IVM + */ +}; + +struct nvmap_alloc_kind_handle { + uint32_t handle; /* nvmap handle */ + uint32_t heap_mask; + uint32_t flags; + uint32_t align; + uint8_t kind; + uint8_t comp_tags; +}; + +struct nvmap_map_caller { + uint32_t handle; /* nvmap handle */ + uint32_t offset; /* offset into hmem; should be page-aligned */ + uint32_t length; /* number of bytes to map */ + uint32_t flags; /* maps as wb/iwb etc. */ + unsigned long addr; /* user pointer */ +}; + +#ifdef CONFIG_COMPAT +struct nvmap_map_caller_32 { + uint32_t handle; /* nvmap handle */ + uint32_t offset; /* offset into hmem; should be page-aligned */ + uint32_t length; /* number of bytes to map */ + uint32_t flags; /* maps as wb/iwb etc. */ + uint32_t addr; /* user pointer*/ +}; +#endif + +struct nvmap_rw_handle { + unsigned long addr; /* user pointer*/ + uint32_t handle; /* nvmap handle */ + uint32_t offset; /* offset into hmem */ + uint32_t elem_size; /* individual atom size */ + uint32_t hmem_stride; /* delta in bytes between atoms in hmem */ + uint32_t user_stride; /* delta in bytes between atoms in user */ + uint32_t count; /* number of atoms to copy */ +}; + +struct nvmap_rw_handle_64 { + unsigned long addr; /* user pointer*/ + uint32_t handle; /* nvmap handle */ + uint64_t offset; /* offset into hmem */ + uint64_t elem_size; /* individual atom size */ + uint64_t hmem_stride; /* delta in bytes between atoms in hmem */ + uint64_t user_stride; /* delta in bytes between atoms in user */ + uint64_t count; /* number of atoms to copy */ +}; + +#ifdef CONFIG_COMPAT +struct nvmap_rw_handle_32 { + uint32_t addr; /* user pointer */ + uint32_t handle; /* nvmap handle */ + uint32_t offset; /* offset into hmem */ + uint32_t elem_size; /* individual atom size */ + uint32_t hmem_stride; /* delta in bytes between atoms in hmem */ + uint32_t user_stride; /* delta in bytes between atoms in user */ + uint32_t count; /* number of atoms to copy */ +}; +#endif + +struct nvmap_pin_handle { + uint32_t *handles; /* array of handles to pin/unpin */ + unsigned long *addr; /* array of addresses to return */ + uint32_t count; /* number of entries in handles */ +}; + +#ifdef CONFIG_COMPAT +struct nvmap_pin_handle_32 { + uint32_t handles; /* array of handles to pin/unpin */ + uint32_t addr; /* array of addresses to return */ + uint32_t count; /* number of entries in handles */ +}; +#endif + +struct nvmap_handle_param { + uint32_t handle; /* nvmap handle */ + uint32_t param; /* size/align/base/heap etc. */ + unsigned long result; /* returns requested info*/ +}; + +#ifdef CONFIG_COMPAT +struct nvmap_handle_param_32 { + uint32_t handle; /* nvmap handle */ + uint32_t param; /* size/align/base/heap etc. */ + uint32_t result; /* returns requested info*/ +}; +#endif + +struct nvmap_cache_op { + unsigned long addr; /* user pointer*/ + uint32_t handle; /* nvmap handle */ + uint32_t len; /* bytes to flush */ + int32_t op; /* wb/wb_inv/inv */ +}; + +struct nvmap_cache_op_64 { + unsigned long addr; /* user pointer*/ + uint32_t handle; /* nvmap handle */ + uint64_t len; /* bytes to flush */ + int32_t op; /* wb/wb_inv/inv */ +}; + +#ifdef CONFIG_COMPAT +struct nvmap_cache_op_32 { + uint32_t addr; /* user pointer*/ + uint32_t handle; /* nvmap handle */ + uint32_t len; /* bytes to flush */ + int32_t op; /* wb/wb_inv/inv */ +}; +#endif + +struct nvmap_cache_op_list { + uint64_t handles; /* Ptr to u32 type array, holding handles */ + uint64_t offsets; /* Ptr to u32 type array, holding offsets + * into handle mem */ + uint64_t sizes; /* Ptr to u32 type array, holindg sizes of memory + * regions within each handle */ + uint32_t nr; /* Number of handles */ + int32_t op; /* wb/wb_inv/inv */ +}; + +struct nvmap_debugfs_handles_header { + uint8_t version; +}; + +struct nvmap_debugfs_handles_entry { + uint64_t base; + uint64_t size; + uint32_t flags; + uint32_t share_count; + uint64_t mapped_size; +}; + +struct nvmap_set_tag_label { + uint32_t tag; + uint32_t len; /* in: label length + out: number of characters copied */ + uint64_t addr; /* in: pointer to label or NULL to remove */ +}; + +struct nvmap_available_heaps { + uint64_t heaps; /* heaps bitmask */ +}; + +struct nvmap_heap_size { + uint32_t heap; + uint64_t size; +}; + +/** + * Struct used while querying heap parameters + */ +struct nvmap_query_heap_params { + uint32_t heap_mask; + uint32_t flags; + uint8_t contig; + uint64_t total; + uint64_t free; + uint64_t largest_free_block; +}; + +struct nvmap_handle_parameters { + uint8_t contig; + uint32_t import_id; + uint32_t handle; + uint32_t heap_number; + uint32_t access_flags; + uint64_t heap; + uint64_t align; + uint64_t coherency; + uint64_t size; +}; + +#define NVMAP_IOC_MAGIC 'N' + +/* Creates a new memory handle. On input, the argument is the size of the new + * handle; on return, the argument is the name of the new handle + */ +#define NVMAP_IOC_CREATE _IOWR(NVMAP_IOC_MAGIC, 0, struct nvmap_create_handle) +#define NVMAP_IOC_CREATE_64 \ + _IOWR(NVMAP_IOC_MAGIC, 1, struct nvmap_create_handle) +#define NVMAP_IOC_FROM_ID _IOWR(NVMAP_IOC_MAGIC, 2, struct nvmap_create_handle) + +/* Actually allocates memory for the specified handle */ +#define NVMAP_IOC_ALLOC _IOW(NVMAP_IOC_MAGIC, 3, struct nvmap_alloc_handle) + +/* Frees a memory handle, unpinning any pinned pages and unmapping any mappings + */ +#define NVMAP_IOC_FREE _IO(NVMAP_IOC_MAGIC, 4) + +/* Maps the region of the specified handle into a user-provided virtual address + * that was previously created via an mmap syscall on this fd */ +#define NVMAP_IOC_MMAP _IOWR(NVMAP_IOC_MAGIC, 5, struct nvmap_map_caller) +#ifdef CONFIG_COMPAT +#define NVMAP_IOC_MMAP_32 _IOWR(NVMAP_IOC_MAGIC, 5, struct nvmap_map_caller_32) +#endif + +/* Reads/writes data (possibly strided) from a user-provided buffer into the + * hmem at the specified offset */ +#define NVMAP_IOC_WRITE _IOW(NVMAP_IOC_MAGIC, 6, struct nvmap_rw_handle) +#define NVMAP_IOC_READ _IOW(NVMAP_IOC_MAGIC, 7, struct nvmap_rw_handle) +#ifdef CONFIG_COMPAT +#define NVMAP_IOC_WRITE_32 _IOW(NVMAP_IOC_MAGIC, 6, struct nvmap_rw_handle_32) +#define NVMAP_IOC_READ_32 _IOW(NVMAP_IOC_MAGIC, 7, struct nvmap_rw_handle_32) +#endif +#define NVMAP_IOC_WRITE_64 \ + _IOW(NVMAP_IOC_MAGIC, 6, struct nvmap_rw_handle_64) +#define NVMAP_IOC_READ_64 \ + _IOW(NVMAP_IOC_MAGIC, 7, struct nvmap_rw_handle_64) + +#define NVMAP_IOC_PARAM _IOWR(NVMAP_IOC_MAGIC, 8, struct nvmap_handle_param) +#ifdef CONFIG_COMPAT +#define NVMAP_IOC_PARAM_32 _IOWR(NVMAP_IOC_MAGIC, 8, struct nvmap_handle_param_32) +#endif + +/* Pins a list of memory handles into IO-addressable memory (either IOVMM + * space or physical memory, depending on the allocation), and returns the + * address. Handles may be pinned recursively. */ +#define NVMAP_IOC_PIN_MULT _IOWR(NVMAP_IOC_MAGIC, 10, struct nvmap_pin_handle) +#define NVMAP_IOC_UNPIN_MULT _IOW(NVMAP_IOC_MAGIC, 11, struct nvmap_pin_handle) +#ifdef CONFIG_COMPAT +#define NVMAP_IOC_PIN_MULT_32 _IOWR(NVMAP_IOC_MAGIC, 10, struct nvmap_pin_handle_32) +#define NVMAP_IOC_UNPIN_MULT_32 _IOW(NVMAP_IOC_MAGIC, 11, struct nvmap_pin_handle_32) +#endif + +#define NVMAP_IOC_CACHE _IOW(NVMAP_IOC_MAGIC, 12, struct nvmap_cache_op) +#define NVMAP_IOC_CACHE_64 _IOW(NVMAP_IOC_MAGIC, 12, struct nvmap_cache_op_64) +#ifdef CONFIG_COMPAT +#define NVMAP_IOC_CACHE_32 _IOW(NVMAP_IOC_MAGIC, 12, struct nvmap_cache_op_32) +#endif + +/* Returns a global ID usable to allow a remote process to create a handle + * reference to the same handle */ +#define NVMAP_IOC_GET_ID _IOWR(NVMAP_IOC_MAGIC, 13, struct nvmap_create_handle) + +/* Returns a dma-buf fd usable to allow a remote process to create a handle + * reference to the same handle */ +#define NVMAP_IOC_SHARE _IOWR(NVMAP_IOC_MAGIC, 14, struct nvmap_create_handle) + +/* Returns a file id that allows a remote process to create a handle + * reference to the same handle */ +#define NVMAP_IOC_GET_FD _IOWR(NVMAP_IOC_MAGIC, 15, struct nvmap_create_handle) + +/* Create a new memory handle from file id passed */ +#define NVMAP_IOC_FROM_FD _IOWR(NVMAP_IOC_MAGIC, 16, struct nvmap_create_handle) + +/* Perform cache maintenance on a list of handles. */ +#define NVMAP_IOC_CACHE_LIST _IOW(NVMAP_IOC_MAGIC, 17, \ + struct nvmap_cache_op_list) +/* Perform reserve operation on a list of handles. */ +#define NVMAP_IOC_RESERVE _IOW(NVMAP_IOC_MAGIC, 18, \ + struct nvmap_cache_op_list) + +#define NVMAP_IOC_FROM_IVC_ID _IOWR(NVMAP_IOC_MAGIC, 19, struct nvmap_create_handle) +#define NVMAP_IOC_GET_IVC_ID _IOWR(NVMAP_IOC_MAGIC, 20, struct nvmap_create_handle) +#define NVMAP_IOC_GET_IVM_HEAPS _IOR(NVMAP_IOC_MAGIC, 21, unsigned int) + +/* Create a new memory handle from VA passed */ +#define NVMAP_IOC_FROM_VA _IOWR(NVMAP_IOC_MAGIC, 22, struct nvmap_create_handle_from_va) + +#define NVMAP_IOC_GUP_TEST _IOWR(NVMAP_IOC_MAGIC, 23, struct nvmap_gup_test) + +/* Define a label for allocation tag */ +#define NVMAP_IOC_SET_TAG_LABEL _IOW(NVMAP_IOC_MAGIC, 24, struct nvmap_set_tag_label) + +#define NVMAP_IOC_GET_AVAILABLE_HEAPS \ + _IOR(NVMAP_IOC_MAGIC, 25, struct nvmap_available_heaps) + +#define NVMAP_IOC_GET_HEAP_SIZE \ + _IOR(NVMAP_IOC_MAGIC, 26, struct nvmap_heap_size) + +#define NVMAP_IOC_PARAMETERS \ + _IOR(NVMAP_IOC_MAGIC, 27, struct nvmap_handle_parameters) +/* START of T124 IOCTLS */ +/* Actually allocates memory for the specified handle, with kind */ +#define NVMAP_IOC_ALLOC_KIND _IOW(NVMAP_IOC_MAGIC, 100, struct nvmap_alloc_kind_handle) + +/* Actually allocates memory from IVM heaps */ +#define NVMAP_IOC_ALLOC_IVM _IOW(NVMAP_IOC_MAGIC, 101, struct nvmap_alloc_ivm_handle) + +/* Allocate seperate memory for VPR */ +#define NVMAP_IOC_VPR_FLOOR_SIZE _IOW(NVMAP_IOC_MAGIC, 102, uint32_t) + +/* Get heap parameters such as total and frre size */ +#define NVMAP_IOC_QUERY_HEAP_PARAMS _IOR(NVMAP_IOC_MAGIC, 105, \ + struct nvmap_query_heap_params) + +#define NVMAP_IOC_MAXNR (_IOC_NR(NVMAP_IOC_QUERY_HEAP_PARAMS)) + +#endif /* AVUTIL_NVMAP_IOCTL_H */ From patchwork Thu May 30 19:43:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49414 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67149vqg; Thu, 30 May 2024 12:44:41 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVzlsK48kU890GRGOSbH3QgRyzoX2FQp9weBjEa5ht48nAuykzMQI8aHRU6d4SEjrOj0GYYFjdfY2tMCSqbfIamKeNbqHipnjnKow== X-Google-Smtp-Source: AGHT+IEJUuiJrM6ugN2NA5Wg45YyTgEytm2LhIosoHQO/Jfq5Pr0DyIPXGlrq3b4Icr8BdIAhahs X-Received: by 2002:a5d:5908:0:b0:354:e0e8:33ea with SMTP id ffacd0b85a97d-35dc02bd9e6mr2164802f8f.66.1717098280816; Thu, 30 May 2024 12:44:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098280; cv=none; d=google.com; s=arc-20160816; b=ZnwK7VAjjbQrzxOoCjqSHARkGD/j8s1xqS3lKERXHSBQOIr+DF4EqUvWz9TJNZIlzj vLYYaTdhDxDqV34qP3cyghNbgRFLkl2dtO3jDK6uLUGzZS+rsLp6Ml1pb8vbpjmixkw1 Yqt3W+QK4hPjjd2JYJ7sTmQI8B97HchyGxWRSjR/VNx3fA643ORP1+AuTPAO3j8gmJs8 QKDe+RnAbV53QUxJ6a+OsSakaiS1ZpxDbUFJigYHm9d8r+LaHu0ohkYrsv5w/CQS3/Fh hzX/rGHG1SjvUkRgwPhxSv6ZFKmsnPgOBQr1FzaLuu1TUiFR0Am8FoXtCDCtuSN8bSYS RI1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=mGhayAXgTH9AYvPR6gk/G8wZreWbsmTYNRHP+xY1p9w=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=lPFdKEUJpoomrRdf9VMz9F/IRH2P0bqADpVVCdEfGsTkbZ9wn5kfBGhx1+NMVB3ent woyxaAK/pVDh0R0NNMN10r/lLmmH6dCNC0VxhoNqSlHsarH3iv0pSG55hAMvs2uqdS27 /PVJVEnqLpAqd+ow32ZRDeAc5l6wsFGtQ0TlXUco/Qx2uwouDqIHiuXeY9xW1Hi0xvFp GaCIq8kYNTVY/aXnbc983JzcH0vf91k6y9HITwDgR9gdQQn+XXM2gpRRfLcZ/VOGOi1L dSyCxxyxf33yMhMwlakJKc29/UyM5Qtxj+XnRsuy3MQNr/tOjoAPyL7fYWyZWeI236eg KrYA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=KfKdvf8x; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67eaf651c0si8591966b.743.2024.05.30.12.44.40; Thu, 30 May 2024 12:44:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=KfKdvf8x; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7FB4268D522; Thu, 30 May 2024 22:44:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1333468D512 for ; Thu, 30 May 2024 22:44:32 +0300 (EEST) Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-35dc1d8867eso1095850f8f.0 for ; Thu, 30 May 2024 12:44:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098272; x=1717703072; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YbS5v9hXal8aqbFns4odH7nQWhI/L2dBHmnKkLh6S24=; b=KfKdvf8x6G+d0M5q4uGGKdCkjA2JRg0NR1bDBh0uWiMvP+qlmzxObt5kQZxSkzYJxM ApuH0yAOGeeS+6aVYAVMFrlfpxTY6FpQb5A+VBPdhWqipwPcT/PfblrDojJU+dcpm69K QLff14PEFXRKoMYxSLXIcwLMVg1qiCKpcaddfj7qkHXlNobTQo8AE0ZDNIO20U2OGmlJ yKC962xzz/QQlfjujDruSz2UumTgqR/j3qPB6Ic9rWOrk+WTJZX6pbEOieS66hRzs6Ur ku5M9+xuUxr2+uyWhtm5BNrUXdZRHJ/545/OgK01a8RxIef8dMj92/Ir/ZeJDXfijcC/ fxsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098272; x=1717703072; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YbS5v9hXal8aqbFns4odH7nQWhI/L2dBHmnKkLh6S24=; b=cuCoceSoiPhQQGqt49TWNmYcTacuN/vpxX2Mp2tHuIJ3yG4Iwx6dMzwYjgSXU9mm2q Huv3T4x6O5l84eFw3to+RkbKZoBD+I/ETGQ+b3FBeaMT0f72Pzgck//EsEYIFGNf3cU1 nqEIAVSz1GinVmbJs/yjlq17yU3UniVhhXZWDBYKqM+dwBexTKYDdAJ361yZ7cxi0Dmw aH+g8qfDzXxrZWUZrsf9Bu+0YTp0mn0Mi35oQ8ypX6Zv7Yl5kOfGiiGAfI61JyEp2vR9 MY7ZEUSyZw5zkIwx+9MOFbrtTUARElwHYTF2+XaZPb0t+9YuLInsU447R302Tj4OGYbx LmKQ== X-Gm-Message-State: AOJu0YwA3Xp0Bs/GIjvDVqDfqK2qHfImCJOjSNpDGwJ2kZUtvcrMHUi8 EiArkp6uAs9FW6LXUVJMc7MQdM/w+nzUSD0Rq6H4MYDZRkuk2PeWibMf2g== X-Received: by 2002:a5d:6a85:0:b0:354:f1bd:3c1f with SMTP id ffacd0b85a97d-35dc00c6e82mr2210757f8f.55.1717098271535; Thu, 30 May 2024 12:44:31 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.43.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:31 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:06 +0200 Message-ID: <60ec1bcddd512652c74798236ce7f5f5178ae20c.1717083799.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 04/16] avutil: add hardware definitions for NVDEC, NVJPG and VIC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: t4Tn9A4YIUBT These files are taken with minimal modification from nvidia's open-gpu-doc project, except VIC-related files which were written following documentation from the Tegra technical reference manual. NVDEC and NVJPG are nvidia's fixed-function hardware for video decoding and jpeg coding, respectively. VIC (Video Image Compositor) is a hardware engine for image post-processing, including scaling, deinterlacing, color mapping and basic compositing. Signed-off-by: averne --- libavutil/clb0b6.h | 303 +++++++ libavutil/clc5b0.h | 436 ++++++++++ libavutil/cle7d0.h | 129 +++ libavutil/nvdec_drv.h | 1858 +++++++++++++++++++++++++++++++++++++++++ libavutil/nvjpg_drv.h | 189 +++++ libavutil/vic_drv.h | 279 +++++++ 6 files changed, 3194 insertions(+) create mode 100644 libavutil/clb0b6.h create mode 100644 libavutil/clc5b0.h create mode 100644 libavutil/cle7d0.h create mode 100644 libavutil/nvdec_drv.h create mode 100644 libavutil/nvjpg_drv.h create mode 100644 libavutil/vic_drv.h diff --git a/libavutil/clb0b6.h b/libavutil/clb0b6.h new file mode 100644 index 0000000000..ee81ebc9d8 --- /dev/null +++ b/libavutil/clb0b6.h @@ -0,0 +1,303 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef AVUTIL_CLB0B6_H +#define AVUTIL_CLB0B6_H + +#ifdef __cplusplus +extern "C" { +#endif + +#define NVB0B6_VIDEO_COMPOSITOR (0x0000B0B6) + +#define NVB0B6_VIDEO_COMPOSITOR_NOP (0x00000100) +#define NVB0B6_VIDEO_COMPOSITOR_NOP_PARAMETER 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER (0x00000140) +#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER_V 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_APPLICATION_ID (0x00000200) +#define NVB0B6_VIDEO_COMPOSITOR_SET_APPLICATION_ID_ID 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_APPLICATION_ID_ID_COMPOSITOR (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_SET_WATCHDOG_TIMER (0x00000204) +#define NVB0B6_VIDEO_COMPOSITOR_SET_WATCHDOG_TIMER_TIMER 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_A (0x00000240) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_A_UPPER 7:0 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_B (0x00000244) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_B_LOWER 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_C (0x00000248) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_C_PAYLOAD 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SAVE_AREA (0x0000024C) +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SAVE_AREA_OFFSET 27:0 +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SAVE_AREA_CTX_VALID 31:28 +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH (0x00000250) +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESTORE 0:0 +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESTORE_FALSE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESTORE_TRUE (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RST_NOTIFY 1:1 +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RST_NOTIFY_FALSE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RST_NOTIFY_TRUE (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESERVED 7:2 +#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_ASID 23:8 +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE (0x00000300) +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY 0:0 +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_DISABLE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ENABLE (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ON 1:1 +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ON_END (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ON_BEGIN (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_AWAKEN 8:8 +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_AWAKEN_DISABLE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_AWAKEN_ENABLE (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D (0x00000304) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_STRUCTURE_SIZE 0:0 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_STRUCTURE_SIZE_ONE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_STRUCTURE_SIZE_FOUR (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_AWAKEN_ENABLE 8:8 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_AWAKEN_ENABLE_FALSE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_AWAKEN_ENABLE_TRUE (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION 17:16 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_RELEASE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_RESERVED0 (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_RESERVED1 (0x00000002) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_TRAP (0x00000003) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_FLUSH_DISABLE 21:21 +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_FLUSH_DISABLE_FALSE (0x00000000) +#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_FLUSH_DISABLE_TRUE (0x00000001) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(b) (0x00000400 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_U_OFFSET(b) (0x00000404 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_V_OFFSET(b) (0x00000408 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_LUMA_OFFSET(b) (0x0000040C + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_U_OFFSET(b) (0x00000410 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_V_OFFSET(b) (0x00000414 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_LUMA_OFFSET(b) (0x00000418 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_U_OFFSET(b) (0x0000041C + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_V_OFFSET(b) (0x00000420 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_LUMA_OFFSET(b) (0x00000424 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_U_OFFSET(b) (0x00000428 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_V_OFFSET(b) (0x0000042C + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_LUMA_OFFSET(b) (0x00000430 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_U_OFFSET(b) (0x00000434 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_V_OFFSET(b) (0x00000438 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_LUMA_OFFSET(b) (0x0000043C + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_U_OFFSET(b) (0x00000440 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_V_OFFSET(b) (0x00000444 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_LUMA_OFFSET(b) (0x00000448 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_U_OFFSET(b) (0x0000044C + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_V_OFFSET(b) (0x00000450 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_LUMA_OFFSET(b) (0x00000454 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_U_OFFSET(b) (0x00000458 + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_V_OFFSET(b) (0x0000045C + (b)*0x00000060) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_PICTURE_INDEX (0x00000700) +#define NVB0B6_VIDEO_COMPOSITOR_SET_PICTURE_INDEX_INDEX 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS (0x00000704) +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_GPTIMER_ON 0:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_DEBUG_MODE 4:4 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_FALCON_CONTROL 8:8 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_CONFIG_STRUCT_SIZE 31:16 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONFIG_STRUCT_OFFSET (0x00000708) +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONFIG_STRUCT_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_FILTER_STRUCT_OFFSET (0x0000070C) +#define NVB0B6_VIDEO_COMPOSITOR_SET_FILTER_STRUCT_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_PALETTE_OFFSET (0x00000710) +#define NVB0B6_VIDEO_COMPOSITOR_SET_PALETTE_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_HIST_OFFSET (0x00000714) +#define NVB0B6_VIDEO_COMPOSITOR_SET_HIST_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID (0x00000718) +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_FCE_UCODE 3:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_CONFIG 7:4 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_PALETTE 11:8 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_OUTPUT 15:12 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_HIST 19:16 +#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_SIZE (0x0000071C) +#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_SIZE_FCE_SZ 15:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET (0x00000720) +#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_U_OFFSET (0x00000724) +#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_U_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_V_OFFSET (0x00000728) +#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_V_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_OFFSET (0x0000072C) +#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_STRUCT_OFFSET (0x00000730) +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_STRUCT_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE (0x00000734) +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_ASEL 3:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_BSEL 7:4 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_CSEL 11:8 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_DSEL 15:12 +#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_CRC_MODE 16:16 +#define NVB0B6_VIDEO_COMPOSITOR_SET_STATUS_OFFSET (0x00000738) +#define NVB0B6_VIDEO_COMPOSITOR_SET_STATUS_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID(b) (0x00000740 + (b)*0x00000004) +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC0 3:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC1 7:4 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC2 11:8 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC3 15:12 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC4 19:16 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC5 23:20 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC6 27:24 +#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC7 31:28 +#define NVB0B6_VIDEO_COMPOSITOR_SET_HISTORY_BUFFER_OFFSET(b) (0x00000780 + (b)*0x00000004) +#define NVB0B6_VIDEO_COMPOSITOR_SET_HISTORY_BUFFER_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_SET_COMP_TAG_BUFFER_OFFSET(b) (0x000007C0 + (b)*0x00000004) +#define NVB0B6_VIDEO_COMPOSITOR_SET_COMP_TAG_BUFFER_OFFSET_OFFSET 31:0 +#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER_END (0x00001114) +#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER_END_V 31:0 + +#define NVB0B6_DXVAHD_FRAME_FORMAT_PROGRESSIVE 0 +#define NVB0B6_DXVAHD_FRAME_FORMAT_INTERLACED_TOP_FIELD_FIRST 1 +#define NVB0B6_DXVAHD_FRAME_FORMAT_INTERLACED_BOTTOM_FIELD_FIRST 2 +#define NVB0B6_DXVAHD_FRAME_FORMAT_TOP_FIELD 3 +#define NVB0B6_DXVAHD_FRAME_FORMAT_BOTTOM_FIELD 4 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_PROGRESSIVE 5 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_INTERLACED_TOP_FIELD_FIRST 6 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_INTERLACED_BOTTOM_FIELD_FIRST 7 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_TOP_FIELD 8 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_BOTTOM_FIELD 9 +#define NVB0B6_DXVAHD_FRAME_FORMAT_TOP_FIELD_CHROMA_BOTTOM 10 +#define NVB0B6_DXVAHD_FRAME_FORMAT_BOTTOM_FIELD_CHROMA_TOP 11 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_TOP_FIELD_CHROMA_BOTTOM 12 +#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_BOTTOM_FIELD_CHROMA_TOP 13 + +#define NVB0B6_T_A8 0 +#define NVB0B6_T_L8 1 +#define NVB0B6_T_A4L4 2 +#define NVB0B6_T_L4A4 3 +#define NVB0B6_T_R8 4 +#define NVB0B6_T_A8L8 5 +#define NVB0B6_T_L8A8 6 +#define NVB0B6_T_R8G8 7 +#define NVB0B6_T_G8R8 8 +#define NVB0B6_T_B5G6R5 9 +#define NVB0B6_T_R5G6B5 10 +#define NVB0B6_T_B6G5R5 11 +#define NVB0B6_T_R5G5B6 12 +#define NVB0B6_T_A1B5G5R5 13 +#define NVB0B6_T_A1R5G5B5 14 +#define NVB0B6_T_B5G5R5A1 15 +#define NVB0B6_T_R5G5B5A1 16 +#define NVB0B6_T_A5B5G5R1 17 +#define NVB0B6_T_A5R1G5B5 18 +#define NVB0B6_T_B5G5R1A5 19 +#define NVB0B6_T_R1G5B5A5 20 +#define NVB0B6_T_X1B5G5R5 21 +#define NVB0B6_T_X1R5G5B5 22 +#define NVB0B6_T_B5G5R5X1 23 +#define NVB0B6_T_R5G5B5X1 24 +#define NVB0B6_T_A4B4G4R4 25 +#define NVB0B6_T_A4R4G4B4 26 +#define NVB0B6_T_B4G4R4A4 27 +#define NVB0B6_T_R4G4B4A4 28 +#define NVB0B6_T_B8_G8_R8 29 +#define NVB0B6_T_R8_G8_B8 30 +#define NVB0B6_T_A8B8G8R8 31 +#define NVB0B6_T_A8R8G8B8 32 +#define NVB0B6_T_B8G8R8A8 33 +#define NVB0B6_T_R8G8B8A8 34 +#define NVB0B6_T_X8B8G8R8 35 +#define NVB0B6_T_X8R8G8B8 36 +#define NVB0B6_T_B8G8R8X8 37 +#define NVB0B6_T_R8G8B8X8 38 +#define NVB0B6_T_A2B10G10R10 39 +#define NVB0B6_T_A2R10G10B10 40 +#define NVB0B6_T_B10G10R10A2 41 +#define NVB0B6_T_R10G10B10A2 42 +#define NVB0B6_T_A4P4 43 +#define NVB0B6_T_P4A4 44 +#define NVB0B6_T_P8A8 45 +#define NVB0B6_T_A8P8 46 +#define NVB0B6_T_P8 47 +#define NVB0B6_T_P1 48 +#define NVB0B6_T_U8V8 49 +#define NVB0B6_T_V8U8 50 +#define NVB0B6_T_A8Y8U8V8 51 +#define NVB0B6_T_V8U8Y8A8 52 +#define NVB0B6_T_Y8_U8_V8 53 +#define NVB0B6_T_Y8_V8_U8 54 +#define NVB0B6_T_U8_V8_Y8 55 +#define NVB0B6_T_V8_U8_Y8 56 +#define NVB0B6_T_Y8_U8__Y8_V8 57 +#define NVB0B6_T_Y8_V8__Y8_U8 58 +#define NVB0B6_T_U8_Y8__V8_Y8 59 +#define NVB0B6_T_V8_Y8__U8_Y8 60 +#define NVB0B6_T_Y8___U8V8_N444 61 +#define NVB0B6_T_Y8___V8U8_N444 62 +#define NVB0B6_T_Y8___U8V8_N422 63 +#define NVB0B6_T_Y8___V8U8_N422 64 +#define NVB0B6_T_Y8___U8V8_N422R 65 +#define NVB0B6_T_Y8___V8U8_N422R 66 +#define NVB0B6_T_Y8___U8V8_N420 67 +#define NVB0B6_T_Y8___V8U8_N420 68 +#define NVB0B6_T_Y8___U8___V8_N444 69 +#define NVB0B6_T_Y8___U8___V8_N422 70 +#define NVB0B6_T_Y8___U8___V8_N422R 71 +#define NVB0B6_T_Y8___U8___V8_N420 72 +#define NVB0B6_T_U8 73 +#define NVB0B6_T_V8 74 + +#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_OPAQUE 0 +#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_BACKGROUND 1 +#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_DESTINATION 2 +#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_SOURCE_STREAM 3 +#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_COMPOSITED 4 +#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_SOURCE_ALPHA 5 + +#define NVB0B6_BLK_KIND_PITCH 0 +#define NVB0B6_BLK_KIND_GENERIC_16Bx2 1 +#define NVB0B6_BLK_KIND_BL_NAIVE 2 +#define NVB0B6_BLK_KIND_BL_KEPLER_XBAR_RAW 3 +#define NVB0B6_BLK_KIND_VP2_TILED 15 + +#define NVB0B6_FILTER_LENGTH_1TAP 0 +#define NVB0B6_FILTER_LENGTH_2TAP 1 +#define NVB0B6_FILTER_LENGTH_5TAP 2 +#define NVB0B6_FILTER_LENGTH_10TAP 3 + +#define NVB0B6_FILTER_TYPE_NORMAL 0 +#define NVB0B6_FILTER_TYPE_NOISE 1 +#define NVB0B6_FILTER_TYPE_DETAIL 2 + +#ifdef __cplusplus +}; /* extern "C" */ +#endif +#endif /* AVUTIL_CLB0B6_H */ diff --git a/libavutil/clc5b0.h b/libavutil/clc5b0.h new file mode 100644 index 0000000000..f7957bf46a --- /dev/null +++ b/libavutil/clc5b0.h @@ -0,0 +1,436 @@ +/******************************************************************************* + Copyright (c) 1993-2020, NVIDIA CORPORATION. All rights reserved. + + Permission is hereby granted, free of charge, to any person obtaining a + copy of this software and associated documentation files (the "Software"), + to deal in the Software without restriction, including without limitation + the rights to use, copy, modify, merge, publish, distribute, sublicense, + and/or sell copies of the Software, and to permit persons to whom the + Software is furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + DEALINGS IN THE SOFTWARE. + +*******************************************************************************/ + +#ifndef AVUTIL_CLC5B0_H +#define AVUTIL_CLC5B0_H + +#ifdef __cplusplus +extern "C" { +#endif + +#define NVC5B0_VIDEO_DECODER (0x0000C5B0) + +#define NVC5B0_NOP (0x00000100) +#define NVC5B0_NOP_PARAMETER 31:0 +#define NVC5B0_SET_APPLICATION_ID (0x00000200) +#define NVC5B0_SET_APPLICATION_ID_ID 31:0 +#define NVC5B0_SET_APPLICATION_ID_ID_MPEG12 (0x00000001) +#define NVC5B0_SET_APPLICATION_ID_ID_VC1 (0x00000002) +#define NVC5B0_SET_APPLICATION_ID_ID_H264 (0x00000003) +#define NVC5B0_SET_APPLICATION_ID_ID_MPEG4 (0x00000004) +#define NVC5B0_SET_APPLICATION_ID_ID_VP8 (0x00000005) +#define NVC5B0_SET_APPLICATION_ID_ID_HEVC (0x00000007) +#define NVC5B0_SET_APPLICATION_ID_ID_VP9 (0x00000009) +#define NVC5B0_SET_APPLICATION_ID_ID_HEVC_PARSER (0x0000000C) +#define NVC5B0_SET_WATCHDOG_TIMER (0x00000204) +#define NVC5B0_SET_WATCHDOG_TIMER_TIMER 31:0 +#define NVC5B0_SEMAPHORE_A (0x00000240) +#define NVC5B0_SEMAPHORE_A_UPPER 7:0 +#define NVC5B0_SEMAPHORE_B (0x00000244) +#define NVC5B0_SEMAPHORE_B_LOWER 31:0 +#define NVC5B0_SEMAPHORE_C (0x00000248) +#define NVC5B0_SEMAPHORE_C_PAYLOAD 31:0 +#define NVC5B0_CTX_SAVE_AREA (0x0000024C) +#define NVC5B0_CTX_SAVE_AREA_OFFSET 31:0 +#define NVC5B0_CTX_SWITCH (0x00000250) +#define NVC5B0_CTX_SWITCH_OP 1:0 +#define NVC5B0_CTX_SWITCH_OP_CTX_UPDATE (0x00000000) +#define NVC5B0_CTX_SWITCH_OP_CTX_SAVE (0x00000001) +#define NVC5B0_CTX_SWITCH_OP_CTX_RESTORE (0x00000002) +#define NVC5B0_CTX_SWITCH_OP_CTX_FORCERESTORE (0x00000003) +#define NVC5B0_CTX_SWITCH_CTXID_VALID 2:2 +#define NVC5B0_CTX_SWITCH_CTXID_VALID_FALSE (0x00000000) +#define NVC5B0_CTX_SWITCH_CTXID_VALID_TRUE (0x00000001) +#define NVC5B0_CTX_SWITCH_RESERVED0 7:3 +#define NVC5B0_CTX_SWITCH_CTX_ID 23:8 +#define NVC5B0_CTX_SWITCH_RESERVED1 31:24 +#define NVC5B0_EXECUTE (0x00000300) +#define NVC5B0_EXECUTE_NOTIFY 0:0 +#define NVC5B0_EXECUTE_NOTIFY_DISABLE (0x00000000) +#define NVC5B0_EXECUTE_NOTIFY_ENABLE (0x00000001) +#define NVC5B0_EXECUTE_NOTIFY_ON 1:1 +#define NVC5B0_EXECUTE_NOTIFY_ON_END (0x00000000) +#define NVC5B0_EXECUTE_NOTIFY_ON_BEGIN (0x00000001) +#define NVC5B0_EXECUTE_AWAKEN 8:8 +#define NVC5B0_EXECUTE_AWAKEN_DISABLE (0x00000000) +#define NVC5B0_EXECUTE_AWAKEN_ENABLE (0x00000001) +#define NVC5B0_SEMAPHORE_D (0x00000304) +#define NVC5B0_SEMAPHORE_D_STRUCTURE_SIZE 0:0 +#define NVC5B0_SEMAPHORE_D_STRUCTURE_SIZE_ONE (0x00000000) +#define NVC5B0_SEMAPHORE_D_STRUCTURE_SIZE_FOUR (0x00000001) +#define NVC5B0_SEMAPHORE_D_AWAKEN_ENABLE 8:8 +#define NVC5B0_SEMAPHORE_D_AWAKEN_ENABLE_FALSE (0x00000000) +#define NVC5B0_SEMAPHORE_D_AWAKEN_ENABLE_TRUE (0x00000001) +#define NVC5B0_SEMAPHORE_D_OPERATION 17:16 +#define NVC5B0_SEMAPHORE_D_OPERATION_RELEASE (0x00000000) +#define NVC5B0_SEMAPHORE_D_OPERATION_RESERVED0 (0x00000001) +#define NVC5B0_SEMAPHORE_D_OPERATION_RESERVED1 (0x00000002) +#define NVC5B0_SEMAPHORE_D_OPERATION_TRAP (0x00000003) +#define NVC5B0_SEMAPHORE_D_FLUSH_DISABLE 21:21 +#define NVC5B0_SEMAPHORE_D_FLUSH_DISABLE_FALSE (0x00000000) +#define NVC5B0_SEMAPHORE_D_FLUSH_DISABLE_TRUE (0x00000001) +#define NVC5B0_SET_CONTROL_PARAMS (0x00000400) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE 3:0 +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG1 (0x00000000) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG2 (0x00000001) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_VC1 (0x00000002) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_H264 (0x00000003) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG4 (0x00000004) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_DIVX3 (0x00000004) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_VP8 (0x00000005) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_HEVC (0x00000007) +#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_VP9 (0x00000009) +#define NVC5B0_SET_CONTROL_PARAMS_GPTIMER_ON 4:4 +#define NVC5B0_SET_CONTROL_PARAMS_RET_ERROR 5:5 +#define NVC5B0_SET_CONTROL_PARAMS_ERR_CONCEAL_ON 6:6 +#define NVC5B0_SET_CONTROL_PARAMS_ERROR_FRM_IDX 12:7 +#define NVC5B0_SET_CONTROL_PARAMS_MBTIMER_ON 13:13 +#define NVC5B0_SET_CONTROL_PARAMS_EC_INTRA_FRAME_USING_PSLC 14:14 +#define NVC5B0_SET_CONTROL_PARAMS_ALL_INTRA_FRAME 17:17 +#define NVC5B0_SET_CONTROL_PARAMS_RESERVED 31:18 +#define NVC5B0_SET_DRV_PIC_SETUP_OFFSET (0x00000404) +#define NVC5B0_SET_DRV_PIC_SETUP_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_IN_BUF_BASE_OFFSET (0x00000408) +#define NVC5B0_SET_IN_BUF_BASE_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_INDEX (0x0000040C) +#define NVC5B0_SET_PICTURE_INDEX_INDEX 31:0 +#define NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET (0x00000410) +#define NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_COLOC_DATA_OFFSET (0x00000414) +#define NVC5B0_SET_COLOC_DATA_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_HISTORY_OFFSET (0x00000418) +#define NVC5B0_SET_HISTORY_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_DISPLAY_BUF_SIZE (0x0000041C) +#define NVC5B0_SET_DISPLAY_BUF_SIZE_SIZE 31:0 +#define NVC5B0_SET_HISTOGRAM_OFFSET (0x00000420) +#define NVC5B0_SET_HISTOGRAM_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_NVDEC_STATUS_OFFSET (0x00000424) +#define NVC5B0_SET_NVDEC_STATUS_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_DISPLAY_BUF_LUMA_OFFSET (0x00000428) +#define NVC5B0_SET_DISPLAY_BUF_LUMA_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_DISPLAY_BUF_CHROMA_OFFSET (0x0000042C) +#define NVC5B0_SET_DISPLAY_BUF_CHROMA_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET0 (0x00000430) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET0_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET1 (0x00000434) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET1_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET2 (0x00000438) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET2_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET3 (0x0000043C) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET3_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET4 (0x00000440) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET4_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET5 (0x00000444) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET5_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET6 (0x00000448) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET6_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET7 (0x0000044C) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET7_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET8 (0x00000450) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET8_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET9 (0x00000454) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET9_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET10 (0x00000458) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET10_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET11 (0x0000045C) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET11_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET12 (0x00000460) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET12_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET13 (0x00000464) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET13_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET14 (0x00000468) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET14_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET15 (0x0000046C) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET15_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_LUMA_OFFSET16 (0x00000470) +#define NVC5B0_SET_PICTURE_LUMA_OFFSET16_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET0 (0x00000474) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET0_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET1 (0x00000478) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET1_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET2 (0x0000047C) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET2_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET3 (0x00000480) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET3_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET4 (0x00000484) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET4_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET5 (0x00000488) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET5_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET6 (0x0000048C) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET6_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET7 (0x00000490) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET7_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET8 (0x00000494) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET8_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET9 (0x00000498) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET9_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET10 (0x0000049C) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET10_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET11 (0x000004A0) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET11_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET12 (0x000004A4) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET12_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET13 (0x000004A8) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET13_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET14 (0x000004AC) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET14_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET15 (0x000004B0) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET15_OFFSET 31:0 +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET16 (0x000004B4) +#define NVC5B0_SET_PICTURE_CHROMA_OFFSET16_OFFSET 31:0 +#define NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET (0x000004B8) +#define NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_SET_EXTERNAL_MVBUFFER_OFFSET (0x000004BC) +#define NVC5B0_SET_EXTERNAL_MVBUFFER_OFFSET_OFFSET 31:0 +#define NVC5B0_H264_SET_MBHIST_BUF_OFFSET (0x00000500) +#define NVC5B0_H264_SET_MBHIST_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP8_SET_PROB_DATA_OFFSET (0x00000540) +#define NVC5B0_VP8_SET_PROB_DATA_OFFSET_OFFSET 31:0 +#define NVC5B0_VP8_SET_HEADER_PARTITION_BUF_BASE_OFFSET (0x00000544) +#define NVC5B0_VP8_SET_HEADER_PARTITION_BUF_BASE_OFFSET_OFFSET 31:0 +#define NVC5B0_HEVC_SET_SCALING_LIST_OFFSET (0x00000580) +#define NVC5B0_HEVC_SET_SCALING_LIST_OFFSET_OFFSET 31:0 +#define NVC5B0_HEVC_SET_TILE_SIZES_OFFSET (0x00000584) +#define NVC5B0_HEVC_SET_TILE_SIZES_OFFSET_OFFSET 31:0 +#define NVC5B0_HEVC_SET_FILTER_BUFFER_OFFSET (0x00000588) +#define NVC5B0_HEVC_SET_FILTER_BUFFER_OFFSET_OFFSET 31:0 +#define NVC5B0_HEVC_SET_SAO_BUFFER_OFFSET (0x0000058C) +#define NVC5B0_HEVC_SET_SAO_BUFFER_OFFSET_OFFSET 31:0 +#define NVC5B0_HEVC_SET_SLICE_INFO_BUFFER_OFFSET (0x00000590) +#define NVC5B0_HEVC_SET_SLICE_INFO_BUFFER_OFFSET_OFFSET 31:0 +#define NVC5B0_HEVC_SET_SLICE_GROUP_INDEX (0x00000594) +#define NVC5B0_HEVC_SET_SLICE_GROUP_INDEX_OFFSET 31:0 +#define NVC5B0_VP9_SET_PROB_TAB_BUF_OFFSET (0x000005C0) +#define NVC5B0_VP9_SET_PROB_TAB_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_CTX_COUNTER_BUF_OFFSET (0x000005C4) +#define NVC5B0_VP9_SET_CTX_COUNTER_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_SEGMENT_READ_BUF_OFFSET (0x000005C8) +#define NVC5B0_VP9_SET_SEGMENT_READ_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_SEGMENT_WRITE_BUF_OFFSET (0x000005CC) +#define NVC5B0_VP9_SET_SEGMENT_WRITE_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_TILE_SIZE_BUF_OFFSET (0x000005D0) +#define NVC5B0_VP9_SET_TILE_SIZE_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_COL_MVWRITE_BUF_OFFSET (0x000005D4) +#define NVC5B0_VP9_SET_COL_MVWRITE_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_COL_MVREAD_BUF_OFFSET (0x000005D8) +#define NVC5B0_VP9_SET_COL_MVREAD_BUF_OFFSET_OFFSET 31:0 +#define NVC5B0_VP9_SET_FILTER_BUFFER_OFFSET (0x000005DC) +#define NVC5B0_VP9_SET_FILTER_BUFFER_OFFSET_OFFSET 31:0 + +#define NVC5B0_ERROR_NONE (0x00000000) +#define NVC5B0_OS_ERROR_EXECUTE_INSUFFICIENT_DATA (0x00000001) +#define NVC5B0_OS_ERROR_SEMAPHORE_INSUFFICIENT_DATA (0x00000002) +#define NVC5B0_OS_ERROR_INVALID_METHOD (0x00000003) +#define NVC5B0_OS_ERROR_INVALID_DMA_PAGE (0x00000004) +#define NVC5B0_OS_ERROR_UNHANDLED_INTERRUPT (0x00000005) +#define NVC5B0_OS_ERROR_EXCEPTION (0x00000006) +#define NVC5B0_OS_ERROR_INVALID_CTXSW_REQUEST (0x00000007) +#define NVC5B0_OS_ERROR_APPLICATION (0x00000008) +#define NVC5B0_OS_ERROR_SW_BREAKPT (0x00000009) +#define NVC5B0_OS_INTERRUPT_EXECUTE_AWAKEN (0x00000100) +#define NVC5B0_OS_INTERRUPT_BACKEND_SEMAPHORE_AWAKEN (0x00000200) +#define NVC5B0_OS_INTERRUPT_CTX_ERROR_FBIF (0x00000300) +#define NVC5B0_OS_INTERRUPT_LIMIT_VIOLATION (0x00000400) +#define NVC5B0_OS_INTERRUPT_LIMIT_AND_FBIF_CTX_ERROR (0x00000500) +#define NVC5B0_OS_INTERRUPT_HALT_ENGINE (0x00000600) +#define NVC5B0_OS_INTERRUPT_TRAP_NONSTALL (0x00000700) +#define NVC5B0_H264_VLD_ERR_SEQ_DATA_INCONSISTENT (0x00004001) +#define NVC5B0_H264_VLD_ERR_PIC_DATA_INCONSISTENT (0x00004002) +#define NVC5B0_H264_VLD_ERR_SLC_DATA_BUF_ADDR_OUT_OF_BOUNDS (0x00004100) +#define NVC5B0_H264_VLD_ERR_BITSTREAM_ERROR (0x00004101) +#define NVC5B0_H264_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID (0x000041F8) +#define NVC5B0_H264_VLD_ERR_SLC_HDR_OUT_SIZE_NOT_MULT256 (0x00004200) +#define NVC5B0_H264_VLD_ERR_SLC_DATA_OUT_SIZE_NOT_MULT256 (0x00004201) +#define NVC5B0_H264_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID (0x00004203) +#define NVC5B0_H264_VLD_ERR_CTX_DMA_ID_SLC_HDR_OUT_INVALID (0x00004204) +#define NVC5B0_H264_VLD_ERR_SLC_HDR_OUT_BUF_TOO_SMALL (0x00004205) +#define NVC5B0_H264_VLD_ERR_SLC_HDR_OUT_BUF_ALREADY_VALID (0x00004206) +#define NVC5B0_H264_VLD_ERR_SLC_DATA_OUT_BUF_TOO_SMALL (0x00004207) +#define NVC5B0_H264_VLD_ERR_DATA_BUF_CNT_TOO_SMALL (0x00004208) +#define NVC5B0_H264_VLD_ERR_BITSTREAM_EMPTY (0x00004209) +#define NVC5B0_H264_VLD_ERR_FRAME_WIDTH_TOO_LARGE (0x0000420A) +#define NVC5B0_H264_VLD_ERR_FRAME_HEIGHT_TOO_LARGE (0x0000420B) +#define NVC5B0_H264_VLD_ERR_HIST_BUF_TOO_SMALL (0x00004300) +#define NVC5B0_VC1_VLD_ERR_PIC_DATA_BUF_ADDR_OUT_OF_BOUND (0x00005100) +#define NVC5B0_VC1_VLD_ERR_BITSTREAM_ERROR (0x00005101) +#define NVC5B0_VC1_VLD_ERR_PIC_HDR_OUT_SIZE_NOT_MULT256 (0x00005200) +#define NVC5B0_VC1_VLD_ERR_PIC_DATA_OUT_SIZE_NOT_MULT256 (0x00005201) +#define NVC5B0_VC1_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID (0x00005202) +#define NVC5B0_VC1_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID (0x00005203) +#define NVC5B0_VC1_VLD_ERR_CTX_DMA_ID_PIC_HDR_OUT_INVALID (0x00005204) +#define NVC5B0_VC1_VLD_ERR_SLC_HDR_OUT_BUF_TOO_SMALL (0x00005205) +#define NVC5B0_VC1_VLD_ERR_PIC_HDR_OUT_BUF_ALREADY_VALID (0x00005206) +#define NVC5B0_VC1_VLD_ERR_PIC_DATA_OUT_BUF_TOO_SMALL (0x00005207) +#define NVC5B0_VC1_VLD_ERR_DATA_INFO_IN_BUF_TOO_SMALL (0x00005208) +#define NVC5B0_VC1_VLD_ERR_BITSTREAM_EMPTY (0x00005209) +#define NVC5B0_VC1_VLD_ERR_FRAME_WIDTH_TOO_LARGE (0x0000520A) +#define NVC5B0_VC1_VLD_ERR_FRAME_HEIGHT_TOO_LARGE (0x0000520B) +#define NVC5B0_VC1_VLD_ERR_PIC_DATA_OUT_BUF_FULL_TIME_OUT (0x00005300) +#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_BUF_ADDR_OUT_OF_BOUNDS (0x00006100) +#define NVC5B0_MPEG12_VLD_ERR_BITSTREAM_ERROR (0x00006101) +#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_OUT_SIZE_NOT_MULT256 (0x00006200) +#define NVC5B0_MPEG12_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID (0x00006201) +#define NVC5B0_MPEG12_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID (0x00006202) +#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_OUT_BUF_TOO_SMALL (0x00006203) +#define NVC5B0_MPEG12_VLD_ERR_DATA_INFO_IN_BUF_TOO_SMALL (0x00006204) +#define NVC5B0_MPEG12_VLD_ERR_BITSTREAM_EMPTY (0x00006205) +#define NVC5B0_MPEG12_VLD_ERR_INVALID_PIC_STRUCTURE (0x00006206) +#define NVC5B0_MPEG12_VLD_ERR_INVALID_PIC_CODING_TYPE (0x00006207) +#define NVC5B0_MPEG12_VLD_ERR_FRAME_WIDTH_TOO_LARGE (0x00006208) +#define NVC5B0_MPEG12_VLD_ERR_FRAME_HEIGHT_TOO_LARGE (0x00006209) +#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_OUT_BUF_FULL_TIME_OUT (0x00006300) +#define NVC5B0_CMN_VLD_ERR_PDEC_RETURNED_ERROR (0x00007101) +#define NVC5B0_CMN_VLD_ERR_EDOB_FLUSH_TIME_OUT (0x00007102) +#define NVC5B0_CMN_VLD_ERR_EDOB_REWIND_TIME_OUT (0x00007103) +#define NVC5B0_CMN_VLD_ERR_VLD_WD_TIME_OUT (0x00007104) +#define NVC5B0_CMN_VLD_ERR_NUM_SLICES_ZERO (0x00007105) +#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_BUF_ADDR_OUT_OF_BOUND (0x00008100) +#define NVC5B0_MPEG4_VLD_ERR_BITSTREAM_ERROR (0x00008101) +#define NVC5B0_MPEG4_VLD_ERR_PIC_HDR_OUT_SIZE_NOT_MULT256 (0x00008200) +#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_OUT_SIZE_NOT_MULT256 (0x00008201) +#define NVC5B0_MPEG4_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID (0x00008202) +#define NVC5B0_MPEG4_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID (0x00008203) +#define NVC5B0_MPEG4_VLD_ERR_CTX_DMA_ID_PIC_HDR_OUT_INVALID (0x00008204) +#define NVC5B0_MPEG4_VLD_ERR_SLC_HDR_OUT_BUF_TOO_SMALL (0x00008205) +#define NVC5B0_MPEG4_VLD_ERR_PIC_HDR_OUT_BUF_ALREADY_VALID (0x00008206) +#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_OUT_BUF_TOO_SMALL (0x00008207) +#define NVC5B0_MPEG4_VLD_ERR_DATA_INFO_IN_BUF_TOO_SMALL (0x00008208) +#define NVC5B0_MPEG4_VLD_ERR_BITSTREAM_EMPTY (0x00008209) +#define NVC5B0_MPEG4_VLD_ERR_FRAME_WIDTH_TOO_LARGE (0x0000820A) +#define NVC5B0_MPEG4_VLD_ERR_FRAME_HEIGHT_TOO_LARGE (0x0000820B) +#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_OUT_BUF_FULL_TIME_OUT (0x00051E01) +#define NVC5B0_DEC_ERROR_MPEG12_APPTIMER_EXPIRED (0xDEC10001) +#define NVC5B0_DEC_ERROR_MPEG12_MVTIMER_EXPIRED (0xDEC10002) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_TOKEN (0xDEC10003) +#define NVC5B0_DEC_ERROR_MPEG12_SLICEDATA_MISSING (0xDEC10004) +#define NVC5B0_DEC_ERROR_MPEG12_HWERR_INTERRUPT (0xDEC10005) +#define NVC5B0_DEC_ERROR_MPEG12_DETECTED_VLD_FAILURE (0xDEC10006) +#define NVC5B0_DEC_ERROR_MPEG12_PICTURE_INIT (0xDEC10100) +#define NVC5B0_DEC_ERROR_MPEG12_STATEMACHINE_FAILURE (0xDEC10101) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_PIC (0xDEC10901) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_UCODE (0xDEC10902) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_FC (0xDEC10903) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_SLH (0xDEC10904) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_UCODE_SIZE (0xDEC10905) +#define NVC5B0_DEC_ERROR_MPEG12_INVALID_SLICE_COUNT (0xDEC10906) +#define NVC5B0_DEC_ERROR_VC1_APPTIMER_EXPIRED (0xDEC20001) +#define NVC5B0_DEC_ERROR_VC1_MVTIMER_EXPIRED (0xDEC20002) +#define NVC5B0_DEC_ERROR_VC1_INVALID_TOKEN (0xDEC20003) +#define NVC5B0_DEC_ERROR_VC1_SLICEDATA_MISSING (0xDEC20004) +#define NVC5B0_DEC_ERROR_VC1_HWERR_INTERRUPT (0xDEC20005) +#define NVC5B0_DEC_ERROR_VC1_DETECTED_VLD_FAILURE (0xDEC20006) +#define NVC5B0_DEC_ERROR_VC1_TIMEOUT_POLLING_FOR_DATA (0xDEC20007) +#define NVC5B0_DEC_ERROR_VC1_PDEC_PIC_END_UNALIGNED (0xDEC20008) +#define NVC5B0_DEC_ERROR_VC1_WDTIMER_EXPIRED (0xDEC20009) +#define NVC5B0_DEC_ERROR_VC1_ERRINTSTART (0xDEC20010) +#define NVC5B0_DEC_ERROR_VC1_IQT_ERRINT (0xDEC20011) +#define NVC5B0_DEC_ERROR_VC1_MC_ERRINT (0xDEC20012) +#define NVC5B0_DEC_ERROR_VC1_MC_IQT_ERRINT (0xDEC20013) +#define NVC5B0_DEC_ERROR_VC1_REC_ERRINT (0xDEC20014) +#define NVC5B0_DEC_ERROR_VC1_REC_IQT_ERRINT (0xDEC20015) +#define NVC5B0_DEC_ERROR_VC1_REC_MC_ERRINT (0xDEC20016) +#define NVC5B0_DEC_ERROR_VC1_REC_MC_IQT_ERRINT (0xDEC20017) +#define NVC5B0_DEC_ERROR_VC1_DBF_ERRINT (0xDEC20018) +#define NVC5B0_DEC_ERROR_VC1_DBF_IQT_ERRINT (0xDEC20019) +#define NVC5B0_DEC_ERROR_VC1_DBF_MC_ERRINT (0xDEC2001A) +#define NVC5B0_DEC_ERROR_VC1_DBF_MC_IQT_ERRINT (0xDEC2001B) +#define NVC5B0_DEC_ERROR_VC1_DBF_REC_ERRINT (0xDEC2001C) +#define NVC5B0_DEC_ERROR_VC1_DBF_REC_IQT_ERRINT (0xDEC2001D) +#define NVC5B0_DEC_ERROR_VC1_DBF_REC_MC_ERRINT (0xDEC2001E) +#define NVC5B0_DEC_ERROR_VC1_DBF_REC_MC_IQT_ERRINT (0xDEC2001F) +#define NVC5B0_DEC_ERROR_VC1_PICTURE_INIT (0xDEC20100) +#define NVC5B0_DEC_ERROR_VC1_STATEMACHINE_FAILURE (0xDEC20101) +#define NVC5B0_DEC_ERROR_VC1_INVALID_CTXID_PIC (0xDEC20901) +#define NVC5B0_DEC_ERROR_VC1_INVALID_CTXID_UCODE (0xDEC20902) +#define NVC5B0_DEC_ERROR_VC1_INVALID_CTXID_FC (0xDEC20903) +#define NVC5B0_DEC_ERROR_VC1_INVAILD_CTXID_SLH (0xDEC20904) +#define NVC5B0_DEC_ERROR_VC1_INVALID_UCODE_SIZE (0xDEC20905) +#define NVC5B0_DEC_ERROR_VC1_INVALID_SLICE_COUNT (0xDEC20906) +#define NVC5B0_DEC_ERROR_H264_APPTIMER_EXPIRED (0xDEC30001) +#define NVC5B0_DEC_ERROR_H264_MVTIMER_EXPIRED (0xDEC30002) +#define NVC5B0_DEC_ERROR_H264_INVALID_TOKEN (0xDEC30003) +#define NVC5B0_DEC_ERROR_H264_SLICEDATA_MISSING (0xDEC30004) +#define NVC5B0_DEC_ERROR_H264_HWERR_INTERRUPT (0xDEC30005) +#define NVC5B0_DEC_ERROR_H264_DETECTED_VLD_FAILURE (0xDEC30006) +#define NVC5B0_DEC_ERROR_H264_ERRINTSTART (0xDEC30010) +#define NVC5B0_DEC_ERROR_H264_IQT_ERRINT (0xDEC30011) +#define NVC5B0_DEC_ERROR_H264_MC_ERRINT (0xDEC30012) +#define NVC5B0_DEC_ERROR_H264_MC_IQT_ERRINT (0xDEC30013) +#define NVC5B0_DEC_ERROR_H264_REC_ERRINT (0xDEC30014) +#define NVC5B0_DEC_ERROR_H264_REC_IQT_ERRINT (0xDEC30015) +#define NVC5B0_DEC_ERROR_H264_REC_MC_ERRINT (0xDEC30016) +#define NVC5B0_DEC_ERROR_H264_REC_MC_IQT_ERRINT (0xDEC30017) +#define NVC5B0_DEC_ERROR_H264_DBF_ERRINT (0xDEC30018) +#define NVC5B0_DEC_ERROR_H264_DBF_IQT_ERRINT (0xDEC30019) +#define NVC5B0_DEC_ERROR_H264_DBF_MC_ERRINT (0xDEC3001A) +#define NVC5B0_DEC_ERROR_H264_DBF_MC_IQT_ERRINT (0xDEC3001B) +#define NVC5B0_DEC_ERROR_H264_DBF_REC_ERRINT (0xDEC3001C) +#define NVC5B0_DEC_ERROR_H264_DBF_REC_IQT_ERRINT (0xDEC3001D) +#define NVC5B0_DEC_ERROR_H264_DBF_REC_MC_ERRINT (0xDEC3001E) +#define NVC5B0_DEC_ERROR_H264_DBF_REC_MC_IQT_ERRINT (0xDEC3001F) +#define NVC5B0_DEC_ERROR_H264_PICTURE_INIT (0xDEC30100) +#define NVC5B0_DEC_ERROR_H264_STATEMACHINE_FAILURE (0xDEC30101) +#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_PIC (0xDEC30901) +#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_UCODE (0xDEC30902) +#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_FC (0xDEC30903) +#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_SLH (0xDEC30904) +#define NVC5B0_DEC_ERROR_H264_INVALID_UCODE_SIZE (0xDEC30905) +#define NVC5B0_DEC_ERROR_H264_INVALID_SLICE_COUNT (0xDEC30906) +#define NVC5B0_DEC_ERROR_MPEG4_APPTIMER_EXPIRED (0xDEC40001) +#define NVC5B0_DEC_ERROR_MPEG4_MVTIMER_EXPIRED (0xDEC40002) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_TOKEN (0xDEC40003) +#define NVC5B0_DEC_ERROR_MPEG4_SLICEDATA_MISSING (0xDEC40004) +#define NVC5B0_DEC_ERROR_MPEG4_HWERR_INTERRUPT (0xDEC40005) +#define NVC5B0_DEC_ERROR_MPEG4_DETECTED_VLD_FAILURE (0xDEC40006) +#define NVC5B0_DEC_ERROR_MPEG4_TIMEOUT_POLLING_FOR_DATA (0xDEC40007) +#define NVC5B0_DEC_ERROR_MPEG4_PDEC_PIC_END_UNALIGNED (0xDEC40008) +#define NVC5B0_DEC_ERROR_MPEG4_WDTIMER_EXPIRED (0xDEC40009) +#define NVC5B0_DEC_ERROR_MPEG4_ERRINTSTART (0xDEC40010) +#define NVC5B0_DEC_ERROR_MPEG4_IQT_ERRINT (0xDEC40011) +#define NVC5B0_DEC_ERROR_MPEG4_MC_ERRINT (0xDEC40012) +#define NVC5B0_DEC_ERROR_MPEG4_MC_IQT_ERRINT (0xDEC40013) +#define NVC5B0_DEC_ERROR_MPEG4_REC_ERRINT (0xDEC40014) +#define NVC5B0_DEC_ERROR_MPEG4_REC_IQT_ERRINT (0xDEC40015) +#define NVC5B0_DEC_ERROR_MPEG4_REC_MC_ERRINT (0xDEC40016) +#define NVC5B0_DEC_ERROR_MPEG4_REC_MC_IQT_ERRINT (0xDEC40017) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_ERRINT (0xDEC40018) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_IQT_ERRINT (0xDEC40019) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_MC_ERRINT (0xDEC4001A) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_MC_IQT_ERRINT (0xDEC4001B) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_ERRINT (0xDEC4001C) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_IQT_ERRINT (0xDEC4001D) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_MC_ERRINT (0xDEC4001E) +#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_MC_IQT_ERRINT (0xDEC4001F) +#define NVC5B0_DEC_ERROR_MPEG4_PICTURE_INIT (0xDEC40100) +#define NVC5B0_DEC_ERROR_MPEG4_STATEMACHINE_FAILURE (0xDEC40101) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_PIC (0xDEC40901) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_UCODE (0xDEC40902) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_FC (0xDEC40903) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_SLH (0xDEC40904) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_UCODE_SIZE (0xDEC40905) +#define NVC5B0_DEC_ERROR_MPEG4_INVALID_SLICE_COUNT (0xDEC40906) + +#ifdef __cplusplus +}; /* extern "C" */ +#endif +#endif /* AVUTIL_CLC5B0_H */ diff --git a/libavutil/cle7d0.h b/libavutil/cle7d0.h new file mode 100644 index 0000000000..f17e67036f --- /dev/null +++ b/libavutil/cle7d0.h @@ -0,0 +1,129 @@ +/******************************************************************************* + Copyright (c) 1993-2020, NVIDIA CORPORATION. All rights reserved. + + Permission is hereby granted, free of charge, to any person obtaining a + copy of this software and associated documentation files (the "Software"), + to deal in the Software without restriction, including without limitation + the rights to use, copy, modify, merge, publish, distribute, sublicense, + and/or sell copies of the Software, and to permit persons to whom the + Software is furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + DEALINGS IN THE SOFTWARE. + +*******************************************************************************/ + +#ifndef AVUTIL_CLE7D0_H +#define AVUTIL_CLE7D0_H + +#ifdef __cplusplus +extern "C" { +#endif + +#define NVE7D0_VIDEO_NVJPG (0x0000E7D0) + +#define NVE7D0_NOP (0x00000100) +#define NVE7D0_NOP_PARAMETER 31:0 +#define NVE7D0_SET_APPLICATION_ID (0x00000200) +#define NVE7D0_SET_APPLICATION_ID_ID 31:0 +#define NVE7D0_SET_APPLICATION_ID_ID_NVJPG_DECODER (0x00000001) +#define NVE7D0_SET_APPLICATION_ID_ID_NVJPG_ENCODER (0x00000002) +#define NVE7D0_SET_WATCHDOG_TIMER (0x00000204) +#define NVE7D0_SET_WATCHDOG_TIMER_TIMER 31:0 +#define NVE7D0_SEMAPHORE_A (0x00000240) +#define NVE7D0_SEMAPHORE_A_UPPER 7:0 +#define NVE7D0_SEMAPHORE_B (0x00000244) +#define NVE7D0_SEMAPHORE_B_LOWER 31:0 +#define NVE7D0_SEMAPHORE_C (0x00000248) +#define NVE7D0_SEMAPHORE_C_PAYLOAD 31:0 +#define NVE7D0_CTX_SAVE_AREA (0x0000024C) +#define NVE7D0_CTX_SAVE_AREA_OFFSET 27:0 +#define NVE7D0_CTX_SAVE_AREA_CTX_VALID 31:28 +#define NVE7D0_CTX_SWITCH (0x00000250) +#define NVE7D0_CTX_SWITCH_RESTORE 0:0 +#define NVE7D0_CTX_SWITCH_RESTORE_FALSE (0x00000000) +#define NVE7D0_CTX_SWITCH_RESTORE_TRUE (0x00000001) +#define NVE7D0_CTX_SWITCH_RST_NOTIFY 1:1 +#define NVE7D0_CTX_SWITCH_RST_NOTIFY_FALSE (0x00000000) +#define NVE7D0_CTX_SWITCH_RST_NOTIFY_TRUE (0x00000001) +#define NVE7D0_CTX_SWITCH_RESERVED 7:2 +#define NVE7D0_CTX_SWITCH_ASID 23:8 +#define NVE7D0_EXECUTE (0x00000300) +#define NVE7D0_EXECUTE_NOTIFY 0:0 +#define NVE7D0_EXECUTE_NOTIFY_DISABLE (0x00000000) +#define NVE7D0_EXECUTE_NOTIFY_ENABLE (0x00000001) +#define NVE7D0_EXECUTE_NOTIFY_ON 1:1 +#define NVE7D0_EXECUTE_NOTIFY_ON_END (0x00000000) +#define NVE7D0_EXECUTE_NOTIFY_ON_BEGIN (0x00000001) +#define NVE7D0_EXECUTE_AWAKEN 8:8 +#define NVE7D0_EXECUTE_AWAKEN_DISABLE (0x00000000) +#define NVE7D0_EXECUTE_AWAKEN_ENABLE (0x00000001) +#define NVE7D0_SEMAPHORE_D (0x00000304) +#define NVE7D0_SEMAPHORE_D_STRUCTURE_SIZE 0:0 +#define NVE7D0_SEMAPHORE_D_STRUCTURE_SIZE_ONE (0x00000000) +#define NVE7D0_SEMAPHORE_D_STRUCTURE_SIZE_FOUR (0x00000001) +#define NVE7D0_SEMAPHORE_D_AWAKEN_ENABLE 8:8 +#define NVE7D0_SEMAPHORE_D_AWAKEN_ENABLE_FALSE (0x00000000) +#define NVE7D0_SEMAPHORE_D_AWAKEN_ENABLE_TRUE (0x00000001) +#define NVE7D0_SEMAPHORE_D_OPERATION 17:16 +#define NVE7D0_SEMAPHORE_D_OPERATION_RELEASE (0x00000000) +#define NVE7D0_SEMAPHORE_D_OPERATION_RESERVED0 (0x00000001) +#define NVE7D0_SEMAPHORE_D_OPERATION_RESERVED1 (0x00000002) +#define NVE7D0_SEMAPHORE_D_OPERATION_TRAP (0x00000003) +#define NVE7D0_SEMAPHORE_D_FLUSH_DISABLE 21:21 +#define NVE7D0_SEMAPHORE_D_FLUSH_DISABLE_FALSE (0x00000000) +#define NVE7D0_SEMAPHORE_D_FLUSH_DISABLE_TRUE (0x00000001) +#define NVE7D0_SET_CONTROL_PARAMS (0x00000700) +#define NVE7D0_SET_CONTROL_PARAMS_GPTIMER_ON 0:0 +#define NVE7D0_SET_CONTROL_PARAMS_DUMP_CYCLE_COUNT 1:1 +#define NVE7D0_SET_CONTROL_PARAMS_DEBUG_MODE 2:2 +#define NVE7D0_SET_PICTURE_INDEX (0x00000704) +#define NVE7D0_SET_PICTURE_INDEX_INDEX 31:0 +#define NVE7D0_SET_IN_DRV_PIC_SETUP (0x00000708) +#define NVE7D0_SET_IN_DRV_PIC_SETUP_OFFSET 31:0 +#define NVE7D0_SET_OUT_STATUS (0x0000070C) +#define NVE7D0_SET_OUT_STATUS_OFFSET 31:0 +#define NVE7D0_SET_BITSTREAM (0x00000710) +#define NVE7D0_SET_BITSTREAM_OFFSET 31:0 +#define NVE7D0_SET_CUR_PIC (0x00000714) +#define NVE7D0_SET_CUR_PIC_OFFSET 31:0 +#define NVE7D0_SET_CUR_PIC_CHROMA_U (0x00000718) +#define NVE7D0_SET_CUR_PIC_CHROMA_U_OFFSET 31:0 +#define NVE7D0_SET_CUR_PIC_CHROMA_V (0x0000071C) +#define NVE7D0_SET_CUR_PIC_CHROMA_V_OFFSET 31:0 + +#define NVE7D0_ERROR_NONE (0x00000000) +#define NVE7D0_OS_ERROR_EXECUTE_INSUFFICIENT_DATA (0x00000001) +#define NVE7D0_OS_ERROR_SEMAPHORE_INSUFFICIENT_DATA (0x00000002) +#define NVE7D0_OS_ERROR_INVALID_METHOD (0x00000003) +#define NVE7D0_OS_ERROR_INVALID_DMA_PAGE (0x00000004) +#define NVE7D0_OS_ERROR_UNHANDLED_INTERRUPT (0x00000005) +#define NVE7D0_OS_ERROR_EXCEPTION (0x00000006) +#define NVE7D0_OS_ERROR_INVALID_CTXSW_REQUEST (0x00000007) +#define NVE7D0_OS_ERROR_APPLICATION (0x00000008) +#define NVE7D0_OS_INTERRUPT_EXECUTE_AWAKEN (0x00000100) +#define NVE7D0_OS_INTERRUPT_BACKEND_SEMAPHORE_AWAKEN (0x00000200) +#define NVE7D0_OS_INTERRUPT_CTX_ERROR_FBIF (0x00000300) +#define NVE7D0_OS_INTERRUPT_LIMIT_VIOLATION (0x00000400) +#define NVE7D0_OS_INTERRUPT_LIMIT_AND_FBIF_CTX_ERROR (0x00000500) +#define NVE7D0_OS_INTERRUPT_HALT_ENGINE (0x00000600) +#define NVE7D0_OS_INTERRUPT_TRAP_NONSTALL (0x00000700) +#define NVE7D0_OS_INTERRUPT_CTX_SAVE_DONE (0x00000800) +#define NVE7D0_OS_INTERRUPT_CTX_RESTORE_DONE (0x00000900) +#define NVE7D0_ERROR_JPGAPPTIMER_EXPIRED (0x30000001) +#define NVE7D0_ERROR_JPGINVALID_INPUT (0x30000002) +#define NVE7D0_ERROR_JPGHWERR_INTERRUPT (0x30000003) +#define NVE7D0_ERROR_JPGBAD_MAGIC (0x30000004) + +#ifdef __cplusplus +}; /* extern "C" */ +#endif +#endif /* AVUTIL_CLE7D0_H */ diff --git a/libavutil/nvdec_drv.h b/libavutil/nvdec_drv.h new file mode 100644 index 0000000000..7803cd16b3 --- /dev/null +++ b/libavutil/nvdec_drv.h @@ -0,0 +1,1858 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: MIT + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#ifndef AVUTIL_NVDEC_DRV_H +#define AVUTIL_NVDEC_DRV_H + +// TODO: Many fields can be converted to bitfields to save memory BW +// TODO: Revisit reserved fields for proper alignment and memory savings + +/////////////////////////////////////////////////////////////////////////////// +// NVDEC(MSDEC 5) is a single engine solution, and seperates into VLD, MV, IQT, +// MCFETCH, MC, MCC, REC, DBF, DFBFDMA, HIST etc unit. +// The class(driver to HW) can mainly seperate into VLD parser +// and Decoder part to be consistent with original design. And +// the sequence level info usally set in VLD part. Later codec like +// VP8 won't name in this way. +// MSVLD: Multi-Standard VLD parser. +// +#define ALIGN_UP(v, n) (((v) + ((n)-1)) &~ ((n)-1)) +#define NVDEC_ALIGN(value) ALIGN_UP(value,256) // Align to 256 bytes +#define NVDEC_MAX_MPEG2_SLICE 65536 // at 4096*4096, macroblock count = 65536, 1 macroblock per slice + +#define NVDEC_CODEC_MPEG1 0 +#define NVDEC_CODEC_MPEG2 1 +#define NVDEC_CODEC_VC1 2 +#define NVDEC_CODEC_H264 3 +#define NVDEC_CODEC_MPEG4 4 +#define NVDEC_CODEC_DIVX NVDEC_CODEC_MPEG4 +#define NVDEC_CODEC_VP8 5 +#define NVDEC_CODEC_HEVC 7 +#define NVDEC_CODEC_VP9 9 +#define NVDEC_CODEC_HEVC_PARSER 12 +#define NVDEC_CODEC_AV1 10 + +// AES encryption +enum +{ + AES128_NONE = 0x0, + AES128_CTR = 0x1, + AES128_CBC, + AES128_ECB, + AES128_OFB, + AES128_CTR_LSB16B, + AES128_CLR_AS_ENCRYPT, + AES128_RESERVED = 0x7 +}; + +enum +{ + AES128_CTS_DISABLE = 0x0, + AES128_CTS_ENABLE = 0x1 +}; + +enum +{ + AES128_PADDING_NONE = 0x0, + AES128_PADDING_CARRY_OVER, + AES128_PADDING_RFC2630, + AES128_PADDING_RESERVED = 0x7 +}; + +typedef enum +{ + ENCR_MODE_CTR64 = 0, + ENCR_MODE_CBC = 1, + ENCR_MODE_ECB = 2, + ENCR_MODE_ECB_PARTIAL = 3, + ENCR_MODE_CBC_PARTIAL = 4, + ENCR_MODE_CLEAR_INTO_VPR = 5, // used for clear stream decoding into VPR. + ENCR_MODE_FORCE_INTO_VPR = 6, // used to force decode output into VPR. +} ENCR_MODE; + +// drm_mode configuration +// +// Bit 0:2 AES encryption mode +// Bit 3 CTS (CipherTextStealing) enable/disable +// Bit 4:6 Padding type +// Bit 7:7 Unwrap key enable/disable + +#define AES_MODE_MASK 0x7 +#define AES_CTS_MASK 0x1 +#define AES_PADDING_TYPE_MASK 0x7 +#define AES_UNWRAP_KEY_MASK 0x1 + +#define AES_MODE_SHIFT 0 +#define AES_CTS_SHIFT 3 +#define AES_PADDING_TYPE_SHIFT 4 +#define AES_UNWRAP_KEY_SHIFT 7 + +#define AES_SET_FLAG(M, C, P) ((M & AES_MODE_MASK) << AES_MODE_SHIFT) | \ + ((C & AES_CTS_MASK) << AES_CTS_SHIFT) | \ + ((P & AES_PADDING_TYPE_MASK) << AES_PADDING_TYPE_SHIFT) + +#define AES_GET_FLAG(V, F) ((V & ((AES_##F##_MASK) <<(AES_##F##_SHIFT))) >> (AES_##F##_SHIFT)) + +#define DRM_MODE_MASK 0x7f // Bits 0:6 (0:2 -> AES_MODE, 3 -> AES_CTS, 4:6 -> AES_PADDING_TYPE) +#define AES_GET_DRM_MODE(V) (V & DRM_MODE_MASK) + +enum { DRM_MS_PIFF_CTR = AES_SET_FLAG(AES128_CTR, AES128_CTS_DISABLE, AES128_PADDING_CARRY_OVER) }; +enum { DRM_MS_PIFF_CBC = AES_SET_FLAG(AES128_CBC, AES128_CTS_DISABLE, AES128_PADDING_NONE) }; +enum { DRM_MARLIN_CTR = AES_SET_FLAG(AES128_CTR, AES128_CTS_DISABLE, AES128_PADDING_NONE) }; +enum { DRM_MARLIN_CBC = AES_SET_FLAG(AES128_CBC, AES128_CTS_DISABLE, AES128_PADDING_RFC2630) }; +enum { DRM_WIDEVINE = AES_SET_FLAG(AES128_CBC, AES128_CTS_ENABLE, AES128_PADDING_NONE) }; +enum { DRM_WIDEVINE_CTR = AES_SET_FLAG(AES128_CTR, AES128_CTS_DISABLE, AES128_PADDING_CARRY_OVER) }; +enum { DRM_ULTRA_VIOLET = AES_SET_FLAG(AES128_CTR_LSB16B, AES128_CTS_DISABLE, AES128_PADDING_NONE) }; +enum { DRM_NONE = AES_SET_FLAG(AES128_NONE, AES128_CTS_DISABLE, AES128_PADDING_NONE) }; +enum { DRM_CLR_AS_ENCRYPT = AES_SET_FLAG(AES128_CLR_AS_ENCRYPT, AES128_CTS_DISABLE, AES128_PADDING_NONE)}; + +// SSM entry structure +typedef struct _nvdec_ssm_s { + unsigned int bytes_of_protected_data;//bytes of protected data, follows bytes_of_clear_data. Note: When padding is enabled, it does not include the padding_bytes (1~15), which can be derived by "(16-(bytes_of_protected_data&0xF))&0xF" + unsigned int bytes_of_clear_data:16; //bytes of clear data, located before bytes_of_protected_data + unsigned int skip_byte_blk : 4; //valid when (entry_type==0 && mode = 1) + unsigned int crypt_byte_blk : 4; //valid when (entry_type==0 && mode = 1) + unsigned int skip : 1; //whether this SSM entry should be skipped or not + unsigned int last : 1; //whether this SSM entry is the last one for the whole decoding frame + unsigned int pad : 1; //valid when (entry_type==0 && mode==0 && AES_PADDING_TYPE==AES128_PADDING_RFC2630), 0 for pad_end, 1 for pad_begin + unsigned int mode : 1; //0 for normal mode, 1 for pattern mode + unsigned int entry_type : 1; //0 for DATA, 1 for IV + unsigned int reserved : 3; +} nvdec_ssm_s; /* SubSampleMap, 8bytes */ + +// PASS2 OTF extension structure for SSM support, not exist in nvdec_mpeg4_pic_s (as MPEG4 OTF SW-DRM is not supported yet) +typedef struct _nvdec_pass2_otf_ext_s { + unsigned int ssm_entry_num :16; //specifies how many SSM entries (each in unit of 8 bytes) existed in SET_SUB_SAMPLE_MAP_OFFSET surface + unsigned int ssm_iv_num :16; //specifies how many SSM IV (each in unit of 16 bytes) existed in SET_SUB_SAMPLE_MAP_IV_OFFSET surface + unsigned int real_stream_length; //the real stream length, which is the bitstream length EMD/VLD will get after whole frame SSM processing, sum up of "clear+protected" bytes in SSM entries and removing "non_slice_data/skip". + unsigned int non_slice_data :16; //specifies the first many bytes needed to skip, includes only those of "clear+protected" bytes ("padding" bytes excluded) + unsigned int drm_mode : 7; + unsigned int reserved : 9; +} nvdec_pass2_otf_ext_s; /* 12bytes */ + + +//NVDEC5.0 low latency decoding (partial stream kickoff without context switch), method will reuse HevcSetSliceInfoBufferOffset. +typedef struct _nvdec_substream_entry_s { + unsigned int substream_start_offset; //substream byte start offset to bitstream base address + unsigned int substream_length; //subsream length in byte + unsigned int substream_first_tile_idx : 8; //the first tile index(raster scan in frame) of this substream,max is 255 + unsigned int substream_last_tile_idx : 8; //the last tile index(raster scan in frame) of this substream, max is 255 + unsigned int last_substream_entry_in_frame : 1; //this entry is the last substream entry of this frame + unsigned int reserved : 15; +} nvdec_substream_entry_s;/*low latency without context switch substream entry map,12bytes*/ + + +// GIP + +/* tile border coefficients of filter */ +#define GIP_ASIC_VERT_FILTER_RAM_SIZE 16 /* bytes per pixel */ + +/* BSD control data of current picture at tile border + * 11 * 128 bits per 4x4 tile = 128/(8*4) bytes per row */ +#define GIP_ASIC_BSD_CTRL_RAM_SIZE 4 /* bytes per row */ + +/* 8 dc + 8 to boundary + 6*16 + 2*6*64 + 2*64 -> 63 * 16 bytes */ +#define GIP_ASIC_SCALING_LIST_SIZE (16*64) + +/* tile border coefficients of filter */ +#define GIP_ASIC_VERT_SAO_RAM_SIZE 16 /* bytes per pixel */ + +/* max number of tiles times width and height (2 bytes each), + * rounding up to next 16 bytes boundary + one extra 16 byte + * chunk (HW guys wanted to have this) */ +#define GIP_ASIC_TILE_SIZE ((20*22*2*2+16+15) & ~0xF) + +/* Segment map uses 32 bytes / CTB */ +#define GIP_ASIC_VP9_CTB_SEG_SIZE 32 + +// HEVC Filter FG buffer +#define HEVC_DBLK_TOP_SIZE_IN_SB16 ALIGN_UP(632, 128) // ctb16 + 444 +#define HEVC_DBLK_TOP_BUF_SIZE(w) NVDEC_ALIGN( (ALIGN_UP(w,16)/16 + 2) * HEVC_DBLK_TOP_SIZE_IN_SB16) // 8K: 1285*256 + +#define HEVC_DBLK_LEFT_SIZE_IN_SB16 ALIGN_UP(506, 128) // ctb16 + 444 +#define HEVC_DBLK_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,16)/16 + 2) * HEVC_DBLK_LEFT_SIZE_IN_SB16) // 8K: 1028*256 + +#define HEVC_SAO_LEFT_SIZE_IN_SB16 ALIGN_UP(713, 128) // ctb16 + 444 +#define HEVC_SAO_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,16)/16 + 2) * HEVC_SAO_LEFT_SIZE_IN_SB16) // 8K: 1542*256 + +// VP9 Filter FG buffer +#define VP9_DBLK_TOP_SIZE_IN_SB64 ALIGN_UP(2000, 128) // 420 +#define VP9_DBLK_TOP_BUF_SIZE(w) NVDEC_ALIGN( (ALIGN_UP(w,64)/64 + 2) * VP9_DBLK_TOP_SIZE_IN_SB64) // 8K: 1040*256 + +#define VP9_DBLK_LEFT_SIZE_IN_SB64 ALIGN_UP(1600, 128) // 420 +#define VP9_DBLK_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * VP9_DBLK_LEFT_SIZE_IN_SB64) // 8K: 845*256 + +// VP9 Hint Dump Buffer +#define VP9_HINT_DUMP_SIZE_IN_SB64 ((64*64)/(4*4)*8) // 8 bytes per CU, 256 CUs(2048 bytes) per SB64 +#define VP9_HINT_DUMP_SIZE(w, h) NVDEC_ALIGN(VP9_HINT_DUMP_SIZE_IN_SB64*((w+63)/64)*((h+63)/64)) + +// used for ecdma debug +typedef struct _nvdec_ecdma_config_s +{ + unsigned int ecdma_enable; // enable/disable ecdma + unsigned short ecdma_blk_x_src; // src start position x , it's 64x aligned + unsigned short ecdma_blk_y_src; // src start position y , it's 8x aligned + unsigned short ecdma_blk_x_dst; // dst start position x , it's 64x aligned + unsigned short ecdma_blk_y_dst; // dst start position y , it's 8x aligned + unsigned short ref_pic_idx; // ref(src) picture index , used to derived source picture base address + unsigned short boundary0_top; // src insided tile/partition region top boundary + unsigned short boundary0_bottom; // src insided tile/partition region bottom boundary + unsigned short boundary1_left; // src insided tile/partition region left boundary + unsigned short boundary1_right; // src insided tile/partition region right boundary + unsigned char blk_copy_flag; // blk_copy enable flag. + // if it's 1 ,ctb_size ==3,ecdma_blk_x_src == boundary1_left and ecdma_blk_y_src == boundary0_top ; + // if it's 0 ,ecdma_blk_x_src == ecdma_blk_x_dst and ecdma_blk_y_src == ecdma_blk_y_dst; + unsigned char ctb_size; // ctb_size .0:64x64,1:32x32,2:16x16,3:8x8 +} nvdec_ecdma_config_s; + +typedef struct _nvdec_status_hevc_s +{ + unsigned int frame_status_intra_cnt; //Intra block counter, in unit of 8x8 block, IPCM block included + unsigned int frame_status_inter_cnt; //Inter block counter, in unit of 8x8 block, SKIP block included + unsigned int frame_status_skip_cnt; //Skip block counter, in unit of 4x4 block, blocks having NO/ZERO texture/coeff data + unsigned int frame_status_fwd_mvx_cnt; //ABS sum of forward MVx, one 14bit MVx(integer) per 4x4 block + unsigned int frame_status_fwd_mvy_cnt; //ABS sum of forward MVy, one 14bit MVy(integer) per 4x4 block + unsigned int frame_status_bwd_mvx_cnt; //ABS sum of backward MVx, one 14bit MVx(integer) per 4x4 block + unsigned int frame_status_bwd_mvy_cnt; //ABS sum of backward MVy, one 14bit MVy(integer) per 4x4 block + unsigned int error_ctb_pos; //[15:0] error ctb position in Y direction, [31:16] error ctb position in X direction + unsigned int error_slice_pos; //[15:0] error slice position in Y direction, [31:16] error slice position in X direction +} nvdec_status_hevc_s; + +typedef struct _nvdec_status_vp9_s +{ + unsigned int frame_status_intra_cnt; //Intra block counter, in unit of 8x8 block, IPCM block included + unsigned int frame_status_inter_cnt; //Inter block counter, in unit of 8x8 block, SKIP block included + unsigned int frame_status_skip_cnt; //Skip block counter, in unit of 4x4 block, blocks having NO/ZERO texture/coeff data + unsigned int frame_status_fwd_mvx_cnt; //ABS sum of forward MVx, one 14bit MVx(integer) per 4x4 block + unsigned int frame_status_fwd_mvy_cnt; //ABS sum of forward MVy, one 14bit MVy(integer) per 4x4 block + unsigned int frame_status_bwd_mvx_cnt; //ABS sum of backward MVx, one 14bit MVx(integer) per 4x4 block + unsigned int frame_status_bwd_mvy_cnt; //ABS sum of backward MVy, one 14bit MVy(integer) per 4x4 block + unsigned int error_ctb_pos; //[15:0] error ctb position in Y direction, [31:16] error ctb position in X direction + unsigned int error_slice_pos; //[15:0] error slice position in Y direction, [31:16] error slice position in X direction +} nvdec_status_vp9_s; + +typedef struct _nvdec_status_s +{ + unsigned int mbs_correctly_decoded; // total numers of correctly decoded macroblocks + unsigned int mbs_in_error; // number of error macroblocks. + unsigned int cycle_count; // total cycles taken for execute. read from PERF_DECODE_FRAME_V register + unsigned int error_status; // report error if any + union + { + nvdec_status_hevc_s hevc; + nvdec_status_vp9_s vp9; + }; + unsigned int slice_header_error_code; // report error in slice header + +} nvdec_status_s; + +// per 16x16 block, used in hevc/vp9 surface of SetExternalMVBufferOffset when error_external_mv_en = 1 +typedef struct _external_mv_s +{ + int mvx : 14; //integrate pixel precision + int mvy : 14; //integrate pixel precision + unsigned int refidx : 4; +} external_mv_s; + +// HEVC +typedef struct _nvdec_hevc_main10_444_ext_s +{ + unsigned int transformSkipRotationEnableFlag : 1; //sps extension for transform_skip_rotation_enabled_flag + unsigned int transformSkipContextEnableFlag : 1; //sps extension for transform_skip_context_enabled_flag + unsigned int intraBlockCopyEnableFlag :1; //sps intraBlockCopyEnableFlag, always 0 before spec define it + unsigned int implicitRdpcmEnableFlag : 1; //sps implicit_rdpcm_enabled_flag + unsigned int explicitRdpcmEnableFlag : 1; //sps explicit_rdpcm_enabled_flag + unsigned int extendedPrecisionProcessingFlag : 1; //sps extended_precision_processing_flag,always 0 in current profile + unsigned int intraSmoothingDisabledFlag : 1; //sps intra_smoothing_disabled_flag + unsigned int highPrecisionOffsetsEnableFlag :1; //sps high_precision_offsets_enabled_flag + unsigned int fastRiceAdaptationEnableFlag: 1; //sps fast_rice_adaptation_enabled_flag + unsigned int cabacBypassAlignmentEnableFlag : 1; //sps cabac_bypass_alignment_enabled_flag, always 0 in current profile + unsigned int sps_444_extension_reserved : 22; //sps reserve for future extension + + unsigned int log2MaxTransformSkipSize : 4 ; //pps extension log2_max_transform_skip_block_size_minus2, 0...5 + unsigned int crossComponentPredictionEnableFlag: 1; //pps cross_component_prediction_enabled_flag + unsigned int chromaQpAdjustmentEnableFlag:1; //pps chroma_qp_adjustment_enabled_flag + unsigned int diffCuChromaQpAdjustmentDepth:2; //pps diff_cu_chroma_qp_adjustment_depth, 0...3 + unsigned int chromaQpAdjustmentTableSize:3; //pps chroma_qp_adjustment_table_size_minus1+1, 1...6 + unsigned int log2SaoOffsetScaleLuma:3; //pps log2_sao_offset_scale_luma, max(0,bitdepth-10),maxBitdepth 16 for future. + unsigned int log2SaoOffsetScaleChroma: 3; //pps log2_sao_offset_scale_chroma + unsigned int pps_444_extension_reserved : 15; //pps reserved + char cb_qp_adjustment[6]; //-[12,+12] + char cr_qp_adjustment[6]; //-[12,+12] + unsigned int HevcFltAboveOffset; // filter above offset respect to filter buffer, 256 bytes unit + unsigned int HevcSaoAboveOffset; // sao above offset respect to filter buffer, 256 bytes unit +} nvdec_hevc_main10_444_ext_s; + +typedef struct _nvdec_hevc_pic_v1_s +{ + // New fields + //hevc main10 444 extensions + nvdec_hevc_main10_444_ext_s hevc_main10_444_ext; + + //HEVC skip bytes from beginning setting for secure + //it is different to the sw_hdr_skip_length who skips the middle of stream of + //the slice header which is parsed by driver + unsigned int sw_skip_start_length : 14; + unsigned int external_ref_mem_dis : 1; + unsigned int error_recovery_start_pos : 2; //0: from start of frame, 1: from start of slice segment, 2: from error detected ctb, 3: reserved + unsigned int error_external_mv_en : 1; + unsigned int reserved0 : 14; + // Reserved bits padding +} nvdec_hevc_pic_v1_s; + +//No versioning in structure: NVDEC2 (T210 and GM206) +//version v1 : NVDEC3 (T186 and GP100) +//version v2 : NVDEC3.1 (GP10x) + +typedef struct _nvdec_hevc_pic_v2_s +{ + // mv-hevc field + unsigned int mv_hevc_enable :1; + unsigned int nuh_layer_id :6; + unsigned int default_ref_layers_active_flag :1; + unsigned int NumDirectRefLayers :6; + unsigned int max_one_active_ref_layer_flag :1; + unsigned int NumActiveRefLayerPics :6; + unsigned int poc_lsb_not_present_flag :1; + unsigned int reserved0 :10; +} nvdec_hevc_pic_v2_s; + +typedef struct _nvdec_hevc_pic_v3_s +{ + // slice level decoding + unsigned int slice_decoding_enable:1;//1: enable slice level decoding + unsigned int slice_ec_enable:1; //1: enable slice error concealment. When slice_ec_enable=1,slice_decoding_enable must be 1; + unsigned int slice_ec_mv_type:2; //0: zero mv; 1: co-located mv; 2: external mv; + unsigned int err_detected_sw:1; //1: indicate sw/driver has detected error already in frame kick mode + unsigned int slice_ec_slice_type:2; //0: B slice; 1: P slice ; others: reserved + unsigned int slice_strm_recfg_en:1; //enable slice bitstream re-configure or not ; + unsigned int reserved:24; + unsigned int HevcSliceEdgeOffset;// slice edge buffer offset which repsect to filter buffer ,256 bytes as one unit +}nvdec_hevc_pic_v3_s; + +typedef struct _nvdec_hevc_pic_s +{ + //The key/IV addr must be 128bit alignment + unsigned int wrapped_session_key[4]; //session keys + unsigned int wrapped_content_key[4]; //content keys + unsigned int initialization_vector[4]; //Ctrl64 initial vector + // hevc_bitstream_data_info + unsigned int stream_len; // stream length in one frame + unsigned int enable_encryption; // flag to enable/disable encryption + unsigned int key_increment : 6; // added to content key after unwrapping + unsigned int encryption_mode : 4; + unsigned int key_slot_index : 4; + unsigned int ssm_en : 1; + unsigned int enable_histogram : 1; // histogram stats output enable + unsigned int enable_substream_decoding: 1; //frame substream kickoff without context switch + unsigned int reserved0 :15; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + // general + unsigned char tileformat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned char gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char reserverd_surface_format : 3 ; + unsigned char sw_start_code_e; // 0: stream doesn't contain start codes,1: stream contains start codes + unsigned char disp_output_mode; // 0: Rec.709 8 bit, 1: Rec.709 10 bit, 2: Rec.709 10 bits -> 8 bit, 3: Rec.2020 10 bit -> 8 bit + unsigned char reserved1; + unsigned int framestride[2]; // frame buffer stride for luma and chroma + unsigned int colMvBuffersize; // collocated MV buffer size of one picture ,256 bytes unit + unsigned int HevcSaoBufferOffset; // sao buffer offset respect to filter buffer ,256 bytes unit . + unsigned int HevcBsdCtrlOffset; // bsd buffer offset respect to filter buffer ,256 bytes unit . + // sps + unsigned short pic_width_in_luma_samples; // :15, 48(?)..16384, multiple of 8 (48 is smallest width supported by NVDEC for CTU size 16x16) + unsigned short pic_height_in_luma_samples; // :15, 8..16384, multiple of 8 + unsigned int chroma_format_idc : 4; // always 1 (=4:2:0) + unsigned int bit_depth_luma : 4; // 8..12 + unsigned int bit_depth_chroma : 4; + unsigned int log2_min_luma_coding_block_size : 4; // 3..6 + unsigned int log2_max_luma_coding_block_size : 4; // 3..6 + unsigned int log2_min_transform_block_size : 4; // 2..5 + unsigned int log2_max_transform_block_size : 4; // 2..5 + unsigned int reserved2 : 4; + + unsigned int max_transform_hierarchy_depth_inter : 3; // 0..4 + unsigned int max_transform_hierarchy_depth_intra : 3; // 0..4 + unsigned int scalingListEnable : 1; // + unsigned int amp_enable_flag : 1; // + unsigned int sample_adaptive_offset_enabled_flag : 1; // + unsigned int pcm_enabled_flag : 1; // + unsigned int pcm_sample_bit_depth_luma : 4; // + unsigned int pcm_sample_bit_depth_chroma : 4; + unsigned int log2_min_pcm_luma_coding_block_size : 4; // + unsigned int log2_max_pcm_luma_coding_block_size : 4; // + unsigned int pcm_loop_filter_disabled_flag : 1; // + unsigned int sps_temporal_mvp_enabled_flag : 1; // + unsigned int strong_intra_smoothing_enabled_flag : 1; // + unsigned int reserved3 : 3; + // pps + unsigned int dependent_slice_segments_enabled_flag : 1; // + unsigned int output_flag_present_flag : 1; // + unsigned int num_extra_slice_header_bits : 3; // 0..7 (normally 0) + unsigned int sign_data_hiding_enabled_flag : 1; // + unsigned int cabac_init_present_flag : 1; // + unsigned int num_ref_idx_l0_default_active : 4; // 1..15 + unsigned int num_ref_idx_l1_default_active : 4; // 1..15 + unsigned int init_qp : 7; // 0..127, support higher bitdepth + unsigned int constrained_intra_pred_flag : 1; // + unsigned int transform_skip_enabled_flag : 1; // + unsigned int cu_qp_delta_enabled_flag : 1; // + unsigned int diff_cu_qp_delta_depth : 2; // 0..3 + unsigned int reserved4 : 5; // + + char pps_cb_qp_offset ; // -12..12 + char pps_cr_qp_offset ; // -12..12 + char pps_beta_offset ; // -12..12 + char pps_tc_offset ; // -12..12 + unsigned int pps_slice_chroma_qp_offsets_present_flag : 1; // + unsigned int weighted_pred_flag : 1; // + unsigned int weighted_bipred_flag : 1; // + unsigned int transquant_bypass_enabled_flag : 1; // + unsigned int tiles_enabled_flag : 1; // (redundant: = num_tile_columns_minus1!=0 || num_tile_rows_minus1!=0) + unsigned int entropy_coding_sync_enabled_flag : 1; // + unsigned int num_tile_columns : 5; // 0..20 + unsigned int num_tile_rows : 5; // 0..22 + unsigned int loop_filter_across_tiles_enabled_flag : 1; // + unsigned int loop_filter_across_slices_enabled_flag : 1; // + unsigned int deblocking_filter_control_present_flag : 1; // + unsigned int deblocking_filter_override_enabled_flag : 1; // + unsigned int pps_deblocking_filter_disabled_flag : 1; // + unsigned int lists_modification_present_flag : 1; // + unsigned int log2_parallel_merge_level : 3; // 2..4 + unsigned int slice_segment_header_extension_present_flag : 1; // (normally 0) + unsigned int reserved5 : 6; + + // reference picture related + unsigned char num_ref_frames; + unsigned char reserved6; + unsigned short longtermflag; // long term flag for refpiclist.bit 15 for picidx 0, bit 14 for picidx 1,... + unsigned char initreflistidxl0[16]; // :5, [refPicidx] 0..15 + unsigned char initreflistidxl1[16]; // :5, [refPicidx] 0..15 + short RefDiffPicOrderCnts[16]; // poc diff between current and reference pictures .[-128,127] + // misc + unsigned char IDR_picture_flag; // idr flag for current picture + unsigned char RAP_picture_flag; // rap flag for current picture + unsigned char curr_pic_idx; // current picture store buffer index,used to derive the store addess of frame buffer and MV + unsigned char pattern_id; // used for dithering to select between 2 tables + unsigned short sw_hdr_skip_length; // reference picture inititial related syntax elements(SE) bits in slice header. + // those SE only decoding once in driver,related bits will flush in HW + unsigned short reserved7; + + // used for ecdma debug + nvdec_ecdma_config_s ecdma_cfg; + + //DXVA on windows + unsigned int separate_colour_plane_flag : 1; + unsigned int log2_max_pic_order_cnt_lsb_minus4 : 4; //0~12 + unsigned int num_short_term_ref_pic_sets : 7 ; //0~64 + unsigned int num_long_term_ref_pics_sps : 6; //0~32 + unsigned int bBitParsingDisable : 1 ; //disable parsing + unsigned int num_delta_pocs_of_rps_idx : 8; + unsigned int long_term_ref_pics_present_flag : 1; + unsigned int reserved_dxva : 4; + //the number of bits for short_term_ref_pic_set()in slice header,dxva API + unsigned int num_bits_short_term_ref_pics_in_slice; + + // New additions + nvdec_hevc_pic_v1_s v1; + nvdec_hevc_pic_v2_s v2; + nvdec_hevc_pic_v3_s v3; + nvdec_pass2_otf_ext_s ssm; + +} nvdec_hevc_pic_s; + +//hevc slice info class +typedef struct _hevc_slice_info_s { + unsigned int first_flag :1;//first slice(s) of frame,must valid for slice EC + unsigned int err_flag :1;//error slice(s) .optional info for EC + unsigned int last_flag :1;//last slice segment(s) of frame,this bit is must be valid when slice_strm_recfg_en==1 or slice_ec==1 + unsigned int conceal_partial_slice :1; // indicate do partial slice error conealment for packet loss case + unsigned int available :1; // indicate the slice bitstream is available. + unsigned int reserved0 :7; + unsigned int ctb_count :20;// ctbs counter inside slice(s) .must valid for slice EC + unsigned int bs_offset; //slice(s) bitstream offset in bitstream buffer (in byte unit) + unsigned int bs_length; //slice(s) bitstream length. It is sum of aligned size and skip size and valid slice bitstream size. + unsigned short start_ctbx; //slice start ctbx ,it's optional,HW can output it in previous slice decoding. + //but this is one check points for error + unsigned short start_ctby; //slice start ctby + } hevc_slice_info_s; + + +//hevc slice ctx class +//slice pos and next slice address +typedef struct _slice_edge_ctb_pos_ctx_s { + unsigned int next_slice_pos_ctbxy; //2d address in raster scan + unsigned int next_slice_segment_addr; //1d address in tile scan +}slice_edge_ctb_pos_ctx_s; + +// next slice's first ctb located tile related information +typedef struct _slice_edge_tile_ctx_s { + unsigned int tileInfo1;// Misc tile info includes tile width and tile height and tile col and tile row + unsigned int tileInfo2;// Misc tile info includes tile start ctbx and start ctby and tile index + unsigned int tileInfo3;// Misc tile info includes ctb pos inside tile +} slice_edge_tile_ctx_s; + +//frame level stats +typedef struct _slice_edge_stats_ctx_s { + unsigned int frame_status_intra_cnt;// frame stats for intra block count + unsigned int frame_status_inter_cnt;// frame stats for inter block count + unsigned int frame_status_skip_cnt;// frame stats for skip block count + unsigned int frame_status_fwd_mvx_cnt;// frame stats for sum of abs fwd mvx + unsigned int frame_status_fwd_mvy_cnt;// frame stats for sum of abs fwd mvy + unsigned int frame_status_bwd_mvx_cnt;// frame stats for sum of abs bwd mvx + unsigned int frame_status_bwd_mvy_cnt;// frame stats for sum of abs bwd mvy + unsigned int frame_status_mv_cnt_ext;// extension bits of sum of abs mv to keep full precision. +}slice_edge_stats_ctx_s; + +//ctx of vpc_edge unit for tile left +typedef struct _slice_vpc_edge_ctx_s { + unsigned int reserved; +}slice_vpc_edge_ctx_s; + +//ctx of vpc_main unit +typedef struct _slice_vpc_main_ctx_s { + unsigned int reserved; +} slice_vpc_main_ctx_s; + +//hevc slice edge ctx class +typedef struct _slice_edge_ctx_s { + //ctb pos + slice_edge_ctb_pos_ctx_s slice_ctb_pos_ctx; + // stats + slice_edge_stats_ctx_s slice_stats_ctx; + // tile info + slice_edge_tile_ctx_s slice_tile_ctx; + //vpc_edge + slice_vpc_edge_ctx_s slice_vpc_edge_ctx; + //vpc_main + slice_vpc_main_ctx_s slice_vpc_main_ctx; +} slice_edge_ctx_s; + +typedef struct _nvdec_hevc_scaling_list_s { + unsigned char ScalingListDCCoeff16x16[6]; + unsigned char ScalingListDCCoeff32x32[2]; + unsigned char reserved0[8]; + + unsigned char ScalingList4x4[6][16]; + unsigned char ScalingList8x8[6][64]; + unsigned char ScalingList16x16[6][64]; + unsigned char ScalingList32x32[2][64]; +} nvdec_hevc_scaling_list_s; + + +//vp9 + +typedef struct _nvdec_vp9_pic_v1_s +{ + // New fields + // new_var : xx; // for variables with expanded bitlength, comment on why the new bit legth is required + // Reserved bits for padding and/or non-HW specific functionality + unsigned int Vp9FltAboveOffset; // filter above offset respect to filter buffer, 256 bytes unit + unsigned int external_ref_mem_dis : 1; + unsigned int bit_depth : 4; + unsigned int error_recovery_start_pos : 2; //0: from start of frame, 1: from start of slice segment, 2: from error detected ctb, 3: reserved + unsigned int error_external_mv_en : 1; + unsigned int Reserved0 : 24; +} nvdec_vp9_pic_v1_s; + +enum VP9_FRAME_SFC_ID +{ + VP9_LAST_FRAME_SFC = 0, + VP9_GOLDEN_FRAME_SFC, + VP9_ALTREF_FRAME_SFC, + VP9_CURR_FRAME_SFC +}; + +typedef struct _nvdec_vp9_pic_s +{ + // vp9_bitstream_data_info + //Key and IV address must 128bit alignment + unsigned int wrapped_session_key[4]; //session keys + unsigned int wrapped_content_key[4]; //content keys + unsigned int initialization_vector[4]; //Ctrl64 initial vector + unsigned int stream_len; // stream length in one frame + unsigned int enable_encryption; // flag to enable/disable encryption + unsigned int key_increment : 6; // added to content key after unwrapping + unsigned int encryption_mode : 4; + unsigned int sw_hdr_skip_length :14; //vp9 skip bytes setting for secure + unsigned int key_slot_index : 4; + unsigned int ssm_en : 1; + unsigned int enable_histogram : 1; // histogram stats output enable + unsigned int reserved0 : 2; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + //general + unsigned char tileformat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned char gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char reserverd_surface_format : 3 ; + unsigned char reserved1[3]; + unsigned int Vp9BsdCtrlOffset; // bsd buffer offset respect to filter buffer ,256 bytes unit . + + + //ref_last dimensions + unsigned short ref0_width; //ref_last coded width + unsigned short ref0_height; //ref_last coded height + unsigned short ref0_stride[2]; //ref_last stride + + //ref_golden dimensions + unsigned short ref1_width; //ref_golden coded width + unsigned short ref1_height; //ref_golden coded height + unsigned short ref1_stride[2]; //ref_golden stride + + //ref_alt dimensions + unsigned short ref2_width; //ref_alt coded width + unsigned short ref2_height; //ref_alt coded height + unsigned short ref2_stride[2]; //ref_alt stride + + + /* Current frame dimensions */ + unsigned short width; //pic width + unsigned short height; //pic height + unsigned short framestride[2]; // frame buffer stride for luma and chroma + + unsigned char keyFrame :1; + unsigned char prevIsKeyFrame:1; + unsigned char resolutionChange:1; + unsigned char errorResilient:1; + unsigned char prevShowFrame:1; + unsigned char intraOnly:1; + unsigned char reserved2 : 2; + + /* DCT coefficient partitions */ + //unsigned int offsetToDctParts; + + unsigned char reserved3[3]; + //unsigned char activeRefIdx[3];//3 bits + //unsigned char refreshFrameFlags; + //unsigned char refreshEntropyProbs; + //unsigned char frameParallelDecoding; + //unsigned char resetFrameContext; + + unsigned char refFrameSignBias[4]; + char loopFilterLevel;//6 bits + char loopFilterSharpness;//3 bits + + /* Quantization parameters */ + unsigned char qpYAc; + char qpYDc; + char qpChAc; + char qpChDc; + + /* From here down, frame-to-frame persisting stuff */ + + char lossless; + char transform_mode; + char allow_high_precision_mv; + char mcomp_filter_type; + char comp_pred_mode; + char comp_fixed_ref; + char comp_var_ref[2]; + char log2_tile_columns; + char log2_tile_rows; + + /* Segment and macroblock specific values */ + unsigned char segmentEnabled; + unsigned char segmentMapUpdate; + unsigned char segmentMapTemporalUpdate; + unsigned char segmentFeatureMode; /* ABS data or delta data */ + unsigned char segmentFeatureEnable[8][4]; + short segmentFeatureData[8][4]; + char modeRefLfEnabled; + char mbRefLfDelta[4]; + char mbModeLfDelta[2]; + char reserved5; // for alignment + + // New additions + nvdec_vp9_pic_v1_s v1; + nvdec_pass2_otf_ext_s ssm; + +} nvdec_vp9_pic_s; + +#define NVDEC_VP9HWPAD(x, y) unsigned char x[y] + +typedef struct { + /* last bytes of address 41 */ + unsigned char joints[3]; + unsigned char sign[2]; + /* address 42 */ + unsigned char class0[2][1]; + unsigned char fp[2][3]; + unsigned char class0_hp[2]; + unsigned char hp[2]; + unsigned char classes[2][10]; + /* address 43 */ + unsigned char class0_fp[2][2][3]; + unsigned char bits[2][10]; + +} nvdec_nmv_context; + +typedef struct { + unsigned int joints[4]; + unsigned int sign[2][2]; + unsigned int classes[2][11]; + unsigned int class0[2][2]; + unsigned int bits[2][10][2]; + unsigned int class0_fp[2][2][4]; + unsigned int fp[2][4]; + unsigned int class0_hp[2][2]; + unsigned int hp[2][2]; + +} nvdec_nmv_context_counts; + +/* Adaptive entropy contexts, padding elements are added to have + * 256 bit aligned tables for HW access. + * Compile with TRACE_PROB_TABLES to print bases for each table. */ +typedef struct nvdec_vp9AdaptiveEntropyProbs_s +{ + /* address 32 */ + unsigned char inter_mode_prob[7][4]; + unsigned char intra_inter_prob[4]; + + /* address 33 */ + unsigned char uv_mode_prob[10][8]; + unsigned char tx8x8_prob[2][1]; + unsigned char tx16x16_prob[2][2]; + unsigned char tx32x32_prob[2][3]; + unsigned char sb_ymode_probB[4][1]; + unsigned char sb_ymode_prob[4][8]; + + /* address 37 */ + unsigned char partition_prob[2][16][4]; + + /* address 41 */ + unsigned char uv_mode_probB[10][1]; + unsigned char switchable_interp_prob[4][2]; + unsigned char comp_inter_prob[5]; + unsigned char mbskip_probs[3]; + NVDEC_VP9HWPAD(pad1, 1); + + nvdec_nmv_context nmvc; + + /* address 44 */ + unsigned char single_ref_prob[5][2]; + unsigned char comp_ref_prob[5]; + NVDEC_VP9HWPAD(pad2, 17); + + /* address 45 */ + unsigned char probCoeffs[2][2][6][6][4]; + unsigned char probCoeffs8x8[2][2][6][6][4]; + unsigned char probCoeffs16x16[2][2][6][6][4]; + unsigned char probCoeffs32x32[2][2][6][6][4]; + +} nvdec_vp9AdaptiveEntropyProbs_t; + +/* Entropy contexts */ +typedef struct nvdec_vp9EntropyProbs_s +{ + /* Default keyframe probs */ + /* Table formatted for 256b memory, probs 0to7 for all tables followed by + * probs 8toN for all tables. + * Compile with TRACE_PROB_TABLES to print bases for each table. */ + + unsigned char kf_bmode_prob[10][10][8]; + + /* Address 25 */ + unsigned char kf_bmode_probB[10][10][1]; + unsigned char ref_pred_probs[3]; + unsigned char mb_segment_tree_probs[7]; + unsigned char segment_pred_probs[3]; + unsigned char ref_scores[4]; + unsigned char prob_comppred[2]; + NVDEC_VP9HWPAD(pad1, 9); + + /* Address 29 */ + unsigned char kf_uv_mode_prob[10][8]; + unsigned char kf_uv_mode_probB[10][1]; + NVDEC_VP9HWPAD(pad2, 6); + + nvdec_vp9AdaptiveEntropyProbs_t a; /* Probs with backward adaptation */ + +} nvdec_vp9EntropyProbs_t; + +/* Counters for adaptive entropy contexts */ +typedef struct nvdec_vp9EntropyCounts_s +{ + unsigned int inter_mode_counts[7][3][2]; + unsigned int sb_ymode_counts[4][10]; + unsigned int uv_mode_counts[10][10]; + unsigned int partition_counts[16][4]; + unsigned int switchable_interp_counts[4][3]; + unsigned int intra_inter_count[4][2]; + unsigned int comp_inter_count[5][2]; + unsigned int single_ref_count[5][2][2]; + unsigned int comp_ref_count[5][2]; + unsigned int tx32x32_count[2][4]; + unsigned int tx16x16_count[2][3]; + unsigned int tx8x8_count[2][2]; + unsigned int mbskip_count[3][2]; + + nvdec_nmv_context_counts nmvcount; + + unsigned int countCoeffs[2][2][6][6][4]; + unsigned int countCoeffs8x8[2][2][6][6][4]; + unsigned int countCoeffs16x16[2][2][6][6][4]; + unsigned int countCoeffs32x32[2][2][6][6][4]; + + unsigned int countEobs[4][2][2][6][6]; + +} nvdec_vp9EntropyCounts_t; + +// Legacy codecs encryption parameters +typedef struct _nvdec_pass2_otf_s { + unsigned int wrapped_session_key[4]; // session keys + unsigned int wrapped_content_key[4]; // content keys + unsigned int initialization_vector[4];// Ctrl64 initial vector + unsigned int enable_encryption : 1; // flag to enable/disable encryption + unsigned int key_increment : 6; // added to content key after unwrapping + unsigned int encryption_mode : 4; + unsigned int key_slot_index : 4; + unsigned int ssm_en : 1; + unsigned int reserved1 :16; // reserved +} nvdec_pass2_otf_s; // 0x10 bytes + +typedef struct _nvdec_display_param_s +{ + unsigned int enableTFOutput : 1; //=1, enable dbfdma to output the display surface; if disable, then the following configure on tf is useless. + //remap for VC1 + unsigned int VC1MapYFlag : 1; + unsigned int MapYValue : 3; + unsigned int VC1MapUVFlag : 1; + unsigned int MapUVValue : 3; + //tf + unsigned int OutStride : 8; + unsigned int TilingFormat : 3; + unsigned int OutputStructure : 1; //(0=frame, 1=field) + unsigned int reserved0 :11; + int OutputTop[2]; // in units of 256 + int OutputBottom[2]; // in units of 256 + //histogram + unsigned int enableHistogram : 1; // enable histogram info collection. + unsigned int HistogramStartX :12; // start X of Histogram window + unsigned int HistogramStartY :12; // start Y of Histogram window + unsigned int reserved1 : 7; + unsigned int HistogramEndX :12; // end X of Histogram window + unsigned int HistogramEndY :12; // end y of Histogram window + unsigned int reserved2 : 8; +} nvdec_display_param_s; // size 0x1c bytes + +// H.264 +typedef struct _nvdec_dpb_entry_s // 16 bytes +{ + unsigned int index : 7; // uncompressed frame buffer index + unsigned int col_idx : 5; // index of associated co-located motion data buffer + unsigned int state : 2; // bit1(state)=1: top field used for reference, bit1(state)=1: bottom field used for reference + unsigned int is_long_term : 1; // 0=short-term, 1=long-term + unsigned int not_existing : 1; // 1=marked as non-existing + unsigned int is_field : 1; // set if unpaired field or complementary field pair + unsigned int top_field_marking : 4; + unsigned int bottom_field_marking : 4; + unsigned int output_memory_layout : 1; // Set according to picture level output NV12/NV24 setting. + unsigned int reserved : 6; + unsigned int FieldOrderCnt[2]; // : 2*32 [top/bottom] + int FrameIdx; // : 16 short-term: FrameNum (16 bits), long-term: LongTermFrameIdx (4 bits) +} nvdec_dpb_entry_s; + +typedef struct _nvdec_h264_pic_s +{ + nvdec_pass2_otf_s encryption_params; + unsigned char eos[16]; + unsigned char explicitEOSPresentFlag; + unsigned char hint_dump_en; //enable COLOMV surface dump for all frames, which includes hints of "MV/REFIDX/QP/CBP/MBPART/MBTYPE", nvbug: 200212874 + unsigned char reserved0[2]; + unsigned int stream_len; + unsigned int slice_count; + unsigned int mbhist_buffer_size; // to pass buffer size of MBHIST_BUFFER + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + // Fields from msvld_h264_seq_s + int log2_max_pic_order_cnt_lsb_minus4; + int delta_pic_order_always_zero_flag; + int frame_mbs_only_flag; + int PicWidthInMbs; + int FrameHeightInMbs; + + unsigned int tileFormat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned int gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned int reserverd_surface_format : 27; + + // Fields from msvld_h264_pic_s + int entropy_coding_mode_flag; + int pic_order_present_flag; + int num_ref_idx_l0_active_minus1; + int num_ref_idx_l1_active_minus1; + int deblocking_filter_control_present_flag; + int redundant_pic_cnt_present_flag; + int transform_8x8_mode_flag; + + // Fields from mspdec_h264_picture_setup_s + unsigned int pitch_luma; // Luma pitch + unsigned int pitch_chroma; // chroma pitch + + unsigned int luma_top_offset; // offset of luma top field in units of 256 + unsigned int luma_bot_offset; // offset of luma bottom field in units of 256 + unsigned int luma_frame_offset; // offset of luma frame in units of 256 + unsigned int chroma_top_offset; // offset of chroma top field in units of 256 + unsigned int chroma_bot_offset; // offset of chroma bottom field in units of 256 + unsigned int chroma_frame_offset; // offset of chroma frame in units of 256 + unsigned int HistBufferSize; // in units of 256 + + unsigned int MbaffFrameFlag : 1; // + unsigned int direct_8x8_inference_flag: 1; // + unsigned int weighted_pred_flag : 1; // + unsigned int constrained_intra_pred_flag:1; // + unsigned int ref_pic_flag : 1; // reference picture (nal_ref_idc != 0) + unsigned int field_pic_flag : 1; // + unsigned int bottom_field_flag : 1; // + unsigned int second_field : 1; // second field of complementary reference field + unsigned int log2_max_frame_num_minus4: 4; // (0..12) + unsigned int chroma_format_idc : 2; // + unsigned int pic_order_cnt_type : 2; // (0..2) + int pic_init_qp_minus26 : 6; // : 6 (-26..+25) + int chroma_qp_index_offset : 5; // : 5 (-12..+12) + int second_chroma_qp_index_offset : 5; // : 5 (-12..+12) + + unsigned int weighted_bipred_idc : 2; // : 2 (0..2) + unsigned int CurrPicIdx : 7; // : 7 uncompressed frame buffer index + unsigned int CurrColIdx : 5; // : 5 index of associated co-located motion data buffer + unsigned int frame_num : 16; // + unsigned int frame_surfaces : 1; // frame surfaces flag + unsigned int output_memory_layout : 1; // 0: NV12; 1:NV24. Field pair must use the same setting. + + int CurrFieldOrderCnt[2]; // : 32 [Top_Bottom], [0]=TopFieldOrderCnt, [1]=BottomFieldOrderCnt + nvdec_dpb_entry_s dpb[16]; + unsigned char WeightScale[6][4][4]; // : 6*4*4*8 in raster scan order (not zig-zag order) + unsigned char WeightScale8x8[2][8][8]; // : 2*8*8*8 in raster scan order (not zig-zag order) + + // mvc setup info, must be zero if not mvc + unsigned char num_inter_view_refs_lX[2]; // number of inter-view references + char reserved1[14]; // reserved for alignment + signed char inter_view_refidx_lX[2][16]; // DPB indices (must also be marked as long-term) + + // lossless decode (At the time of writing this manual, x264 and JM encoders, differ in Intra_8x8 reference sample filtering) + unsigned int lossless_ipred8x8_filter_enable : 1; // = 0, skips Intra_8x8 reference sample filtering, for vertical and horizontal predictions (x264 encoded streams); = 1, filter Intra_8x8 reference samples (JM encoded streams) + unsigned int qpprime_y_zero_transform_bypass_flag : 1; // determines the transform bypass mode + unsigned int reserved2 : 30; // kept for alignment; may be used for other parameters + + nvdec_display_param_s displayPara; + nvdec_pass2_otf_ext_s ssm; + +} nvdec_h264_pic_s; + +// VC-1 Scratch buffer +typedef enum _vc1_fcm_e +{ + FCM_PROGRESSIVE = 0, + FCM_FRAME_INTERLACE = 2, + FCM_FIELD_INTERLACE = 3 +} vc1_fcm_e; + +typedef enum _syntax_vc1_ptype_e +{ + PTYPE_I = 0, + PTYPE_P = 1, + PTYPE_B = 2, + PTYPE_BI = 3, //PTYPE_BI is not used to config register NV_CNVDEC_VLD_PIC_INFO_COMMON. field NV_CNVDEC_VLD_PIC_INFO_COMMON_PIC_CODING_VC1 is only 2 bits. I and BI pictures are configured with same value. Please refer to manual. + PTYPE_SKIPPED = 4 +} syntax_vc1_ptype_e; + +// 7.1.1.32, Table 46 etc. +enum vc1_mvmode_e +{ + MVMODE_MIXEDMV = 0, + MVMODE_1MV = 1, + MVMODE_1MV_HALFPEL = 2, + MVMODE_1MV_HALFPEL_BILINEAR = 3, + MVMODE_INTENSITY_COMPENSATION = 4 +}; + +// 9.1.1.42, Table 105 +typedef enum _vc1_fptype_e +{ + FPTYPE_I_I = 0, + FPTYPE_I_P, + FPTYPE_P_I, + FPTYPE_P_P, + FPTYPE_B_B, + FPTYPE_B_BI, + FPTYPE_BI_B, + FPTYPE_BI_BI +} vc1_fptype_e; + +// Table 43 (7.1.1.31.2) +typedef enum _vc1_dqprofile_e +{ + DQPROFILE_ALL_FOUR_EDGES_ = 0, + DQPROFILE_DOUBLE_EDGE_ = 1, + DQPROFILE_SINGLE_EDGE_ = 2, + DQPROFILE_ALL_MACROBLOCKS_ = 3 +} vc1_dqprofile_e; + +typedef struct _nvdec_vc1_pic_s +{ + nvdec_pass2_otf_s encryption_params; + unsigned char eos[16]; // to pass end of stream data separately if not present in bitstream surface + unsigned char prefixStartCode[4]; // used for dxva to pass prefix start code. + unsigned int bitstream_offset; // offset in words from start of bitstream surface if there is gap. + unsigned char explicitEOSPresentFlag; // to indicate that eos[] is used for passing end of stream data. + unsigned char reserved0[3]; + unsigned int stream_len; + unsigned int slice_count; + unsigned int scratch_pic_buffer_size; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + // Fields from vc1_seq_s + unsigned short FrameWidth; // actual frame width + unsigned short FrameHeight; // actual frame height + + unsigned char profile; // 1 = SIMPLE or MAIN, 2 = ADVANCED + unsigned char postprocflag; + unsigned char pulldown; + unsigned char interlace; + + unsigned char tfcntrflag; + unsigned char finterpflag; + unsigned char psf; + unsigned char tileFormat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned char gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char reserverd_surface_format : 3 ; + + // simple,main + unsigned char multires; + unsigned char syncmarker; + unsigned char rangered; + unsigned char maxbframes; + + // Fields from vc1_entrypoint_s + unsigned char dquant; + unsigned char panscan_flag; + unsigned char refdist_flag; + unsigned char quantizer; + + unsigned char extended_mv; + unsigned char extended_dmv; + unsigned char overlap; + unsigned char vstransform; + + // Fields from vc1_scratch_s + char refdist; + char reserved1[3]; // for alignment + + // Fields from vld_vc1_pic_s + vc1_fcm_e fcm; + syntax_vc1_ptype_e ptype; + int tfcntr; + int rptfrm; + int tff; + int rndctrl; + int pqindex; + int halfqp; + int pquantizer; + int postproc; + int condover; + int transacfrm; + int transacfrm2; + int transdctab; + int pqdiff; + int abspq; + int dquantfrm; + vc1_dqprofile_e dqprofile; + int dqsbedge; + int dqdbedge; + int dqbilevel; + int mvrange; + enum vc1_mvmode_e mvmode; + enum vc1_mvmode_e mvmode2; + int lumscale; + int lumshift; + int mvtab; + int cbptab; + int ttmbf; + int ttfrm; + int bfraction; + vc1_fptype_e fptype; + int numref; + int reffield; + int dmvrange; + int intcompfield; + int lumscale1; // type was char in ucode + int lumshift1; // type was char in ucode + int lumscale2; // type was char in ucode + int lumshift2; // type was char in ucode + int mbmodetab; + int imvtab; + int icbptab; + int fourmvbptab; + int fourmvswitch; + int intcomp; + int twomvbptab; + // simple,main + int rangeredfrm; + + // Fields from pdec_vc1_pic_s + unsigned int HistBufferSize; // in units of 256 + // frame buffers + unsigned int FrameStride[2]; // [y_c] + unsigned int luma_top_offset; // offset of luma top field in units of 256 + unsigned int luma_bot_offset; // offset of luma bottom field in units of 256 + unsigned int luma_frame_offset; // offset of luma frame in units of 256 + unsigned int chroma_top_offset; // offset of chroma top field in units of 256 + unsigned int chroma_bot_offset; // offset of chroma bottom field in units of 256 + unsigned int chroma_frame_offset; // offset of chroma frame in units of 256 + + unsigned short CodedWidth; // entrypoint specific + unsigned short CodedHeight; // entrypoint specific + + unsigned char loopfilter; // entrypoint specific + unsigned char fastuvmc; // entrypoint specific + unsigned char output_memory_layout; // picture specific + unsigned char ref_memory_layout[2]; // picture specific 0: fwd, 1: bwd + unsigned char reserved3[3]; // for alignment + + nvdec_display_param_s displayPara; + nvdec_pass2_otf_ext_s ssm; + +} nvdec_vc1_pic_s; + +// MPEG-2 +typedef struct _nvdec_mpeg2_pic_s +{ + nvdec_pass2_otf_s encryption_params; + unsigned char eos[16]; + unsigned char explicitEOSPresentFlag; + unsigned char reserved0[3]; + unsigned int stream_len; + unsigned int slice_count; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + // Fields from vld_mpeg2_seq_pic_info_s + short FrameWidth; // actual frame width + short FrameHeight; // actual frame height + unsigned char picture_structure; // 0 => Reserved, 1 => Top field, 2 => Bottom field, 3 => Frame picture. Table 6-14. + unsigned char picture_coding_type; // 0 => Forbidden, 1 => I, 2 => P, 3 => B, 4 => D (for MPEG-2). Table 6-12. + unsigned char intra_dc_precision; // 0 => 8 bits, 1=> 9 bits, 2 => 10 bits, 3 => 11 bits. Table 6-13. + char frame_pred_frame_dct; // as in section 6.3.10 + char concealment_motion_vectors; // as in section 6.3.10 + char intra_vlc_format; // as in section 6.3.10 + unsigned char tileFormat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned char gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char reserverd_surface_format : 3 ; + + char reserved1; // always 0 + char f_code[4]; // as in section 6.3.10 + + // Fields from pdec_mpeg2_picture_setup_s + unsigned short PicWidthInMbs; + unsigned short FrameHeightInMbs; + unsigned int pitch_luma; + unsigned int pitch_chroma; + unsigned int luma_top_offset; + unsigned int luma_bot_offset; + unsigned int luma_frame_offset; + unsigned int chroma_top_offset; + unsigned int chroma_bot_offset; + unsigned int chroma_frame_offset; + unsigned int HistBufferSize; + unsigned short output_memory_layout; + unsigned short alternate_scan; + unsigned short secondfield; + /******************************/ + // Got rid of the union kept for compatibility with NVDEC1. + // Removed field mpeg2, and kept rounding type. + // NVDEC1 ucode is not using the mpeg2 field, instead using codec type from the methods. + // Rounding type should only be set for Divx3.11. + unsigned short rounding_type; + /******************************/ + unsigned int MbInfoSizeInBytes; + unsigned int q_scale_type; + unsigned int top_field_first; + unsigned int full_pel_fwd_vector; + unsigned int full_pel_bwd_vector; + unsigned char quant_mat_8x8intra[64]; + unsigned char quant_mat_8x8nonintra[64]; + unsigned int ref_memory_layout[2]; //0:for fwd; 1:for bwd + + nvdec_display_param_s displayPara; + nvdec_pass2_otf_ext_s ssm; + +} nvdec_mpeg2_pic_s; + +// MPEG-4 +typedef struct _nvdec_mpeg4_pic_s +{ + nvdec_pass2_otf_s encryption_params; + unsigned char eos[16]; + unsigned char explicitEOSPresentFlag; + unsigned char reserved2[3]; // for alignment + unsigned int stream_len; + unsigned int slice_count; + unsigned int scratch_pic_buffer_size; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + // Fields from vld_mpeg4_seq_s + short FrameWidth; // :13 video_object_layer_width + short FrameHeight; // :13 video_object_layer_height + char vop_time_increment_bitcount; // : 5 1..16 + char resync_marker_disable; // : 1 + unsigned char tileFormat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned char gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char reserverd_surface_format : 3 ; + char reserved3; // for alignment + + // Fields from pdec_mpeg4_picture_setup_s + int width; // : 13 + int height; // : 13 + + unsigned int FrameStride[2]; // [y_c] + unsigned int luma_top_offset; // offset of luma top field in units of 256 + unsigned int luma_bot_offset; // offset of luma bottom field in units of 256 + unsigned int luma_frame_offset; // offset of luma frame in units of 256 + unsigned int chroma_top_offset; // offset of chroma top field in units of 256 + unsigned int chroma_bot_offset; // offset of chroma bottom field in units of 256 + unsigned int chroma_frame_offset; // offset of chroma frame in units of 256 + + unsigned int HistBufferSize; // in units of 256, History buffer size + + int trd[2]; // : 16, temporal reference frame distance (only needed for B-VOPs) + int trb[2]; // : 16, temporal reference B-VOP distance from fwd reference frame (only needed for B-VOPs) + + int divx_flags; // : 16 (bit 0: DivX interlaced chroma rounding, bit 1: Divx 4 boundary padding, bit 2: Divx IDCT) + + short vop_fcode_forward; // : 1...7 + short vop_fcode_backward; // : 1...7 + + unsigned char interlaced; // : 1 + unsigned char quant_type; // : 1 + unsigned char quarter_sample; // : 1 + unsigned char short_video_header; // : 1 + + unsigned char curr_output_memory_layout; // : 1 0:NV12; 1:NV24 + unsigned char ptype; // picture type: 0 for PTYPE_I, 1 for PTYPE_P, 2 for PTYPE_B, 3 for PTYPE_BI, 4 for PTYPE_SKIPPED + unsigned char rnd; // : 1, rounding mode + unsigned char alternate_vertical_scan_flag; // : 1 + + unsigned char top_field_flag; // : 1 + unsigned char reserved0[3]; // alignment purpose + + unsigned char intra_quant_mat[64]; // : 64*8 + unsigned char nonintra_quant_mat[64]; // : 64*8 + unsigned char ref_memory_layout[2]; //0:for fwd; 1:for bwd + unsigned char reserved1[34]; // 256 byte alignemnt till now + + nvdec_display_param_s displayPara; + +} nvdec_mpeg4_pic_s; + +// VP8 +enum VP8_FRAME_TYPE +{ + VP8_KEYFRAME = 0, + VP8_INTERFRAME = 1 +}; + +enum VP8_FRAME_SFC_ID +{ + VP8_GOLDEN_FRAME_SFC = 0, + VP8_ALTREF_FRAME_SFC, + VP8_LAST_FRAME_SFC, + VP8_CURR_FRAME_SFC +}; + +typedef struct _nvdec_vp8_pic_s +{ + nvdec_pass2_otf_s encryption_params; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + unsigned short FrameWidth; // actual frame width + unsigned short FrameHeight; // actual frame height + + unsigned char keyFrame; // 1: key frame; 0: not + unsigned char version; + unsigned char tileFormat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned char gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char reserverd_surface_format : 3 ; + unsigned char errorConcealOn; // 1: error conceal on; 0: off + + unsigned int firstPartSize; // the size of first partition(frame header and mb header partition) + + // ctx + unsigned int HistBufferSize; // in units of 256 + unsigned int VLDBufferSize; // in units of 1 + // current frame buffers + unsigned int FrameStride[2]; // [y_c] + unsigned int luma_top_offset; // offset of luma top field in units of 256 + unsigned int luma_bot_offset; // offset of luma bottom field in units of 256 + unsigned int luma_frame_offset; // offset of luma frame in units of 256 + unsigned int chroma_top_offset; // offset of chroma top field in units of 256 + unsigned int chroma_bot_offset; // offset of chroma bottom field in units of 256 + unsigned int chroma_frame_offset; // offset of chroma frame in units of 256 + + nvdec_display_param_s displayPara; + + // decode picture buffere related + char current_output_memory_layout; + char output_memory_layout[3]; // output NV12/NV24 setting. item 0:golden; 1: altref; 2: last + + unsigned char segmentation_feature_data_update; + unsigned char reserved1[3]; + + // ucode return result + unsigned int resultValue; // ucode return the picture header info; includes copy_buffer_to_golden etc. + unsigned int partition_offset[8]; // byte offset to each token partition (used for encrypted streams only) + + nvdec_pass2_otf_ext_s ssm; + +} nvdec_vp8_pic_s; // size is 0xc0 + +// PASS1 + +//Sample means the entire frame is encrypted with a single IV, and subsample means a given frame may be encrypted in multiple chunks with different IVs. +#define NUM_SUBSAMPLES 32 + +typedef struct _bytes_of_data_s +{ + unsigned int clear_bytes; // clear bytes per subsample + unsigned int encypted_bytes; // encrypted bytes per subsample + +} bytes_of_data_s; + +typedef struct _nvdec_pass1_input_data_s +{ + bytes_of_data_s sample_size[NUM_SUBSAMPLES]; // clear/encrypted bytes per subsample + unsigned int initialization_vector[NUM_SUBSAMPLES][4]; // Ctrl64 initial vector per subsample + unsigned char IvValid[NUM_SUBSAMPLES]; // each element will tell whether IV is valid for that subsample or not. + unsigned int stream_len; // encrypted bitstream size. + unsigned int clearBufferSize; // allocated size of clear buffer size + unsigned int reencryptBufferSize; // allocated size of reencrypted buffer size + unsigned int vp8coeffPartitonBufferSize; // allocated buffer for vp8 coeff partition buffer + unsigned int PrevWidth; // required for VP9 + unsigned int num_nals :16; // number of subsamples in a frame + unsigned int drm_mode : 8; // DRM mode + unsigned int key_sel : 4; // key select from keyslot + unsigned int codec : 4; // codecs selection + unsigned int TotalSizeOfClearData; // Used with Pattern based encryption + unsigned int SliceHdrOffset; // This is used with pattern mode encryption where data before slice hdr comes in clear. + unsigned int EncryptBlkCnt :16; + unsigned int SkipBlkCnt :16; +} nvdec_pass1_input_data_s; + +#define VP8_MAX_TOKEN_PARTITIONS 8 +#define VP9_MAX_FRAMES_IN_SUPERFRAME 8 + +typedef struct _nvdec_pass1_output_data_s +{ + unsigned int clear_header_size; // h264/vc1/mpeg2/vp8, decrypted pps/sps/part of slice header info, 128 bits aligned + unsigned int reencrypt_data_size; // h264/vc1/mpeg2, slice level data, vp8 mb header info, 128 bits aligned + unsigned int clear_token_data_size; // vp8, clear token data saved in VPR, 128 bits aligned + unsigned int key_increment : 6; // added to content key after unwrapping + unsigned int encryption_mode : 4; // encryption mode + unsigned int bReEncrypted : 1; // set to 0 if no re-encryption is done. + unsigned int bvp9SuperFrame : 1; // set to 1 for vp9 superframe + unsigned int vp9NumFramesMinus1 : 3; // set equal to numFrames-1 for vp9superframe. Max 8 frames are possible in vp9 superframe. + unsigned int reserved1 :17; // reserved, 32 bit alignment + unsigned int wrapped_session_key[4]; // session keys + unsigned int wrapped_content_key[4]; // content keys + unsigned int initialization_vector[4]; // Ctrl64 initial vector + union { + unsigned int partition_size[VP8_MAX_TOKEN_PARTITIONS]; // size of each token partition (used for encrypted streams of VP8) + unsigned int vp9_frame_sizes[VP9_MAX_FRAMES_IN_SUPERFRAME]; // frame size information for all frames in vp9 superframe. + }; + unsigned int vp9_clear_hdr_size[VP9_MAX_FRAMES_IN_SUPERFRAME]; // clear header size for each frame in vp9 superframe. +} nvdec_pass1_output_data_s; + + +/***************************************************** + AV1 +*****************************************************/ +typedef struct _scale_factors_reference_s{ + short x_scale_fp; // horizontal fixed point scale factor + short y_scale_fp; // vertical fixed point scale factor +}scale_factors_reference_s; + +typedef struct _frame_info_t{ + unsigned short width; // in pixel, av1 support arbitray resolution + unsigned short height; + unsigned short stride[2]; // luma and chroma stride in 16Bytes + unsigned int frame_buffer_idx; // TBD :clean associate the reference frame and frame buffer id to lookup base_addr +} frame_info_t; + +typedef struct _ref_frame_struct_s{ + frame_info_t info; + scale_factors_reference_s sf; // scalefactor for reference frame and current frame size, driver can calculate it + unsigned char sign_bias : 1; // calcuate based on frame_offset and current frame offset + unsigned char wmtype : 2; // global motion parameters : identity,translation,rotzoom,affine + unsigned char reserved_rf : 5; + short frame_off; // relative offset to current frame + short roffset; // relative offset from current frame +} ref_frame_struct_s; + +typedef struct _av1_fgs_cfg_t{ + //from AV1 spec 5.9.30 Film Grain Params syntax + unsigned short apply_grain : 1; + unsigned short overlap_flag : 1; + unsigned short clip_to_restricted_range : 1; + unsigned short chroma_scaling_from_luma : 1; + unsigned short num_y_points_b : 1; // flag indicates num_y_points>0 + unsigned short num_cb_points_b : 1; // flag indicates num_cb_points>0 + unsigned short num_cr_points_b : 1; // flag indicates num_cr_points>0 + unsigned short scaling_shift : 4; + unsigned short reserved_fgs : 5; + unsigned short sw_random_seed; + short cb_offset; + short cr_offset; + char cb_mult; + char cb_luma_mult; + char cr_mult; + char cr_luma_mult; +} av1_fgs_cfg_t; + + +typedef struct _nvdec_av1_pic_s +{ + nvdec_pass2_otf_s encryption_params; + + nvdec_pass2_otf_ext_s ssm; + + av1_fgs_cfg_t fgs_cfg; + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + unsigned int stream_len; // stream length. + unsigned int reserved12; // skip bytes length to real frame data . + + //sequence header + unsigned int use_128x128_superblock : 1; // superblock 128x128 or 64x64, 0:64x64, 1: 128x128 + unsigned int chroma_format : 2; // 1:420, others:reserved for future + unsigned int bit_depth : 4; // bitdepth + unsigned int enable_filter_intra : 1; // tool enable in seq level, 0 : disable 1: frame header control + unsigned int enable_intra_edge_filter : 1; + unsigned int enable_interintra_compound : 1; + unsigned int enable_masked_compound : 1; + unsigned int enable_dual_filter : 1; // enable or disable vertical and horiz filter selection + unsigned int reserved10 : 1; // 0 - disable order hint, and related tools + unsigned int reserved0 : 3; + unsigned int enable_jnt_comp : 1; // 0 - disable joint compound modes + unsigned int reserved1 : 1; + unsigned int enable_cdef : 1; + unsigned int reserved11 : 1; + unsigned int enable_fgs : 1; + unsigned int enable_substream_decoding : 1; //enable frame substream kickoff mode without context switch + unsigned int reserved2 : 10; // reserved bits + + //frame header + unsigned int frame_type : 2; // 0:Key frame, 1:Inter frame, 2:intra only, 3:s-frame + unsigned int show_frame : 1; // show frame flag + unsigned int reserved13 : 1; + unsigned int disable_cdf_update : 1; // disable CDF update during symbol decoding + unsigned int allow_screen_content_tools : 1; // screen content tool enable + unsigned int cur_frame_force_integer_mv : 1; // AMVR enable + unsigned int scale_denom_minus9 : 3; // The denominator minus9 of the superres scale + unsigned int allow_intrabc : 1; // IBC enable + unsigned int allow_high_precision_mv : 1; // 1/8 precision mv enable + unsigned int interp_filter : 3; // interpolation filter : EIGHTTAP_REGULAR,.... + unsigned int switchable_motion_mode : 1; // 0: simple motion mode, 1: SIMPLE, OBMC, LOCAL WARP + unsigned int use_ref_frame_mvs : 1; // 1: current frame can use the previous frame mv information, MFMV + unsigned int refresh_frame_context : 1; // backward update flag + unsigned int delta_q_present_flag : 1; // quantizer index delta values are present in the block level + unsigned int delta_q_res : 2; // left shift will apply to decoded quantizer index delta values + unsigned int delta_lf_present_flag : 1; // specified whether loop filter delta values are present in the block level + unsigned int delta_lf_res : 2; // specifies the left shift will apply to decoded loop filter values + unsigned int delta_lf_multi : 1; // seperate loop filter deltas for Hy,Vy,U,V edges + unsigned int reserved3 : 1; + unsigned int coded_lossless : 1; // 1 means all segments use lossless coding. Frame is fully lossless, CDEF/DBF will disable + unsigned int tile_enabled : 1; // tile enable + unsigned int reserved4 : 2; + unsigned int superres_is_scaled : 1; // frame level frame for using_superres + unsigned int reserved_fh : 1; + + unsigned int tile_cols : 8; // horizontal tile numbers in frame, max is 64 + unsigned int tile_rows : 8; // vertical tile numbers in frame, max is 64 + unsigned int context_update_tile_id : 16; // which tile cdf will be seleted as the backward update CDF, MAXTILEROW=64, MAXTILECOL=64, 12bits + + unsigned int cdef_damping_minus_3 : 2; // controls the amount of damping in the deringing filter + unsigned int cdef_bits : 2; // the number of bits needed to specify which CDEF filter to apply + unsigned int frame_tx_mode : 3; // 0:ONLY4x4,3:LARGEST,4:SELECT + unsigned int frame_reference_mode : 2; // single,compound,select + unsigned int skip_mode_flag : 1; // skip mode + unsigned int skip_ref0 : 4; + unsigned int skip_ref1 : 4; + unsigned int allow_warp : 1; // sequence level & frame level warp enable + unsigned int reduced_tx_set_used : 1; // whether the frame is restricted to oa reduced subset of the full set of transform types + unsigned int ref_scaling_enable : 1; + unsigned int reserved5 : 1; + unsigned int reserved6 : 10; // reserved bits + unsigned short superres_upscaled_width; // upscale width, frame_size_with_refs() syntax,restoration will use it + unsigned short superres_luma_step; + unsigned short superres_chroma_step; + unsigned short superres_init_luma_subpel_x; + unsigned short superres_init_chroma_subpel_x; + + /*frame header qp information*/ + unsigned char base_qindex; // the maximum qp is 255 + char y_dc_delta_q; + char u_dc_delta_q; + char v_dc_delta_q; + char u_ac_delta_q; + char v_ac_delta_q; + unsigned char qm_y; // 4bit: 0-15 + unsigned char qm_u; + unsigned char qm_v; + + /*cdef, need to update in the new spec*/ + unsigned int cdef_y_pri_strength; // 4bit for one, max is 8 + unsigned int cdef_uv_pri_strength; // 4bit for one, max is 8 + unsigned int cdef_y_sec_strength : 16; // 2bit for one, max is 8 + unsigned int cdef_uv_sec_strength : 16; // 2bit for one, max is 8 + + /*segmentation*/ + unsigned char segment_enabled; + unsigned char segment_update_map; + unsigned char reserved7; + unsigned char segment_temporal_update; + short segment_feature_data[8][8]; + unsigned char last_active_segid; // The highest numbered segment id that has some enabled feature. + unsigned char segid_preskip; // Whether the segment id will be read before the skip syntax element. + // 1: the segment id will be read first. + // 0: the skip syntax element will be read first. + unsigned char prevsegid_flag; // 1 : previous segment id is available + unsigned char segment_quant_sign : 8; // sign bit for segment alternative QP + + /*loopfilter*/ + unsigned char filter_level[2]; + unsigned char filter_level_u; + unsigned char filter_level_v; + unsigned char lf_sharpness_level; + char lf_ref_deltas[8]; // 0 = Intra, Last, Last2+Last3, GF, BRF, ARF2, ARF + char lf_mode_deltas[2]; // 0 = ZERO_MV, MV + + /*restoration*/ + unsigned char lr_type ; // restoration type. Y:bit[1:0];U:bit[3:2],V:bit[5:4] + unsigned char lr_unit_size; // restoration unit size 0:32x32, 1:64x64, 2:128x128,3:256x256; Y:bit[1:0];U:bit[3:2],V:bit[5:4] + + //general + frame_info_t current_frame; + ref_frame_struct_s ref_frame[7]; // Last, Last2, Last3, Golden, BWDREF, ALTREF2, ALTREF + + unsigned int use_temporal0_mvs : 1; + unsigned int use_temporal1_mvs : 1; + unsigned int use_temporal2_mvs : 1; + unsigned int mf1_type : 3; + unsigned int mf2_type : 3; + unsigned int mf3_type : 3; + unsigned int reserved_mfmv : 20; + + short mfmv_offset[3][7]; // 3: mf0~2, 7: Last, Last2, Last3, Golden, BWDREF, ALTREF2, ALTREF + char mfmv_side[3][7]; // flag for reverse offset great than 0 + // MFMV relative offset from the ref frame(reference to reference relative offset) + + unsigned char tileformat : 2; // 0: TBL; 1: KBL; + unsigned char gob_height : 3; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned char errorConcealOn : 1; // this field is not used, use ctrl_param.error_conceal_on to enable error concealment in ucode, + // always set NV_CNVDEC_GIP_ERR_CONCEAL_CTRL_ON = 1 to enable error detect in hw + unsigned char reserver8 : 2; // reserve + + unsigned char stream_error_detection : 1; + unsigned char mv_error_detection : 1; + unsigned char coeff_error_detection : 1; + unsigned char reserved_eh : 5; + + // Filt neighbor buffer offset + unsigned int Av1FltTopOffset; // filter top buffer offset respect to filter buffer, 256 bytes unit + unsigned int Av1FltVertOffset; // filter vertical buffer offset respect to filter buffer, 256 bytes unit + unsigned int Av1CdefVertOffset; // cdef vertical buffer offset respect to filter buffer, 256 bytes unit + unsigned int Av1LrVertOffset; // lr vertical buffer offset respect to filter buffer, 256 bytes unit + unsigned int Av1HusVertOffset; // hus vertical buffer offset respect to filter buffer, 256 bytes unit + unsigned int Av1FgsVertOffset; // fgs vertical buffer offset respect to filter buffer, 256 bytes unit + + unsigned int enable_histogram : 1; + unsigned int sw_skip_start_length : 14; //skip start length + unsigned int reserved_stat : 17; + +} nvdec_av1_pic_s; + +////////////////////////////////////////////////////////////////////// +// AV1 Buffer structure +////////////////////////////////////////////////////////////////////// +typedef struct _AV1FilmGrainMemory + { + unsigned char scaling_lut_y[256]; + unsigned char scaling_lut_cb[256]; + unsigned char scaling_lut_cr[256]; + short cropped_luma_grain_block[4096]; + short cropped_cb_grain_block[1024]; + short cropped_cr_grain_block[1024]; +} AV1FilmGrainMemory; + +typedef struct _AV1TileInfo_OLD +{ + unsigned char width_in_sb; + unsigned char height_in_sb; + unsigned char tile_start_b0; + unsigned char tile_start_b1; + unsigned char tile_start_b2; + unsigned char tile_start_b3; + unsigned char tile_end_b0; + unsigned char tile_end_b1; + unsigned char tile_end_b2; + unsigned char tile_end_b3; + unsigned char padding[6]; +} AV1TileInfo_OLD; + +typedef struct _AV1TileInfo +{ + unsigned char width_in_sb; + unsigned char padding_w; + unsigned char height_in_sb; + unsigned char padding_h; +} AV1TileInfo; + +typedef struct _AV1TileStreamInfo +{ + unsigned int tile_start; + unsigned int tile_end; + unsigned char padding[8]; +} AV1TileStreamInfo; + + +// AV1 TileSize buffer +#define AV1_MAX_TILES 256 +#define AV1_TILEINFO_BUF_SIZE_OLD NVDEC_ALIGN(AV1_MAX_TILES * sizeof(AV1TileInfo_OLD)) +#define AV1_TILEINFO_BUF_SIZE NVDEC_ALIGN(AV1_MAX_TILES * sizeof(AV1TileInfo)) + +// AV1 TileStreamInfo buffer +#define AV1_TILESTREAMINFO_BUF_SIZE NVDEC_ALIGN(AV1_MAX_TILES * sizeof(AV1TileStreamInfo)) + +// AV1 SubStreamEntry buffer +#define MAX_SUBSTREAM_ENTRY_SIZE 32 +#define AV1_SUBSTREAM_ENTRY_BUF_SIZE NVDEC_ALIGN(MAX_SUBSTREAM_ENTRY_SIZE * sizeof(nvdec_substream_entry_s)) + +// AV1 FilmGrain Parameter buffer +#define AV1_FGS_BUF_SIZE NVDEC_ALIGN(sizeof(AV1FilmGrainMemory)) + +// AV1 Temporal MV buffer +#define AV1_TEMPORAL_MV_SIZE_IN_64x64 256 // 4Bytes for 8x8 +#define AV1_TEMPORAL_MV_BUF_SIZE(w, h) ALIGN_UP( ALIGN_UP(w,128) * ALIGN_UP(h,128) / (64*64) * AV1_TEMPORAL_MV_SIZE_IN_64x64, 4096) + +// AV1 SegmentID buffer +#define AV1_SEGMENT_ID_SIZE_IN_64x64 128 // (3bits + 1 pad_bits) for 4x4 +#define AV1_SEGMENT_ID_BUF_SIZE(w, h) ALIGN_UP( ALIGN_UP(w,128) * ALIGN_UP(h,128) / (64*64) * AV1_SEGMENT_ID_SIZE_IN_64x64, 4096) + +// AV1 Global Motion buffer +#define AV1_GLOBAL_MOTION_BUF_SIZE NVDEC_ALIGN(7*32) + +// AV1 Intra Top buffer +#define AV1_INTRA_TOP_BUF_SIZE NVDEC_ALIGN(8*8192) + +// AV1 Histogram buffer +#define AV1_HISTOGRAM_BUF_SIZE NVDEC_ALIGN(1024) + +// AV1 Filter FG buffer +#define AV1_DBLK_TOP_SIZE_IN_SB64 ALIGN_UP(1920, 128) +#define AV1_DBLK_TOP_BUF_SIZE(w) NVDEC_ALIGN( (ALIGN_UP(w,64)/64 + 2) * AV1_DBLK_TOP_SIZE_IN_SB64) + +#define AV1_DBLK_LEFT_SIZE_IN_SB64 ALIGN_UP(1536, 128) +#define AV1_DBLK_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_DBLK_LEFT_SIZE_IN_SB64) + +#define AV1_CDEF_LEFT_SIZE_IN_SB64 ALIGN_UP(1792, 128) +#define AV1_CDEF_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_CDEF_LEFT_SIZE_IN_SB64) + +#define AV1_HUS_LEFT_SIZE_IN_SB64 ALIGN_UP(12544, 128) +#define AV1_ASIC_HUS_LEFT_BUFFER_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_HUS_LEFT_SIZE_IN_SB64) +#define AV1_HUS_LEFT_BUF_SIZE(h) 2*AV1_ASIC_HUS_LEFT_BUFFER_SIZE(h) // Ping-Pong buffers + +#define AV1_LR_LEFT_SIZE_IN_SB64 ALIGN_UP(1920, 128) +#define AV1_LR_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_LR_LEFT_SIZE_IN_SB64) + +#define AV1_FGS_LEFT_SIZE_IN_SB64 ALIGN_UP(320, 128) +#define AV1_FGS_LEFT_BUF_SIZE(h) NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_FGS_LEFT_SIZE_IN_SB64) + +// AV1 Hint Dump Buffer +#define AV1_HINT_DUMP_SIZE_IN_SB64 ((64*64)/(4*4)*8) // 8 bytes per CU, 256 CUs(2048 bytes) per SB64 +#define AV1_HINT_DUMP_SIZE_IN_SB128 ((128*128)/(4*4)*8) // 8 bytes per CU,1024 CUs(8192 bytes) per SB128 +#define AV1_HINT_DUMP_SIZE(w, h) NVDEC_ALIGN(AV1_HINT_DUMP_SIZE_IN_SB128*((w+127)/128)*((h+127)/128)) // always use SB128 for allocation + + +/******************************************************************* + New H264 +********************************************************************/ +typedef struct _nvdec_new_h264_pic_s +{ + nvdec_pass2_otf_s encryption_params; + unsigned char eos[16]; + unsigned char explicitEOSPresentFlag; + unsigned char hint_dump_en; //enable COLOMV surface dump for all frames, which includes hints of "MV/REFIDX/QP/CBP/MBPART/MBTYPE", nvbug: 200212874 + unsigned char reserved0[2]; + unsigned int stream_len; + unsigned int slice_count; + unsigned int mbhist_buffer_size; // to pass buffer size of MBHIST_BUFFER + + // Driver may or may not use based upon need. + // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode. + // Driver can send this value based upon resolution using the formula: + // gptimer_timeout_value = 3 * (cycles required for one frame) + unsigned int gptimer_timeout_value; + + // Fields from msvld_h264_seq_s + int log2_max_pic_order_cnt_lsb_minus4; + int delta_pic_order_always_zero_flag; + int frame_mbs_only_flag; + int PicWidthInMbs; + int FrameHeightInMbs; + + unsigned int tileFormat : 2 ; // 0: TBL; 1: KBL; 2: Tile16x16 + unsigned int gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards) + unsigned int reserverd_surface_format : 27; + + // Fields from msvld_h264_pic_s + int entropy_coding_mode_flag; + int pic_order_present_flag; + int num_ref_idx_l0_active_minus1; + int num_ref_idx_l1_active_minus1; + int deblocking_filter_control_present_flag; + int redundant_pic_cnt_present_flag; + int transform_8x8_mode_flag; + + // Fields from mspdec_h264_picture_setup_s + unsigned int pitch_luma; // Luma pitch + unsigned int pitch_chroma; // chroma pitch + + unsigned int luma_top_offset; // offset of luma top field in units of 256 + unsigned int luma_bot_offset; // offset of luma bottom field in units of 256 + unsigned int luma_frame_offset; // offset of luma frame in units of 256 + unsigned int chroma_top_offset; // offset of chroma top field in units of 256 + unsigned int chroma_bot_offset; // offset of chroma bottom field in units of 256 + unsigned int chroma_frame_offset; // offset of chroma frame in units of 256 + unsigned int HistBufferSize; // in units of 256 + + unsigned int MbaffFrameFlag : 1; // + unsigned int direct_8x8_inference_flag: 1; // + unsigned int weighted_pred_flag : 1; // + unsigned int constrained_intra_pred_flag:1; // + unsigned int ref_pic_flag : 1; // reference picture (nal_ref_idc != 0) + unsigned int field_pic_flag : 1; // + unsigned int bottom_field_flag : 1; // + unsigned int second_field : 1; // second field of complementary reference field + unsigned int log2_max_frame_num_minus4: 4; // (0..12) + unsigned int chroma_format_idc : 2; // + unsigned int pic_order_cnt_type : 2; // (0..2) + int pic_init_qp_minus26 : 6; // : 6 (-26..+25) + int chroma_qp_index_offset : 5; // : 5 (-12..+12) + int second_chroma_qp_index_offset : 5; // : 5 (-12..+12) + + unsigned int weighted_bipred_idc : 2; // : 2 (0..2) + unsigned int CurrPicIdx : 7; // : 7 uncompressed frame buffer index + unsigned int CurrColIdx : 5; // : 5 index of associated co-located motion data buffer + unsigned int frame_num : 16; // + unsigned int frame_surfaces : 1; // frame surfaces flag + unsigned int output_memory_layout : 1; // 0: NV12; 1:NV24. Field pair must use the same setting. + + int CurrFieldOrderCnt[2]; // : 32 [Top_Bottom], [0]=TopFieldOrderCnt, [1]=BottomFieldOrderCnt + nvdec_dpb_entry_s dpb[16]; + unsigned char WeightScale[6][4][4]; // : 6*4*4*8 in raster scan order (not zig-zag order) + unsigned char WeightScale8x8[2][8][8]; // : 2*8*8*8 in raster scan order (not zig-zag order) + + // mvc setup info, must be zero if not mvc + unsigned char num_inter_view_refs_lX[2]; // number of inter-view references + char reserved1[14]; // reserved for alignment + signed char inter_view_refidx_lX[2][16]; // DPB indices (must also be marked as long-term) + + // lossless decode (At the time of writing this manual, x264 and JM encoders, differ in Intra_8x8 reference sample filtering) + unsigned int lossless_ipred8x8_filter_enable : 1; // = 0, skips Intra_8x8 reference sample filtering, for vertical and horizontal predictions (x264 encoded streams); = 1, filter Intra_8x8 reference samples (JM encoded streams) + unsigned int qpprime_y_zero_transform_bypass_flag : 1; // determines the transform bypass mode + unsigned int reserved2 : 30; // kept for alignment; may be used for other parameters + + nvdec_display_param_s displayPara; + nvdec_pass2_otf_ext_s ssm; + +} nvdec_new_h264_pic_s; + +// golden crc struct dumped into surface +// for each part, if golden crc compare is enabled, one interface is selected to do crc calculation in vmod. +// vmod's crc is compared with cmod's golden crc (4*32 bits), and compare reuslt is written into surface. +typedef struct +{ + // input + unsigned int dbg_crc_enable_partb : 1; // Eable flag for enable/disable interface crc calculation in NVDEC HW's part b + unsigned int dbg_crc_enable_partc : 1; // Eable flag for enable/disable interface crc calculation in NVDEC HW's part c + unsigned int dbg_crc_enable_partd : 1; // Eable flag for enable/disable interface crc calculation in NVDEC HW's part d + unsigned int dbg_crc_enable_parte : 1; // Eable flag for enable/disable interface crc calculation in NVDEC HW's part e + unsigned int dbg_crc_intf_partb : 6; // For partb to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface + unsigned int dbg_crc_intf_partc : 6; // For partc to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface + unsigned int dbg_crc_intf_partd : 6; // For partd to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface + unsigned int dbg_crc_intf_parte : 6; // For parte to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface + unsigned int reserved0 : 4; + + unsigned int dbg_crc_partb_golden[4]; // Golden crc values for part b + unsigned int dbg_crc_partc_golden[4]; // Golden crc values for part c + unsigned int dbg_crc_partd_golden[4]; // Golden crc values for part d + unsigned int dbg_crc_parte_golden[4]; // Golden crc values for part e + + // output + unsigned int dbg_crc_comp_partb : 4; // Compare result for part b + unsigned int dbg_crc_comp_partc : 4; // Compare result for part c + unsigned int dbg_crc_comp_partd : 4; // Compare result for part d + unsigned int dbg_crc_comp_parte : 4; // Compare result for part e + unsigned int reserved1 : 16; + + unsigned char reserved2[56]; +}nvdec_crc_s; // 128 Bytes + +#endif /* AVUTIL_DRV_NVDEC_H */ diff --git a/libavutil/nvjpg_drv.h b/libavutil/nvjpg_drv.h new file mode 100644 index 0000000000..cd8e976952 --- /dev/null +++ b/libavutil/nvjpg_drv.h @@ -0,0 +1,189 @@ +/******************************************************************************* + Copyright (c) 2016-2020, NVIDIA CORPORATION. All rights reserved. + + Permission is hereby granted, free of charge, to any person obtaining a + copy of this software and associated documentation files (the "Software"), + to deal in the Software without restriction, including without limitation + the rights to use, copy, modify, merge, publish, distribute, sublicense, + and/or sell copies of the Software, and to permit persons to whom the + Software is furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + DEALINGS IN THE SOFTWARE. + +*******************************************************************************/ + +#ifndef AVUTIL_NVJPG_DRV_H +#define AVUTIL_NVJPG_DRV_H + +#include + +typedef uint8_t NvU8; +typedef uint16_t NvU16; +typedef uint32_t NvU32; +typedef uint64_t NvU64; +typedef int8_t NvS8; +typedef int16_t NvS16; +typedef int32_t NvS32; +typedef int64_t NvS64; +typedef _Bool NvBool; + +// +// CLASS NV_E7D0_NVJPG +// +// NVJPG is the combination of JPEG decoder and encoder, it will support baseline sequential profile. +// In the encoder side, it support: a. 420 pitch linear format, b. programable huffman/qunat table, ... etc. +// In the decoder side, it support: a. 400/420/422/444 decoding, b. YUV2RGB, c. Power2Scale: 1/2, 1/4, 1/8, d.ChromaSumbSample ... etc. +// =================== + + +// huffuman table: +// huffuman table is organized in symbol value order, each table item include 2 field, codeWord length, and codeWord value +#define DCVALUEITEM 12 +#define ACVALUEITEM 256 // in fact, only 162 items are used in baseline sequential profile. +typedef struct +{ + unsigned short length; // 4 bit, code word length + unsigned short value; // 16 bit, code word value +}huffman_symbol_s; + + +typedef struct +{ + // surface related + unsigned int bitstream_start_off;// start offset position in bitstream buffer where data should be written (byte offset) + unsigned int bitstream_buf_size; // size in bytes of the buffer allocated for bitstream slice/mb data + unsigned int luma_stride; // 64 bytes align; + unsigned int chroma_stride; // 64 bytes align; + unsigned int inputType : 4; // 0: YUV; 1: RGB, 2: BGR, 3:RGBA, 4: BGRA, 5: ABGR, 6: ARGB + unsigned int chromaFormat : 2; // chroma format: 0: 444; 1: 422H; 2:422V; 3:420 + unsigned int tilingMode : 2; // 0: linear; 1: GPU_blkLinear; 2: Tegra_blkLinear + unsigned int gobHeight : 3; // used for blkLinear, 0: 2; 1: 4; ... 4: 32 + unsigned int yuvMemoryMode: 3; // 0-semi planar nv12; 1-semi planar nv21; 2-plane(yuy2); 3-planar + unsigned int reserved_0 : 18; + // control para + unsigned short imageWidth; // real image width, up to 16K + unsigned short imageHeight; // real image height, up to 16K + unsigned short jpegWidth; // image width align to 8 or 16 pixel + unsigned short jpegHeight; // image height align to 8 or 16 pixel + unsigned int totalMcu; + unsigned int widthMcu; + unsigned int heightMcu; + unsigned int restartInterval; // restart interval, 0 means disable the restart feature + + // rate control related + unsigned int rateControl : 2; // RC: 0:disable; 1:block-base; others: reserve + unsigned int rcTargetYBits : 11; // target luma bits per block, [0 ~ (1<<11)-1] + unsigned int rcTargetCBits : 11; // target chroma bits per block, [0 ~ (1<<11)-1] + unsigned int reserved_1 : 8; + unsigned int preQuant : 1; // pre quant trunction enabled flag + unsigned int rcThreshIdx : 8; // pre_quant threshold index [1 ~ 63] + unsigned int rcThreshMag : 21; // threshold magnitude + // mjpeg-typeB + unsigned int isMjpgTypeB : 1; // a flag indicate mjpg type B format, which control HW no stuff byte. + unsigned int reserved_2 : 1; + // huffman tables + huffman_symbol_s hfDcLuma[DCVALUEITEM]; //dc luma huffman table, arranged in symbol increase order, encoder can directly index and use + huffman_symbol_s hfAcLuma[ACVALUEITEM]; //ac luma huffman table, arranged in symbol increase order, encoder can directly index and use + huffman_symbol_s hfDcChroma[DCVALUEITEM]; //dc chroma huffman table, arranged in symbol increase order, encoder can directly index and use + huffman_symbol_s hfAcChroma[ACVALUEITEM]; //ac chroma huffman table, arranged in symbol increase order, encoder can directly index and use + // quantization tables + unsigned short quantLumaFactor[64]; //luma quantize factor table, arranged in horizontal scan order, (1<<15)/quantLuma + unsigned short quantChromaFactor[64]; //chroma quantize factor table, arranged in horizontal scan order, (1<<15)/quantLuma + + unsigned char reserve[0x6c]; +}nvjpg_enc_drv_pic_setup_s; + +typedef struct +{ + unsigned int bitstream_size; //exact residual part bitstram size of current image + unsigned int mcu_x; //encoded mcu_x + unsigned int mcu_y; //encoded mcu_y + unsigned int cycle_count; + unsigned int error_status; //report error if any + unsigned char reserved1[12]; +}nvjpg_enc_status; + +struct ctrl_param_s +{ + union + { + struct + { + unsigned int gptimer_on :1; + unsigned int dump_cycle :1; + unsigned int debug_mode :1; + unsigned int reserved :29; + }bits; + unsigned int data; + }; +}; + + +//NVJPG Decoder class interface +typedef struct +{ + int codeNum[16]; //the number of huffman code with length i + unsigned char minCodeIdx[16]; //the index of the min huffman code with length i + int minCode[16]; //the min huffman code with length i + unsigned char symbol[162]; // symbol need to be coded. + unsigned char reserved[2]; // alignment +}huffman_tab_s; + +typedef struct +{ + unsigned char hblock; + unsigned char vblock; + unsigned char quant; + unsigned char ac; + unsigned char dc; + unsigned char reserved[3]; //alignment +} block_parameter_s; + +typedef struct +{ + huffman_tab_s huffTab[2][4]; + block_parameter_s blkPar[4]; + unsigned char quant[4][64]; //quant table + int restart_interval; + int frame_width; + int frame_height; + int mcu_width; + int mcu_height; + int comp; + int bitstream_offset; + int bitstream_size; + int stream_chroma_mode; //0-mono chrome; 1-yuv420; 2-yuv422H; 3-yuv422V; 4-yuv444; + int output_chroma_mode; //0-mono chrome; 1-yuv420; 2-yuv422H; 3-yuv422V; 4-yuv444; + int output_pixel_format; //0-yuv; 1-RGB; 2-BGR; 3-RGBA; 4-BGRA; 5-ABGR; 6-ARGB + int output_stride_luma; //64 bytes align + int output_stride_chroma; //64 bytes align + int alpha_value; + int yuv2rgb_param[6]; //K0, K1, K2, K3, K4, C + int tile_mode; //0-pitch linear; 1-gpu block linear; 2-tegra block linear + int block_linear_height; + int memory_mode; //0-semi planar nv12; 1-semi planar nv21; 2-plane(yuy2); 3-planar + int power2_downscale; //0-no scale; 1- 1/2; 2- 1/4; 3- 1/8 + int motion_jpeg_type; //0-type A; 1-type B + int start_mcu_x; //set start mcu x for error robust + int start_mcu_y; //set start mcu y for error robust +}nvjpg_dec_drv_pic_setup_s; + +typedef struct +{ + unsigned int bytes_offset; //bytes consumed by HW + unsigned int mcu_x; //decoded mcu_x + unsigned int mcu_y; //decoded mcu_y + unsigned int cycle_count; + unsigned int error_status; //report error if any + unsigned char reserved1[12]; +}nvjpg_dec_status; +#endif /* AVUTIL_NVJPG_DRV_H */ diff --git a/libavutil/vic_drv.h b/libavutil/vic_drv.h new file mode 100644 index 0000000000..32ebe1a17d --- /dev/null +++ b/libavutil/vic_drv.h @@ -0,0 +1,279 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef AVUTIL_VIC_DRV_H +#define AVUTIL_VIC_DRV_H + +#include + +typedef uint8_t NvU8; +typedef uint16_t NvU16; +typedef uint32_t NvU32; +typedef uint64_t NvU64; +typedef int8_t NvS8; +typedef int16_t NvS16; +typedef int32_t NvS32; +typedef int64_t NvS64; +typedef _Bool NvBool; + +typedef struct VicPipeConfig { + NvU32 DownsampleHoriz : 11; + NvU32 reserved0 : 5; + NvU32 DownsampleVert : 11; + NvU32 reserved1 : 5; + NvU32 reserved2 : 32; + NvU32 reserved3 : 32; + NvU32 reserved4 : 32; +} VicPipeConfig; + +typedef struct VicOutputConfig { + NvU64 AlphaFillMode : 3; + NvU64 AlphaFillSlot : 3; + NvU64 BackgroundAlpha : 10; + NvU64 BackgroundR : 10; + NvU64 BackgroundG : 10; + NvU64 BackgroundB : 10; + NvU64 RegammaMode : 2; + NvU64 OutputFlipX : 1; + NvU64 OutputFlipY : 1; + NvU64 OutputTranspose : 1; + NvU64 reserved1 : 1; + NvU64 reserved2 : 12; + NvU32 TargetRectLeft : 14; + NvU32 reserved3 : 2; + NvU32 TargetRectRight : 14; + NvU32 reserved4 : 2; + NvU32 TargetRectTop : 14; + NvU32 reserved5 : 2; + NvU32 TargetRectBottom : 14; + NvU32 reserved6 : 2; +} VicOutputConfig; + +typedef struct VicOutputSurfaceConfig { + NvU32 OutPixelFormat : 7; + NvU32 OutChromaLocHoriz : 2; + NvU32 OutChromaLocVert : 2; + NvU32 OutBlkKind : 4; + NvU32 OutBlkHeight : 4; + NvU32 reserved0 : 3; + NvU32 reserved1 : 10; + NvU32 OutSurfaceWidth : 14; + NvU32 OutSurfaceHeight : 14; + NvU32 reserved2 : 4; + NvU32 OutLumaWidth : 14; + NvU32 OutLumaHeight : 14; + NvU32 reserved3 : 4; + NvU32 OutChromaWidth : 14; + NvU32 OutChromaHeight : 14; + NvU32 reserved4 : 4; +} VicOutputSurfaceConfig; + +typedef struct VicMatrixStruct { + NvU64 matrix_coeff00 : 20; + NvU64 matrix_coeff10 : 20; + NvU64 matrix_coeff20 : 20; + NvU64 matrix_r_shift : 4; + NvU64 matrix_coeff01 : 20; + NvU64 matrix_coeff11 : 20; + NvU64 matrix_coeff21 : 20; + NvU64 reserved0 : 3; + NvU64 matrix_enable : 1; + NvU64 matrix_coeff02 : 20; + NvU64 matrix_coeff12 : 20; + NvU64 matrix_coeff22 : 20; + NvU64 reserved1 : 4; + NvU64 matrix_coeff03 : 20; + NvU64 matrix_coeff13 : 20; + NvU64 matrix_coeff23 : 20; + NvU64 reserved2 : 4; +} VicMatrixStruct; + +typedef struct VicClearRectStruct { + NvU32 ClearRect0Left : 14; + NvU32 reserved0 : 2; + NvU32 ClearRect0Right : 14; + NvU32 reserved1 : 2; + NvU32 ClearRect0Top : 14; + NvU32 reserved2 : 2; + NvU32 ClearRect0Bottom : 14; + NvU32 reserved3 : 2; + NvU32 ClearRect1Left : 14; + NvU32 reserved4 : 2; + NvU32 ClearRect1Right : 14; + NvU32 reserved5 : 2; + NvU32 ClearRect1Top : 14; + NvU32 reserved6 : 2; + NvU32 ClearRect1Bottom : 14; + NvU32 reserved7 : 2; +} VicClearRectStruct; + +typedef struct VicSlotStructSlotConfig { + NvU64 SlotEnable : 1; + NvU64 DeNoise : 1; + NvU64 AdvancedDenoise : 1; + NvU64 CadenceDetect : 1; + NvU64 MotionMap : 1; + NvU64 MMapCombine : 1; + NvU64 IsEven : 1; + NvU64 ChromaEven : 1; + NvU64 CurrentFieldEnable : 1; + NvU64 PrevFieldEnable : 1; + NvU64 NextFieldEnable : 1; + NvU64 NextNrFieldEnable : 1; + NvU64 CurMotionFieldEnable : 1; + NvU64 PrevMotionFieldEnable : 1; + NvU64 PpMotionFieldEnable : 1; + NvU64 CombMotionFieldEnable : 1; + NvU64 FrameFormat : 4; + NvU64 FilterLengthY : 2; + NvU64 FilterLengthX : 2; + NvU64 Panoramic : 12; + NvU64 reserved1 : 22; + NvU64 DetailFltClamp : 6; + NvU64 FilterNoise : 10; + NvU64 FilterDetail : 10; + NvU64 ChromaNoise : 10; + NvU64 ChromaDetail : 10; + NvU64 DeinterlaceMode : 4; + NvU64 MotionAccumWeight : 3; + NvU64 NoiseIir : 11; + NvU64 LightLevel : 4; + NvU64 reserved4 : 2; + NvU32 SoftClampLow : 10; + NvU32 SoftClampHigh : 10; + NvU32 reserved5 : 3; + NvU32 reserved6 : 9; + NvU32 PlanarAlpha : 10; + NvU32 ConstantAlpha : 1; + NvU32 StereoInterleave : 3; + NvU32 ClipEnabled : 1; + NvU32 ClearRectMask : 8; + NvU32 DegammaMode : 2; + NvU32 reserved7 : 1; + NvU32 DecompressEnable : 1; + NvU32 reserved9 : 5; + NvU64 DecompressCtbCount : 8; + NvU64 DecompressZbcColor : 32; + NvU64 reserved12 : 24; + NvU32 SourceRectLeft : 30; + NvU32 reserved14 : 2; + NvU32 SourceRectRight : 30; + NvU32 reserved15 : 2; + NvU32 SourceRectTop : 30; + NvU32 reserved16 : 2; + NvU32 SourceRectBottom : 30; + NvU32 reserved17 : 2; + NvU32 DestRectLeft : 14; + NvU32 reserved18 : 2; + NvU32 DestRectRight : 14; + NvU32 reserved19 : 2; + NvU32 DestRectTop : 14; + NvU32 reserved20 : 2; + NvU32 DestRectBottom : 14; + NvU32 reserved21 : 2; + NvU32 reserved22 : 32; + NvU32 reserved23 : 32; +} VicSlotStructSlotConfig; + +typedef struct VicSlotStructSlotSurfaceConfig { + NvU32 SlotPixelFormat : 7; + NvU32 SlotChromaLocHoriz : 2; + NvU32 SlotChromaLocVert : 2; + NvU32 SlotBlkKind : 4; + NvU32 SlotBlkHeight : 4; + NvU32 SlotCacheWidth : 3; + NvU32 reserved0 : 10; + NvU32 SlotSurfaceWidth : 14; + NvU32 SlotSurfaceHeight : 14; + NvU32 reserved1 : 4; + NvU32 SlotLumaWidth : 14; + NvU32 SlotLumaHeight : 14; + NvU32 reserved2 : 4; + NvU32 SlotChromaWidth : 14; + NvU32 SlotChromaHeight : 14; + NvU32 reserved3 : 4; +} VicSlotStructSlotSurfaceConfig; + +typedef struct VicSlotStructLumaKeyStruct { + NvU64 luma_coeff0 : 20; + NvU64 luma_coeff1 : 20; + NvU64 luma_coeff2 : 20; + NvU64 luma_r_shift : 4; + NvU64 luma_coeff3 : 20; + NvU64 LumaKeyLower : 10; + NvU64 LumaKeyUpper : 10; + NvU64 LumaKeyEnabled : 1; + NvU64 reserved0 : 2; + NvU64 reserved1 : 21; +} VicSlotStructLumaKeyStruct; + +typedef struct VicSlotStructBlendingSlotStruct { + NvU32 AlphaK1 : 10; + NvU32 reserved0 : 6; + NvU32 AlphaK2 : 10; + NvU32 reserved1 : 6; + NvU32 SrcFactCMatchSelect : 3; + NvU32 reserved2 : 1; + NvU32 DstFactCMatchSelect : 3; + NvU32 reserved3 : 1; + NvU32 SrcFactAMatchSelect : 3; + NvU32 reserved4 : 1; + NvU32 DstFactAMatchSelect : 3; + NvU32 reserved5 : 1; + NvU32 reserved6 : 4; + NvU32 reserved7 : 4; + NvU32 reserved8 : 4; + NvU32 reserved9 : 4; + NvU32 reserved10 : 2; + NvU32 OverrideR : 10; + NvU32 OverrideG : 10; + NvU32 OverrideB : 10; + NvU32 OverrideA : 10; + NvU32 reserved11 : 2; + NvU32 UseOverrideR : 1; + NvU32 UseOverrideG : 1; + NvU32 UseOverrideB : 1; + NvU32 UseOverrideA : 1; + NvU32 MaskR : 1; + NvU32 MaskG : 1; + NvU32 MaskB : 1; + NvU32 MaskA : 1; + NvU32 reserved12 : 12; +} VicSlotStructBlendingSlotStruct; + +typedef struct VicSlotStruct { + VicSlotStructSlotConfig slotConfig; + VicSlotStructSlotSurfaceConfig slotSurfaceConfig; + VicSlotStructLumaKeyStruct lumaKeyStruct; + VicMatrixStruct colorMatrixStruct; + VicMatrixStruct gamutMatrixStruct; + VicSlotStructBlendingSlotStruct blendingSlotStruct; +} VicSlotStruct; + +typedef struct VicConfigStruct { + VicPipeConfig pipeConfig; + VicOutputConfig outputConfig; + VicOutputSurfaceConfig outputSurfaceConfig; + VicMatrixStruct outColorMatrixStruct; + VicClearRectStruct clearRectStruct[4]; + VicSlotStruct slotStruct[8]; +} VicConfigStruct; + +#endif /* AVUTIL_VIC_DRV_H */ From patchwork Thu May 30 19:43:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49415 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67242vqg; Thu, 30 May 2024 12:44:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXVDkFMnn0NokHYGGmwxDzpoLhWa603xJjSMgb+ooWtKTlXec6gaA8DfSLnar6/z9NLapPaSiFE/yObCbkdPNLYdOsoLcfGw+emWA== X-Google-Smtp-Source: AGHT+IH/gUG1UkstyfKPfiapBtX3++Aht5rSuMM03+FRW73fBaoHHwduXNjFSaUZJPnmoSRIFuo/ X-Received: by 2002:a2e:9e44:0:b0:2df:907e:6de3 with SMTP id 38308e7fff4ca-2ea848844d3mr17098861fa.35.1717098295766; Thu, 30 May 2024 12:44:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098295; cv=none; d=google.com; s=arc-20160816; b=JvzjtcbBjbwVwdDUzcod7B2M6ZaN+DVsg+LA1ha8N2Vej9sq7u3oP/dAV008Pg169h SZ2DU78KgTcNJm6jcMF2i0x7oWrPQFhOMobPI0TTsj3wkRFpJUbMrlEmhWmMtInvBG/i YpLpGTXY7QJ+3R7niA/aj6xfPqk4ZZv/knajhfFA3ctq3O1noGX1vLhDvJL1/7bnNspD LV6YfIA4sGIF5MIcQmmRJdDqB+RPFKWPvXuYZ1wWhwwpMCFbccX7Zu1FBK+L5/6m9a/L aD66XBxE8yIwIp9frRASH2B3n6hFbaK2XRVDVeKp4lMhobbop199+DudxDpzm5yt5N6R LajQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=0gPBrOwMieOnP+VqYY356sXlLstlhowFISyklHp5vZY=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=0IL6J2MDP1unN7daI1LIyBJIShP7Ta/4wkVHOGDCUqlshkZbrCdbci5vArHn2Il8he bAutKqWfGfrrv+KpMs9VKeH9lR5ZHWcLRcCvrxalCDuukGGaaNIlArovx0xIWiB9OD5M eqD+BTlfJugVxkt11tN96bT45BKSSIMhesT2T2hNzWqfk/67icta9FVA4QtDd1lyh2Mn jcgANF7pxQQoo0+aSbJzN6FcYn+Hn3oaD36RNSxuOzKGLXrQHRHwBR9PocVjhU27kFFS TJyF2ceBeu1n2DbSBu3nX+H2twSsgZsT6tLYsvEVgWjGqD8UPLLcSHDM5J/9o8luf8J2 Crxg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=TrYyHaWs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ea91c0db35si998761fa.14.2024.05.30.12.44.55; Thu, 30 May 2024 12:44:55 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=TrYyHaWs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BFFCD68D5A6; Thu, 30 May 2024 22:44:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C4BA968D582 for ; Thu, 30 May 2024 22:44:34 +0300 (EEST) Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-354f8a0cd08so1317105f8f.2 for ; Thu, 30 May 2024 12:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098274; x=1717703074; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oMR1rw/KbC0KhwfcTrKzU27cjtGtV5MF8GcoU3h86os=; b=TrYyHaWs+wwX9rTR86qatfg2YCClyyQpAbw/AIJRe0Q+QSheqPgLr52f2mKAOg6f8e RrSo3L384OC9lmiF4ku99aQTw1/eJ+cwXqG71o4KLYNxTua0DTiOJLuUJ/z8Txrdgvua 8ab6vO1M8oFSTSi9x4jhi7iFLWcI4g13wTHQpoBVSmz4jX6Rmgkk6BmMBVYCiXxLH4HD gQPoTLWC73EtYNwkMG6jbVLsW1+SG43yTKCqlxC6kutGEWMnBeThyQ5A39sjtS1oN3sY Uy9nDRIzZiBBf/it8+3jjHRcRWaupg7KfX1FKHwPsoFPMzDMpTDZEd3YUFCcM4GtT3hI ljqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098274; x=1717703074; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oMR1rw/KbC0KhwfcTrKzU27cjtGtV5MF8GcoU3h86os=; b=dQ0sEH5jZZeofYAOOfL3OQ5/VDQNZgqYdrmKA7RPjz9vruAFDCPlkgr0QO9cSld7yl k2tXG6nHBFE938f9Yf2q5W1xLrQtJtBUkz3/FA8YXYy2Bs9HmDMpSd2jpJw9dsx1dUpL F6j+hicMbQi49btdus98pCE6L4M5Z0V0ECGjxqS9Tk7IXCfhvYFgWlvdMYiJUSjgE7A4 FYmiQrEP4pDNdNZNoe6NXdaVIcwwhcUkv28SjVC3jbd+X9pg1rKqNUsQ1+2B7wwmqnpK 2Q9QiL89+PXr8P9XX9PuMGl1sApVGM4iyeGyLG/0NbenS5vzEBYN5QVUrSNv1IjxFiNS dZAQ== X-Gm-Message-State: AOJu0YzX+t8RkatRKZAYiEm20lOaECypxenZaagPOTiY7BgRwFyKcpyn D+OdzbEsUY8AKOZG7HybEtPedrObRI93aDaBc6YF6RJWzDAaoghBtTIUIw== X-Received: by 2002:a5d:60c3:0:b0:355:513:f08b with SMTP id ffacd0b85a97d-35dc0091800mr2028048f8f.27.1717098273551; Thu, 30 May 2024 12:44:33 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:33 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:07 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: zM+jLkUR+DQ7 This includes a new pixel format for nvtegra hardware frames, and several objects for interaction with hardware blocks. In particular, this contains code for channels (handles to hardware engines), maps (memory-mapped buffers shared with engines), and command buffers (abstraction for building command lists sent to the engines). Signed-off-by: averne --- configure | 2 + libavutil/Makefile | 4 + libavutil/nvtegra.c | 1035 ++++++++++++++++++++++++++++++++++++ libavutil/nvtegra.h | 258 +++++++++ libavutil/nvtegra_host1x.h | 94 ++++ libavutil/pixdesc.c | 4 + libavutil/pixfmt.h | 8 + 7 files changed, 1405 insertions(+) create mode 100644 libavutil/nvtegra.c create mode 100644 libavutil/nvtegra.h create mode 100644 libavutil/nvtegra_host1x.h diff --git a/configure b/configure index 09fb2aed1b..51f169bfbd 100755 --- a/configure +++ b/configure @@ -361,6 +361,7 @@ External library support: --disable-vdpau disable Nvidia Video Decode and Presentation API for Unix code [autodetect] --disable-videotoolbox disable VideoToolbox code [autodetect] --disable-vulkan disable Vulkan code [autodetect] + --enable-nvtegra enable nvtegra code [no] Toolchain options: --arch=ARCH select architecture [$arch] @@ -3151,6 +3152,7 @@ videotoolbox_hwaccel_deps="videotoolbox pthreads" videotoolbox_hwaccel_extralibs="-framework QuartzCore" vulkan_deps="threads" vulkan_deps_any="libdl LoadLibrary" +nvtegra_deps="gpl" av1_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_AV1" av1_d3d11va_hwaccel_select="av1_decoder" diff --git a/libavutil/Makefile b/libavutil/Makefile index 9c112bc58a..733a23a8a3 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -52,6 +52,7 @@ HEADERS = adler32.h \ hwcontext_videotoolbox.h \ hwcontext_vdpau.h \ hwcontext_vulkan.h \ + nvtegra.h \ nvhost_ioctl.h \ nvmap_ioctl.h \ iamf.h \ @@ -209,6 +210,7 @@ OBJS-$(CONFIG_VDPAU) += hwcontext_vdpau.o OBJS-$(CONFIG_VULKAN) += hwcontext_vulkan.o vulkan.o OBJS-$(!CONFIG_VULKAN) += hwcontext_stub.o +OBJS-$(CONFIG_NVTEGRA) += nvtegra.o OBJS += $(COMPAT_OBJS:%=../compat/%) @@ -230,6 +232,8 @@ SKIPHEADERS-$(CONFIG_VDPAU) += hwcontext_vdpau.h SKIPHEADERS-$(CONFIG_VULKAN) += hwcontext_vulkan.h vulkan.h \ vulkan_functions.h \ vulkan_loader.h +SKIPHEADERS-$(CONFIG_NVTEGRA) += nvtegra.h \ + nvtegra_host1x.h TESTPROGS = adler32 \ aes \ diff --git a/libavutil/nvtegra.c b/libavutil/nvtegra.c new file mode 100644 index 0000000000..ad0bbbdfaa --- /dev/null +++ b/libavutil/nvtegra.c @@ -0,0 +1,1035 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef __SWITCH__ +# include +# include +# include +# include +#else +# include +# include +#endif + +#include + +#include "buffer.h" +#include "log.h" +#include "error.h" +#include "mem.h" +#include "thread.h" + +#include "nvhost_ioctl.h" +#include "nvmap_ioctl.h" +#include "nvtegra_host1x.h" + +#include "nvtegra.h" + +/* + * Tag used by the kernel to identify allocations. + * Official software has been seen using 0x900, 0xf00, 0x1100, 0x1400, 0x4000. + */ +#define MEM_TAG (0xfeed) + +struct DriverState { + int nvmap_fd, nvhost_fd; +}; + +static AVMutex g_driver_init_mtx = AV_MUTEX_INITIALIZER; +static struct DriverState *g_driver_state = NULL; +static AVBufferRef *g_driver_state_ref = NULL; + +static void free_driver_fds(void *opaque, uint8_t *data) { + if (!g_driver_state) + return; + +#ifndef __SWITCH__ + if (g_driver_state->nvmap_fd > 0) + close(g_driver_state->nvmap_fd); + + if (g_driver_state->nvhost_fd > 0) + close(g_driver_state->nvhost_fd); +#else + nvFenceExit(); + nvMapExit(); + nvExit(); + mmuExit(); +#endif + + g_driver_init_mtx = (AVMutex)AV_MUTEX_INITIALIZER; + g_driver_state_ref = NULL; + av_freep(&g_driver_state); +} + +static int init_driver_fds(void) { + AVBufferRef *ref; + struct DriverState *state; + int err; + + state = av_mallocz(sizeof(*state)); + if (!state) + return AVERROR(ENOMEM); + + ref = av_buffer_create((uint8_t *)state, sizeof(*state), free_driver_fds, NULL, 0); + if (!state) + return AVERROR(ENOMEM); + + g_driver_state = state; + g_driver_state_ref = ref; + +#ifndef __SWITCH__ + err = open("/dev/nvmap", O_RDWR | O_SYNC); + if (err < 0) + return AVERROR(errno); + state->nvmap_fd = err; + + err = open("/dev/nvhost-ctrl", O_RDWR | O_SYNC); + if (err < 0) + return AVERROR(errno); + state->nvhost_fd = err; +#else + err = nvInitialize(); + if (R_FAILED(err)) + return AVERROR(err); + + err = nvMapInit(); + if (R_FAILED(err)) + return AVERROR(err); + state->nvmap_fd = nvMapGetFd(); + + err = nvFenceInit(); + if (R_FAILED(err)) + return AVERROR(err); + /* libnx doesn't export the nvhost-ctrl file descriptor */ + + err = mmuInitialize(); + if (R_FAILED(err)) + return AVERROR(err); +#endif + + return 0; +} + +static inline int get_nvmap_fd(void) { + if (!g_driver_state) + return AVERROR_UNKNOWN; + + if (!g_driver_state->nvmap_fd) + return AVERROR_UNKNOWN; + + return g_driver_state->nvmap_fd; +} + +static inline int get_nvhost_fd(void) { + if (!g_driver_state) + return AVERROR_UNKNOWN; + + if (!g_driver_state->nvhost_fd) + return AVERROR_UNKNOWN; + + return g_driver_state->nvhost_fd; +} + +AVBufferRef *av_nvtegra_driver_init(void) { + AVBufferRef *out = NULL; + int err; + + /* + * We have to do this overly complex dance of putting driver fds in a refcounted struct, + * otherwise initializing multiple hwcontexts would leak fds + */ + + err = ff_mutex_lock(&g_driver_init_mtx); + if (err != 0) + goto exit; + + if (g_driver_state_ref) { + out = av_buffer_ref(g_driver_state_ref); + goto exit; + } + + err = init_driver_fds(); + if (err < 0) { + /* In case memory allocations failed, call the destructor ourselves */ + av_buffer_unref(&g_driver_state_ref); + free_driver_fds(NULL, NULL); + goto exit; + } + + out = g_driver_state_ref; + +exit: + ff_mutex_unlock(&g_driver_init_mtx); + return out; +} + +int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev) { + int err; +#ifndef __SWITCH__ + struct nvhost_get_param_arg args; + + err = open(dev, O_RDWR); + if (err < 0) + return AVERROR(errno); + + channel->fd = err; + + args = (struct nvhost_get_param_arg){0}; + + err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_SYNCPOINT, &args); + if (err < 0) + goto fail; + + channel->syncpt = args.value; + + return 0; + +fail: + close(channel->fd); + return AVERROR(errno); +#else + err = nvChannelCreate(&channel->channel, dev); + if (R_FAILED(err)) + return AVERROR(err); + + err = nvioctlChannel_GetSyncpt(channel->channel.fd, 0, &channel->syncpt); + if (R_FAILED(err)) + goto fail; + + return 0; + +fail: + nvChannelClose(&channel->channel); + return AVERROR(err); +#endif +} + +int av_nvtegra_channel_close(AVNVTegraChannel *channel) { +#ifndef __SWITCH__ + if (!channel->fd) + return 0; + + return close(channel->fd); +#else + nvChannelClose(&channel->channel); + return 0; +#endif +} + +int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate) { + int err; +#ifndef __SWITCH__ + struct nvhost_clk_rate_args args; + + args = (struct nvhost_clk_rate_args){ + .moduleid = moduleid, + }; + + err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_CLK_RATE, &args); + if (err < 0) + return AVERROR(errno); + + if (clock_rate) + *clock_rate = args.rate; + + return 0; +#else + uint32_t tmp; + + err = AVERROR(nvioctlChannel_GetModuleClockRate(channel->channel.fd, moduleid, &tmp)); + if (err < 0) + return err; + + if (clock_rate) + *clock_rate = tmp * 1000; + + return 0; +#endif +} + +int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate) { +#ifndef __SWITCH__ + struct nvhost_clk_rate_args args; + + args = (struct nvhost_clk_rate_args){ + .rate = clock_rate, + .moduleid = moduleid, + }; + + return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_CLK_RATE, &args) < 0) ? AVERROR(errno) : 0; +#else + return AVERROR(nvioctlChannel_SetModuleClockRate(channel->channel.fd, moduleid, clock_rate / 1000)); +#endif +} + +int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence) { + int err; +#ifndef __SWITCH__ + struct nvhost_submit_args args; + + args = (struct nvhost_submit_args){ + .submit_version = NVHOST_SUBMIT_VERSION_V2, + .num_syncpt_incrs = cmdbuf->num_syncpt_incrs, + .num_cmdbufs = cmdbuf->num_cmdbufs, + .num_relocs = cmdbuf->num_relocs, + .num_waitchks = cmdbuf->num_waitchks, + .timeout = 0, + .flags = 0, + .fence = 0, + .syncpt_incrs = (uintptr_t)cmdbuf->syncpt_incrs, + .cmdbuf_exts = (uintptr_t)cmdbuf->cmdbuf_exts, + .checksum_methods = 0, + .checksum_falcon_methods = 0, + .pad = { 0 }, + .reloc_types = (uintptr_t)cmdbuf->reloc_types, + .cmdbufs = (uintptr_t)cmdbuf->cmdbufs, + .relocs = (uintptr_t)cmdbuf->relocs, + .reloc_shifts = (uintptr_t)cmdbuf->reloc_shifts, + .waitchks = (uintptr_t)cmdbuf->waitchks, + .waitbases = 0, + .class_ids = (uintptr_t)cmdbuf->class_ids, + .fences = (uintptr_t)cmdbuf->fences, + }; + + err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SUBMIT, &args); + if (err < 0) + return AVERROR(errno); + + if (fence) + *fence = args.fence; + + return 0; +#else + nvioctl_fence tmp; + + err = nvioctlChannel_Submit(channel->channel.fd, (nvioctl_cmdbuf *)cmdbuf->cmdbufs, cmdbuf->num_cmdbufs, + NULL, NULL, 0, (nvioctl_syncpt_incr *)cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs, + &tmp, 1); + if (R_FAILED(err)) + return AVERROR(err); + + if (fence) + *fence = tmp.value; + + return 0; +#endif +} + +int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms) { +#ifndef __SWITCH__ + struct nvhost_set_timeout_args args; + + args = (struct nvhost_set_timeout_args){ + .timeout = timeout_ms, + }; + + return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_TIMEOUT, &args) < 0) ? AVERROR(errno) : 0; +#else + return AVERROR(nvioctlChannel_SetSubmitTimeout(channel->channel.fd, timeout_ms)); +#endif +} + +int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout) { +#ifndef __SWITCH__ + struct nvhost_ctrl_syncpt_waitex_args args = { + .id = channel->syncpt, + .thresh = threshold, + .timeout = timeout, + }; + + return (ioctl(get_nvhost_fd(), NVHOST_IOCTL_CTRL_SYNCPT_WAITEX, &args) < 0) ? AVERROR(errno) : 0; +#else + NvFence fence; + + fence = (NvFence){ + .id = channel->syncpt, + .value = threshold, + }; + + return AVERROR(nvFenceWait(&fence, timeout)); +#endif +} + +#ifdef __SWITCH__ +static inline bool convert_cache_flags(uint32_t flags) { + /* Return whether the map should be CPU-cacheable */ + switch (flags & NVMAP_HANDLE_CACHE_FLAG) { + case NVMAP_HANDLE_INNER_CACHEABLE: + case NVMAP_HANDLE_CACHEABLE: + return true; + default: + return false; + } +} +#endif + +int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *channel, uint32_t size, + uint32_t align, int heap_mask, int flags) +{ +#ifndef __SWITCH__ + struct nvmap_create_handle create_args; + struct nvmap_alloc_handle alloc_args; + int err; + + create_args = (struct nvmap_create_handle){ + .size = size, + }; + + err = ioctl(get_nvmap_fd(), NVMAP_IOC_CREATE, &create_args); + if (err < 0) + return AVERROR(errno); + + map->size = size; + map->handle = create_args.handle; + + alloc_args = (struct nvmap_alloc_handle){ + .handle = create_args.handle, + .heap_mask = heap_mask, + .flags = flags | (MEM_TAG << 16), + .align = align, + }; + + err = ioctl(get_nvmap_fd(), NVMAP_IOC_ALLOC, &alloc_args); + if (err < 0) + goto fail; + + return 0; + +fail: + av_nvtegra_map_free(map); + return AVERROR(errno); +#else + void *mem; + + map->owner = channel->channel.fd; + + size = FFALIGN(size, 0x1000); + + mem = aligned_alloc(FFALIGN(align, 0x1000), size); + if (!mem) + return AVERROR(ENOMEM); + + return AVERROR(nvMapCreate(&map->map, mem, size, 0x10000, NvKind_Pitch, + convert_cache_flags(flags))); +#endif +} + +int av_nvtegra_map_free(AVNVTegraMap *map) { +#ifndef __SWITCH__ + int err; + + if (!map->handle) + return 0; + + err = ioctl(get_nvmap_fd(), NVMAP_IOC_FREE, map->handle); + if (err < 0) + return AVERROR(errno); + + map->handle = 0; + + return 0; +#else + void *addr = map->map.cpu_addr; + + if (!map->map.cpu_addr) + return 0; + + nvMapClose(&map->map); + free(addr); + return 0; +#endif +} + +int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem, + uint32_t size, uint32_t align, uint32_t flags) +{ +#ifndef __SWITCH__ + struct nvmap_create_handle_from_va args; + int err; + + args = (struct nvmap_create_handle_from_va){ + .va = (uintptr_t)mem, + .size = size, + .flags = flags | (MEM_TAG << 16), + }; + + err = ioctl(get_nvmap_fd(), NVMAP_IOC_FROM_VA, &args); + if (err < 0) + return AVERROR(errno); + + map->cpu_addr = mem; + map->size = size; + map->handle = args.handle; + + return 0; +#else + + map->owner = owner->channel.fd; + + return AVERROR(nvMapCreate(&map->map, mem, FFALIGN(size, 0x1000), 0x10000, NvKind_Pitch, + convert_cache_flags(flags)));; +#endif +} + +int av_nvtegra_map_close(AVNVTegraMap *map) { +#ifndef __SWITCH__ + return av_nvtegra_map_free(map); +#else + nvMapClose(&map->map); + return 0; +#endif +} + +int av_nvtegra_map_map(AVNVTegraMap *map) { +#ifndef __SWITCH__ + void *addr; + + addr = mmap(NULL, map->size, PROT_READ | PROT_WRITE, MAP_SHARED, map->handle, 0); + if (addr == MAP_FAILED) + return AVERROR(errno); + + map->cpu_addr = addr; + + return 0; +#else + nvioctl_command_buffer_map params; + int err; + + params = (nvioctl_command_buffer_map){ + .handle = map->map.handle, + }; + + err = nvioctlChannel_MapCommandBuffer(map->owner, ¶ms, 1, false); + if (R_FAILED(err)) + return AVERROR(err); + + map->iova = params.iova; + + return 0; +#endif +} + +int av_nvtegra_map_unmap(AVNVTegraMap *map) { + int err; +#ifndef __SWITCH__ + if (!map->cpu_addr) + return 0; + + err = munmap(map->cpu_addr, map->size); + if (err < 0) + return AVERROR(errno); + + map->cpu_addr = NULL; + + return 0; +#else + nvioctl_command_buffer_map params; + + if (!map->iova) + return 0; + + params = (nvioctl_command_buffer_map){ + .handle = map->map.handle, + .iova = map->iova, + }; + + err = nvioctlChannel_UnmapCommandBuffer(map->owner, ¶ms, 1, false); + if (R_FAILED(err)) + return AVERROR(err); + + map->iova = 0; + + return 0; +#endif +} + +int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len) { +#ifndef __SWITCH__ + struct nvmap_cache_op args; + + args = (struct nvmap_cache_op){ + .addr = (uintptr_t)addr, + .len = len, + .handle = av_nvtegra_map_get_handle(map), + .op = op, + }; + + return AVERROR(ioctl(get_nvmap_fd(), NVMAP_IOC_CACHE, &args)); +#else + if (!map->map.is_cpu_cacheable) + return 0; + + switch (op) { + case NVMAP_CACHE_OP_WB: + armDCacheClean(addr, len); + break; + default: + case NVMAP_CACHE_OP_INV: + case NVMAP_CACHE_OP_WB_INV: + /* libnx internally performs a clean-invalidate, since invalidate is a privileged instruction */ + armDCacheFlush(addr, len); + break; + } + + return 0; +#endif +} + +int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align, + int heap_mask, int flags) +{ + AVNVTegraChannel channel; + AVNVTegraMap tmp = {0}; + int err; + + if (av_nvtegra_map_get_size(map) >= size) + return 0; + + /* Dummy channel object to hold the owner fd */ + channel = (AVNVTegraChannel){ +#ifdef __SWITCH__ + .channel.fd = map->owner, +#endif + }; + + err = av_nvtegra_map_create(&tmp, &channel, size, align, heap_mask, flags); + if (err < 0) + goto fail; + + memcpy(av_nvtegra_map_get_addr(&tmp), av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map)); + + err = av_nvtegra_map_destroy(map); + if (err < 0) + goto fail; + + *map = tmp; + + return 0; + +fail: + av_nvtegra_map_destroy(&tmp); + return err; +} + +int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf) { + cmdbuf->num_cmdbufs = 0; +#ifndef __SWITCH__ + cmdbuf->num_relocs = 0; + cmdbuf->num_waitchks = 0; +#endif + cmdbuf->num_syncpt_incrs = 0; + +#define NUM_INITIAL_CMDBUFS 3 +#define NUM_INITIAL_RELOCS 15 +#define NUM_INITIAL_SYNCPT_INCRS 3 + + cmdbuf->cmdbufs = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbufs)); +#ifndef __SWITCH__ + cmdbuf->cmdbuf_exts = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbuf_exts)); + cmdbuf->class_ids = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->class_ids)); +#endif + +#ifndef __SWITCH__ + if (!cmdbuf->cmdbufs || !cmdbuf->cmdbuf_exts || !cmdbuf->class_ids) +#else + if (!cmdbuf->cmdbufs) +#endif + return AVERROR(ENOMEM); + +#ifndef __SWITCH__ + cmdbuf->relocs = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->relocs)); + cmdbuf->reloc_types = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_types)); + cmdbuf->reloc_shifts = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_shifts)); + if (!cmdbuf->relocs || !cmdbuf->reloc_types || !cmdbuf->reloc_shifts) + return AVERROR(ENOMEM); +#endif + + cmdbuf->syncpt_incrs = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->syncpt_incrs)); +#ifndef __SWITCH__ + cmdbuf->fences = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->fences)); +#endif + +#ifndef __SWITCH__ + if (!cmdbuf->syncpt_incrs || !cmdbuf->fences) +#else + if (!cmdbuf->syncpt_incrs) +#endif + return AVERROR(ENOMEM); + + return 0; +} + +int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf) { + av_freep(&cmdbuf->cmdbufs); + av_freep(&cmdbuf->syncpt_incrs); + +#ifndef __SWITCH__ + av_freep(&cmdbuf->cmdbuf_exts), av_freep(&cmdbuf->class_ids); + av_freep(&cmdbuf->relocs), av_freep(&cmdbuf->reloc_types), av_freep(&cmdbuf->reloc_shifts); + av_freep(&cmdbuf->fences); +#endif + + return 0; +} + +int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size) { + uint8_t *mem; + + mem = av_nvtegra_map_get_addr(map); + + cmdbuf->map = map; + cmdbuf->mem_offset = offset; + cmdbuf->mem_size = size; + + cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset); + + return 0; +} + +int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf) { + uint8_t *mem; + + mem = av_nvtegra_map_get_addr(cmdbuf->map); + + cmdbuf->num_cmdbufs = 0, cmdbuf->num_syncpt_incrs = 0; +#ifndef __SWITCH__ + cmdbuf->num_relocs = 0, cmdbuf->num_waitchks = 0; +#endif + + cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset); + return 0; +} + +int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id) { + uint8_t *mem; + void *tmp1; +#ifndef __SWITCH__ + void *tmp2, *tmp3; +#endif + + mem = av_nvtegra_map_get_addr(cmdbuf->map); + + tmp1 = av_realloc_array(cmdbuf->cmdbufs, cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbufs)); +#ifndef __SWITCH__ + tmp2 = av_realloc_array(cmdbuf->cmdbuf_exts, cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbuf_exts)); + tmp3 = av_realloc_array(cmdbuf->class_ids, cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->class_ids)); +#endif + +#ifndef __SWITCH__ + if (!tmp1 || !tmp2 || !tmp3) +#else + if (!tmp1) +#endif + return AVERROR(ENOMEM); + + cmdbuf->cmdbufs = tmp1; + +#ifndef __SWITCH__ + cmdbuf->cmdbuf_exts = tmp2, cmdbuf->class_ids = tmp3; +#endif + + cmdbuf->cmdbufs[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf){ + .mem = av_nvtegra_map_get_handle(cmdbuf->map), + .offset = (uint8_t *)cmdbuf->cur_word - mem, + }; + +#ifndef __SWITCH__ + cmdbuf->cmdbuf_exts[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf_ext){ + .pre_fence = -1, + }; + + cmdbuf->class_ids[cmdbuf->num_cmdbufs] = class_id; +#endif + +#ifdef __SWITCH__ + if (cmdbuf->num_cmdbufs == 0) + av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(class_id, 0, 0)); +#endif + + return 0; +} + +int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf) { + cmdbuf->num_cmdbufs++; + return 0; +} + +int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word) { + uintptr_t mem_start = (uintptr_t)av_nvtegra_map_get_addr(cmdbuf->map) + cmdbuf->mem_offset; + + if ((uintptr_t)cmdbuf->cur_word - mem_start >= cmdbuf->mem_size) + return AVERROR(ENOMEM); + + *cmdbuf->cur_word++ = word; + cmdbuf->cmdbufs[cmdbuf->num_cmdbufs].words += 1; + return 0; +} + +int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word) { + int err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_incr(NV_THI_METHOD0>>2, 2)); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, offset); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, word); + if (err < 0) + return err; + + return 0; +} + +int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset, + int reloc_type, int shift) +{ + int err; +#ifndef __SWITCH__ + uint8_t *mem; + void *tmp1, *tmp2, *tmp3; + + mem = av_nvtegra_map_get_addr(cmdbuf->map); + + tmp1 = av_realloc_array(cmdbuf->relocs, cmdbuf->num_relocs + 1, sizeof(*cmdbuf->relocs)); + tmp2 = av_realloc_array(cmdbuf->reloc_types, cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_types)); + tmp3 = av_realloc_array(cmdbuf->reloc_shifts, cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_shifts)); + if (!tmp1 || !tmp2 || !tmp3) + return AVERROR(ENOMEM); + + cmdbuf->relocs = tmp1, cmdbuf->reloc_types = tmp2, cmdbuf->reloc_shifts = tmp3; + + err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, 0xdeadbeef); + if (err < 0) + return err; + + cmdbuf->relocs[cmdbuf->num_relocs] = (struct nvhost_reloc){ + .cmdbuf_mem = av_nvtegra_map_get_handle(cmdbuf->map), + .cmdbuf_offset = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t), + .target = av_nvtegra_map_get_handle(target), + .target_offset = target_offset, + }; + + cmdbuf->reloc_types[cmdbuf->num_relocs] = (struct nvhost_reloc_type){ + .reloc_type = reloc_type, + }; + + cmdbuf->reloc_shifts[cmdbuf->num_relocs] = (struct nvhost_reloc_shift){ + .shift = shift, + }; + + cmdbuf->num_relocs++; + + return 0; +#else + err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, (target->iova + target_offset) >> shift); + if (err < 0) + return err; + + return 0; +#endif +} + +int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt) { + int err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_nonincr(NV_THI_INCR_SYNCPT>>2, 1)); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, + AV_NVTEGRA_VALUE(NV_THI_INCR_SYNCPT, INDX, syncpt) | + AV_NVTEGRA_ENUM (NV_THI_INCR_SYNCPT, COND, OP_DONE)); + if (err < 0) + return err; + + return 0; +} + +int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) { + int err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0)); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_mask(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD>>2, + (1<<(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD)) | + (1<<(NV_CLASS_HOST_WAIT_SYNCPT - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD)))); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, fence); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_push_word(cmdbuf, syncpt); + if (err < 0) + return err; + + return 0; +} + +int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) +{ + void *tmp1; +#ifndef __SWITCH__ + void *tmp2; +#endif + + tmp1 = av_realloc_array(cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->syncpt_incrs)); +#ifndef __SWITCH__ + tmp2 = av_realloc_array(cmdbuf->fences, cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->fences)); +#endif + +#ifndef __SWITCH__ + if (!tmp1 || !tmp2) +#else + if (!tmp1) +#endif + return AVERROR(ENOMEM); + + cmdbuf->syncpt_incrs = tmp1; +#ifndef __SWITCH__ + cmdbuf->fences = tmp2; +#endif + + cmdbuf->syncpt_incrs[cmdbuf->num_syncpt_incrs] = (struct nvhost_syncpt_incr){ + .syncpt_id = syncpt, + .syncpt_incrs = 1, + }; + +#ifndef __SWITCH__ + cmdbuf->fences[cmdbuf->num_syncpt_incrs] = fence; +#endif + + cmdbuf->num_syncpt_incrs++; + + return av_nvtegra_cmdbuf_push_syncpt_incr(cmdbuf, syncpt); +} + +int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) { +#ifndef __SWITCH__ + uint8_t *mem; + void *tmp; + + mem = av_nvtegra_map_get_addr(cmdbuf->map); + + tmp = av_realloc_array(cmdbuf->waitchks, cmdbuf->num_waitchks + 1, sizeof(*cmdbuf->waitchks)); + if (!tmp) + return AVERROR(ENOMEM); + + cmdbuf->waitchks = tmp; + + cmdbuf->waitchks[cmdbuf->num_waitchks] = (struct nvhost_waitchk){ + .mem = av_nvtegra_map_get_handle(cmdbuf->map), + .offset = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t), + .syncpt_id = syncpt, + .thresh = fence, + }; + + cmdbuf->num_waitchks++; +#endif + + return av_nvtegra_cmdbuf_push_wait(cmdbuf, syncpt, fence); +} + +static void nvtegra_job_free(void *opaque, uint8_t *data) { + AVNVTegraJob *job = (AVNVTegraJob *)data; + + if (!job) + return; + + av_nvtegra_cmdbuf_deinit(&job->cmdbuf); + av_nvtegra_map_destroy(&job->input_map); + + av_freep(&job); +} + +static AVBufferRef *nvtegra_job_alloc(void *opaque, size_t size) { + AVNVTegraJobPool *pool = opaque; + + AVBufferRef *buffer; + AVNVTegraJob *job; + int err; + + job = av_mallocz(sizeof(*job)); + if (!job) + return NULL; + + err = av_nvtegra_map_create(&job->input_map, pool->channel, pool->input_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + err = av_nvtegra_cmdbuf_init(&job->cmdbuf); + if (err < 0) + goto fail; + + err = av_nvtegra_cmdbuf_add_memory(&job->cmdbuf, &job->input_map, pool->cmdbuf_off, pool->max_cmdbuf_size); + if (err < 0) + goto fail; + + buffer = av_buffer_create((uint8_t *)job, sizeof(*job), nvtegra_job_free, pool, 0); + if (!buffer) + goto fail; + + return buffer; + +fail: + av_nvtegra_cmdbuf_deinit(&job->cmdbuf); + av_nvtegra_map_destroy(&job->input_map); + av_freep(job); + return NULL; +} + +int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel, + size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size) +{ + pool->channel = channel; + pool->input_map_size = input_map_size; + pool->cmdbuf_off = cmdbuf_off; + pool->max_cmdbuf_size = max_cmdbuf_size; + pool->pool = av_buffer_pool_init2(sizeof(AVNVTegraJob), pool, + nvtegra_job_alloc, NULL); + if (!pool->pool) + return AVERROR(ENOMEM); + + return 0; +} + +int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool) { + av_buffer_pool_uninit(&pool->pool); + return 0; +} + +AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool) { + return av_buffer_pool_get(pool->pool); +} + +int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job) { + return av_nvtegra_channel_submit(pool->channel, &job->cmdbuf, &job->fence); +} + +int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout) { + return av_nvtegra_syncpt_wait(pool->channel, job->fence, timeout); +} diff --git a/libavutil/nvtegra.h b/libavutil/nvtegra.h new file mode 100644 index 0000000000..3b63335d6c --- /dev/null +++ b/libavutil/nvtegra.h @@ -0,0 +1,258 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef AVUTIL_NVTEGRA_H +#define AVUTIL_NVTEGRA_H + +#include +#include + +#include "buffer.h" + +#include "nvhost_ioctl.h" +#include "nvmap_ioctl.h" + +typedef struct AVNVTegraChannel { +#ifndef __SWITCH__ + int fd; + int module_id; +#else + NvChannel channel; +#endif + + uint32_t syncpt; + +#ifdef __SWITCH__ + MmuRequest mmu_request; +#endif + uint32_t clock; +} AVNVTegraChannel; + +typedef struct AVNVTegraMap { +#ifndef __SWITCH__ + uint32_t handle; + uint32_t size; + void *cpu_addr; +#else + NvMap map; + uint32_t iova; + uint32_t owner; +#endif + bool is_linear; +} AVNVTegraMap; + +typedef struct AVNVTegraCmdbuf { + AVNVTegraMap *map; + + uint32_t mem_offset, mem_size; + + uint32_t *cur_word; + + struct nvhost_cmdbuf *cmdbufs; +#ifndef __SWITCH__ + struct nvhost_cmdbuf_ext *cmdbuf_exts; + uint32_t *class_ids; +#endif + uint32_t num_cmdbufs; + +#ifndef __SWITCH__ + struct nvhost_reloc *relocs; + struct nvhost_reloc_type *reloc_types; + struct nvhost_reloc_shift *reloc_shifts; + uint32_t num_relocs; +#endif + + struct nvhost_syncpt_incr *syncpt_incrs; +#ifndef __SWITCH__ + uint32_t *fences; +#endif + uint32_t num_syncpt_incrs; + +#ifndef __SWITCH__ + struct nvhost_waitchk *waitchks; + uint32_t num_waitchks; +#endif +} AVNVTegraCmdbuf; + +typedef struct AVNVTegraJobPool { + /* + * Pool object for job allocation + */ + AVBufferPool *pool; + + /* + * Hardware channel the jobs will be submitted to + */ + AVNVTegraChannel *channel; + + /* + * Total size of the input memory-mapped buffer + */ + size_t input_map_size; + + /* + * Offset of the command data within the input map + */ + off_t cmdbuf_off; + + /* + * Maximum memory usable by the command buffer + */ + size_t max_cmdbuf_size; +} AVNVTegraJobPool; + +typedef struct AVNVTegraJob { + /* + * Memory-mapped buffer for command buffers, metadata structures, ... + */ + AVNVTegraMap input_map; + + /* + * Object for command recording + */ + AVNVTegraCmdbuf cmdbuf; + + /* + * Fence indicating completion of the job + */ + uint32_t fence; +} AVNVTegraJob; + +AVBufferRef *av_nvtegra_driver_init(void); + +int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev); +int av_nvtegra_channel_close(AVNVTegraChannel *channel); +int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate); +int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate); +int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence); +int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms); + +int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout); + +int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size, + uint32_t align, int heap_mask, int flags); +int av_nvtegra_map_free(AVNVTegraMap *map); +int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem, + uint32_t size, uint32_t align, uint32_t flags); +int av_nvtegra_map_close(AVNVTegraMap *map); +int av_nvtegra_map_map(AVNVTegraMap *map); +int av_nvtegra_map_unmap(AVNVTegraMap *map); +int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len); +int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align, int heap_mask, int flags); + +static inline int av_nvtegra_map_create(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size, uint32_t align, + int heap_mask, int flags) +{ + int err; + + err = av_nvtegra_map_allocate(map, owner, size, align, heap_mask, flags); + if (err < 0) + return err; + + return av_nvtegra_map_map(map); +} + +static inline int av_nvtegra_map_destroy(AVNVTegraMap *map) { + int err; + + err = av_nvtegra_map_unmap(map); + if (err < 0) + return err; + + return av_nvtegra_map_free(map); +} + +int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf); +int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf); +int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size); +int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf); +int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id); +int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf); +int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word); +int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word); +int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset, + int reloc_type, int shift); +int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt); +int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence); +int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence); +int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence); + +/* + * Job allocation and submission routines + */ +int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel, + size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size); +int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool); +AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool); + +int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job); +int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout); + +static inline uint32_t av_nvtegra_map_get_handle(AVNVTegraMap *map) { +#ifndef __SWITCH__ + return map->handle; +#else + return map->map.handle; +#endif +} + +static inline void *av_nvtegra_map_get_addr(AVNVTegraMap *map) { +#ifndef __SWITCH__ + return map->cpu_addr; +#else + return map->map.cpu_addr; +#endif +} + +static inline uint32_t av_nvtegra_map_get_size(AVNVTegraMap *map) { +#ifndef __SWITCH__ + return map->size; +#else + return map->map.size; +#endif +} + +/* Addresses are shifted by 8 bits in the command buffer, requiring an alignment to 256 */ +#define AV_NVTEGRA_MAP_ALIGN (1 << 8) + +#define AV_NVTEGRA_VALUE(offset, field, value) \ + ((value & \ + ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \ + << (0?offset ## _ ## field)) + +#define AV_NVTEGRA_ENUM(offset, field, value) \ + ((offset ## _ ## field ## _ ## value & \ + ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \ + << (0?offset ## _ ## field)) + +#define AV_NVTEGRA_PUSH_VALUE(cmdbuf, offset, value) ({ \ + int _err = av_nvtegra_cmdbuf_push_value(cmdbuf, (offset) / sizeof(uint32_t), value); \ + if (_err < 0) \ + return _err; \ +}) + +#define AV_NVTEGRA_PUSH_RELOC(cmdbuf, offset, target, target_offset, type) ({ \ + int _err = av_nvtegra_cmdbuf_push_reloc(cmdbuf, (offset) / sizeof(uint32_t), \ + target, target_offset, type, 8); \ + if (_err < 0) \ + return _err; \ +}) + +#endif /* AVUTIL_NVTEGRA_H */ diff --git a/libavutil/nvtegra_host1x.h b/libavutil/nvtegra_host1x.h new file mode 100644 index 0000000000..25e37eae61 --- /dev/null +++ b/libavutil/nvtegra_host1x.h @@ -0,0 +1,94 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef AVUTIL_NVTEGRA_HOST1X_H +#define AVUTIL_NVTEGRA_HOST1X_H + +#include + +#include "macros.h" + +/* From L4T include/linux/host1x.h */ +enum host1x_class { + HOST1X_CLASS_HOST1X = 0x01, + HOST1X_CLASS_NVENC = 0x21, + HOST1X_CLASS_VI = 0x30, + HOST1X_CLASS_ISPA = 0x32, + HOST1X_CLASS_ISPB = 0x34, + HOST1X_CLASS_GR2D = 0x51, + HOST1X_CLASS_GR2D_SB = 0x52, + HOST1X_CLASS_VIC = 0x5d, + HOST1X_CLASS_GR3D = 0x60, + HOST1X_CLASS_NVJPG = 0xc0, + HOST1X_CLASS_NVDEC = 0xf0, +}; + +static inline uint32_t host1x_opcode_setclass(unsigned class_id, unsigned offset, unsigned mask) { + return (0 << 28) | (offset << 16) | (class_id << 6) | mask; +} + +static inline uint32_t host1x_opcode_incr(unsigned offset, unsigned count) { + return (1 << 28) | (offset << 16) | count; +} + +static inline uint32_t host1x_opcode_nonincr(unsigned offset, unsigned count) { + return (2 << 28) | (offset << 16) | count; +} + +static inline uint32_t host1x_opcode_mask(unsigned offset, unsigned mask) { + return (3 << 28) | (offset << 16) | mask; +} + +static inline uint32_t host1x_opcode_imm(unsigned offset, unsigned value) { + return (4 << 28) | (offset << 16) | value; +} + +#define NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD (0x00000138) +#define NV_CLASS_HOST_WAIT_SYNCPT (0x00000140) + +#define NV_THI_INCR_SYNCPT (0x00000000) +#define NV_THI_INCR_SYNCPT_INDX 7:0 +#define NV_THI_INCR_SYNCPT_COND 15:8 +#define NV_THI_INCR_SYNCPT_COND_IMMEDIATE (0x00000000) +#define NV_THI_INCR_SYNCPT_COND_OP_DONE (0x00000001) +#define NV_THI_INCR_SYNCPT_ERR (0x00000008) +#define NV_THI_INCR_SYNCPT_ERR_COND_STS_IMM 0:0 +#define NV_THI_INCR_SYNCPT_ERR_COND_STS_OPDONE 1:1 +#define NV_THI_CTXSW_INCR_SYNCPT (0x0000000c) +#define NV_THI_CTXSW_INCR_SYNCPT_INDX 7:0 +#define NV_THI_CTXSW (0x00000020) +#define NV_THI_CTXSW_CURR_CLASS 9:0 +#define NV_THI_CTXSW_AUTO_ACK 11:11 +#define NV_THI_CTXSW_CURR_CHANNEL 15:12 +#define NV_THI_CTXSW_NEXT_CLASS 25:16 +#define NV_THI_CTXSW_NEXT_CHANNEL 31:28 +#define NV_THI_CONT_SYNCPT_EOF (0x00000028) +#define NV_THI_CONT_SYNCPT_EOF_INDEX 7:0 +#define NV_THI_CONT_SYNCPT_EOF_COND 8:8 +#define NV_THI_METHOD0 (0x00000040) +#define NV_THI_METHOD0_OFFSET 11:0 +#define NV_THI_METHOD1 (0x00000044) +#define NV_THI_METHOD1_DATA 31:0 +#define NV_THI_INT_STATUS (0x00000078) +#define NV_THI_INT_STATUS_FALCON_INT 0:0 +#define NV_THI_INT_MASK (0x0000007c) +#define NV_THI_INT_MASK_FALCON_INT 0:0 + +#endif /* AVUTIL_NVTEGRA_HOST1X_H */ diff --git a/libavutil/pixdesc.c b/libavutil/pixdesc.c index 1c0bcf2232..bb14b1b306 100644 --- a/libavutil/pixdesc.c +++ b/libavutil/pixdesc.c @@ -2791,6 +2791,10 @@ static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = { }, .flags = AV_PIX_FMT_FLAG_PLANAR, }, + [AV_PIX_FMT_NVTEGRA] = { + .name = "nvtegra", + .flags = AV_PIX_FMT_FLAG_HWACCEL, + }, }; static const char * const color_range_names[] = { diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h index a7f50e1690..a3213c792a 100644 --- a/libavutil/pixfmt.h +++ b/libavutil/pixfmt.h @@ -439,6 +439,14 @@ enum AVPixelFormat { */ AV_PIX_FMT_D3D12, + /** + * Hardware surfaces for Tegra devices. + * + * data[0..2] points to memory-mapped buffer containing frame data + * buf[0] contains an AVBufferRef to an AVNTegraMap + */ + AV_PIX_FMT_NVTEGRA, + AV_PIX_FMT_NB ///< number of pixel formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions }; From patchwork Thu May 30 19:43:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49416 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67314vqg; Thu, 30 May 2024 12:45:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWBgi04vtfP5ceAR9brE8D/2IHL5rLfivAUhkkwMJcTBKu5s4J4CE+eG2eD218dMOkA+EqcJrYXvvoFm/4IgaF7zQgbfOCz6/3P8A== X-Google-Smtp-Source: AGHT+IE4ODbU89UxYLYJQrzcfLcm1E59o2AU0mjpWRNQ6l5zXA/cuTLgbGJ5p6hWOoACXJOBSmcx X-Received: by 2002:a17:906:8453:b0:a63:d9cb:9f48 with SMTP id a640c23a62f3a-a65e9109566mr190513566b.66.1717098307657; Thu, 30 May 2024 12:45:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098307; cv=none; d=google.com; s=arc-20160816; b=LnMd8U2aaBfPdb05aLVah+O4jCjbEKkyFLNIjSB1Plf9n+DHS62G+NcNVW5SjGxDVE chZa7zFlzN8qoMGF9NSLKUCNbkyZ1fvUu2M6AtTs+BPPDKQuIo2t1E9B25osPKXloppA +//aKgCE+iMWHyN4bzajJC8o4JXOMpDsRfOVyzhb1s2T1c2NPIO3lVN5LM+FkN3b7MZJ pxIAXJECekwWgcgKxvwgPCPgKlyZbo4mzboHozqXb0QCfG8zlsHf/QtnW/uvV66r/HId 6YVy6X5pxj+dBsj+GLA1KQ6OIUxFTOXK1Zuq2okejdVB4A6L4vfJIgtZXgpdCgcAhdMq KwEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=RzRSgS5CqYC8LKRUuVyMWPpTyKLajHrVYk83qx/UpcU=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=MVNpDMLAtCwsvOoPss/qleiVxgEvGi8f/DASYVoLmfk9pdvnj8wauSF8scDNwejH4u DTb5wBxQ4s0zP/NQPmebqyd56pJWyWaAHl7rBzib5fW6cQtxKKT25mmrEjAOD3urUg5A l+QgThDkvkTUH1WcmU6Lf7xssqBWcKP/pAdo3qlJyzKQ3E2IcmksttiJOmml144d3oFG 0Znk5IoZmLLlZNvQWjQ6Jn7EzxZt5lGwFGfZW/fGydWeWUWBxzvG0K3Z9xnGtd1lwtTC LaNFwZ3Dt3GFvKoc5RxIFOdhfSd7gVu40/7J1dM+oGquxays6LfPhpB4gIBWzlEy9Qmr UPKg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Qt94wSuT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67eae70292si8715866b.767.2024.05.30.12.45.06; Thu, 30 May 2024 12:45:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Qt94wSuT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D187768D5B2; Thu, 30 May 2024 22:44:39 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B229068D59F for ; Thu, 30 May 2024 22:44:36 +0300 (EEST) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-35dd0c06577so136891f8f.2 for ; Thu, 30 May 2024 12:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098276; x=1717703076; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=M8glET+YTeJE34CHE4Dck1KfiCfaGQn+YuXrilkbPaA=; b=Qt94wSuTYkZl0VPffXSradBNiLOvWnbpLVzzIiFdi1EJnf/uV1B2eorYQ54wQRiQje F+E4EkYO/zYLEPeHEIMbWgKnLym7X+tI9O5zoc3BiL9Tr+a4slj473P4fJvyT0gZNRnx 03rbP/k9qrSZwcz0UmUKIPHuqwkjBBXiipgz1WWvZTeJiG0GA9NWQGcDA5U06+xWL3fq BUp0ew5CcIObXGQQcDeX34BfAy9UrIIeRo6FfYuJYdSwk1uta84JMtWMYhnRtxqDZLFW tJhIZRqLg95yx55H3AXMCliPCKeR+FV2p9R9YStUWNB645vXfdhyiTqMLOPAOhOz8qPN BGMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098276; x=1717703076; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M8glET+YTeJE34CHE4Dck1KfiCfaGQn+YuXrilkbPaA=; b=SsKJlZaU7BHLmzTo476u+Us+tPTZA2AtTRZkqZS3xyMNf1Eso8s8x25PO4byujXpjA ZJjCSWHtQQx2RTquQlbNozzlklkOzV1eOF/kcrK9ilI+SBPqb7XflyU6qm23JriCTRk3 Dj6WrPsRGyqZuxOqqilrHUgImZisYmP+GZojMwooSuX+WF+3fypucbsl8YgesG1Z4NdC VXmHXQucabLyqo3gSC6++hq9m3kHQp2J3qqQPUDhAd2ISfB19VjNk1nyNmIldUFTB+bO t0bxeIU/Vuj4sAU9gg+se2ofdkAyEyTICD30lj7oOzp7HmUmRo75MHDJ0/27PW2E1DzZ vCqQ== X-Gm-Message-State: AOJu0YybmoxrtEOD6lGHtQ6/4s/JrF45bxCjolyvF4+1FJS2e1i7VE+Q Q2CCJJpLHEMkyTeFWBFmmPViOAEfCXqDguQSo4poERuVUWo/05lpG5sbig== X-Received: by 2002:a5d:4c50:0:b0:357:8bbf:3f87 with SMTP id ffacd0b85a97d-35dc02c13f7mr2210811f8f.60.1717098275613; Thu, 30 May 2024 12:44:35 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:35 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:08 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6B2XoIl+WwkE This includes hwdevice and hwframes objects. As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented. The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes. The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied. Signed-off-by: averne --- libavutil/Makefile | 7 +- libavutil/hwcontext.c | 4 + libavutil/hwcontext.h | 1 + libavutil/hwcontext_internal.h | 1 + libavutil/hwcontext_nvtegra.c | 880 +++++++++++++++++++++++++++++++++ libavutil/hwcontext_nvtegra.h | 85 ++++ 6 files changed, 976 insertions(+), 2 deletions(-) create mode 100644 libavutil/hwcontext_nvtegra.c create mode 100644 libavutil/hwcontext_nvtegra.h diff --git a/libavutil/Makefile b/libavutil/Makefile index 733a23a8a3..44cd3f0dda 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -52,6 +52,7 @@ HEADERS = adler32.h \ hwcontext_videotoolbox.h \ hwcontext_vdpau.h \ hwcontext_vulkan.h \ + hwcontext_nvtegra.h \ nvtegra.h \ nvhost_ioctl.h \ nvmap_ioctl.h \ @@ -210,7 +211,7 @@ OBJS-$(CONFIG_VDPAU) += hwcontext_vdpau.o OBJS-$(CONFIG_VULKAN) += hwcontext_vulkan.o vulkan.o OBJS-$(!CONFIG_VULKAN) += hwcontext_stub.o -OBJS-$(CONFIG_NVTEGRA) += nvtegra.o +OBJS-$(CONFIG_NVTEGRA) += nvtegra.o hwcontext_nvtegra.o OBJS += $(COMPAT_OBJS:%=../compat/%) @@ -233,7 +234,9 @@ SKIPHEADERS-$(CONFIG_VULKAN) += hwcontext_vulkan.h vulkan.h \ vulkan_functions.h \ vulkan_loader.h SKIPHEADERS-$(CONFIG_NVTEGRA) += nvtegra.h \ - nvtegra_host1x.h + nvtegra_host1x.h \ + hwcontext_nvtegra.h + TESTPROGS = adler32 \ aes \ diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c index fa99a0d8a4..8dd05147a4 100644 --- a/libavutil/hwcontext.c +++ b/libavutil/hwcontext.c @@ -65,6 +65,9 @@ static const HWContextType * const hw_table[] = { #endif #if CONFIG_VULKAN &ff_hwcontext_type_vulkan, +#endif +#if CONFIG_NVTEGRA + &ff_hwcontext_type_nvtegra, #endif NULL, }; @@ -82,6 +85,7 @@ static const char *const hw_type_names[] = { [AV_HWDEVICE_TYPE_VIDEOTOOLBOX] = "videotoolbox", [AV_HWDEVICE_TYPE_MEDIACODEC] = "mediacodec", [AV_HWDEVICE_TYPE_VULKAN] = "vulkan", + [AV_HWDEVICE_TYPE_NVTEGRA] = "nvtegra", }; typedef struct FFHWDeviceContext { diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h index bac30debae..d506281784 100644 --- a/libavutil/hwcontext.h +++ b/libavutil/hwcontext.h @@ -38,6 +38,7 @@ enum AVHWDeviceType { AV_HWDEVICE_TYPE_MEDIACODEC, AV_HWDEVICE_TYPE_VULKAN, AV_HWDEVICE_TYPE_D3D12VA, + AV_HWDEVICE_TYPE_NVTEGRA, }; /** diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h index e32b786238..478583abdd 100644 --- a/libavutil/hwcontext_internal.h +++ b/libavutil/hwcontext_internal.h @@ -163,5 +163,6 @@ extern const HWContextType ff_hwcontext_type_vdpau; extern const HWContextType ff_hwcontext_type_videotoolbox; extern const HWContextType ff_hwcontext_type_mediacodec; extern const HWContextType ff_hwcontext_type_vulkan; +extern const HWContextType ff_hwcontext_type_nvtegra; #endif /* AVUTIL_HWCONTEXT_INTERNAL_H */ diff --git a/libavutil/hwcontext_nvtegra.c b/libavutil/hwcontext_nvtegra.c new file mode 100644 index 0000000000..0f4d5a323b --- /dev/null +++ b/libavutil/hwcontext_nvtegra.c @@ -0,0 +1,880 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "config.h" +#include "pixdesc.h" +#include "imgutils.h" +#include "internal.h" +#include "mem.h" +#include "time.h" + +#include "hwcontext.h" +#include "hwcontext_internal.h" + +#include "nvhost_ioctl.h" +#include "nvmap_ioctl.h" +#include "nvtegra_host1x.h" +#include "clb0b6.h" +#include "vic_drv.h" + +#include "hwcontext_nvtegra.h" + +typedef struct NVTegraDevicePriv { + /* The public AVNVTegraDeviceContext */ + AVNVTegraDeviceContext p; + + AVBufferRef *driver_state_ref; + + AVNVTegraJobPool job_pool; + uint32_t vic_setup_off, vic_cmdbuf_off; +} NVTegraDevicePriv; + +static const enum AVPixelFormat supported_sw_formats[] = { + AV_PIX_FMT_GRAY8, + AV_PIX_FMT_NV12, + AV_PIX_FMT_P010, + AV_PIX_FMT_YUV420P, +}; + +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt) { + switch (fmt) { + case AV_PIX_FMT_GRAY8: + return NVB0B6_T_L8; + case AV_PIX_FMT_NV12: + return NVB0B6_T_Y8___U8V8_N420; + case AV_PIX_FMT_YUV420P: + return NVB0B6_T_Y8___U8___V8_N420; + case AV_PIX_FMT_RGB565: + return NVB0B6_T_R5G6B5; + case AV_PIX_FMT_RGB32: + return NVB0B6_T_A8R8G8B8; + case AV_PIX_FMT_BGR32: + return NVB0B6_T_A8B8G8R8; + case AV_PIX_FMT_RGB32_1: + return NVB0B6_T_R8G8B8A8; + case AV_PIX_FMT_BGR32_1: + return NVB0B6_T_B8G8R8A8; + case AV_PIX_FMT_0RGB32: + return NVB0B6_T_X8R8G8B8; + case AV_PIX_FMT_0BGR32: + return NVB0B6_T_X8B8G8R8; + default: + return -1; + } +} + +static inline uint32_t nvtegra_surface_get_width_align(enum AVPixelFormat fmt, const AVComponentDescriptor *comp) { + int step = comp->step; + + if (fmt != AV_PIX_FMT_NVTEGRA) + return 256 / step; /* Pitch linear surfaces must be aligned to 256B for VIC */ + + /* + * GOBs are 64B wide. + * In addition, we use a 32Bx8 cache width in VIC for block linear surfaces. + */ + return 64 / step; +} + +static inline uint32_t nvtegra_surface_get_height_align(enum AVPixelFormat fmt, const AVComponentDescriptor *comp) { + /* Height alignment is in terms of lines, not bytes, therefore we don't divide by the sample step */ + if (fmt != AV_PIX_FMT_NVTEGRA) + return 4; /* We use 64Bx4 cache width in VIC for pitch linear surfaces */ + + /* + * GOBs are 8B high, and we use a GOB height of 2. + * In addition, we use a 32Bx8 cache width in VIC for block linear surfaces. + * We double this requirement to make sure it is respected for the subsampled chroma plane. + */ + return 32; +} + +static void nvtegra_device_uninit(AVHWDeviceContext *ctx) { + NVTegraDevicePriv *priv = ctx->hwctx; + AVNVTegraDeviceContext *hwctx = &priv->p; + + av_log(ctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA device\n"); + + av_nvtegra_job_pool_uninit(&priv->job_pool); + + if (hwctx->nvdec_version) { + av_nvtegra_channel_close(&hwctx->nvdec_channel); +#ifdef __SWITCH__ + mmuRequestFinalize(&hwctx->nvdec_channel.mmu_request); +#endif + } + + if (hwctx->nvjpg_version) { + av_nvtegra_channel_close(&hwctx->nvjpg_channel); +#ifdef __SWITCH__ + mmuRequestFinalize(&hwctx->nvjpg_channel.mmu_request); +#endif + } + + av_nvtegra_channel_close(&hwctx->vic_channel); + + av_buffer_unref(&priv->driver_state_ref); +} + +/* + * Hardware modules on the Tegra X1 (see t210.c in l4t kernel sources) + * - nvdec v2.0 + * - nvenc v5.0 + * - nvjpg v1.0 + * - vic v4.0 + */ + +static int nvtegra_device_init(AVHWDeviceContext *ctx) { + NVTegraDevicePriv *priv = ctx->hwctx; + AVNVTegraDeviceContext *hwctx = &priv->p; + + uint32_t vic_map_size; + int err; + + av_log(ctx, AV_LOG_DEBUG, "Initializing NVTEGRA device\n"); + + err = av_nvtegra_channel_open(&hwctx->nvdec_channel, "/dev/nvhost-nvdec"); + if (!err) + hwctx->nvdec_version = AV_NVTEGRA_ENCODE_REV(2,0); + + err = av_nvtegra_channel_open(&hwctx->nvjpg_channel, "/dev/nvhost-nvjpg"); + if (!err) + hwctx->nvjpg_version = AV_NVTEGRA_ENCODE_REV(1,0); + + err = av_nvtegra_channel_open(&hwctx->vic_channel, "/dev/nvhost-vic"); + if (err < 0) + goto fail; + + hwctx->vic_version = AV_NVTEGRA_ENCODE_REV(4,0); + + /* Note: Official code only sets this for the nvdec channel */ + if (hwctx->nvdec_version) { + err = av_nvtegra_channel_set_submit_timeout(&hwctx->nvdec_channel, 1000); + if (err < 0) + goto fail; + } + + if (hwctx->nvjpg_version) { + err = av_nvtegra_channel_set_submit_timeout(&hwctx->nvjpg_channel, 1000); + if (err < 0) + goto fail; + } + + priv->vic_setup_off = 0; + priv->vic_cmdbuf_off = FFALIGN(priv->vic_setup_off + sizeof(VicConfigStruct), + AV_NVTEGRA_MAP_ALIGN); + vic_map_size = FFALIGN(priv->vic_cmdbuf_off + AV_NVTEGRA_MAP_ALIGN, + 0x1000); + + err = av_nvtegra_job_pool_init(&priv->job_pool, &hwctx->vic_channel, vic_map_size, + priv->vic_cmdbuf_off, vic_map_size - priv->vic_cmdbuf_off); + if (err < 0) + goto fail; + +#ifndef __SWITCH__ + hwctx->nvdec_channel.module_id = 0x75; + hwctx->nvjpg_channel.module_id = 0x76; +#else + /* + * The NVHOST_IOCTL_CHANNEL_SET_CLK_RATE ioctl also exists on HOS but the clock rate + * will be reset when the console goes to sleep. + */ + if (hwctx->nvdec_version) { + err = AVERROR(mmuRequestInitialize(&hwctx->nvdec_channel.mmu_request, (MmuModuleId)5, 8, false)); + if (err < 0) + goto fail; + } + + if (hwctx->nvjpg_version) { + err = AVERROR(mmuRequestInitialize(&hwctx->nvjpg_channel.mmu_request, MmuModuleId_Nvjpg, 8, false)); + if (err < 0) + goto fail; + } +#endif + + return 0; + +fail: + nvtegra_device_uninit(ctx); + return err; +} + +static int nvtegra_device_create(AVHWDeviceContext *ctx, const char *device, + AVDictionary *opts, int flags) +{ + NVTegraDevicePriv *priv = ctx->hwctx; + + av_log(ctx, AV_LOG_DEBUG, "Creating NVTEGRA device\n"); + + priv->driver_state_ref = av_nvtegra_driver_init(); + if (!priv->driver_state_ref) { + av_log(ctx, AV_LOG_ERROR, "Failed to create driver context, " + "make sure you are using a Tegra device\n"); + return AVERROR(ENOSYS); + } + + return 0; +} + +static int nvtegra_frames_get_constraints(AVHWDeviceContext *ctx, const void *hwconfig, + AVHWFramesConstraints *constraints) +{ + av_log(ctx, AV_LOG_DEBUG, "Getting frame constraints for NVTEGRA device\n"); + + constraints->valid_sw_formats = av_malloc_array(FF_ARRAY_ELEMS(supported_sw_formats) + 1, + sizeof(*constraints->valid_sw_formats)); + if (!constraints->valid_sw_formats) + return AVERROR(ENOMEM); + + for (int i = 0; i < FF_ARRAY_ELEMS(supported_sw_formats); ++i) + constraints->valid_sw_formats[i] = supported_sw_formats[i]; + constraints->valid_sw_formats[FF_ARRAY_ELEMS(supported_sw_formats)] = AV_PIX_FMT_NONE; + + constraints->valid_hw_formats = av_malloc_array(2, sizeof(*constraints->valid_hw_formats)); + if (!constraints->valid_hw_formats) + return AVERROR(ENOMEM); + + constraints->valid_hw_formats[0] = AV_PIX_FMT_NVTEGRA; + constraints->valid_hw_formats[1] = AV_PIX_FMT_NONE; + + return 0; +} + +static void nvtegra_map_free(void *opaque, uint8_t *data) { + AVNVTegraMap *map = (AVNVTegraMap *)data; + + if (!map) + return; + + av_nvtegra_map_destroy(map); + + av_freep(&map); +} + +static void nvtegra_frame_free(void *opaque, uint8_t *data) { + AVNVTegraFrame *frame = (AVNVTegraFrame *)data; + + if (!frame) + return; + + av_buffer_unref(&frame->map_ref); + + av_freep(&frame); +} + +static AVBufferRef *nvtegra_pool_alloc(void *opaque, size_t size) { + AVHWFramesContext *ctx = opaque; + AVNVTegraDeviceContext *hwctx = &((NVTegraDevicePriv *)ctx->device_ctx->hwctx)->p; + + AVBufferRef *buffer = NULL; + AVNVTegraFrame *frame = NULL; + AVNVTegraMap *map = NULL; + int err; + + av_log(ctx, AV_LOG_DEBUG, "Creating surface from NVTEGRA device\n"); + + map = av_mallocz(sizeof(*map)); + if (!map) + goto fail; + + frame = av_mallocz(sizeof(*frame)); + if (!map) + goto fail; + + /* + * Framebuffers are allocated as CPU-cacheable, since they might get copied from + * during transfer operations. Cache management is done manually. + */ + err = av_nvtegra_map_create(map, &hwctx->nvdec_channel, size, 0x100, + NVMAP_HEAP_CARVEOUT_GENERIC, NVMAP_HANDLE_CACHEABLE); + if (err < 0) + goto fail; + + /* Flush the CPU cache */ + av_nvtegra_map_cache_op(map, NVMAP_CACHE_OP_WB, av_nvtegra_map_get_addr(map), + av_nvtegra_map_get_size(map)); + + frame->map_ref = av_buffer_create((uint8_t *)map, sizeof(*map), nvtegra_map_free, ctx, 0); + if (!frame->map_ref) + goto fail; + + buffer = av_buffer_create((uint8_t *)frame, sizeof(*frame), nvtegra_frame_free, ctx, 0); + if (!buffer) + goto fail; + + return buffer; + +fail: + av_log(ctx, AV_LOG_ERROR, "Failed to create buffer\n"); + nvtegra_frame_free(opaque, (uint8_t *)frame); + return NULL; +} + +static int nvtegra_frames_init(AVHWFramesContext *ctx) { + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(ctx->sw_format); + + uint32_t width_aligned, height_aligned, size; + + av_log(ctx, AV_LOG_DEBUG, "Initializing frame pool for the NVTEGRA device\n"); + + if (!ctx->pool) { + width_aligned = FFALIGN(ctx->width, nvtegra_surface_get_width_align (ctx->format, &desc->comp[0])); + height_aligned = FFALIGN(ctx->height, nvtegra_surface_get_height_align(ctx->format, &desc->comp[0])); + + size = av_image_get_buffer_size(ctx->sw_format, width_aligned, height_aligned, + nvtegra_surface_get_width_align(ctx->format, &desc->comp[0])); + + ffhwframesctx(ctx)->pool_internal = av_buffer_pool_init2(size, ctx, nvtegra_pool_alloc, NULL); + if (!ffhwframesctx(ctx)->pool_internal) + return AVERROR(ENOMEM); + } + + return 0; +} + +static void nvtegra_frames_uninit(AVHWFramesContext *ctx) { + av_log(ctx, AV_LOG_DEBUG, "Deinitializing frame pool for the NVTEGRA device\n"); +} + +static int nvtegra_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) { + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(ctx->sw_format); + + AVNVTegraMap *map; + uint32_t width_aligned, height_aligned; + int err; + + av_log(ctx, AV_LOG_DEBUG, "Getting frame buffer for NVTEGRA device\n"); + + frame->buf[0] = av_buffer_pool_get(ctx->pool); + if (!frame->buf[0]) + return AVERROR(ENOMEM); + + map = av_nvtegra_frame_get_fbuf_map(frame); + + width_aligned = FFALIGN(ctx->width, nvtegra_surface_get_width_align (ctx->format, &desc->comp[0])); + height_aligned = FFALIGN(ctx->height, nvtegra_surface_get_height_align(ctx->format, &desc->comp[0])); + + err = av_image_fill_arrays(frame->data, frame->linesize, av_nvtegra_map_get_addr(map), + ctx->sw_format, width_aligned, height_aligned, + nvtegra_surface_get_width_align(ctx->format, &desc->comp[0])); + if (err < 0) + return err; + + frame->format = AV_PIX_FMT_NVTEGRA; + frame->width = ctx->width; + frame->height = ctx->height; + + return 0; +} + +static int nvtegra_transfer_get_formats(AVHWFramesContext *ctx, + enum AVHWFrameTransferDirection dir, + enum AVPixelFormat **formats) +{ + enum AVPixelFormat *fmts; + + av_log(ctx, AV_LOG_DEBUG, "Getting transfer formats for NVTEGRA device\n"); + + fmts = av_malloc_array(2, sizeof(**formats)); + if (!fmts) + return AVERROR(ENOMEM); + + fmts[0] = ctx->sw_format; + fmts[1] = AV_PIX_FMT_NONE; + + *formats = fmts; + return 0; +} + +static inline void nvtegra_cpu_copy_plane(void *dst, int dst_stride, + void *src, int src_stride, int h, bool from) +{ + /* + * Adapted from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/. + * We process 16x2 bytes at a time. Horizontally, this is the size of a linear atom + * in a 16Bx2 sector, conveniently also the size of a cache line and of a macroblock. + * + * NVDEC always uses a GOB height of 2 (block height of 16, in line with macroblock dimensions). + * The corresponding swizzling pattern is the following: + * y3 y2 y1 y0 x5 x4 x3 x2 x1 x0 + * x: ___x5_______x4____x3 x3 x1 x0 + * y: y3____y2 y1____y0____________ + * + * Addresses for the 4 lower bits can then be copied as-is (16 bytes). + * As a further optimization, the y0 bit is also handled within the same inner loop, + * which halves the total number of iterations. + * + * This function is declared inline with the expectation that the compiler will optimize + * the branches depending on the copy direction. + */ + + __uint128_t *src_ = src, *dst_ = dst, *src_line, *dst_line; + uint32_t ws = src_stride / sizeof(__uint128_t), wd = dst_stride / sizeof(__uint128_t), + w = FFMIN(ws, wd), offs_x = 0, offs_y = 0, offs_line; + uint32_t x_mask = -0x2e, y_mask = 0x2c; + int x, y; + + for (y = 0; y < h; y += 2) { + dst_line = dst_ + (from ? y * wd : offs_y); + src_line = src_ + (from ? offs_y : y * ws); + + offs_line = offs_x; + for (x = 0; x < w; ++x) { + dst_line[from ? x+0 : offs_line+0] = src_line[from ? offs_line+0 : x+0 ]; + dst_line[from ? x+wd : offs_line+1] = src_line[from ? offs_line+1 : x+ws]; + offs_line = (offs_line - x_mask) & x_mask; + } + + offs_y = (offs_y - y_mask) & y_mask; + + /* Wrap into next tile row */ + if (!offs_y) + offs_x += from ? src_stride : dst_stride; + } +} + +static int nvtegra_cpu_transfer_data(AVHWFramesContext *ctx, const AVFrame *dst, const AVFrame *src, + int num_planes, bool from) +{ + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(ctx->sw_format); + const AVFrame *hwframe, *swframe; + AVNVTegraMap *map; + int h, i; + + hwframe = from ? src : dst, swframe = from ? dst : src; + map = av_nvtegra_frame_get_fbuf_map(hwframe); + + if (swframe->format != ctx->sw_format) { + av_log(ctx, AV_LOG_ERROR, "Source and destination must have the same format for cpu transfers\n"); + return AVERROR(EINVAL); + } + + /* If we are transferring from a hardware frame, invalidate the CPU cache which might be stale */ + if (from) { + av_nvtegra_map_cache_op(map, NVMAP_CACHE_OP_INV, + av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map)); + } + + /* Align the height to an even size */ + h = FFALIGN(dst->height, 2); + + for (i = 0; i < num_planes; ++i) { + if (map->is_linear) { + av_image_copy_plane(dst->data[i], dst->linesize[i], src->data[i], src->linesize[i], + FFMIN(dst->linesize[i], src->linesize[i]), + h >> (i ? desc->log2_chroma_h : 0)); + } else { + /* + * Instanciate the same inlined function for both destinations, + * giving the compiler the opportunity to remove branching within the copy loops. + * (verified by decompilation at -O1 and higher for both gcc and clang) + */ + if (from) + nvtegra_cpu_copy_plane(dst->data[i], dst->linesize[i], src->data[i], src->linesize[i], + h >> (i ? desc->log2_chroma_h : 0), true); + else + nvtegra_cpu_copy_plane(dst->data[i], dst->linesize[i], src->data[i], src->linesize[i], + h >> (i ? desc->log2_chroma_h : 0), false); + } + } + + /* If we transferred to a hardware frame, flush the CPU cache to make the data visible to hardware engines */ + if (!from) { + av_nvtegra_map_cache_op(map, NVMAP_CACHE_OP_WB, + av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map)); + } + + return 0; +} + +static void nvtegra_vic_preprare_config(VicConfigStruct *config, const AVFrame *src, const AVFrame *dst, + enum AVPixelFormat fmt, bool is_16b_chroma) +{ + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt); + bool input_linear = (src->format != AV_PIX_FMT_NVTEGRA) || av_nvtegra_frame_get_fbuf_map(src)->is_linear, + output_linear = (dst->format != AV_PIX_FMT_NVTEGRA) || av_nvtegra_frame_get_fbuf_map(dst)->is_linear; + + /* + * The VIC engine has an undocumented limitation regarding height alignment, + * which should be padded to an even size. + */ + + /* Subsampled dimensions when emulating 16-bit chroma transfers, as input is always NV12 */ + int divider = !is_16b_chroma ? 1 : 2; + int src_width = src->width / divider, src_height = FFALIGN(src->height, 2) / divider, + dst_width = dst->width / divider, dst_height = FFALIGN(dst->height, 2) / divider; + + *config = (VicConfigStruct){ + .pipeConfig = { + .DownsampleHoriz = 1 << 2, /* U9.2 */ + .DownsampleVert = 1 << 2, /* U9.2 */ + }, + .outputConfig = { + .AlphaFillMode = !is_16b_chroma ? NVB0B6_DXVAHD_ALPHA_FILL_MODE_OPAQUE : + NVB0B6_DXVAHD_ALPHA_FILL_MODE_SOURCE_STREAM, + .BackgroundAlpha = 0, + .BackgroundR = 0, + .BackgroundG = 0, + .BackgroundB = 0, + .TargetRectLeft = 0, + .TargetRectRight = dst_width - 1, + .TargetRectTop = 0, + .TargetRectBottom = dst_height - 1, + }, + .outputSurfaceConfig = { + .OutPixelFormat = av_nvtegra_pixfmt_to_vic(fmt), + .OutSurfaceWidth = dst_width - 1, + .OutSurfaceHeight = dst_height - 1, + .OutBlkKind = !output_linear ? NVB0B6_BLK_KIND_GENERIC_16Bx2 : NVB0B6_BLK_KIND_PITCH, + .OutBlkHeight = !output_linear ? 1 : 0, /* GOB height 2 */ + .OutLumaWidth = (dst->linesize[0] / desc->comp[0].step) - 1, + .OutLumaHeight = FFALIGN(dst_height, !output_linear ? 32 : 2) - 1, + .OutChromaWidth = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? + -1 : (dst->linesize[1] / desc->comp[1].step) - 1, + .OutChromaHeight = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? -1 : + (FFALIGN(dst_height, !output_linear ? 32 : 2) >> desc->log2_chroma_h) - 1, + }, + .slotStruct = { + { + .slotConfig = { + .SlotEnable = 1, + .CurrentFieldEnable = 1, + .SoftClampLow = 0, + .SoftClampHigh = 1023, + .PlanarAlpha = 1023, + .ConstantAlpha = 1, + .SourceRectLeft = 0, + .SourceRectRight = (src_width - 1) << 16, /* U14.16 (for subpixel positioning) */ + .SourceRectTop = 0, + .SourceRectBottom = (src_height - 1) << 16, + .DestRectLeft = 0, + .DestRectRight = src_width - 1, + .DestRectTop = 0, + .DestRectBottom = src_height - 1, + }, + .slotSurfaceConfig = { + .SlotPixelFormat = av_nvtegra_pixfmt_to_vic(fmt), + .SlotChromaLocHoriz = ((desc->flags & AV_PIX_FMT_FLAG_RGB) || + src->chroma_location == AVCHROMA_LOC_TOPLEFT || + src->chroma_location == AVCHROMA_LOC_LEFT || + src->chroma_location == AVCHROMA_LOC_BOTTOMLEFT) ? 0 : 1, + .SlotChromaLocVert = ((desc->flags & AV_PIX_FMT_FLAG_RGB) || + src->chroma_location == AVCHROMA_LOC_TOPLEFT || + src->chroma_location == AVCHROMA_LOC_TOP) ? 0 : + (src->chroma_location == AVCHROMA_LOC_LEFT || + src->chroma_location == AVCHROMA_LOC_CENTER) ? 1 : 2, + .SlotBlkKind = !input_linear ? NVB0B6_BLK_KIND_GENERIC_16Bx2 : NVB0B6_BLK_KIND_PITCH, + .SlotBlkHeight = !input_linear ? 1 : 0, /* GOB height 2 */ + .SlotCacheWidth = !input_linear ? 1 : 3, /* 32Bx8 for block, 128Bx2 for pitch */ + .SlotSurfaceWidth = src_width - 1, + .SlotSurfaceHeight = src_height - 1, + .SlotLumaWidth = (src->linesize[0] / desc->comp[0].step) - 1, + .SlotLumaHeight = FFALIGN(src_height, !input_linear ? 32 : 2) - 1, + .SlotChromaWidth = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? + -1 : (src->linesize[1] / desc->comp[1].step) - 1, + .SlotChromaHeight = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? -1 : + (FFALIGN(src_height, !input_linear ? 32 : 2) >> desc->log2_chroma_h) - 1, + }, + }, + }, + }; +} + +static int nvtegra_vic_prepare_cmdbuf(AVHWFramesContext *ctx, AVNVTegraJobPool *pool, AVNVTegraJob *job, + const AVFrame *src, const AVFrame *dst, enum AVPixelFormat fmt, + AVNVTegraMap **plane_maps, uint32_t *plane_offsets, int num_planes) +{ + NVTegraDevicePriv *priv = ctx->device_ctx->hwctx; + AVNVTegraCmdbuf *cmdbuf = &job->cmdbuf; + + AVNVTegraMap *src_maps[4], *dst_maps[4]; + uint32_t src_map_offsets[4], dst_map_offsets[4]; + int src_reloc_type, dst_reloc_type, i, err; + +#define RELOC_VARS(frame) ({ \ + if (frame->format == AV_PIX_FMT_NVTEGRA) { \ + for (i = 0; i < FF_ARRAY_ELEMS(AV_JOIN(frame, _map_offsets)); ++i) { \ + AV_JOIN(frame, _maps )[i] = av_nvtegra_frame_get_fbuf_map(frame); \ + AV_JOIN(frame, _map_offsets)[i] = frame->data[i] - frame->data[0]; \ + } \ + AV_JOIN(frame, _reloc_type) = !av_nvtegra_frame_get_fbuf_map(frame)->is_linear ? \ + NVHOST_RELOC_TYPE_BLOCK_LINEAR : NVHOST_RELOC_TYPE_PITCH_LINEAR; \ + } else { \ + for (i = 0; i < FF_ARRAY_ELEMS(AV_JOIN(frame, _map_offsets)); ++i) { \ + AV_JOIN(frame, _maps )[i] = plane_maps [i]; \ + AV_JOIN(frame, _map_offsets)[i] = plane_offsets[i]; \ + } \ + AV_JOIN(frame, _reloc_type) = NVHOST_RELOC_TYPE_PITCH_LINEAR; \ + } \ +}) + + RELOC_VARS(src); + RELOC_VARS(dst); + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_VIC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, + AV_NVTEGRA_VALUE(NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, CONFIG_STRUCT_SIZE, sizeof(VicConfigStruct) >> 4) | + AV_NVTEGRA_VALUE(NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, GPTIMER_ON, 1) | + AV_NVTEGRA_VALUE(NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, FALCON_CONTROL, 1)); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_CONFIG_STRUCT_OFFSET, + &job->input_map, priv->vic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + + switch (fmt) { + /* 16-bit transfer emulation */ + case AV_PIX_FMT_RGB565: + /* Luma transfer */ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(0), + src_maps[0], src_map_offsets[0], src_reloc_type); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET, + dst_maps[0], dst_map_offsets[0], dst_reloc_type); + break; + case AV_PIX_FMT_RGB32: + /* Chroma transfer */ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(0), + src_maps[1], src_map_offsets[1], src_reloc_type); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET, + dst_maps[1], dst_map_offsets[1], dst_reloc_type); + break; + + /* Normal transfers */ + case AV_PIX_FMT_GRAY8: + case AV_PIX_FMT_NV12: + case AV_PIX_FMT_YUV420P: + for (i = 0; i < num_planes; ++i) { + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(0) + i * sizeof(uint32_t), + src_maps[i], src_map_offsets[i], src_reloc_type); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET + i * sizeof(uint32_t), + dst_maps[i], dst_map_offsets[i], dst_reloc_type); + } + break; + default: + return AVERROR(EINVAL); + } + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_EXECUTE, + AV_NVTEGRA_ENUM(NVB0B6_VIDEO_COMPOSITOR_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_add_syncpt_incr(cmdbuf, pool->channel->syncpt, 0); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_vic_copy_plane(AVHWFramesContext *ctx, AVNVTegraJob *job, + const AVFrame *src, const AVFrame *dst, + enum AVPixelFormat fmt, AVNVTegraMap **plane_maps, uint32_t *plane_offsets, + int num_planes, bool is_chroma) +{ + NVTegraDevicePriv *priv = ctx->device_ctx->hwctx; + + uint8_t *mem; + int err; + + mem = av_nvtegra_map_get_addr(&job->input_map); + + nvtegra_vic_preprare_config((VicConfigStruct *)(mem + priv->vic_setup_off), + src, dst, fmt, is_chroma); + + err = av_nvtegra_cmdbuf_clear(&job->cmdbuf); + if (err < 0) + return err; + + err = nvtegra_vic_prepare_cmdbuf(ctx, &priv->job_pool, job, src, dst, fmt, + plane_maps, plane_offsets, num_planes); + if (err < 0) + goto fail; + + err = av_nvtegra_job_submit(&priv->job_pool, job); + if (err < 0) + goto fail; + + err = av_nvtegra_job_wait(&priv->job_pool, job, -1); + if (err < 0) + goto fail; + +fail: + return err; +} + +static int nvtegra_vic_transfer_data(AVHWFramesContext *ctx, const AVFrame *dst, const AVFrame *src, + int num_planes, bool from) +{ + NVTegraDevicePriv *priv = ctx->device_ctx->hwctx; + AVNVTegraDeviceContext *hwctx = &priv->p; + + AVBufferRef *job_ref; + AVNVTegraJob *job; + const AVFrame *swframe; + uint8_t *map_bases[4]; + AVNVTegraMap maps[4] = {0}; + AVNVTegraMap *plane_maps[4]; + uint32_t plane_offsets[4]; + int num_maps, i, j, err; + + swframe = from ? dst : src; + + job_ref = av_nvtegra_job_pool_get(&priv->job_pool); + if (!job_ref) { + err = AVERROR(ENOMEM); + goto fail; + } + + job = (AVNVTegraJob *)job_ref->data; + + /* Create a map for each frame backing buffer */ + for (i = 0; i < FF_ARRAY_ELEMS(maps); num_maps = ++i) { + if (!swframe->buf[i]) + break; + + /* + * In order to avoid a full-frame copy on the CPU, the provided memory + * is mapped into VIC and used directly during the transfer. + * The address and size are aligned to page boundaries. + * Cache management is performed manually to not affect data outside the buffer. + */ + map_bases[i] = (uint8_t *)((uintptr_t)swframe->buf[i]->data & ~0xfff); + err = av_nvtegra_map_from_va(&maps[i], &hwctx->vic_channel, map_bases[i], + swframe->buf[i]->size + ((uintptr_t)swframe->buf[i]->data & 0xfff), + 0x100, NVMAP_HANDLE_CACHEABLE); + if (err < 0) + goto fail; + + err = av_nvtegra_map_map(&maps[i]); + if (err < 0) + goto fail; + + /* Flush-invalidate the CPU cache prior to the transfer */ + av_nvtegra_map_cache_op(&maps[i], NVMAP_CACHE_OP_WB_INV, + ((uint8_t *)av_nvtegra_map_get_addr(&maps[i])) + + ((uintptr_t)swframe->buf[i]->data & 0xfff), + swframe->buf[i]->size); + } + + /* Find the corresponding map object and its offset for each plane */ + for (i = 0; i < num_planes; ++i) { + for (j = 0; j < FF_ARRAY_ELEMS(swframe->buf); ++j) { + if ((swframe->buf[j]->data <= swframe->data[i]) && + (swframe->data[i] < swframe->buf[j]->data + swframe->buf[j]->size)) + break; + } + + plane_maps [i] = &maps[j]; + plane_offsets[i] = swframe->data[i] - map_bases[j]; + } + + /* VIC expects planes in the reversed order */ + if (swframe->format == AV_PIX_FMT_YUV420P) { + FFSWAP(AVNVTegraMap *, plane_maps [1], plane_maps [2]); + FFSWAP(uint32_t, plane_offsets[1], plane_offsets[2]); + } + + /* + * VIC2 does not support 16-bit YUV surfaces (eg. P010, P012, ...). + * Here we emulate them using two separates transfers for the luma and chroma planes + * (16-bit and 32-bit widths respectively). + */ + if (swframe->format == AV_PIX_FMT_P010) { + err = nvtegra_vic_copy_plane(ctx, job, src, dst, AV_PIX_FMT_RGB565, + plane_maps, plane_offsets, 1, false); + if (err < 0) + goto fail; + + err = nvtegra_vic_copy_plane(ctx, job, src, dst, AV_PIX_FMT_RGB32, + plane_maps, plane_offsets, 1, true); + if (err < 0) + goto fail; + } else { + err = nvtegra_vic_copy_plane(ctx, job, src, dst, swframe->format, + plane_maps, plane_offsets, num_planes, false); + if (err < 0) + goto fail; + } + +fail: + for (i = 0; i < num_maps; ++i) { + av_nvtegra_map_unmap(&maps[i]); + av_nvtegra_map_close(&maps[i]); + } + + av_buffer_unref(&job_ref); + + return err; +} + +static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) { + const AVFrame *swframe; + bool from; + int num_planes, i; + + from = !dst->hw_frames_ctx; + swframe = from ? dst : src; + + if (swframe->hw_frames_ctx) + return AVERROR(ENOSYS); + + num_planes = av_pix_fmt_count_planes(swframe->format); + + for (i = 0; i < num_planes; ++i) { + if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) { + av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, " + "falling back to cpu transfer\n"); + return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from); + } + } + + return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from); +} + +const HWContextType ff_hwcontext_type_nvtegra = { + .type = AV_HWDEVICE_TYPE_NVTEGRA, + .name = "nvtegra", + + .device_hwctx_size = sizeof(NVTegraDevicePriv), + .device_hwconfig_size = 0, + .frames_hwctx_size = 0, + + .device_create = &nvtegra_device_create, + .device_init = &nvtegra_device_init, + .device_uninit = &nvtegra_device_uninit, + + .frames_get_constraints = &nvtegra_frames_get_constraints, + .frames_init = &nvtegra_frames_init, + .frames_uninit = &nvtegra_frames_uninit, + .frames_get_buffer = &nvtegra_get_buffer, + + .transfer_get_formats = &nvtegra_transfer_get_formats, + .transfer_data_to = &nvtegra_transfer_data, + .transfer_data_from = &nvtegra_transfer_data, + + .pix_fmts = (const enum AVPixelFormat[]) { + AV_PIX_FMT_NVTEGRA, + AV_PIX_FMT_NONE, + }, +}; diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h new file mode 100644 index 0000000000..8a2383d304 --- /dev/null +++ b/libavutil/hwcontext_nvtegra.h @@ -0,0 +1,85 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H +#define AVUTIL_HWCONTEXT_NVTEGRA_H + +#include + +#include "hwcontext.h" +#include "buffer.h" +#include "frame.h" +#include "pixfmt.h" + +#include "nvtegra.h" + +/* + * Encode a hardware revision into a version number + */ +#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff)) + +/* + * Decode a version number + */ +static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) { + *maj = (rev >> 8) & 0xff; + *min = (rev >> 0) & 0xff; +} + +/** + * @file + * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA. + * + * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs + * with the data pointer set to an AVNVTegraMap. + */ + +typedef struct AVNVTegraDeviceContext { + /* + * Hardware multimedia engines + */ + AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel; + + /* + * Hardware revisions for associated engines, or 0 if invalid + */ + int nvdec_version, nvenc_version, nvjpg_version, vic_version; +} AVNVTegraDeviceContext; + +typedef struct AVNVTegraFrame { + /* + * Reference to an AVNVTegraMap object + */ + AVBufferRef *map_ref; +} AVNVTegraFrame; + +/* + * Helper to retrieve a map object from the corresponding frame + */ +static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) { + return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data; +} + +/* + * Converts a pixel format to the equivalent code for the VIC engine + */ +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt); + +#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */ From patchwork Thu May 30 19:43:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49417 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67415vqg; Thu, 30 May 2024 12:45:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVGoKeSdnVM4udCL8zejyNzSOkhdcQhcp/sSpR5f7E/cnGokjBPtmm3tGtGxrryZKvzGLo1H/YfiTYOFEh2VW1/L/e3kzDbI2ytSQ== X-Google-Smtp-Source: AGHT+IGAzmloo6whP80lALPFVzAFGWSbkbDS4AqN4rEtRDB5DAdgN0MLKj9jyRjQolE+NnLqCeem X-Received: by 2002:a17:906:3558:b0:a64:71a6:5e7d with SMTP id a640c23a62f3a-a65e90f7821mr162374366b.4.1717098319694; Thu, 30 May 2024 12:45:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098319; cv=none; d=google.com; s=arc-20160816; b=cV3OWpTp3fCgdzBSBXfxHWkRahm/RATgZ6ySm1IBXOeB+ue5WjoJ2ov3Rd9bDMQI02 H5SQkdGbLW+jH+/9yZq0nVQrF1LIFo6NsW0MP06fdBc1Q4MLKWiIoF+2N98x5cbKE9Gk IE4QNrlbZQxbsfcZjm7q5+I1fi/t5MxNk4lHsUKxWbgGWFCtWzrQ1QFTcZKrPH+q9J+W Im/lpbs/oqls6EFozwIYnCwZH4TyXP0l9QkOToyR5GbLQQLRUCTPSBxMvp/OSJfZQUv5 2qNtWXdIuUWftRLkanfKZ0kr+oVU+av2NeQ91v3fMkhPf9Ek22RDCSirvCCZnoMvJKly qhRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=3f4xc6r0n3EQ2NhMkhZzywmbTt7D38mGGs0NAEfbVsM=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=EzZFfDpoCFqKB6vlZGvUtlRMgy9uX/WlBvTbd3sVgjqFlvc62W0MZuuxCXziu9GuQP XOgFGh+DEBsXjyMeEl2J2XbIM2Fvre6PKbWVXFDlqJsb5DvO3WUTbEoE9AiGGKDIhdfr wQAra6yCjw0cKZvr8Wmx9q1/LL22ZhTMO0g68zkY695pulesUBa079OK8r3MiEpAuQdw pLQ7fRlLLcR0BReUbRTVxp478HpmF9uLdXJL7L6mJdmnkGkWAhgRF3kMDN8jFQveIYll ghYPIrG/M2zYcT99Ev8QxfrOuQczhYyr6pi4mxjLmngf3YPVKtCR7CaQcC/F1nHAcs3R V40A==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=PJSajhdN; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67eaf65914si8422366b.851.2024.05.30.12.45.18; Thu, 30 May 2024 12:45:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=PJSajhdN; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A768268CFAD; Thu, 30 May 2024 22:44:40 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 56C8168D5AB for ; Thu, 30 May 2024 22:44:37 +0300 (EEST) Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-3550134ef25so1260660f8f.1 for ; Thu, 30 May 2024 12:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098276; x=1717703076; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nzrTcJUfW7BDUvX4kvvrP/o8SFgfplqdfK+/7zSJytY=; b=PJSajhdNPSjixF9kuoUUUOlAr0vivLoHeopiQ+qLppuJwTgqZETzF0MpMmlwbH4hkN eHzpM+yZASRxo2PMNZ33Lhtga7QJhEsKxqNOHqCkc7lEvsPYQid22+xUkqIpbtWOjn9T wcDpaGKZWB7o3PDankAWekGRftbzPIp4pwrV3+fJCq2ajAAhAgdchOaNE0Az2etm+qWS k5aK9MGa5kLoiThqKfI7fa5jhM7lFq1HAsXLIC3xPTJ1yTgsFAtMkSPrYamp2yq4cMew GlW5jRFYZrYZ/uqsZdxmst3g2fADOOc8+ZdFZduKkPW5xKG4NP8KLfyfNYw5KmgksiQc uYjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098276; x=1717703076; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nzrTcJUfW7BDUvX4kvvrP/o8SFgfplqdfK+/7zSJytY=; b=XLBtPwyxDdbW0Zku4YeF0e9zPHyOWkNq9ndZ/cvftmicpr3VV8CBPVSXQbColZTBrn lZ+9j+pgbg9ycfnXf7pY4vLHuz6Hzl+GyH3f3vTwn4arXvKhslbpebZtiPs34jZHOIIm V/g9DVVucJz9xFDu/cNt8odN4B6WZGsWkZdIupiFVwRbZGwrFC4rSYSKdeeMwQId5wOa 3ftiRdft955cjU2T8uHOMYWGxnSbT2BE8O1JmMUKMZRuapgowqYMJ/P/MxX9DHodaScP jU/cJfNp3tjrVHNPWczrC4AdlRgAOxbWcRCPh6EPdkUqK8WHurzl2f8qJfipNu8PbguR JDLg== X-Gm-Message-State: AOJu0YxOao6tbbizWe45SkBtg52vxipmXK2BoulVsVYMnJLJ+YLpw31o GQ2FdCRqZpaCsRAbSEMl6nvufCd7tFIyqvN2B+lBkpps/XaFO3+EcA8fvg== X-Received: by 2002:a5d:58fa:0:b0:354:fb1a:25f5 with SMTP id ffacd0b85a97d-35dc00c9a4cmr2359212f8f.52.1717098276391; Thu, 30 May 2024 12:44:36 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:36 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:09 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gSlCoKC5OHVD To save on energy, the clock speed of multimedia engines should be adapted to their workload. Signed-off-by: averne --- libavutil/hwcontext_nvtegra.c | 165 ++++++++++++++++++++++++++++++++++ libavutil/hwcontext_nvtegra.h | 7 ++ 2 files changed, 172 insertions(+) diff --git a/libavutil/hwcontext_nvtegra.c b/libavutil/hwcontext_nvtegra.c index 0f4d5a323b..6b72348082 100644 --- a/libavutil/hwcontext_nvtegra.c +++ b/libavutil/hwcontext_nvtegra.c @@ -46,6 +46,14 @@ typedef struct NVTegraDevicePriv { AVNVTegraJobPool job_pool; uint32_t vic_setup_off, vic_cmdbuf_off; + + double framerate; + uint32_t dfs_lowcorner; + double dfs_decode_cycles_ema; + double dfs_ema_damping; + int dfs_bitrate_sum; + int dfs_cur_sample, dfs_num_samples; + int64_t dfs_sampling_start_ts, dfs_last_ts_delta; } NVTegraDevicePriv; static const enum AVPixelFormat supported_sw_formats[] = { @@ -108,6 +116,28 @@ static inline uint32_t nvtegra_surface_get_height_align(enum AVPixelFormat fmt, return 32; } +static int nvtegra_channel_set_freq(AVNVTegraChannel *channel, uint32_t freq) { + int err; +#ifndef __SWITCH__ + err = av_nvtegra_channel_set_clock_rate(channel, channel->module_id, freq); + if (err < 0) + return err; + + err = av_nvtegra_channel_get_clock_rate(channel, channel->module_id, &channel->clock); + if (err < 0) + return err; +#else + err = AVERROR(mmuRequestSetAndWait(&channel->mmu_request, freq, -1)); + if (err < 0) + return err; + + err = AVERROR(mmuRequestGet(&channel->mmu_request, &channel->clock)); + if (err < 0) + return err; +#endif + return 0; +} + static void nvtegra_device_uninit(AVHWDeviceContext *ctx) { NVTegraDevicePriv *priv = ctx->hwctx; AVNVTegraDeviceContext *hwctx = &priv->p; @@ -386,6 +416,141 @@ static int nvtegra_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) { return 0; } +/* + * Possible frequencies on Icosa and Mariko+, in MHz + * (see tegra210-core-dvfs.c and tegra210b01-core-dvfs.c in l4t kernel sources, respectively): + * for NVDEC: + * 268.8, 384.0, 448.0, 486.4, 550.4, 576.0, 614.4, 652.8, 678.4, 691.2, 716.8 + * 460.8, 499.2, 556.8, 633.6, 652.8, 710.4, 748.8, 787.2, 825.6, 844.8, 883.2, 902.4, 921.6, 940.8, 960.0, 979.2 + * for NVJPG: + * 192.0, 307.2, 345.6, 409.6, 486.4, 524.8, 550.4, 576.0, 588.8, 614.4, 627.2 + * 422.4, 441.6, 499.2, 518.4, 537.6, 556.8, 576.0, 595.2, 614.4, 633.6, 652.8 + */ + +int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int width, int height, + double framerate_hz) +{ + NVTegraDevicePriv *priv = ctx->hwctx; + + uint32_t max_freq, lowcorner; + int num_mbs, err; + + priv->dfs_num_samples = 20; + priv->dfs_ema_damping = 0.1; + + /* + * Initialize low-corner frequency (reproduces official code) + * Framerate might be unavailable (or variable), but this is official logic + */ + num_mbs = width / 16 * height / 16; + if (num_mbs <= 3600) + lowcorner = 100000000; /* 480p */ + else if (num_mbs <= 8160) + lowcorner = 180000000; /* 720p */ + else if (num_mbs <= 32400) + lowcorner = 345000000; /* 1080p */ + else + lowcorner = 576000000; /* 4k */ + + if (framerate_hz >= 0.1 && isfinite(framerate_hz)) + lowcorner = FFMIN(lowcorner, lowcorner * framerate_hz / 30.0); + + priv->framerate = framerate_hz; + priv->dfs_lowcorner = lowcorner; + + av_log(ctx, AV_LOG_DEBUG, "DFS: Initializing lowcorner to %d Hz, using %u samples\n", + priv->dfs_lowcorner, priv->dfs_num_samples); + + /* + * Initialize channel to the max possible frequency (the kernel driver will clamp to an allowed value) + * Note: Official code passes INT_MAX kHz then multiplies by 1000 (to Hz) and converts to u32, + * resulting in this value. + */ + max_freq = (UINT64_C(1)<<32) - 1000 & UINT32_MAX; + + err = nvtegra_channel_set_freq(channel, max_freq); + if (err < 0) + return err; + + priv->dfs_decode_cycles_ema = 0.0; + priv->dfs_bitrate_sum = 0; + priv->dfs_cur_sample = 0; + priv->dfs_sampling_start_ts = av_gettime_relative(); + priv->dfs_last_ts_delta = 0; + + return 0; +} + +int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int bitstream_len, int decode_cycles) { + NVTegraDevicePriv *priv = ctx->hwctx; + + double frame_time, avg; + int64_t now, wl_dt; + uint32_t clock; + int err; + + /* + * Official software implements DFS using a flat average of the decoder pool occupancy. + * We instead use the decode cycles as reported by NVDEC microcode, and the "bitrate" + * (bitstream bits fed to the hardware in a given clock time interval, NOT video time), + * to calculate a suitable frequency, and multiply it by 1.2 for good measure: + * Freq = decode_cycles_per_bit * bits_per_second * 1.2 + */ + + /* Convert to bits */ + bitstream_len *= 8; + + /* Exponential moving average of decode cycles per frame */ + priv->dfs_decode_cycles_ema = priv->dfs_ema_damping * (double)decode_cycles/bitstream_len + + (1.0 - priv->dfs_ema_damping) * priv->dfs_decode_cycles_ema; + + priv->dfs_bitrate_sum += bitstream_len; + priv->dfs_cur_sample = (priv->dfs_cur_sample + 1) % priv->dfs_num_samples; + + err = 0; + + /* Reclock if we collected enough samples */ + if (priv->dfs_cur_sample == 0) { + now = av_gettime_relative(); + wl_dt = now - priv->dfs_sampling_start_ts; + + /* + * Try to filter bad sample sets caused by eg. pausing the video playback. + * We reject if one of these conditions is met: + * - the wall time is over 1.5x the framerate (10Hz is used as fallback if no framerate information is available) + * - the wall time is over 1.5x the ema-damped previous values + */ + + if (priv->framerate >= 0.1 && isfinite(priv->framerate)) + frame_time = 1.0e6 / priv->framerate; + else + frame_time = 0.1e6; + + if ((wl_dt < 1.5 * priv->dfs_num_samples * frame_time) || + ((priv->dfs_last_ts_delta) && (wl_dt < 1.5 * priv->dfs_last_ts_delta))) { + avg = priv->dfs_bitrate_sum * 1e6 / wl_dt; + clock = priv->dfs_decode_cycles_ema * avg * 1.2; + clock = FFMAX(clock, priv->dfs_lowcorner); + + av_log(ctx, AV_LOG_DEBUG, "DFS: %.0f cycles/b (ema), %.0f b/s -> clock %u Hz (lowcorner %u Hz)\n", + priv->dfs_decode_cycles_ema, avg, clock, priv->dfs_lowcorner); + + err = nvtegra_channel_set_freq(channel, clock); + + priv->dfs_last_ts_delta = wl_dt; + } + + priv->dfs_bitrate_sum = 0; + priv->dfs_sampling_start_ts = now; + } + + return err; +} + +int av_nvtegra_dfs_uninit(AVHWDeviceContext *ctx, AVNVTegraChannel *channel) { + return nvtegra_channel_set_freq(channel, 0); +} + static int nvtegra_transfer_get_formats(AVHWFramesContext *ctx, enum AVHWFrameTransferDirection dir, enum AVPixelFormat **formats) diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h index 8a2383d304..7c845951d9 100644 --- a/libavutil/hwcontext_nvtegra.h +++ b/libavutil/hwcontext_nvtegra.h @@ -82,4 +82,11 @@ static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) */ int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt); +/* + * Dynamic frequency scaling routines + */ +int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int width, int height, double framerate_hz); +int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int bitstream_len, int decode_cycles); +int av_nvtegra_dfs_uninit(AVHWDeviceContext *ctx, AVNVTegraChannel *channel); + #endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */ From patchwork Thu May 30 19:43:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49418 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67522vqg; Thu, 30 May 2024 12:45:31 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUnp62ns5ARgfSf0kRrvMSN26XfLCkm8j/ikWBjPa7BmOr6hl+1NFk5pKNYnjdp8u2FqY7bL1xUH9WnVO4C9eoSgUlUnkFO47OTPg== X-Google-Smtp-Source: AGHT+IGE8QcIqGrY4RAuoEPQmhjvIJ9k8ZGd+kabsowNwMYoCy+BBnyWcAIGUerKjyD8u7mCe3xX X-Received: by 2002:a05:6512:484d:b0:52b:6d5c:bf68 with SMTP id 2adb3069b0e04-52b7d488df5mr1981848e87.57.1717098331428; Thu, 30 May 2024 12:45:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098331; cv=none; d=google.com; s=arc-20160816; b=jzo+Ll5ga2vq3kiiJIURns2ZqUcuNtZWHJoDwgLnF9FLHFXz+4NJ4wicbyZg/4SpeZ vM4pj6cb0HtlsKf2tGCS291z4LrkvMNan407M+qFNaNcDHyEaRt1Tbu1ao/r4nDJ34fx 1M6E5pi2s4rgfvFCNh1oa7nzMfVPTedR2mEQrSo2G7xri1vilwwYkVOsh9vhnPh4Dg/S BaAAOZ9ZtHMrOWAiHMH77iNYcYxuWMGMQRzPrdMV10tv+HZVqRwGHJlftiP7600YztU3 PbZuRVBcyb9exfyX7L1fqv5fxg2tryVbUcYGqpqd4kz3GoJlhqFo0fmGBH17fEFejMbN 9R/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=njuAQhxArCPqOwJJCIY6EeGuXAFb83pChGCjk4kMLj8=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=PJiTmffFC2Lh63t97rFofehfcLj6B9scWVm3X4KD0C4Pqe78OpgaZ7kQd0xeU+ri9a TusnvrmDBv+LiJ+5r7WB9vt3Sa1BXbqbNvq0ZfciiSf9RJ+yIwms+sCaK7amRxBkxQoC h8Ld/iqZ3477BP+8KUtn2eGgvXyeRQSgmVPH88WXOfPPpMiZVJ7Sa6q1RqZDnI3yFMVD DeKjkPZICGh0v4D4xyxgHxyG6DJV/RO57x+cc9mXsuB18ErXAMV6YlhDmasE0LoRAHW5 y4TKsxQ6kq/QI8Yu+1yIyjdvCyNZEqXvVLtt5jvkg5N8ldQ8x2Bzw3PS+4+n1hCSF68U +7TA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=XJwxnpM9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52b84d7eb9bsi99892e87.394.2024.05.30.12.45.30; Thu, 30 May 2024 12:45:31 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=XJwxnpM9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B5A0868D4D0; Thu, 30 May 2024 22:44:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DA88F68D5AB for ; Thu, 30 May 2024 22:44:38 +0300 (EEST) Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-354f8a0cd08so1317178f8f.2 for ; Thu, 30 May 2024 12:44:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098278; x=1717703078; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=miHaZHaSoro2wdWKfTvq1ahzEns2/nzc5vyoHrTtA1c=; b=XJwxnpM9q1apcYitzjhQwvGKRzFZ4N777iIz2lpevA+eyf6I6Kn9pizsalNLi3TgdR 5nCqamn9yIUu8az+1E6wmiXi7hJCNapxI/fhNH80GBDbv+qDFUahFCDhGZd6D25zCFpx vuyVOQOa51BVfDCQ+nvi0SL45Pf8TIMxDpJkI1wtfVpM4YPYpNd11NqJpZBYb2GK9ccn aH5WhptQbyHkVohuiNE1GWDKQ4/mPz/dvgGl8kLpShoQeV/vrtyPLDby5uXzrfYoOXU7 u92yAh5xNJK22wKASNakVgcKefvT5/DSdgKEhpHkcYeYuxz9q7I//0GC1FT4vJYwF34z TcHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098278; x=1717703078; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=miHaZHaSoro2wdWKfTvq1ahzEns2/nzc5vyoHrTtA1c=; b=V5fC6Ry7NH9frhOntWhng21k31xo5cLwvncOgeJITfEOjjLVirplzO8OCLjrOdW504 r9bC9vTdWvkP89xsYAgPniVRCI9QryDkfn8ABbM1Rhm/Z7mUDh5P+rwSgE7sbQ3KuTf2 rTXMkioqRoi6rod5Mtq0UIM79Ou2i4K0+x1Z5GiHmetOcIuEiwzItQcqrx8NFejA4Mvj 5YA5ehfG6ms7jdiNvgfKl/U/wQK29RKsKes5Uynn2sp5ORoGsMiu/cCt2P0tyIVFz6Je aL0sur7WiGpjqJ7ROsPJt36KrDZFj1a2veDdvh3vrbmcohWoAGzmUytKqOmVEwCtM2bO zmRA== X-Gm-Message-State: AOJu0YywEdpbsn/3e+g02Ky8QUEmh8nTzanbev7vRErwfF5pyQetqGq1 tINYnHPNryTNWddlaNzgcXlq29UHHBHlKXOPkkxEJkw0K+AkR+KeJt4mJA== X-Received: by 2002:a05:6000:e43:b0:351:dace:3dac with SMTP id ffacd0b85a97d-35dc00be9a5mr2302276f8f.56.1717098277639; Thu, 30 May 2024 12:44:37 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:37 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:10 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 08/16] nvtegra: add common hardware decoding code X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: msIsmTUJSC5F This includes decode common de/initialization code, decode-job management, and constraint checks. Signed-off-by: averne --- configure | 1 + libavcodec/Makefile | 2 + libavcodec/hwconfig.h | 2 + libavcodec/nvtegra_decode.c | 517 ++++++++++++++++++++++++++++++++++++ libavcodec/nvtegra_decode.h | 94 +++++++ 5 files changed, 616 insertions(+) create mode 100644 libavcodec/nvtegra_decode.c create mode 100644 libavcodec/nvtegra_decode.h diff --git a/configure b/configure index 51f169bfbd..566bb37b8c 100755 --- a/configure +++ b/configure @@ -2022,6 +2022,7 @@ HWACCEL_LIBRARY_LIST=" mmal omx opencl + nvtegra " DOCUMENT_LIST=" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 2443d2c6fd..f1e2dc6625 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -993,6 +993,7 @@ OBJS-$(CONFIG_VAAPI) += vaapi_decode.o OBJS-$(CONFIG_VIDEOTOOLBOX) += videotoolbox.o OBJS-$(CONFIG_VDPAU) += vdpau.o OBJS-$(CONFIG_VULKAN) += vulkan.o vulkan_video.o +OBJS-$(CONFIG_NVTEGRA) += nvtegra_decode.o OBJS-$(CONFIG_AV1_D3D11VA_HWACCEL) += dxva2_av1.o OBJS-$(CONFIG_AV1_DXVA2_HWACCEL) += dxva2_av1.o @@ -1285,6 +1286,7 @@ SKIPHEADERS-$(CONFIG_VIDEOTOOLBOX) += videotoolbox.h vt_internal.h SKIPHEADERS-$(CONFIG_VULKAN) += vulkan.h vulkan_video.h vulkan_decode.h SKIPHEADERS-$(CONFIG_V4L2_M2M) += v4l2_buffers.h v4l2_context.h v4l2_m2m.h SKIPHEADERS-$(CONFIG_ZLIB) += zlib_wrapper.h +SKIPHEADERS-$(CONFIG_NVTEGRA) += nvtegra_decode.h TESTPROGS = avcodec \ avpacket \ diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h index ee29ca631d..a3c3402c77 100644 --- a/libavcodec/hwconfig.h +++ b/libavcodec/hwconfig.h @@ -79,6 +79,8 @@ void ff_hwaccel_uninit(AVCodecContext *avctx); HW_CONFIG_HWACCEL(0, 0, 1, D3D11VA_VLD, NONE, ff_ ## codec ## _d3d11va_hwaccel) #define HWACCEL_D3D12VA(codec) \ HW_CONFIG_HWACCEL(1, 1, 0, D3D12, D3D12VA, ff_ ## codec ## _d3d12va_hwaccel) +#define HWACCEL_NVTEGRA(codec) \ + HW_CONFIG_HWACCEL(1, 1, 0, NVTEGRA, NVTEGRA, ff_ ## codec ## _nvtegra_hwaccel) #define HW_CONFIG_ENCODER(device, frames, ad_hoc, format, device_type_) \ &(const AVCodecHWConfigInternal) { \ diff --git a/libavcodec/nvtegra_decode.c b/libavcodec/nvtegra_decode.c new file mode 100644 index 0000000000..1978fcf644 --- /dev/null +++ b/libavcodec/nvtegra_decode.c @@ -0,0 +1,517 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "libavutil/hwcontext.h" +#include "libavutil/hwcontext_nvtegra.h" +#include "libavutil/nvtegra_host1x.h" +#include "libavutil/pixdesc.h" +#include "libavutil/pixfmt.h" +#include "libavutil/intreadwrite.h" + +#include "avcodec.h" +#include "codec_desc.h" +#include "internal.h" +#include "decode.h" +#include "nvtegra_decode.h" + +static void nvtegra_input_map_free(void *opaque, uint8_t *data) { + AVNVTegraMap *map = (AVNVTegraMap *)data; + + if (!data) + return; + + av_nvtegra_map_destroy(map); + + av_freep(&map); +} + +static AVBufferRef *nvtegra_input_map_alloc(void *opaque, size_t size) { + FFNVTegraDecodeContext *ctx = opaque; + + AVBufferRef *buffer; + AVNVTegraMap *map; + int err; + + map = av_mallocz(sizeof(*map)); + if (!map) + return NULL; + + err = av_nvtegra_map_create(map, ctx->channel, ctx->input_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + return NULL; + + buffer = av_buffer_create((uint8_t *)map, sizeof(*map), nvtegra_input_map_free, ctx, 0); + if (!buffer) + goto fail; + + ctx->new_input_buffer = true; + + return buffer; + +fail: + av_log(ctx, AV_LOG_ERROR, "Failed to create buffer\n"); + av_nvtegra_map_destroy(map); + av_freep(map); + return NULL; +} + +int ff_nvtegra_decode_init(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx) { + AVHWFramesContext *frames_ctx; + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + + int err; + + err = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_NVTEGRA); + if (err < 0) + goto fail; + + frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data; + hw_device_ctx = (AVHWDeviceContext *)frames_ctx->device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + if ((!ctx->is_nvjpg && !device_hwctx->nvdec_version) || (ctx->is_nvjpg && !device_hwctx->nvjpg_version)) + return AVERROR(EACCES); + + ctx->hw_device_ref = av_buffer_ref(frames_ctx->device_ref); + if (!ctx->hw_device_ref) { + err = AVERROR(ENOMEM); + goto fail; + } + + ctx->decoder_pool = av_buffer_pool_init2(sizeof(AVNVTegraMap), ctx, + nvtegra_input_map_alloc, NULL); + if (!ctx->decoder_pool) { + err = AVERROR(ENOMEM); + goto fail; + } + + ctx->channel = !ctx->is_nvjpg ? &device_hwctx->nvdec_channel : &device_hwctx->nvjpg_channel; + + err = av_nvtegra_cmdbuf_init(&ctx->cmdbuf); + if (err < 0) + goto fail; + + err = av_nvtegra_dfs_init(hw_device_ctx, ctx->channel, avctx->coded_width, avctx->coded_height, + av_q2d(avctx->framerate)); + if (err < 0) + goto fail; + + return 0; + +fail: + ff_nvtegra_decode_uninit(avctx, ctx); + return err; +} + +int ff_nvtegra_decode_uninit(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx) { + AVHWFramesContext *frames_ctx; + AVHWDeviceContext *hw_device_ctx; + + av_buffer_pool_uninit(&ctx->decoder_pool); + + av_buffer_unref(&ctx->hw_device_ref); + + av_nvtegra_cmdbuf_deinit(&ctx->cmdbuf); + + if (avctx->hw_frames_ctx) { + frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data; + hw_device_ctx = (AVHWDeviceContext *)frames_ctx->device_ref->data; + + av_nvtegra_dfs_uninit(hw_device_ctx, ctx->channel); + } + + + return 0; +} + +static void nvtegra_fdd_priv_free(void *priv) { + FFNVTegraDecodeFrame *tf = priv; + FFNVTegraDecodeContext *ctx = tf->ctx; + + if (!tf) + return; + + if (tf->in_flight) + av_nvtegra_syncpt_wait(ctx->channel, tf->fence, -1); + + av_buffer_unref(&tf->input_map_ref); + av_freep(&tf); +} + +int ff_nvtegra_wait_decode(void *logctx, AVFrame *frame) { + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + FFNVTegraDecodeContext *ctx = tf->ctx; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + AVHWDeviceContext *hw_device_ctx = (AVHWDeviceContext *)ctx->hw_device_ref->data; + + nvdec_status_s *nvdec_status; + nvjpg_dec_status *nvjpg_status; + uint32_t decode_cycles; + uint8_t *mem; + int err; + + if (!tf->in_flight) + return 0; + + mem = av_nvtegra_map_get_addr(input_map); + + err = av_nvtegra_syncpt_wait(ctx->channel, tf->fence, -1); + if (err < 0) + return err; + + tf->in_flight = false; + + if (!ctx->is_nvjpg) { + nvdec_status = (nvdec_status_s *)(mem + ctx->status_off); + if (nvdec_status->error_status != 0 || nvdec_status->mbs_in_error != 0) + return AVERROR_UNKNOWN; + + decode_cycles = nvdec_status->cycle_count * 16; + } else { + nvjpg_status = (nvjpg_dec_status *)(mem + ctx->status_off); + if (nvjpg_status->error_status != 0 || nvjpg_status->bytes_offset == 0) + return AVERROR_UNKNOWN; + + decode_cycles = nvjpg_status->cycle_count; + } + + /* Decode time in µs: decode_cycles * 1000000 / ctx->channel->clock */ + err = av_nvtegra_dfs_update(hw_device_ctx, ctx->channel, tf->bitstream_len, decode_cycles); + if (err < 0) + return err; + + return 0; +} + +int ff_nvtegra_start_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx) { + AVHWFramesContext *frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + + FFNVTegraDecodeFrame *tf = NULL; + int err; + + /* Abort on resolution changes that wouldn't fit into the frame */ + if ((frame->width > frames_ctx->width) || (frame->height > frames_ctx->height)) + return AVERROR(EINVAL); + + ctx->bitstream_len = ctx->num_slices = 0; + + if (fdd->hwaccel_priv) { + /* + * For interlaced video, both fields use the same fdd, + * however by proceeding we might overwrite the input buffer + * during the decoding, so wait for the previous operation to complete. + */ + err = ff_nvtegra_wait_decode(avctx, frame); + if (err < 0) + return err; + } else { + tf = av_mallocz(sizeof(*tf)); + if (!tf) + return AVERROR(ENOMEM); + + fdd->hwaccel_priv = tf; + fdd->hwaccel_priv_free = nvtegra_fdd_priv_free; + fdd->post_process = ff_nvtegra_wait_decode; + + tf->ctx = ctx; + + tf->input_map_ref = av_buffer_pool_get(ctx->decoder_pool); + if (!tf->input_map_ref) { + err = AVERROR(ENOMEM); + goto fail; + } + } + + tf = fdd->hwaccel_priv; + tf->in_flight = false; + + err = av_nvtegra_cmdbuf_add_memory(&ctx->cmdbuf, (AVNVTegraMap *)tf->input_map_ref->data, + ctx->cmdbuf_off, ctx->max_cmdbuf_size); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_clear(&ctx->cmdbuf); + if (err < 0) + return err; + + return 0; + +fail: + nvtegra_fdd_priv_free(tf); + return err; +} + +int ff_nvtegra_decode_slice(AVCodecContext *avctx, AVFrame *frame, + const uint8_t *buf, uint32_t buf_size, bool add_startcode) +{ + FFNVTegraDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + bool need_bitstream_move = false; + uint32_t old_bitstream_off, startcode_size; + uint8_t *mem; + int err; + + mem = av_nvtegra_map_get_addr(input_map); + + startcode_size = add_startcode ? 3 : 0; + + /* Reserve 16 bytes for the termination sequence */ + if (ctx->bitstream_len + buf_size + startcode_size >= ctx->max_bitstream_size - 16) { + ctx->input_map_size += ctx->max_bitstream_size + buf_size; + ctx->input_map_size = FFALIGN(ctx->input_map_size, 0x1000); + + ctx->max_bitstream_size = ctx->input_map_size - ctx->bitstream_off; + + need_bitstream_move = false; + } + + /* Reserve 4 bytes for the bitstream size */ + if (ctx->max_num_slices && ctx->num_slices >= ctx->max_num_slices - 1) { + ctx->input_map_size += ctx->max_num_slices * sizeof(uint32_t); + ctx->input_map_size = FFALIGN(ctx->input_map_size, 0x1000); + + ctx->max_num_slices *= 2; + + old_bitstream_off = ctx->bitstream_off; + ctx->bitstream_off = ctx->slice_offsets_off + ctx->max_num_slices * sizeof(uint32_t); + + need_bitstream_move = true; + } + + if (ctx->input_map_size != av_nvtegra_map_get_size(input_map)) { + err = av_nvtegra_map_realloc(input_map, ctx->input_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + return err; + + mem = av_nvtegra_map_get_addr(input_map); + + err = av_nvtegra_cmdbuf_add_memory(&ctx->cmdbuf, input_map, + ctx->cmdbuf_off, ctx->max_cmdbuf_size); + if (err < 0) + return err; + + /* Running out of slice offsets mem shouldn't happen so the extra memmove is fine */ + if (need_bitstream_move) + memmove(mem + ctx->bitstream_off, mem + old_bitstream_off, ctx->bitstream_len); + } + + if (ctx->max_num_slices) + ((uint32_t *)(mem + ctx->slice_offsets_off))[ctx->num_slices] = ctx->bitstream_len; + + /* NAL startcode 000001 */ + if (add_startcode) { + AV_WB24(mem + ctx->bitstream_off + ctx->bitstream_len, 1); + ctx->bitstream_len += 3; + } + + memcpy(mem + ctx->bitstream_off + ctx->bitstream_len, buf, buf_size); + ctx->bitstream_len += buf_size; + + ctx->num_slices++; + + return 0; +} + +int ff_nvtegra_end_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx, + const uint8_t *end_sequence, int end_sequence_size) +{ + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + uint8_t *mem; + int err; + + mem = av_nvtegra_map_get_addr(input_map); + + /* Last slice data range */ + if (ctx->max_num_slices) + ((uint32_t *)(mem + ctx->slice_offsets_off))[ctx->num_slices] = ctx->bitstream_len; + + /* Termination sequence for the bitstream data */ + if (end_sequence_size) + memcpy(mem + ctx->bitstream_off + ctx->bitstream_len, end_sequence, end_sequence_size); + + err = av_nvtegra_cmdbuf_begin(&ctx->cmdbuf, !ctx->is_nvjpg ? HOST1X_CLASS_NVDEC : HOST1X_CLASS_NVJPG); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_add_syncpt_incr(&ctx->cmdbuf, ctx->channel->syncpt, 0); + if (err < 0) + return err; + + err = av_nvtegra_cmdbuf_end(&ctx->cmdbuf); + if (err < 0) + return err; + + err = av_nvtegra_channel_submit(ctx->channel, &ctx->cmdbuf, &tf->fence); + if (err < 0) + return err; + + tf->bitstream_len = ctx->bitstream_len; + tf->in_flight = true; + + ctx->frame_idx++; + + ctx->new_input_buffer = false; + + return 0; +} + +static int nvtegra_get_size_constraints(enum AVCodecID codec, + int *min_width, int *min_height, + int *max_width, int *max_height, + int *align, int *max_mbs) +{ + switch (codec) { + case AV_CODEC_ID_MPEG1VIDEO: + case AV_CODEC_ID_MPEG2VIDEO: + *min_width = 48, *min_height = 1; + *max_width = 4096, *max_height = 4096; + *align = 16, *max_mbs = 0x20000; + break; + + case AV_CODEC_ID_MPEG4: + *min_width = 48, *min_height = 1; + *max_width = 2048, *max_height = 2048; + *align = 16, *max_mbs = 0x2000; + break; + + case AV_CODEC_ID_VC1: + case AV_CODEC_ID_WMV3: + *min_width = 48, *min_height = 1; + *max_width = 2048, *max_height = 2048; + *align = 1, *max_mbs = -1; + break; + + case AV_CODEC_ID_H264: + *min_width = 48, *min_height = 1; + *max_width = 4096, *max_height = 4096; + *align = 16, *max_mbs = 0x20000; + break; + + case AV_CODEC_ID_HEVC: + /* Note: on nvdec 4.0+ (tegra 194) max dimensions are 8192, and max mbs 0x80000 */ + *min_width = 144, *min_height = 144; + *max_width = 4096, *max_height = 4096; + *align = 64, *max_mbs = 0x20000; + break; + + case AV_CODEC_ID_VP8: + *min_width = 48, *min_height = 1; + *max_width = 4096, *max_height = 4096; + *align = 16, *max_mbs = 0x20000; + break; + + case AV_CODEC_ID_VP9: + /* Note: on nvdec 4.0+ (tegra 194) max dimensions are 8192, and max mbs 0x40000 */ + *min_width = 144, *min_height = 144; + *max_width = 4096, *max_height = 4096; + *align = 16, *max_mbs = 0x10000; + break; + + case AV_CODEC_ID_MJPEG: + *min_width = 1, *min_height = 1; + *max_width = 16384, *max_height = 16384; + *align = 1, *max_mbs = -1; + break; + + #if 0 + case AV_CODEC_ID_AV1: + /* Note: on nvdec 4.0+ (tegra 194) max dimensions are 8192, and max mbs 0x80000 */ + *min_width = 128, *min_height = 128; + *max_width = 4096, *max_height = 4096; + *align = 64, *max_mbs = 0x20000; + break; + #endif + + default: + return AVERROR(EINVAL); + } + + return 0; +} + +int ff_nvtegra_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) { + AVHWFramesContext *frames_ctx = (AVHWFramesContext *)hw_frames_ctx->data; + const AVPixFmtDescriptor *sw_desc; + + int min_width, min_height, max_width, max_height, align, max_mbs, + aligned_width, aligned_height, num_mbs; + int err; + + err = nvtegra_get_size_constraints(avctx->codec_id, &min_width, &min_height, + &max_width, &max_height, &align, &max_mbs); + if (err < 0) + return err; + + aligned_width = FFALIGN(avctx->coded_width, align); + aligned_height = FFALIGN(avctx->coded_height, align); + num_mbs = (aligned_width / 16) * (aligned_height / 16); + + if ((aligned_width < min_width) || (aligned_width > max_width) || + (aligned_height < min_height) || (aligned_height > max_height)) + { + av_log(avctx, AV_LOG_ERROR, "Dimensions %dx%d (min. %dx%d, max. %dx%d) " + "are not supported by the hardware for codec %s\n", + avctx->coded_width, avctx->coded_height, + min_width, min_height, max_width, max_height, + avctx->codec_descriptor->name); + return AVERROR(EINVAL); + } + + if ((max_mbs > 0) && (num_mbs > max_mbs)) { + av_log(avctx, AV_LOG_ERROR, "Number of macroblocks %d exceeds maximum %d " + "for codec %s\n", + num_mbs, max_mbs, avctx->codec_descriptor->name); + return AVERROR(EINVAL); + } + + frames_ctx->format = AV_PIX_FMT_NVTEGRA; + frames_ctx->width = FFALIGN(avctx->coded_width, 2); /* NVDEC only supports even sizes */ + frames_ctx->height = FFALIGN(avctx->coded_height, 2); + + sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt); + if (!sw_desc) + return AVERROR_BUG; + + switch (sw_desc->comp[0].depth) { + case 8: + frames_ctx->sw_format = (sw_desc->nb_components > 1) ? + AV_PIX_FMT_NV12 : AV_PIX_FMT_GRAY8; + break; + case 10: + frames_ctx->sw_format = (sw_desc->nb_components > 1) ? + AV_PIX_FMT_P010 : AV_PIX_FMT_GRAY10; + break; + default: + return AVERROR(EINVAL); + } + + return 0; +} diff --git a/libavcodec/nvtegra_decode.h b/libavcodec/nvtegra_decode.h new file mode 100644 index 0000000000..5260c8b3c5 --- /dev/null +++ b/libavcodec/nvtegra_decode.h @@ -0,0 +1,94 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef AVCODEC_NVTEGRA_DECODE_H +#define AVCODEC_NVTEGRA_DECODE_H + +#include + +#include "avcodec.h" +#include "libavutil/mem.h" +#include "libavutil/hwcontext_nvtegra.h" + +#include "libavutil/nvdec_drv.h" +#include "libavutil/nvjpg_drv.h" +#include "libavutil/clc5b0.h" +#include "libavutil/cle7d0.h" + +typedef struct FFNVTegraDecodeContext { + uint64_t frame_idx; + + AVBufferRef *hw_device_ref; + AVBufferPool *decoder_pool; + + bool is_nvjpg; + AVNVTegraChannel *channel; + + AVNVTegraCmdbuf cmdbuf; + + uint32_t pic_setup_off, status_off, cmdbuf_off, + bitstream_off, slice_offsets_off; + uint32_t input_map_size; + uint32_t max_cmdbuf_size, max_bitstream_size, max_num_slices; + + uint32_t num_slices; + uint32_t bitstream_len; + + bool new_input_buffer; +} FFNVTegraDecodeContext; + +typedef struct FFNVTegraDecodeFrame { + FFNVTegraDecodeContext *ctx; + AVBufferRef *input_map_ref; + uint32_t fence; + uint32_t bitstream_len; + bool in_flight; +} FFNVTegraDecodeFrame; + +static inline size_t ff_nvtegra_decode_pick_bitstream_buffer_size(AVCodecContext *avctx) { + /* + * Official software uses a static map of a predetermined size, usually around 0x600000 (6MiB). + * Our implementation supports dynamically resizing the input map, so be less conservative. + */ + if ((avctx->coded_width >= 3840) || (avctx->coded_height >= 2160)) /* 4k */ + return 0x100000; /* 1MiB */ + if ((avctx->coded_width >= 1920) || (avctx->coded_height >= 1080)) /* 1080p */ + return 0x40000; /* 256KiB */ + else + return 0x10000; /* 64KiB */ +} + +static inline AVFrame *ff_nvtegra_safe_get_ref(AVFrame *ref, AVFrame *fallback) { + return (ref && ref->private_ref) ? ref : fallback; +} + +int ff_nvtegra_decode_init(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx); +int ff_nvtegra_decode_uninit(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx); +int ff_nvtegra_start_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx); +int ff_nvtegra_decode_slice(AVCodecContext *avctx, AVFrame *frame, + const uint8_t *buf, uint32_t buf_size, bool add_startcode); +int ff_nvtegra_end_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx, + const uint8_t *end_sequence, int end_sequence_size); + +int ff_nvtegra_wait_decode(void *logctx, AVFrame *frame); + +int ff_nvtegra_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx); + +#endif /* AVCODEC_NVTEGRA_DECODE_H */ From patchwork Thu May 30 19:43:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49422 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp71520vqg; Thu, 30 May 2024 12:55:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWK+3f3zwqQtJT9tpyrAHr5gzl/Oeqq/r9FDuI3DGFUwAN+NZzDduYSY4/sXGO0tF5QYa8BZ7JYSl0EJA9TyTmRUUNm8+XuKz7kTQ== X-Google-Smtp-Source: AGHT+IGovnoArphJnxFsrd/IIvctshJE36cROCB8CDb3BiKclN/pwVqfN2SCIguOt5rKiG9HhYoX X-Received: by 2002:a2e:8317:0:b0:2de:d4ef:af19 with SMTP id 38308e7fff4ca-2ea847c7a2amr21003461fa.10.1717098914567; Thu, 30 May 2024 12:55:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098914; cv=none; d=google.com; s=arc-20160816; b=IId7vvrH7lLSAdOKXY5q4rAeqDOuTZEAFkdfpCYxFatWoeXzWBTeCOW8ZjJqI0yIU4 VYDKyGipv1lOdvsSP/WxGlfcA3L/Hb5LjRZJ6buNCyzfM3uV0P+HRFbADMKgpiHjcHxO Ztxnrb+H0L0Tkyebra7XyCgWiRZPwIRia3Oqh1lTbj0PVc59UZVBp9y7+v3OuwRGgvJl kkEbBQ6+fhmRFx2KQLOjlz0YiITqxIGg+sQeNW2yF+LbGYD5r043F2syCfoVHXW7u6MK SoAkbwaVIDfMFoIYf8ZmOTS4/2OYK5Zk656OuMQXO7LCHBHi2wLqWIzQfpd8X7g0rS2N MQYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=sKOTKVekZuXCFBApa+07aiyOS2HmyIInsiq5Qus4Eu8=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=HGlCdjGUsRIP9XzjRQKGS/2BNvVOA4AVq9tivQDbBWvmcmL/UcrrE3rdTxiwMd3we/ J1+r5wnSx2ZS0pGC9XIBS228i3AUZdd0sg7fGgEKBt6nSqw8R2utX9vR3JGZhh+UYjAs 4PuQ3lYgBVLYzVYUxGZBxYb3qscJEa/1uKolQ1VYMGo7cfolqYNDUmZbLAvA3JjcoZ/Q DSwta/ExFCNub1q0pDTVw9smcQelxdh2A5YHBTtg3eSWh/PgFFV3NaJZatw4g3PFMA08 slNDVgLKe7gKGLJ+lBPGjC/1VJOuwR6UUzVHD94r7YCoykvQKqk6tfRZcbWfZIJ76RL0 akQg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=KRxRFOfL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ea91d3243fsi1063151fa.396.2024.05.30.12.55.13; Thu, 30 May 2024 12:55:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=KRxRFOfL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2B90D68D594; Thu, 30 May 2024 22:44:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8C02268D5B2 for ; Thu, 30 May 2024 22:44:39 +0300 (EEST) Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-354cd8da8b9so1161831f8f.0 for ; Thu, 30 May 2024 12:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098279; x=1717703079; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=p0lpSmUb/EKhQbpFi7rFvT7XJjGwa8W2+vCjuXFLVnk=; b=KRxRFOfLZcuxl1ms3r/PoiEDNRq2vn4E9E+d3BWUsmz8qEabBgd/Z70UnjlYCxb0WL FuFhg0k+GJ5yvD+thRtdIEnYwmh1zftnBwRilYMxxVg+Z5IC1lTcbCuKkFdLOI9j3mq2 Fe2VIN+YLmfz6YsWA8bkKcOj3Lmvc8fns7bYliOEqI9O/Goi8dQdL18phkzLcufuraMj 6H6ALAsJu2EBb4GahwS7T7X7Kau2kmD14xXHKfSVLBryaGLfvTOdUTsL7kvpxxrvfzXd l3FVMyw+Z44ieiUSowj0reIxckR2qkGqGvkeeYc4o4LBjormMs4r+9ares7UXcI+rVTD wUig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098279; x=1717703079; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p0lpSmUb/EKhQbpFi7rFvT7XJjGwa8W2+vCjuXFLVnk=; b=RjOkDOLtnVzLlvzTkcSa7j8c0gYZ+W9gvwuCcGM+nWQoFozHmtV54P9NFsmKRyaErs MylfRJttq3XDSzGK3tW/K93edZaZHxGwOrWLrlXVwEX3RfTz5iwHmGkJrfIOtC70oFNY Scp0H6MF4ZIZTdQuwbaT8AQWCtutIFduzF+YtkkPPdCWxaLCNDI/w7JO6NxVFkJctHPx eREFHPOmocVW5wuIwLfA3IGoaw8SodFQyZoES8tPMs5vptOj4CLsbiHNCKev7uFzGUCQ PXBCHphsKgVASl9OghI8BMItS5L5UUOCAX8LKXQdjULOkk+JvbeXncbqP9ng/6r0uD/3 DZhw== X-Gm-Message-State: AOJu0YydErJCdAEYosVPYVR+mH3h6vdGfnal37IPus0WM2VPkdw/9ja0 pyHDgIkxhOpI4HepuDg9qZM6wiDsUypmfLwLKLeivO+TFjykyRev3coV1w== X-Received: by 2002:adf:f852:0:b0:35d:a660:4dfc with SMTP id ffacd0b85a97d-35dc00c74f9mr2395845f8f.60.1717098278661; Thu, 30 May 2024 12:44:38 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:38 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:11 +0200 Message-ID: <97094c8a0d54f2122f7eb9e49d5fd3ca39ac9156.1717083800.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 09/16] nvtegra: add mpeg1/2 hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: JT/ko4Z3oQ/o This is probably the most straightforward codec to implement on NVDEC. Since mpeg2 is a superset of mpeg1, both are supported by the same backend. Signed-off-by: averne --- configure | 4 + libavcodec/Makefile | 2 + libavcodec/hwaccels.h | 2 + libavcodec/mpeg12dec.c | 12 ++ libavcodec/nvtegra_mpeg12.c | 319 ++++++++++++++++++++++++++++++++++++ 5 files changed, 339 insertions(+) create mode 100644 libavcodec/nvtegra_mpeg12.c diff --git a/configure b/configure index 566bb37b8c..67db4a2ed2 100755 --- a/configure +++ b/configure @@ -3221,6 +3221,8 @@ mpeg1_vdpau_hwaccel_deps="vdpau" mpeg1_vdpau_hwaccel_select="mpeg1video_decoder" mpeg1_videotoolbox_hwaccel_deps="videotoolbox" mpeg1_videotoolbox_hwaccel_select="mpeg1video_decoder" +mpeg1_nvtegra_hwaccel_deps="nvtegra" +mpeg1_nvtegra_hwaccel_select="mpeg1video_decoder" mpeg2_d3d11va_hwaccel_deps="d3d11va" mpeg2_d3d11va_hwaccel_select="mpeg2video_decoder" mpeg2_d3d11va2_hwaccel_deps="d3d11va" @@ -3237,6 +3239,8 @@ mpeg2_vdpau_hwaccel_deps="vdpau" mpeg2_vdpau_hwaccel_select="mpeg2video_decoder" mpeg2_videotoolbox_hwaccel_deps="videotoolbox" mpeg2_videotoolbox_hwaccel_select="mpeg2video_decoder" +mpeg2_nvtegra_hwaccel_deps="nvtegra" +mpeg2_nvtegra_hwaccel_select="mpeg2video_decoder" mpeg4_nvdec_hwaccel_deps="nvdec" mpeg4_nvdec_hwaccel_select="mpeg4_decoder" mpeg4_vaapi_hwaccel_deps="vaapi" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index f1e2dc6625..e4dfcbce6c 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1026,6 +1026,7 @@ OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL) += vaapi_mjpeg.o OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL) += nvdec_mpeg12.o OBJS-$(CONFIG_MPEG1_VDPAU_HWACCEL) += vdpau_mpeg12.o OBJS-$(CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o +OBJS-$(CONFIG_MPEG1_NVTEGRA_HWACCEL) += nvtegra_mpeg12.o OBJS-$(CONFIG_MPEG2_D3D11VA_HWACCEL) += dxva2_mpeg2.o OBJS-$(CONFIG_MPEG2_DXVA2_HWACCEL) += dxva2_mpeg2.o OBJS-$(CONFIG_MPEG2_D3D12VA_HWACCEL) += dxva2_mpeg2.o d3d12va_mpeg2.o @@ -1034,6 +1035,7 @@ OBJS-$(CONFIG_MPEG2_QSV_HWACCEL) += qsvdec.o OBJS-$(CONFIG_MPEG2_VAAPI_HWACCEL) += vaapi_mpeg2.o OBJS-$(CONFIG_MPEG2_VDPAU_HWACCEL) += vdpau_mpeg12.o OBJS-$(CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o +OBJS-$(CONFIG_MPEG2_NVTEGRA_HWACCEL) += nvtegra_mpeg12.o OBJS-$(CONFIG_MPEG4_NVDEC_HWACCEL) += nvdec_mpeg4.o OBJS-$(CONFIG_MPEG4_VAAPI_HWACCEL) += vaapi_mpeg4.o OBJS-$(CONFIG_MPEG4_VDPAU_HWACCEL) += vdpau_mpeg4.o diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index 5171e4c7d7..ad9e9366f2 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -52,6 +52,7 @@ extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel; extern const struct FFHWAccel ff_mpeg1_nvdec_hwaccel; extern const struct FFHWAccel ff_mpeg1_vdpau_hwaccel; extern const struct FFHWAccel ff_mpeg1_videotoolbox_hwaccel; +extern const struct FFHWAccel ff_mpeg1_nvtegra_hwaccel; extern const struct FFHWAccel ff_mpeg2_d3d11va_hwaccel; extern const struct FFHWAccel ff_mpeg2_d3d11va2_hwaccel; extern const struct FFHWAccel ff_mpeg2_d3d12va_hwaccel; @@ -60,6 +61,7 @@ extern const struct FFHWAccel ff_mpeg2_nvdec_hwaccel; extern const struct FFHWAccel ff_mpeg2_vaapi_hwaccel; extern const struct FFHWAccel ff_mpeg2_vdpau_hwaccel; extern const struct FFHWAccel ff_mpeg2_videotoolbox_hwaccel; +extern const struct FFHWAccel ff_mpeg2_nvtegra_hwaccel; extern const struct FFHWAccel ff_mpeg4_nvdec_hwaccel; extern const struct FFHWAccel ff_mpeg4_vaapi_hwaccel; extern const struct FFHWAccel ff_mpeg4_vdpau_hwaccel; diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c index 9fd765f030..7d8ecae542 100644 --- a/libavcodec/mpeg12dec.c +++ b/libavcodec/mpeg12dec.c @@ -835,6 +835,9 @@ static const enum AVPixelFormat mpeg1_hwaccel_pixfmt_list_420[] = { #endif #if CONFIG_MPEG1_VDPAU_HWACCEL AV_PIX_FMT_VDPAU, +#endif +#if CONFIG_MPEG1_NVTEGRA_HWACCEL + AV_PIX_FMT_NVTEGRA, #endif AV_PIX_FMT_YUV420P, AV_PIX_FMT_NONE @@ -862,6 +865,9 @@ static const enum AVPixelFormat mpeg2_hwaccel_pixfmt_list_420[] = { #endif #if CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL AV_PIX_FMT_VIDEOTOOLBOX, +#endif +#if CONFIG_MPEG2_NVTEGRA_HWACCEL + AV_PIX_FMT_NVTEGRA, #endif AV_PIX_FMT_YUV420P, AV_PIX_FMT_NONE @@ -2624,6 +2630,9 @@ const FFCodec ff_mpeg1video_decoder = { #endif #if CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL HWACCEL_VIDEOTOOLBOX(mpeg1), +#endif +#if CONFIG_MPEG1_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(mpeg1), #endif NULL }, @@ -2696,6 +2705,9 @@ const FFCodec ff_mpeg2video_decoder = { #endif #if CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL HWACCEL_VIDEOTOOLBOX(mpeg2), +#endif +#if CONFIG_MPEG2_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(mpeg2), #endif NULL }, diff --git a/libavcodec/nvtegra_mpeg12.c b/libavcodec/nvtegra_mpeg12.c new file mode 100644 index 0000000000..2206635a7d --- /dev/null +++ b/libavcodec/nvtegra_mpeg12.c @@ -0,0 +1,319 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "config_components.h" + +#include + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "mpegvideo.h" +#include "mpegutils.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraMPEG12DecodeContext { + FFNVTegraDecodeContext core; + + AVFrame *prev_frame, *next_frame; +} NVTegraMPEG12DecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +static const uint8_t bitstream_end_sequence[16] = { + 0x00, 0x00, 0x01, 0xb7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0xb7, 0x00, 0x00, 0x00, 0x00, +}; + +static int nvtegra_mpeg12_decode_uninit(AVCodecContext *avctx) { + NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MPEG12 decoder\n"); + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mpeg12_decode_init(AVCodecContext *avctx) { + NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + uint32_t num_slices; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA MPEG12 decoder\n"); + + num_slices = (FFALIGN(avctx->coded_width, MB_SIZE) / MB_SIZE) * + (FFALIGN(avctx->coded_height, MB_SIZE) / MB_SIZE); + num_slices = FFMIN(num_slices, 8160); + + /* Ignored: histogram map, size 0x400 */ + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_mpeg2_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.slice_offsets_off = FFALIGN(ctx->core.cmdbuf_off + AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->core.slice_offsets_off + num_slices * sizeof(uint32_t), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->core.slice_offsets_off - ctx->core.cmdbuf_off; + ctx->core.max_num_slices = (ctx->core.bitstream_off - ctx->core.slice_offsets_off) / sizeof(uint32_t); + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + return 0; + +fail: + nvtegra_mpeg12_decode_uninit(avctx); + return err; +} + +static void nvtegra_mpeg12_prepare_frame_setup(nvdec_mpeg2_pic_s *setup, MpegEncContext *s, + NVTegraMPEG12DecodeContext *ctx) +{ + *setup = (nvdec_mpeg2_pic_s){ + .gptimer_timeout_value = 0, /* Default value */ + + .FrameWidth = FFALIGN(s->width, MB_SIZE), + .FrameHeight = FFALIGN(s->height, MB_SIZE), + + .picture_structure = s->picture_structure, + .picture_coding_type = s->pict_type, + .intra_dc_precision = s->intra_dc_precision, + .frame_pred_frame_dct = s->frame_pred_frame_dct, + .concealment_motion_vectors = s->concealment_motion_vectors, + .intra_vlc_format = s->intra_vlc_format, + + .tileFormat = 0, /* TBL */ + .gob_height = 0, /* GOB_2 */ + + .f_code = { + s->mpeg_f_code[0][0], s->mpeg_f_code[0][1], + s->mpeg_f_code[1][0], s->mpeg_f_code[1][1], + }, + + .PicWidthInMbs = FFALIGN(s->width, MB_SIZE) / MB_SIZE, + .FrameHeightInMbs = FFALIGN(s->height, MB_SIZE) / MB_SIZE, + .pitch_luma = s->current_picture.f->linesize[0], + .pitch_chroma = s->current_picture.f->linesize[1], + .luma_top_offset = 0, + .luma_bot_offset = 0, + .luma_frame_offset = 0, + .chroma_top_offset = 0, + .chroma_bot_offset = 0, + .chroma_frame_offset = 0, + .alternate_scan = s->alternate_scan, + .secondfield = s->picture_structure != PICT_FRAME && !s->first_field, + .rounding_type = 0, + .q_scale_type = s->q_scale_type, + .top_field_first = s->top_field_first, + .full_pel_fwd_vector = (s->codec_id != AV_CODEC_ID_MPEG2VIDEO) ? s->full_pel[0] : 0, + .full_pel_bwd_vector = (s->codec_id != AV_CODEC_ID_MPEG2VIDEO) ? s->full_pel[1] : 0, + .output_memory_layout = 0, /* NV12 */ + .ref_memory_layout = { 0, 0 }, /* NV12 */ + }; + + for (int i = 0; i < FF_ARRAY_ELEMS(setup->quant_mat_8x8intra); ++i) { + setup->quant_mat_8x8intra [i] = (NvU8)s->intra_matrix[i]; + setup->quant_mat_8x8nonintra[i] = (NvU8)s->inter_matrix[i]; + } +} + +static int nvtegra_mpeg12_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, MpegEncContext *s, NVTegraMPEG12DecodeContext *ctx, + AVFrame *current_frame, AVFrame *prev_frame, AVFrame *next_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)current_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int err, codec_id; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + switch (s->codec_id) { + case AV_CODEC_ID_MPEG1VIDEO: + codec_id = NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG1; + break; + case AV_CODEC_ID_MPEG2VIDEO: + codec_id = NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG2; + break; + default: + return AVERROR(EINVAL); + } + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, MPEG12)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, codec_id | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET, + input_map, ctx->core.slice_offsets_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(fr, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0], \ + NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + PUSH_FRAME(current_frame, 0); + PUSH_FRAME(prev_frame, 1); + PUSH_FRAME(next_frame, 2); + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mpeg12_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + MpegEncContext *s = avctx->priv_data; + AVFrame *frame = s->current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting MPEG12-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + nvtegra_mpeg12_prepare_frame_setup((nvdec_mpeg2_pic_s *)(mem + ctx->core.pic_setup_off), s, ctx); + + ctx->prev_frame = (s->pict_type != AV_PICTURE_TYPE_I) ? s->last_picture.f : frame; + ctx->next_frame = (s->pict_type == AV_PICTURE_TYPE_B) ? s->next_picture.f : frame; + + return 0; +} + +static int nvtegra_mpeg12_end_frame(AVCodecContext *avctx) { + MpegEncContext *s = avctx->priv_data; + NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = s->current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_mpeg2_pic_s *setup; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending MPEG12-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_mpeg2_pic_s *)(mem + ctx->core.pic_setup_off); + setup->stream_len = ctx->core.bitstream_len + sizeof(bitstream_end_sequence); + setup->slice_count = ctx->core.num_slices; + + err = nvtegra_mpeg12_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame, + ctx->prev_frame, ctx->next_frame); + if (err < 0) + return err; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence, + sizeof(bitstream_end_sequence)); +} + +static int nvtegra_mpeg12_decode_slice(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + MpegEncContext *s = avctx->priv_data; + AVFrame *frame = s->current_picture.f; + + return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false); +} + +#if CONFIG_MPEG1_NVTEGRA_HWACCEL +const FFHWAccel ff_mpeg1_nvtegra_hwaccel = { + .p.name = "mpeg1_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_MPEG1VIDEO, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_mpeg12_start_frame, + .end_frame = &nvtegra_mpeg12_end_frame, + .decode_slice = &nvtegra_mpeg12_decode_slice, + .init = &nvtegra_mpeg12_decode_init, + .uninit = &nvtegra_mpeg12_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraMPEG12DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif + +#if CONFIG_MPEG2_NVTEGRA_HWACCEL +const FFHWAccel ff_mpeg2_nvtegra_hwaccel = { + .p.name = "mpeg2_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_MPEG2VIDEO, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_mpeg12_start_frame, + .end_frame = &nvtegra_mpeg12_end_frame, + .decode_slice = &nvtegra_mpeg12_decode_slice, + .init = &nvtegra_mpeg12_decode_init, + .uninit = &nvtegra_mpeg12_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraMPEG12DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif From patchwork Thu May 30 19:43:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49419 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67658vqg; Thu, 30 May 2024 12:45:49 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX4ebsj94NHBmhEZW/rvGt4jnyrRoOJkVeJ37sRhFaVem9UdrL3E+vidUssFBDqa8dXL9OubdJJeS2SdnecmGLZ+CxWUdFSgbAw0g== X-Google-Smtp-Source: AGHT+IH4MZfynrMk3z6nIkthu7dXnB05wicE6QhDhjTfOer/Gco6mvP/GSvfPTLoifKKfZRaOIgy X-Received: by 2002:a17:906:d29b:b0:a5c:e9e4:99b8 with SMTP id a640c23a62f3a-a65e923c4b6mr193351466b.74.1717098349296; Thu, 30 May 2024 12:45:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098349; cv=none; d=google.com; s=arc-20160816; b=fxdTZJiv5Y32YfL5Gi6kidM05WA7qmWHzHUTKfyUhbyFjRoUBXHpmZCTE75PIL91tu HymDZfdqgorPdzV0WOnfCyo6GF0gIwoAoTfrl9GLfpBCw9uv/TS6UuaDVOmXP82s51w0 oMU5wV6QpoF4Zs2C3L+WT14R+DRkSoFlTs73VJDtt5K7XuCwS2SNssW9kPMVk6N70KGO mmiZ/f+ypRyXwiO43l5y37Hfp1X7HKv20MsvQiWP2ss+t6Yq8ku+Q/YKj+8X5XZhBYRS HO1HTTNXutthgrP4cXsVFeDAw5xjFKMc3ZEghmh2KBZAUA3kWIV00OF2vVJXnwMvw6BU ju2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=T4Zw2bXYSWd6GCPD/ftrrO2K2ArKpEAzHERtHsMLr4U=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=iLn+8nbbGStuPHEk6z+or6AC/ySlypHYfeuT2GH+/jpn3iK13wlZSLD8xSaPNzYECD qeoGtgVfprrvZkYHFxkl6/3H3Yxmw7c0aSAfS8RwInnT8WvLXBXWBXwbc2tBnpP/OAZG MNHf1gsa3g6XwCxEGbltX57SpunXikXyTyytYUASdrKLgi0jjwxhl5iAzGYyudImvley EGig6fB1ESjreHqh1lTGPY65opzQoRaLg8UkEAofndurLQsT9Mg/wMwjYrHHfYyzRrvL z7KdaPweGM0eTVqXNIBq6qlj3rVQU3Y594//YJad5gknEx6NM6FKjPTCs2RmzvkLWsQu 1isw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Z6Gntmam; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67eae70292si8767966b.767.2024.05.30.12.45.48; Thu, 30 May 2024 12:45:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Z6Gntmam; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8C30668D441; Thu, 30 May 2024 22:44:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9B6E268D4BA for ; Thu, 30 May 2024 22:44:40 +0300 (EEST) Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-35dcff36522so205785f8f.1 for ; Thu, 30 May 2024 12:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098280; x=1717703080; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CU9UvuCYhsiOGdVxX5+zhQwHcVbxOAyFlrkYS92YfXM=; b=Z6Gntmam8kyUi3TPMOLI3YjiADqqYmcCp0qMeVTmf1Nu1i/l3rTtMyBLz3N97RbY2Z VJq74AV1vm82md7oDBn2j/38kO9kTapjKnYi13pYvmhfRIkqTeNgdczIb6SwaNDZAuG+ rmu7HR+QRaXDEzhxEirjyl+5gf4kahpMc6++xs9af+EF9hUsGVW2D8nhHD6dUNIe2Gbu X8FBfMiC2KAAcPbM2ZQb9cnJTwikLD1dF32psZkccVnNv+KxFSowoi8hz5YMZVbWTR+I 5S/SU9eRVGrOqpuieVi3zes4L88Vg9nwQs8TV8T78AQNs2BN6el5n5F0VCeijTtADFdu dxzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098280; x=1717703080; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CU9UvuCYhsiOGdVxX5+zhQwHcVbxOAyFlrkYS92YfXM=; b=bf7lybjWFGvELUKAgT2Sa9POmY8feE2aw/s3a7DZdG/voT92Q8VP8L34Q+DnECktMJ kHIqcIUyNH+nZT6uWKZLPmoA5SOrntTGNh3cAhVNzsVigoaMMosseHpN8nj2K0qm73Pn /IjIawCwUo8aw6VmaO30GEjovE4o+D3RhWgVPfVJ/udokn1ANSUcFMQhafVLdOw6HG+b ny9KPt/0olFiwgm3+TvW8+abbo32suQZwtt/aYrXNwLupZlFreMlmQ1euucV8+FjKjCB 2wKlwTzVSFgDc6VVZsJxyvrB32wy/MNlI76DwkU2TVL0iqLc5+3ZclXhlSYeIZO/TSh2 KmHg== X-Gm-Message-State: AOJu0YzrPPamzNdVnjc3C0S1QzpSjOfvrOdjdTyYx4gdK6burJXF3bH+ z4wmm3uokPiLU+cZQLgve97QynYp+cBQ2SmFmNinPtgugMh2ZNKX/72fcg== X-Received: by 2002:adf:db52:0:b0:34c:ab55:bf1 with SMTP id ffacd0b85a97d-35dc0085276mr1908879f8f.2.1717098279705; Thu, 30 May 2024 12:44:39 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:39 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:12 +0200 Message-ID: <97e067d90ed4c59897a6606aaab6e96384f34aef.1717083800.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 10/16] nvtegra: add mpeg4 hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 7hpQMiT3SoS+ Signed-off-by: averne --- configure | 2 + libavcodec/Makefile | 1 + libavcodec/h263dec.c | 6 + libavcodec/hwaccels.h | 1 + libavcodec/mpeg4videodec.c | 3 + libavcodec/nvtegra_mpeg4.c | 344 +++++++++++++++++++++++++++++++++++++ 6 files changed, 357 insertions(+) create mode 100644 libavcodec/nvtegra_mpeg4.c diff --git a/configure b/configure index 67db4a2ed2..0795f44a1e 100755 --- a/configure +++ b/configure @@ -3251,6 +3251,8 @@ mpeg4_videotoolbox_hwaccel_deps="videotoolbox" mpeg4_videotoolbox_hwaccel_select="mpeg4_decoder" prores_videotoolbox_hwaccel_deps="videotoolbox" prores_videotoolbox_hwaccel_select="prores_decoder" +mpeg4_nvtegra_hwaccel_deps="nvtegra" +mpeg4_nvtegra_hwaccel_select="mpeg4_decoder" vc1_d3d11va_hwaccel_deps="d3d11va" vc1_d3d11va_hwaccel_select="vc1_decoder" vc1_d3d11va2_hwaccel_deps="d3d11va" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index e4dfcbce6c..1ea9984876 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1040,6 +1040,7 @@ OBJS-$(CONFIG_MPEG4_NVDEC_HWACCEL) += nvdec_mpeg4.o OBJS-$(CONFIG_MPEG4_VAAPI_HWACCEL) += vaapi_mpeg4.o OBJS-$(CONFIG_MPEG4_VDPAU_HWACCEL) += vdpau_mpeg4.o OBJS-$(CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o +OBJS-$(CONFIG_MPEG4_NVTEGRA_HWACCEL) += nvtegra_mpeg4.o OBJS-$(CONFIG_VC1_D3D11VA_HWACCEL) += dxva2_vc1.o OBJS-$(CONFIG_VC1_DXVA2_HWACCEL) += dxva2_vc1.o OBJS-$(CONFIG_VC1_D3D12VA_HWACCEL) += dxva2_vc1.o d3d12va_vc1.o diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c index 48bd467f30..db25e09ff3 100644 --- a/libavcodec/h263dec.c +++ b/libavcodec/h263dec.c @@ -60,6 +60,9 @@ static const enum AVPixelFormat h263_hwaccel_pixfmt_list_420[] = { #endif #if CONFIG_H263_VIDEOTOOLBOX_HWACCEL || CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL AV_PIX_FMT_VIDEOTOOLBOX, +#endif +#if CONFIG_MPEG4_NVTEGRA_HWACCEL + AV_PIX_FMT_NVTEGRA, #endif AV_PIX_FMT_YUV420P, AV_PIX_FMT_NONE @@ -690,6 +693,9 @@ static const AVCodecHWConfigInternal *const h263_hw_config_list[] = { #if CONFIG_MPEG4_VDPAU_HWACCEL HWACCEL_VDPAU(mpeg4), #endif +#if CONFIG_MPEG4_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(mpeg4), +#endif #if CONFIG_H263_VIDEOTOOLBOX_HWACCEL HWACCEL_VIDEOTOOLBOX(h263), #endif diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index ad9e9366f2..da2b4ae10e 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -67,6 +67,7 @@ extern const struct FFHWAccel ff_mpeg4_vaapi_hwaccel; extern const struct FFHWAccel ff_mpeg4_vdpau_hwaccel; extern const struct FFHWAccel ff_mpeg4_videotoolbox_hwaccel; extern const struct FFHWAccel ff_prores_videotoolbox_hwaccel; +extern const struct FFHWAccel ff_mpeg4_nvtegra_hwaccel; extern const struct FFHWAccel ff_vc1_d3d11va_hwaccel; extern const struct FFHWAccel ff_vc1_d3d11va2_hwaccel; extern const struct FFHWAccel ff_vc1_d3d12va_hwaccel; diff --git a/libavcodec/mpeg4videodec.c b/libavcodec/mpeg4videodec.c index df1e22207d..15e2da5e88 100644 --- a/libavcodec/mpeg4videodec.c +++ b/libavcodec/mpeg4videodec.c @@ -3882,6 +3882,9 @@ const FFCodec ff_mpeg4_decoder = { #endif #if CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL HWACCEL_VIDEOTOOLBOX(mpeg4), +#endif +#if CONFIG_MPEG4_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(mpeg4), #endif NULL }, diff --git a/libavcodec/nvtegra_mpeg4.c b/libavcodec/nvtegra_mpeg4.c new file mode 100644 index 0000000000..2325380330 --- /dev/null +++ b/libavcodec/nvtegra_mpeg4.c @@ -0,0 +1,344 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "mpeg4video.h" +#include "mpeg4videodec.h" +#include "mpeg4videodefs.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraMPEG4DecodeContext { + FFNVTegraDecodeContext core; + + AVNVTegraMap common_map; + uint32_t coloc_off, history_off, scratch_off; + uint32_t history_size, scratch_size; + + AVFrame *prev_frame, *next_frame; +} NVTegraMPEG4DecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +static const uint8_t bitstream_end_sequence[16] = { + 0x00, 0x00, 0x01, 0xb1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0xb1, 0x00, 0x00, 0x00, 0x00, +}; + +static int nvtegra_mpeg4_decode_uninit(AVCodecContext *avctx) { + NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MPEG4 decoder\n"); + + err = av_nvtegra_map_destroy(&ctx->common_map); + if (err < 0) + return err; + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mpeg4_decode_init(AVCodecContext *avctx) { + NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + uint32_t width_in_mbs, height_in_mbs, + coloc_size, history_size, scratch_size, common_map_size; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA MPEG4 decoder\n"); + + /* Ignored: histogram map, size 0x400 */ + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_mpeg4_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->core.cmdbuf_off + AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->core.bitstream_off - ctx->core.cmdbuf_off; + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + width_in_mbs = FFALIGN(avctx->coded_width, MB_SIZE) / MB_SIZE; + height_in_mbs = FFALIGN(avctx->coded_height, MB_SIZE) / MB_SIZE; + coloc_size = FFALIGN(FFALIGN(height_in_mbs, 2) * (width_in_mbs * 64) - 63, 0x100); + history_size = FFALIGN(width_in_mbs * 0x100 + 0x1100, 0x100); + scratch_size = 0x400; + + ctx->coloc_off = 0; + ctx->history_off = FFALIGN(ctx->coloc_off + coloc_size, AV_NVTEGRA_MAP_ALIGN); + ctx->scratch_off = FFALIGN(ctx->history_off + history_size, AV_NVTEGRA_MAP_ALIGN); + common_map_size = FFALIGN(ctx->scratch_off + scratch_size, 0x1000); + + err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + ctx->history_size = history_size; + ctx->scratch_size = scratch_size; + + return 0; + +fail: + nvtegra_mpeg4_decode_uninit(avctx); + return err; +} + +static void nvtegra_mpeg4_prepare_frame_setup(nvdec_mpeg4_pic_s *setup, AVCodecContext *avctx, + NVTegraMPEG4DecodeContext *ctx) +{ + Mpeg4DecContext *m = avctx->priv_data; + MpegEncContext *s = &m->m; + + int i; + + *setup = (nvdec_mpeg4_pic_s){ + .scratch_pic_buffer_size = ctx->scratch_size, + + .gptimer_timeout_value = 0, /* Default value */ + + .FrameWidth = FFALIGN(s->width, MB_SIZE), + .FrameHeight = FFALIGN(s->height, MB_SIZE), + + .vop_time_increment_bitcount = m->time_increment_bits, + .resync_marker_disable = !m->resync_marker, + + .tileFormat = 0, /* TBL */ + .gob_height = 0, /* GOB_2 */ + + .width = FFALIGN(s->width, MB_SIZE), + .height = FFALIGN(s->height, MB_SIZE), + + .FrameStride = { + s->current_picture.f->linesize[0], + s->current_picture.f->linesize[1], + }, + + .luma_top_offset = 0, + .luma_bot_offset = 0, + .luma_frame_offset = 0, + .chroma_top_offset = 0, + .chroma_bot_offset = 0, + .chroma_frame_offset = 0, + + .HistBufferSize = ctx->history_size / 256, + + .trd = { s->pp_time, s->pp_field_time >> 1 }, + .trb = { s->pb_time, s->pb_field_time >> 1 }, + + .vop_fcode_forward = s->f_code, + .vop_fcode_backward = s->b_code, + + .interlaced = s->interlaced_dct, + .quant_type = s->mpeg_quant, + .quarter_sample = s->quarter_sample, + .short_video_header = avctx->codec->id == AV_CODEC_ID_H263, + + .curr_output_memory_layout = 0, /* NV12 */ + + .ptype = s->pict_type - AV_PICTURE_TYPE_I, + .rnd = s->no_rounding, + .alternate_vertical_scan_flag = s->alternate_scan, + + .ref_memory_layout = { 0, 0 }, /* NV12 */ + }; + + for (i = 0; i < 64; ++i) { + setup->intra_quant_mat [i] = s->intra_matrix[i]; + setup->nonintra_quant_mat[i] = s->inter_matrix[i]; + } +} + +static int nvtegra_mpeg4_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, MpegEncContext *s, NVTegraMPEG4DecodeContext *ctx, + AVFrame *cur_frame, AVFrame *prev_frame, AVFrame *next_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)cur_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int err; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, MPEG4)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, + AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE, MPEG4) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET, + &ctx->common_map, ctx->coloc_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET, + &ctx->common_map, ctx->history_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET, + &ctx->common_map, ctx->scratch_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(fr, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0], \ + NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + PUSH_FRAME(cur_frame, 0); + PUSH_FRAME(prev_frame, 1); + PUSH_FRAME(next_frame, 2); + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mpeg4_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + Mpeg4DecContext *m = avctx->priv_data; + MpegEncContext *s = &m->m; + AVFrame *frame = s->current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting MPEG4-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + nvtegra_mpeg4_prepare_frame_setup((nvdec_mpeg4_pic_s *)(mem + ctx->core.pic_setup_off), avctx, ctx); + + ctx->prev_frame = (s->pict_type != AV_PICTURE_TYPE_I) ? s->last_picture.f : frame; + ctx->next_frame = (s->pict_type == AV_PICTURE_TYPE_B) ? s->next_picture.f : frame; + + return 0; +} + +static int nvtegra_mpeg4_end_frame(AVCodecContext *avctx) { + Mpeg4DecContext *m = avctx->priv_data; + MpegEncContext *s = &m->m; + NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = s->current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_mpeg4_pic_s *setup; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending MPEG4-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_mpeg4_pic_s *)(mem + ctx->core.pic_setup_off); + setup->stream_len = ctx->core.bitstream_len + sizeof(bitstream_end_sequence); + setup->slice_count = ctx->core.num_slices; + + err = nvtegra_mpeg4_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame, + ctx->prev_frame, ctx->next_frame); + if (err < 0) + return err; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence, + sizeof(bitstream_end_sequence)); +} + +static int nvtegra_mpeg4_decode_slice(AVCodecContext *avctx, const uint8_t *buf, + uint32_t buf_size) +{ + Mpeg4DecContext *m = avctx->priv_data; + AVFrame *frame = m->m.current_picture.f; + + /* Rewind the bitstream looking for the VOP start marker */ + while (*(uint32_t *)buf != AV_BE2NE32C(VOP_STARTCODE)) + buf -= 1, buf_size += 1; + + return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false); +} + +#if CONFIG_MPEG4_NVTEGRA_HWACCEL +const FFHWAccel ff_mpeg4_nvtegra_hwaccel = { + .p.name = "mpeg4_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_MPEG4, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_mpeg4_start_frame, + .end_frame = &nvtegra_mpeg4_end_frame, + .decode_slice = &nvtegra_mpeg4_decode_slice, + .init = &nvtegra_mpeg4_decode_init, + .uninit = &nvtegra_mpeg4_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraMPEG4DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif From patchwork Thu May 30 19:43:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49423 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp71538vqg; Thu, 30 May 2024 12:55:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVfH7Eynnoffvu1tmRT61OHlzQBIsE3kjg5JKZ3Bf8gVvSw8ZBFDOPkHeiDzqrO2KQ30qI3OUWQzJh/lFhpP1NAbqlmMOgDpwHd1w== X-Google-Smtp-Source: AGHT+IEVdef9g5oqPKGB962/Y48zliCmejtuyIpP5QOzWbXdj+pWUm8e4rW/spkc8+U/imZJkqBQ X-Received: by 2002:a2e:9297:0:b0:2e6:935f:b6d3 with SMTP id 38308e7fff4ca-2ea847c945dmr20480671fa.14.1717098917637; Thu, 30 May 2024 12:55:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098917; cv=none; d=google.com; s=arc-20160816; b=KYcEB1TS8P2WBZZ73VlhowCURG2nsjUV+3ZvuKBRSB3Yxnft5wG0ShMk4Cs8kSw022 /oPC4t08Gwhg+oto8N2k5TY3fIUwbgT0pyNCMzFFMxbhosNhmPUuqrpZniwbwHiJfd3w dLPG77TgDKxlkPxpwUytWp+JWETbGv0HT1CEqfw18fYjORbo0TS0gYWrWOlulXCx06ux YiJotp7q9jODxBaRUqeICAXC2kSHEbHGV5H70kHNHNaWwMzD8+LHkR0G3dr73UbEOeay Wr/9+TROyegT9xH6YDRGP1eHNT5eXhNvmZey7msXA4x3QKEKQO0Go+0LXs1Mp9Srytn9 FXBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=ECYybbyLMepo+8VvXJX9F/cQG3F+3hG3cw/DUxQucWY=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=BxEgyDyNVKYDuWa3lQm6Gq0IbzzP9P8eEPOMjlK5f46MraQR71suts8CFuYJn1xGQK pesHXRf6o3yaoIQC4RkT6o1qJG+i2IsoGnrorRPAKUi9DHcMWJ7cQxn1HJRxzIV+gLcL erAqg6u+gkVCyKkCF1KSPJsGsWEl/M9Wl8MoDdKMQmezLLya12dlnYEvUN+bx1lX/tMf CE+jI/44JQx0aX6vNVn0SoQCRcG7+8+PyMzCaApBHd2gA3d7E//W9IAX8LrBM5555mNX x5NlrTM8AoHcC4MbhQ4C+4wB2LB74jHXPWZw8uLJ1c8wwplGHiUEetn2kkx7Vf6RnHvD 1aXQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Ynho9dwm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ea91d20e52si1058791fa.245.2024.05.30.12.55.17; Thu, 30 May 2024 12:55:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Ynho9dwm; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BBD6168D5B4; Thu, 30 May 2024 22:44:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D808268D5B4 for ; Thu, 30 May 2024 22:44:41 +0300 (EEST) Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-35dc7d0387cso1350295f8f.1 for ; Thu, 30 May 2024 12:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098281; x=1717703081; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4PfTdAg3WA9SnVq4sMP6tzufNGZBEqoWfX9o/vsVkL8=; b=Ynho9dwmKUvmtZs4hCdZ78Cyq23VCWJn0Sq/SgVVjOStNzmiuHVqIQDtvUeWyHXxI4 uxWTfwgV8gihpCLZyDWVGKWrDUeAhO21ER+Ux2MycgcX/+FQM1Oz7myczB7FbgWBUTD8 PxMJDHASckfwjfEL2WlhQWHztegN3qm6b3qkQiRL/CeRu4DKl1WVlJXBVXbngJunAVcq kZ2zO0ARX6Vvz653OUVrCmPsNUemxw/VJ96mnrUOLWcNLZeD6IN71AZ10Y0qnIyk1KZ5 wvM8VUmqEC230gvoOKDNzqcRJoqzrf1x2763M+XaLlT/dp+mRFsb2DnH/Uk9aGO34Ms2 8Q9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098281; x=1717703081; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4PfTdAg3WA9SnVq4sMP6tzufNGZBEqoWfX9o/vsVkL8=; b=vix4mQBx7u3Xu813strWrmwJSYipbBde2+e8344cm48zGM3xK+YsE7nK9T/K2yNAd9 YZ7tbYwPYCR7yM+i/1OzQckioKIxIdFljdmTM6z6JczugiFVZ94Rpo5h59sjvGFCqSPu 7gJgqs+fecxhgjj0LMB1X1oKgIhByIvcudpo89Bu6w4fVUuivMUbm8z/FABq8BjjOxBO u38c/FLge2NyvueY1D9+PhddeTBAgjbZgkWx4zT/XexEL+x2mPRVSRrWMn9ROcoF2mTM sKikR+K6Ls5GSqIkD17aPwq7TUJMcfsokp0/duygOEc24WLSv7bSKnjBf4QUaZ9Le/1N qRmw== X-Gm-Message-State: AOJu0YzNZ2zB1aDl1ouATZqBQJH3dkLvf8ocmmsO+2ZV9gIrRHtiT8JL msSyEBzTXG3Y1ljRVCS3yWiJO5KP79ybuDDmLGmnJGtB56jL/oAdzzX96w== X-Received: by 2002:adf:ea92:0:b0:354:de28:9eb3 with SMTP id ffacd0b85a97d-35dc003393bmr2936882f8f.0.1717098280931; Thu, 30 May 2024 12:44:40 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:40 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:13 +0200 Message-ID: <135f790a32b417a090ba0fc1227f2ca9d9052758.1717083800.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 11/16] nvtegra: add vc1 hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 2yMAbAS2fNHl Since L4T does not hook up the vc1 code to a user-facing library, this was written solely based on static reverse engineering. Signed-off-by: averne --- configure | 3 + libavcodec/Makefile | 1 + libavcodec/hwaccels.h | 2 + libavcodec/nvtegra_vc1.c | 455 +++++++++++++++++++++++++++++++++++++++ libavcodec/vc1dec.c | 9 + 5 files changed, 470 insertions(+) create mode 100644 libavcodec/nvtegra_vc1.c diff --git a/configure b/configure index 0795f44a1e..952e3aef7d 100755 --- a/configure +++ b/configure @@ -3267,6 +3267,8 @@ vc1_vaapi_hwaccel_deps="vaapi" vc1_vaapi_hwaccel_select="vc1_decoder" vc1_vdpau_hwaccel_deps="vdpau" vc1_vdpau_hwaccel_select="vc1_decoder" +vc1_nvtegra_hwaccel_deps="nvtegra" +vc1_nvtegra_hwaccel_select="vc1_decoder" vp8_nvdec_hwaccel_deps="nvdec" vp8_nvdec_hwaccel_select="vp8_decoder" vp8_vaapi_hwaccel_deps="vaapi" @@ -3294,6 +3296,7 @@ wmv3_dxva2_hwaccel_select="vc1_dxva2_hwaccel" wmv3_nvdec_hwaccel_select="vc1_nvdec_hwaccel" wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel" wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel" +wmv3_nvtegra_hwaccel_select="vc1_nvtegra_hwaccel" # hardware-accelerated codecs mediafoundation_deps="mftransform_h MFCreateAlignedMemoryBuffer" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 1ea9984876..e102d03e7d 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1048,6 +1048,7 @@ OBJS-$(CONFIG_VC1_NVDEC_HWACCEL) += nvdec_vc1.o OBJS-$(CONFIG_VC1_QSV_HWACCEL) += qsvdec.o OBJS-$(CONFIG_VC1_VAAPI_HWACCEL) += vaapi_vc1.o OBJS-$(CONFIG_VC1_VDPAU_HWACCEL) += vdpau_vc1.o +OBJS-$(CONFIG_VC1_NVTEGRA_HWACCEL) += nvtegra_vc1.o OBJS-$(CONFIG_VP8_NVDEC_HWACCEL) += nvdec_vp8.o OBJS-$(CONFIG_VP8_VAAPI_HWACCEL) += vaapi_vp8.o OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL) += dxva2_vp9.o diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index da2b4ae10e..a69e6a1977 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -75,6 +75,7 @@ extern const struct FFHWAccel ff_vc1_dxva2_hwaccel; extern const struct FFHWAccel ff_vc1_nvdec_hwaccel; extern const struct FFHWAccel ff_vc1_vaapi_hwaccel; extern const struct FFHWAccel ff_vc1_vdpau_hwaccel; +extern const struct FFHWAccel ff_vc1_nvtegra_hwaccel; extern const struct FFHWAccel ff_vp8_nvdec_hwaccel; extern const struct FFHWAccel ff_vp8_vaapi_hwaccel; extern const struct FFHWAccel ff_vp9_d3d11va_hwaccel; @@ -92,5 +93,6 @@ extern const struct FFHWAccel ff_wmv3_dxva2_hwaccel; extern const struct FFHWAccel ff_wmv3_nvdec_hwaccel; extern const struct FFHWAccel ff_wmv3_vaapi_hwaccel; extern const struct FFHWAccel ff_wmv3_vdpau_hwaccel; +extern const struct FFHWAccel ff_wmv3_nvtegra_hwaccel; #endif /* AVCODEC_HWACCELS_H */ diff --git a/libavcodec/nvtegra_vc1.c b/libavcodec/nvtegra_vc1.c new file mode 100644 index 0000000000..b5ee85c9d4 --- /dev/null +++ b/libavcodec/nvtegra_vc1.c @@ -0,0 +1,455 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "vc1.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraVC1DecodeContext { + FFNVTegraDecodeContext core; + + AVNVTegraMap common_map; + uint32_t coloc_off, history_off, scratch_off; + uint32_t history_size, scratch_size; + + bool is_first_slice; + + AVFrame *prev_frame, *next_frame; +} NVTegraVC1DecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +static const uint8_t bitstream_end_sequence[] = { + 0x00, 0x00, 0x01, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x0a, 0x00, 0x00, 0x00, 0x00, +}; + +static int nvtegra_vc1_decode_uninit(AVCodecContext *avctx) { + NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VC1 decoder\n"); + + err = av_nvtegra_map_destroy(&ctx->common_map); + if (err < 0) + return err; + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_vc1_decode_init(AVCodecContext *avctx) { + NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + uint32_t width_in_mbs, height_in_mbs, num_slices, + coloc_size, history_size, scratch_size, common_map_size; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA VC1 decoder\n"); + + width_in_mbs = FFALIGN(avctx->coded_width, MB_SIZE) / MB_SIZE; + height_in_mbs = FFALIGN(avctx->coded_height, MB_SIZE) / MB_SIZE; + + num_slices = width_in_mbs * height_in_mbs; + + /* Ignored: histogram map, size 0x400 */ + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_vc1_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.slice_offsets_off = FFALIGN(ctx->core.cmdbuf_off + AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->core.slice_offsets_off + num_slices * sizeof(uint32_t), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->core.slice_offsets_off - ctx->core.cmdbuf_off; + ctx->core.max_num_slices = (ctx->core.bitstream_off - ctx->core.slice_offsets_off) / sizeof(uint32_t); + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + coloc_size = 3 * FFALIGN(width_in_mbs * FFALIGN(height_in_mbs, 2) * 64 - 63, AV_NVTEGRA_MAP_ALIGN); + history_size = FFALIGN(width_in_mbs, 2) * 0x300; + scratch_size = 0x400; + + ctx->coloc_off = 0; + ctx->history_off = FFALIGN(ctx->coloc_off + coloc_size, AV_NVTEGRA_MAP_ALIGN); + ctx->scratch_off = FFALIGN(ctx->history_off + history_size, AV_NVTEGRA_MAP_ALIGN); + common_map_size = FFALIGN(ctx->scratch_off + scratch_size, 0x1000); + + err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + mem = av_nvtegra_map_get_addr(&ctx->common_map); + + memset(mem + ctx->coloc_off, 0, coloc_size); + memset(mem + ctx->history_off, 0, history_size); + memset(mem + ctx->scratch_off, 0, scratch_size); + + ctx->history_size = history_size; + ctx->scratch_size = scratch_size; + + return 0; + +fail: + nvtegra_vc1_decode_uninit(avctx); + return err; +} + +static void nvtegra_vc1_prepare_frame_setup(nvdec_vc1_pic_s *setup, AVCodecContext *avctx, + NVTegraVC1DecodeContext *ctx) +{ + VC1Context *v = avctx->priv_data; + MpegEncContext *s = &v->s; + AVFrame *frame = s->current_picture_ptr->f; + + /* + * Notes: + * - s->current_picture.f->linesize is unconsistently doubled for interlaced content + * between I-frames and others, so s->current_pic_ptr is used + * - a lot of fields in this structure are unused by official software, + * here we only set those + */ + *setup = (nvdec_vc1_pic_s){ + .scratch_pic_buffer_size = ctx->scratch_size, + + .gptimer_timeout_value = 0, /* Default value */ + + .bitstream_offset = 0, + + .FrameStride = { + frame->linesize[0], + frame->linesize[1], + }, + + .luma_top_offset = 0, + .luma_bot_offset = 0, + .luma_frame_offset = 0, + .chroma_top_offset = 0, + .chroma_bot_offset = 0, + .chroma_frame_offset = 0, + + .CodedWidth = FFALIGN(avctx->coded_width, + (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE), + .CodedHeight = FFALIGN(avctx->coded_height, + (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE), + + .HistBufferSize = ctx->history_size / 256, + + .loopfilter = s->loop_filter, + + .output_memory_layout = 0, /* NV12 */ + .ref_memory_layout = { + 0, 0, /* NV12 */ + }, + + .fastuvmc = v->fastuvmc, + + .FrameWidth = FFALIGN(frame->width, + (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE), + .FrameHeight = FFALIGN(frame->height, + (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE), + + .profile = (v->profile != PROFILE_ADVANCED) ? 1 : 2, + + .postprocflag = v->postprocflag, + .pulldown = v->broadcast, + .interlace = v->interlace, + + .tfcntrflag = v->tfcntrflag, + .finterpflag = v->finterpflag, + + .tileFormat = 0, /* TBL */ + + .psf = v->psf, + + .multires = v->multires, + .syncmarker = v->resync_marker, + .rangered = v->rangered, + .maxbframes = s->max_b_frames, + .panscan_flag = v->panscanflag, + .dquant = v->dquant, + .refdist_flag = v->refdist_flag, + .quantizer = v->quantizer_mode, + .overlap = v->overlap, + .vstransform = v->vstransform, + .extended_mv = v->extended_mv, + .extended_dmv = v->extended_dmv, + }; + + if (v->profile == PROFILE_ADVANCED) { + setup->displayPara.enableTFOutput = 1; + setup->displayPara.VC1MapYFlag = v->range_mapy_flag; + setup->displayPara.MapYValue = v->range_mapy; + setup->displayPara.VC1MapUVFlag = v->range_mapuv_flag; + setup->displayPara.MapUVValue = v->range_mapuv; + } else if (v->rangered && v->rangeredfrm) { + setup->displayPara.enableTFOutput = 1; + setup->displayPara.VC1MapYFlag = 1; + setup->displayPara.MapYValue = 7; + setup->displayPara.VC1MapUVFlag = 1; + setup->displayPara.MapUVValue = 7; + } + + if (v->range_mapy_flag || v->range_mapuv_flag) { + setup->displayPara.OutputBottom[0] = 0; + setup->displayPara.OutputBottom[1] = 0; + setup->displayPara.OutputStructure = v->interlace & 1; + setup->displayPara.OutStride = frame->linesize[0] & 0xff; + } +} + +static int nvtegra_vc1_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, VC1Context *v, NVTegraVC1DecodeContext *ctx, + AVFrame *cur_frame, AVFrame *prev_frame, AVFrame *next_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)cur_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int err; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, VC1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, + AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE, VC1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET, + input_map, ctx->core.slice_offsets_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET, + &ctx->common_map, ctx->coloc_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET, + &ctx->common_map, ctx->history_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET, + &ctx->common_map, ctx->scratch_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(fr, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0], \ + NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + PUSH_FRAME(cur_frame, 0); + PUSH_FRAME(prev_frame, 1); + PUSH_FRAME(next_frame, 2); + + /* + * TODO: Bind a surface to the postproc output if we need range remapping + if (((v->profile != PROFILE_ADVANCED) && ((v->rangered != 0) || (v->rangeredfrm != 0))) || + ((v->range_mapy_flag != 0) || (v->range_mapuv_flag != 0))) { + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DISPLAY_BUF_LUMA_OFFSET, + &output.luma, 0, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DISPLAY_BUF_CHROMA_OFFSET, + &output.chroma, 0, NVHOST_RELOC_TYPE_DEFAULT); + } + */ + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_vc1_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + VC1Context *v = avctx->priv_data; + MpegEncContext *s = &v->s; + AVFrame *frame = s->current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting VC1-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + ctx->is_first_slice = true; + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + nvtegra_vc1_prepare_frame_setup((nvdec_vc1_pic_s *)(mem + ctx->core.pic_setup_off), avctx, ctx); + + ctx->prev_frame = ff_nvtegra_safe_get_ref(s->last_picture.f, frame); + ctx->next_frame = ff_nvtegra_safe_get_ref(s->next_picture.f, frame); + + return 0; +} + +static int nvtegra_vc1_end_frame(AVCodecContext *avctx) { + VC1Context *v = avctx->priv_data; + NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = v->s.current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_vc1_pic_s *setup; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending VC1-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_vc1_pic_s *)(mem + ctx->core.pic_setup_off); + setup->stream_len = ctx->core.bitstream_len + sizeof(bitstream_end_sequence); + setup->slice_count = ctx->core.num_slices; + + err = nvtegra_vc1_prepare_cmdbuf(&ctx->core.cmdbuf, v, ctx, frame, + ctx->prev_frame, ctx->next_frame); + if (err < 0) + return err; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence, + sizeof(bitstream_end_sequence)); +} + +static int nvtegra_vc1_decode_slice(AVCodecContext *avctx, const uint8_t *buf, + uint32_t buf_size) +{ + VC1Context *v = avctx->priv_data; + NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = v->s.current_picture.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + nvdec_vc1_pic_s *setup; + uint8_t *mem; + enum VC1Code startcode; + + mem = av_nvtegra_map_get_addr(input_map); + + setup = (nvdec_vc1_pic_s *)(mem + ctx->core.pic_setup_off); + + if (ctx->is_first_slice) { + startcode = VC1_CODE_FRAME; + + if (v->profile == PROFILE_ADVANCED && + v->fcm == ILACE_FIELD && v->second_field) + startcode = VC1_CODE_FIELD; + + /* + * Skip a dword if the bitstream already contains the startcode + * We could probably just not insert our startcode but this is what official code does + */ + if ((buf_size >= 4) && (AV_RB32(buf) == startcode)) + setup->bitstream_offset = 1; + + AV_WB32(mem + ctx->core.bitstream_off + ctx->core.bitstream_len, startcode); + ctx->core.bitstream_len += 4; + ctx->is_first_slice = false; + } + + return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false); +} + +#if CONFIG_VC1_NVTEGRA_HWACCEL +const FFHWAccel ff_vc1_nvtegra_hwaccel = { + .p.name = "vc1_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_VC1, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_vc1_start_frame, + .end_frame = &nvtegra_vc1_end_frame, + .decode_slice = &nvtegra_vc1_decode_slice, + .init = &nvtegra_vc1_decode_init, + .uninit = &nvtegra_vc1_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraVC1DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif + +#if CONFIG_WMV3_NVTEGRA_HWACCEL +const FFHWAccel ff_wmv3_nvtegra_hwaccel = { + .p.name = "wmv3_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_WMV3, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_vc1_start_frame, + .end_frame = &nvtegra_vc1_end_frame, + .decode_slice = &nvtegra_vc1_decode_slice, + .init = &nvtegra_vc1_decode_init, + .uninit = &nvtegra_vc1_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraVC1DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif diff --git a/libavcodec/vc1dec.c b/libavcodec/vc1dec.c index 3b5b016cf9..e907d26e14 100644 --- a/libavcodec/vc1dec.c +++ b/libavcodec/vc1dec.c @@ -71,6 +71,9 @@ static const enum AVPixelFormat vc1_hwaccel_pixfmt_list_420[] = { #endif #if CONFIG_VC1_VDPAU_HWACCEL AV_PIX_FMT_VDPAU, +#endif +#if CONFIG_VC1_NVTEGRA_HWACCEL + AV_PIX_FMT_NVTEGRA, #endif AV_PIX_FMT_YUV420P, AV_PIX_FMT_NONE @@ -1415,6 +1418,9 @@ const FFCodec ff_vc1_decoder = { #endif #if CONFIG_VC1_VDPAU_HWACCEL HWACCEL_VDPAU(vc1), +#endif +#if CONFIG_VC1_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(vc1), #endif NULL }, @@ -1454,6 +1460,9 @@ const FFCodec ff_wmv3_decoder = { #endif #if CONFIG_WMV3_VDPAU_HWACCEL HWACCEL_VDPAU(wmv3), +#endif +#if CONFIG_WMV3_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(wmv3), #endif NULL }, From patchwork Thu May 30 19:43:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49421 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67927vqg; Thu, 30 May 2024 12:46:26 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVCN/s6NsmNYxCpy7ATUbUs30dmhv2HPJ+vaXIVt9ApwIydQ7pyX0JS8SZN9UdOGhpEYP7IO0i9VJ3o2LHbFMLBRdfEEwvD8k2PSA== X-Google-Smtp-Source: AGHT+IG8Uj8eYqNgN97XrH+dRLHODtHFV3kQEx+EXKaA3iyMi9akKZOv00R60SJtGa/DcuyfiLbk X-Received: by 2002:a2e:350f:0:b0:2ea:551f:aaea with SMTP id 38308e7fff4ca-2ea847c77afmr19308431fa.4.1717098386533; Thu, 30 May 2024 12:46:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098386; cv=none; d=google.com; s=arc-20160816; b=tUXYObtXN5IuaOgK9v3dxPyFMeetdtHVFhGvhMOUu8GguQMHK/xorV49k/njPxxDvz gDEj3fQ3DttxS/VKTeIlxE2rk8VW81eqdDLWh2pVycfU/PSJ0E0rorIfRMPMFz53bzep kpqovOK8cyiZzcW4WUHWUuQKQ8GRyGiDGsUopAUdG/mCUAxH5BeKiKzqO17ro9oUnjbo +l863qlkU0yBHPe81oO5BAo5sZ8VptWAr9jUaDHF8ISjDzNgvoMveh4I+pg3smPKEERL zVtV21QIodMsUTfJtZwbhYQeWyC8Q563jqw/szBzs4fVWJc9iALDgLQfpj9phI8X/05W 3mlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=c3bcrN2VsY0fIlR7vAWyDLYpMBKKh/LZfBr60xemYjc=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=AJu+Gqer0tyU5uiCjTEVJiZi8bO9W4Fne7MB9cBCqKNiFiccIAGFmX67SBm0sPczvs Twj1+4M6i2r5sil0pn1dcmI2QhLtbJFftWxPcp67QDxZP4CEq9F6xBzZTvsVpYzirKC0 ARiSZ5qYYiNTj4bT4H3oqiC6T0q/MDLYBe2pqBsMJgQPofLx6os2SlXiC0wmwlB5fU39 PskhFo0zrCx7JhLs2lyjfOD0JvotsFV0Jb6KR1M+l6FiJN7+nhaLuzuE1p6g9bUkmiRj 9S08hgdzJeLdzfZGKQCQ9ErcGAcUJfVEbF78QIGZuJEzlGfyXTumjv8+ftw/fKsH6UJL eWNA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Dox28ojr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57a31b81900si152451a12.119.2024.05.30.12.46.26; Thu, 30 May 2024 12:46:26 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Dox28ojr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D793968D5B5; Thu, 30 May 2024 22:44:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2526568D5D2 for ; Thu, 30 May 2024 22:44:43 +0300 (EEST) Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-351da5838fcso1247369f8f.1 for ; Thu, 30 May 2024 12:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098282; x=1717703082; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KPJkM5+Dw/ju4beqZAnrDshVJnJWXfDGHrqotxFD03c=; b=Dox28ojrRE03zKQ6xGVCq36xWs/Y0reQL19G3BSdYdtf0IY9JzTSrmLUpHOKR0xLtr mw1TjxL69Y2y+nUne997XDjN1vV5BLiaMxGyQv1SAR1wJ0N05ng0ybS/BdzVXpzhWVzM xVuq0oxleIarcIokQvrHPRbsMVkPHYSHe8Zhp6BZ6aGVS25qFWKxSs+Ep5vG8GrA8iJF 4wXzHp7iZWTC/xkyaVpK1C9jARI7Md+V20Xaz4zhBuM+panm80bIVQsv5X/tgHOa6I9x IFV5kt3d1enANgmN/S4ZpNjPN+ZgHjSM0+0G2TxdJ6NKTrZXH8EIbqMBDE6QexzclMxh cJpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098282; x=1717703082; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KPJkM5+Dw/ju4beqZAnrDshVJnJWXfDGHrqotxFD03c=; b=hg9fhPB1dhRPPAmcgnzv0f+sJUdBsLJpE2LTNvXQwJ+moitd9yQsQ2YhRr6WGiyBZD CsHl2TMRy1CB3XfufdB2T+cFwii94993EAHJPiZ+qMSwQX3nyloQmCDozejKjbKkLHaZ NGhOKJigoTia0Af3ugaTQdCTmJLd72cJa1VTjlgYQtVJS4cYFhCbbA/YVW2KjmxAjrAh vmLpgjw6lGlQ3ulmfVnkF8GbbKKe+E4ZcYZ5gC0sgurutEsmW2+LcToeN5T2/71YVQF6 wBNGLUVqJDKx6BVUZILlADiRHtYGGa87UPkvBcIYD8c1+KwNzbvBbky9ezE17CRud2iS cPTg== X-Gm-Message-State: AOJu0YxHahO/CoxNJRvz+JKbYV48K144wkAYIMGBhi8V+o8PVwuJd+wI khn2NWNp6xyio2wy4TJxpq8d7/6MmBtlr+SMf3cX4IBcU1CoD78QkhGY/w== X-Received: by 2002:a05:6000:364:b0:34c:9a04:466f with SMTP id ffacd0b85a97d-35dc00be96cmr2030329f8f.50.1717098281938; Thu, 30 May 2024 12:44:41 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:41 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:14 +0200 Message-ID: <38a5a4060b25fcfc58b0f98c33b37badb506c144.1717083800.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 12/16] nvtegra: add h264 hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: e8lLOikCl4HP Due to the hardware modus operandi, dpb references must stay at a fixed slot for their entire lifetime. Signed-off-by: averne --- configure | 2 + libavcodec/Makefile | 1 + libavcodec/h264_slice.c | 6 +- libavcodec/h264dec.c | 3 + libavcodec/hwaccels.h | 1 + libavcodec/nvtegra_h264.c | 506 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 518 insertions(+), 1 deletion(-) create mode 100644 libavcodec/nvtegra_h264.c diff --git a/configure b/configure index 952e3aef7d..930cd3c9bd 100755 --- a/configure +++ b/configure @@ -3193,6 +3193,8 @@ h264_videotoolbox_hwaccel_deps="videotoolbox" h264_videotoolbox_hwaccel_select="h264_decoder" h264_vulkan_hwaccel_deps="vulkan" h264_vulkan_hwaccel_select="h264_decoder" +h264_nvtegra_hwaccel_deps="nvtegra" +h264_nvtegra_hwaccel_select="h264_decoder" hevc_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_HEVC" hevc_d3d11va_hwaccel_select="hevc_decoder" hevc_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_HEVC" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index e102d03e7d..2cb0ec21a8 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1013,6 +1013,7 @@ OBJS-$(CONFIG_H264_VAAPI_HWACCEL) += vaapi_h264.o OBJS-$(CONFIG_H264_VDPAU_HWACCEL) += vdpau_h264.o OBJS-$(CONFIG_H264_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o OBJS-$(CONFIG_H264_VULKAN_HWACCEL) += vulkan_decode.o vulkan_h264.o +OBJS-$(CONFIG_H264_NVTEGRA_HWACCEL) += nvtegra_h264.o OBJS-$(CONFIG_HEVC_D3D11VA_HWACCEL) += dxva2_hevc.o OBJS-$(CONFIG_HEVC_DXVA2_HWACCEL) += dxva2_hevc.o OBJS-$(CONFIG_HEVC_D3D12VA_HWACCEL) += dxva2_hevc.o d3d12va_hevc.o diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c index ce2c4caca1..dc4c5545c8 100644 --- a/libavcodec/h264_slice.c +++ b/libavcodec/h264_slice.c @@ -784,7 +784,8 @@ static enum AVPixelFormat get_pixel_format(H264Context *h, int force_callback) CONFIG_H264_VAAPI_HWACCEL + \ CONFIG_H264_VIDEOTOOLBOX_HWACCEL + \ CONFIG_H264_VDPAU_HWACCEL + \ - CONFIG_H264_VULKAN_HWACCEL) + CONFIG_H264_VULKAN_HWACCEL + \ + CONFIG_H264_NVTEGRA_HWACCEL) enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts; switch (h->ps.sps->bit_depth_luma) { @@ -888,6 +889,9 @@ static enum AVPixelFormat get_pixel_format(H264Context *h, int force_callback) #endif #if CONFIG_H264_VAAPI_HWACCEL *fmt++ = AV_PIX_FMT_VAAPI; +#endif +#if CONFIG_H264_NVTEGRA_HWACCEL + *fmt++ = AV_PIX_FMT_NVTEGRA; #endif if (h->avctx->color_range == AVCOL_RANGE_JPEG) *fmt++ = AV_PIX_FMT_YUVJ420P; diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c index fd23e367b4..51f53f07a9 100644 --- a/libavcodec/h264dec.c +++ b/libavcodec/h264dec.c @@ -1160,6 +1160,9 @@ const FFCodec ff_h264_decoder = { #endif #if CONFIG_H264_VULKAN_HWACCEL HWACCEL_VULKAN(h264), +#endif +#if CONFIG_H264_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(h264), #endif NULL }, diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index a69e6a1977..463fd333a1 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -37,6 +37,7 @@ extern const struct FFHWAccel ff_h264_nvdec_hwaccel; extern const struct FFHWAccel ff_h264_vaapi_hwaccel; extern const struct FFHWAccel ff_h264_vdpau_hwaccel; extern const struct FFHWAccel ff_h264_videotoolbox_hwaccel; +extern const struct FFHWAccel ff_h264_nvtegra_hwaccel; extern const struct FFHWAccel ff_h264_vulkan_hwaccel; extern const struct FFHWAccel ff_hevc_d3d11va_hwaccel; extern const struct FFHWAccel ff_hevc_d3d11va2_hwaccel; diff --git a/libavcodec/nvtegra_h264.c b/libavcodec/nvtegra_h264.c new file mode 100644 index 0000000000..63073c44a6 --- /dev/null +++ b/libavcodec/nvtegra_h264.c @@ -0,0 +1,506 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "h264dec.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraH264DecodeContext { + FFNVTegraDecodeContext core; + + AVNVTegraMap common_map; + uint32_t coloc_off, mbhist_off, history_off; + uint32_t mbhist_size, history_size; + + struct NVTegraH264RefFrame { + AVNVTegraMap *map; + uint32_t chroma_off; + int16_t frame_num; + int16_t pic_id; + } refs[16+1]; + + uint8_t ordered_dpb_map[16+1], + pic_id_map[16+1], scratch_ref, cur_frame; + + uint64_t refs_mask, ordered_dpb_mask, pic_id_mask; +} NVTegraH264DecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +static const uint8_t bitstream_end_sequence[16] = { + 0x00, 0x00, 0x01, 0x0b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x0b, 0x00, 0x00, 0x00, 0x00, +}; + +static int nvtegra_h264_decode_uninit(AVCodecContext *avctx) { + NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA H264 decoder\n"); + + err = av_nvtegra_map_destroy(&ctx->common_map); + if (err < 0) + return err; + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_h264_decode_init(AVCodecContext *avctx) { + H264Context *h = avctx->priv_data; + const SPS *sps = h->ps.sps; + NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + uint32_t aligned_width, aligned_height, + width_in_mbs, height_in_mbs, num_slices, + coloc_size, mbhist_size, history_size, common_map_size; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA H264 decoder\n"); + + aligned_width = FFALIGN(avctx->coded_width, MB_SIZE); + aligned_height = FFALIGN(avctx->coded_height, MB_SIZE); + width_in_mbs = aligned_width / MB_SIZE; + height_in_mbs = aligned_height / MB_SIZE; + + num_slices = width_in_mbs * height_in_mbs; + + /* Ignored: histogram map, size 0x400 */ + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_h264_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.slice_offsets_off = FFALIGN(ctx->core.cmdbuf_off + 3*AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->core.slice_offsets_off + num_slices * sizeof(uint32_t), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->core.slice_offsets_off - ctx->core.cmdbuf_off; + ctx->core.max_num_slices = (ctx->core.bitstream_off - ctx->core.slice_offsets_off) / sizeof(uint32_t); + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + coloc_size = FFALIGN(FFALIGN(height_in_mbs, 2) * (width_in_mbs * 64) - 63, 0x100); + coloc_size *= sps->ref_frame_count + 1; /* Max number of references frames, plus current frame */ + mbhist_size = FFALIGN(width_in_mbs * 104, 0x100); + history_size = FFALIGN(width_in_mbs * 0x200 + 0x1100, 0x200); + + ctx->coloc_off = 0; + ctx->mbhist_off = FFALIGN(ctx->coloc_off + coloc_size, AV_NVTEGRA_MAP_ALIGN); + ctx->history_off = FFALIGN(ctx->mbhist_off + mbhist_size, AV_NVTEGRA_MAP_ALIGN); + common_map_size = FFALIGN(ctx->history_off + history_size, 0x1000); + + err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + ctx->mbhist_size = mbhist_size; + ctx->history_size = history_size; + + memset(ctx->ordered_dpb_map, -1, sizeof(ctx->ordered_dpb_map)); + memset(ctx->pic_id_map, -1, sizeof(ctx->pic_id_map)); + + return 0; + +fail: + nvtegra_h264_decode_uninit(avctx); + return err; +} + +static inline int field_poc(int poc[2], bool top) { + return (poc[!top] != INT_MAX) ? poc[!top] : 0; +} + +static void dpb_add(H264Context *h, nvdec_dpb_entry_s *dst, + H264Picture *src, int pic_id) +{ + int marking; + + marking = src->long_ref ? 2 : 1; + *dst = (nvdec_dpb_entry_s){ + .index = pic_id, + .col_idx = pic_id, + .state = src->reference, + .is_long_term = src->long_ref, + .not_existing = src->invalid_gap, + .is_field = src->field_picture, + .top_field_marking = (src->reference & PICT_TOP_FIELD) ? marking : 0, + .bottom_field_marking = (src->reference & PICT_BOTTOM_FIELD) ? marking : 0, + .output_memory_layout = 0, /* NV12 */ + .FieldOrderCnt = { + field_poc(src->field_poc, true), + field_poc(src->field_poc, false), + }, + .FrameIdx = src->long_ref ? src->pic_id : src->frame_num, + }; +} + +static inline int find_slot(uint64_t *mask) { + int slot = ff_ctzll(~*mask); + *mask |= (1 << slot); + return slot; +} + +static void nvtegra_h264_prepare_frame_setup(nvdec_h264_pic_s *setup, H264Context *h, + NVTegraH264DecodeContext *ctx) +{ + const PPS *pps = h->ps.pps; + const SPS *sps = h->ps.sps; + + int dpb_size, i, j, diff; + H264Picture *refs [16+1] = {0}; + uint8_t dpb_to_ref[16+1] = {0}; + + *setup = (nvdec_h264_pic_s){ + .mbhist_buffer_size = ctx->mbhist_size, + + .gptimer_timeout_value = 0, /* Default value */ + + .log2_max_pic_order_cnt_lsb_minus4 = FFMAX(sps->log2_max_poc_lsb - 4, 0), + .delta_pic_order_always_zero_flag = sps->delta_pic_order_always_zero_flag, + .frame_mbs_only_flag = sps->frame_mbs_only_flag, + + .PicWidthInMbs = h->mb_width, + .FrameHeightInMbs = h->mb_height, + + .tileFormat = 0, /* TBL */ + .gob_height = 0, /* GOB_2 */ + + .entropy_coding_mode_flag = pps->cabac, + .pic_order_present_flag = pps->pic_order_present, + .num_ref_idx_l0_active_minus1 = pps->ref_count[0] - 1, + .num_ref_idx_l1_active_minus1 = pps->ref_count[1] - 1, + .deblocking_filter_control_present_flag = pps->deblocking_filter_parameters_present, + .redundant_pic_cnt_present_flag = pps->redundant_pic_cnt_present, + .transform_8x8_mode_flag = pps->transform_8x8_mode, + + .pitch_luma = h->cur_pic_ptr->f->linesize[0], + .pitch_chroma = h->cur_pic_ptr->f->linesize[1], + + .luma_top_offset = 0, + .luma_bot_offset = 0, + .luma_frame_offset = 0, + .chroma_top_offset = 0, + .chroma_bot_offset = 0, + .chroma_frame_offset = 0, + + .HistBufferSize = ctx->history_size / 256, + + .MbaffFrameFlag = sps->mb_aff && !FIELD_PICTURE(h), + .direct_8x8_inference_flag = sps->direct_8x8_inference_flag, + .weighted_pred_flag = pps->weighted_pred, + .constrained_intra_pred_flag = pps->constrained_intra_pred, + .ref_pic_flag = h->nal_ref_idc != 0, + .field_pic_flag = FIELD_PICTURE(h), + .bottom_field_flag = h->picture_structure == PICT_BOTTOM_FIELD, + .second_field = FIELD_PICTURE(h) && !h->first_field, + .log2_max_frame_num_minus4 = sps->log2_max_frame_num - 4, + .chroma_format_idc = sps->chroma_format_idc, + .pic_order_cnt_type = sps->poc_type, + .pic_init_qp_minus26 = pps->init_qp - 26, + .chroma_qp_index_offset = pps->chroma_qp_index_offset[0], + .second_chroma_qp_index_offset = pps->chroma_qp_index_offset[1], + + .weighted_bipred_idc = pps->weighted_bipred_idc, + .frame_num = h->cur_pic_ptr->frame_num, + .output_memory_layout = 0, /* NV12 */ + + .CurrFieldOrderCnt = { + field_poc(h->cur_pic_ptr->field_poc, true), + field_poc(h->cur_pic_ptr->field_poc, false), + }, + + .lossless_ipred8x8_filter_enable = true, + .qpprime_y_zero_transform_bypass_flag = sps->transform_bypass, + }; + + /* Build concatenated ref list for this frame */ + dpb_size = 0; + for (i = 0; i < h->short_ref_count; ++i) + refs[dpb_size++] = h->short_ref[i]; + + for (i = 0; i < 16; ++i) + if (h->long_ref[i]) + refs[dpb_size++] = h->long_ref[i]; + + /* Remove stale references from our ref list */ + for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) { + if (!(ctx->refs_mask & (1 << i))) + continue; + + for (j = 0; j < dpb_size; ++j) { + if (av_nvtegra_frame_get_fbuf_map(refs[j]->f) == ctx->refs[i].map) + break; + } + + if (j == dpb_size) { + ctx->pic_id_mask &= ~(1 << ctx->refs[i].pic_id); + ctx->pic_id_map[ctx->refs[i].pic_id] = -1; + + ctx->refs_mask &= ~(1 << i); + ctx->refs[i].map = NULL; + } else { + dpb_to_ref[i] = j; + } + } + + /* Update the ordered DPB mask */ + for (i = 0; i < FF_ARRAY_ELEMS(ctx->ordered_dpb_map); ++i) { + if (!(ctx->ordered_dpb_mask & (1 << i))) + continue; + if (!ctx->refs[ctx->ordered_dpb_map[i]].map) { + ctx->ordered_dpb_mask &= ~(1 << i); + ctx->ordered_dpb_map[i] = -1; + } + } + + /* Add new frames to the ordered DPB */ + for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) { + if (!(ctx->refs_mask & (1 << i))) + continue; + + for (j = 0; j < FF_ARRAY_ELEMS(ctx->ordered_dpb_map); ++j) { + if (ctx->ordered_dpb_map[j] == i) + break; + } + + if (j == FF_ARRAY_ELEMS(ctx->ordered_dpb_map)) + ctx->ordered_dpb_map[find_slot(&ctx->ordered_dpb_mask)] = i; + } + + /* + * Add the current frame to our ref list + * In the case of interlaced video, the new frame can be the same as the last + */ + if (ctx->refs[ctx->cur_frame].map != av_nvtegra_frame_get_fbuf_map(h->cur_pic_ptr->f)) { + /* Allocate a pic id for the current frame */ + i = find_slot(&ctx->pic_id_mask); + + /* Insert it in our ref list */ + ctx->cur_frame = find_slot(&ctx->refs_mask); + ctx->pic_id_map[i] = ctx->cur_frame; + ctx->refs[ctx->cur_frame] = (struct NVTegraH264RefFrame){ + .map = av_nvtegra_frame_get_fbuf_map(h->cur_pic_ptr->f), + .chroma_off = h->cur_pic_ptr->f->data[1] - h->cur_pic_ptr->f->data[0], + .frame_num = h->cur_pic_ptr->frame_num, + .pic_id = i, + }; + } + + setup->CurrPicIdx = setup->CurrColIdx = ctx->refs[ctx->cur_frame].pic_id; + + /* Find the temporally closest frame to be used as a scratch ref, or use the current one */ + diff = INT_MAX; + ctx->scratch_ref = ctx->cur_frame; + for (i = 0; i < FF_ARRAY_ELEMS(ctx->ordered_dpb_map); ++i) { + j = ctx->ordered_dpb_map[i]; + if ((ctx->ordered_dpb_mask & (1 << i)) && + FFABS(h->cur_pic_ptr->frame_num - refs[dpb_to_ref[j]]->frame_num) < diff) + ctx->scratch_ref = j; + } + + /* Build the NVDEC DPB */ + for (i = 0; i < FF_ARRAY_ELEMS(setup->dpb); ++i) { + if (ctx->ordered_dpb_mask & (1 << i)) { + j = ctx->ordered_dpb_map[i]; + dpb_add(h, &setup->dpb[i], refs[dpb_to_ref[j]], ctx->refs[j].pic_id); + } + } + + memcpy(setup->WeightScale, pps->scaling_matrix4, sizeof(setup->WeightScale)); + memcpy(setup->WeightScale8x8[0], pps->scaling_matrix8[0], sizeof(setup->WeightScale8x8[0])); + memcpy(setup->WeightScale8x8[1], pps->scaling_matrix8[3], sizeof(setup->WeightScale8x8[1])); +} + +static int nvtegra_h264_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, H264Context *h, + AVFrame *cur_frame, NVTegraH264DecodeContext *ctx) +{ + FrameDecodeData *fdd = (FrameDecodeData *)cur_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int err, i; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, H264)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, + AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE, H264) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET, + input_map, ctx->core.slice_offsets_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET, + &ctx->common_map, ctx->coloc_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_H264_SET_MBHIST_BUF_OFFSET, + &ctx->common_map, ctx->mbhist_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET, + &ctx->common_map, ctx->history_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(ref, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + ref.map, 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + ref.map, ref.chroma_off, NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + for (i = 0; i < 16 + 1; ++i) { + if (i == ctx->cur_frame) + PUSH_FRAME(ctx->refs[i], i); + else if (ctx->pic_id_mask & (1 << i)) + PUSH_FRAME(ctx->refs[ctx->pic_id_map[i]], i); + else + PUSH_FRAME(ctx->refs[ctx->scratch_ref], i); + } + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_h264_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + H264Context *h = avctx->priv_data; + AVFrame *frame = h->cur_pic_ptr->f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting H264-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + nvtegra_h264_prepare_frame_setup((nvdec_h264_pic_s *)(mem + ctx->core.pic_setup_off), h, ctx); + + return 0; +} + +static int nvtegra_h264_end_frame(AVCodecContext *avctx) { + H264Context *h = avctx->priv_data; + NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = h->cur_pic_ptr->f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_h264_pic_s *setup; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending H264-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_h264_pic_s *)(mem + ctx->core.pic_setup_off); + setup->stream_len = ctx->core.bitstream_len + sizeof(bitstream_end_sequence); + setup->slice_count = ctx->core.num_slices; + + err = nvtegra_h264_prepare_cmdbuf(&ctx->core.cmdbuf, h, frame, ctx); + if (err < 0) + return err; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence, + sizeof(bitstream_end_sequence)); +} + +static int nvtegra_h264_decode_slice(AVCodecContext *avctx, const uint8_t *buf, + uint32_t buf_size) +{ + H264Context *h = avctx->priv_data; + AVFrame *frame = h->cur_pic_ptr->f; + + return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, true); +} + +#if CONFIG_H264_NVTEGRA_HWACCEL +const FFHWAccel ff_h264_nvtegra_hwaccel = { + .p.name = "h264_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_H264, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_h264_start_frame, + .end_frame = &nvtegra_h264_end_frame, + .decode_slice = &nvtegra_h264_decode_slice, + .init = &nvtegra_h264_decode_init, + .uninit = &nvtegra_h264_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraH264DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif From patchwork Thu May 30 19:43:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49432 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp118783vqg; Thu, 30 May 2024 14:40:25 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUEvEBeN+TiHHTi+6Nox8htAMOK54ERolk1djl1K/bnVEV5jb9U3qA+daiJa/oCut9UAsFmWvxXILHNpHgozW7MNX0HGfl0iQ6Fxw== X-Google-Smtp-Source: AGHT+IEaIcCLxZ/4/tzZgfcvIpYzVWffQXEuHZgF+kXZXwJkyc9B/2hZk9Q8ZFUwjte7bgFEvwNB X-Received: by 2002:a17:906:554:b0:a59:be21:3577 with SMTP id a640c23a62f3a-a6820be92b5mr7673166b.43.1717105225249; Thu, 30 May 2024 14:40:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717105225; cv=none; d=google.com; s=arc-20160816; b=WFTmXgV9c594o28S21jRoeOBsvf+dOtYvgMTfyqqGTAJweeRAYs+iE8es5BcsalGm/ qL8mCT52oxVEapCLA4pEGIm1BNMEqJNmkCor/VgfoLTatr43NqucLnPsPgpL+9RRYjae ymga8jhdXA1FCOrujOWVQ6CNajjkeClAGj7bSNd9kcPfShuWvtuxJGklfNK0EajqnZ89 Ne8RW2nLUzL+cpwQQk/dL2LaKf3hr4jmQwNVZaRvvff62BNPK0kq7W1wKi1EowLUNSR5 aLFTl8C5zWHsF66sLMYp6S13KxT4qbPA7saZXrP+CnkmjYM9lXecVc0bvBMcAUafDSxl TEbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=BWuaaBawcqCFaKKEFfQHdIYUCm4bnvX3468EfReTh5U=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=NbgfOAVA6DdT/i8fDYz+1lXdtAFw9SwCjD610Z/FPiGd1HnoOJkbIoTE2r49yKepFt Ot08hT42ASPdKsBKE54CcRq+revGZ7VArhY+SBzxrtjdm0W6Y1CaTdfm5Pcmv4cdl6yA TWnMzzUvuM2BvDUNjg126oEpkV8xlP1Ta3UZ20kfW79wz4md4VQtsa3CBXvONyn/mjX7 lSo/0KikGsx3U+IYLO4LUNj5STMA9huy43Xh07XYbqVawNXIZerXRwqa6LQibrwOGmg0 TZA942E94WiMHc36AaUzeFYdWKOEFM1hVsddlpeUqqXfi1po2KBaJWJi/TQZwIGcMPN1 HMog==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=ggyVV9me; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67e73f840bsi17742266b.240.2024.05.30.14.40.09; Thu, 30 May 2024 14:40:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=ggyVV9me; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2FC8168D4D9; Thu, 30 May 2024 22:44:55 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3728E68D5E4 for ; Thu, 30 May 2024 22:44:44 +0300 (EEST) Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-354dfe54738so808652f8f.3 for ; Thu, 30 May 2024 12:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098283; x=1717703083; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JX22SJ1CXGsj+bx6ZAQFULsAQ2YVzFp1r1eejwcU+Bs=; b=ggyVV9meLJOUcK6QHnkl/IUmcdmG2nwIe3vHDyT237rdZ25VIm6VVp+Fq3R/5OpIEb CzsnTjYZSGBnVuQqWZNwLmVuMyFqm2y9Shn9N7Etrw/lJrBxZOcJE6WvQLYBi3jw6z0k pP9hYapJDewT1VembsENCSyDbinbzBJAeVzrE4UY+B7HCZ1Ub4U6qBl/rppZ61y0iSPl sMPXGXbysn3xiKJIkeQt3dk5n07dPn5EI207SAqieBHSAlstrRMHSgkBoSd/n+N62+s9 16cwa1Trp2LawB+hBiBkpc0oszmeZ0riUgrgwdQwZMrpCDjU5gA9K7iZ430QW5IM7DaB OQ0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098283; x=1717703083; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JX22SJ1CXGsj+bx6ZAQFULsAQ2YVzFp1r1eejwcU+Bs=; b=d1aR+0OA6qiJ+L3T1Y2fl6G01aQgmO+GFc6qq6ozSP21HEuQMr0UB76cdcD/+kur6G DYvsaUBbrX4yi82L7HBB/RUplOUOM9Q3n1MoaripTF9Eh6sADJQ4VD5e5quTLvDxtbZx OJOFSkRHVp32FIRdsSfAv/whHyvqfraGKLWbD+lgu+D1XbqoLHPFIhsF7aKXUeDLHtG0 bmIt+MNrCHXyYBrYPirEDajDVs8RvnW1s1w9sI752Efj1iGjao52rJWZtvxwUQda0P4d +KPKOs7x20/GcdJK17CzVdwmv45dVY2/9P1Wv1GO0IEEPtocejPIkZnyQ0O6JsMwG8NO 0GDg== X-Gm-Message-State: AOJu0YzKk8rrRSM4dK7P1yCMDUSzWPHeknKpukY49Da6nqMvoT8EIOGA 0CG9V+krVkt+JTBYJUi6xZb5hKYNFTIm68j35mJPq3tiM7k87iPjt/a2ug== X-Received: by 2002:a5d:58d2:0:b0:34b:81b3:2c62 with SMTP id ffacd0b85a97d-35dc0098dc8mr2372583f8f.35.1717098283187; Thu, 30 May 2024 12:44:43 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:42 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:15 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 13/16] nvtegra: add hevc hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: eXR6/dcBVlKc Same remark as for h264. In addition, a number of bits to be skipped must be calculated. This is done in the main header parsing routine, instead of reimplementing it in the hwaccel backend. On the tegra 210, this is the only hardware codec that can output 10-bit data. Signed-off-by: averne --- configure | 2 + libavcodec/Makefile | 1 + libavcodec/hevcdec.c | 17 +- libavcodec/hevcdec.h | 2 + libavcodec/hwaccels.h | 1 + libavcodec/nvtegra_hevc.c | 633 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 655 insertions(+), 1 deletion(-) create mode 100644 libavcodec/nvtegra_hevc.c diff --git a/configure b/configure index 930cd3c9bd..ba4c5287e3 100755 --- a/configure +++ b/configure @@ -3213,6 +3213,8 @@ hevc_videotoolbox_hwaccel_deps="videotoolbox" hevc_videotoolbox_hwaccel_select="hevc_decoder" hevc_vulkan_hwaccel_deps="vulkan" hevc_vulkan_hwaccel_select="hevc_decoder" +hevc_nvtegra_hwaccel_deps="nvtegra" +hevc_nvtegra_hwaccel_select="hevc_decoder" mjpeg_nvdec_hwaccel_deps="nvdec" mjpeg_nvdec_hwaccel_select="mjpeg_decoder" mjpeg_vaapi_hwaccel_deps="vaapi" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 2cb0ec21a8..de667b8a4b 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1022,6 +1022,7 @@ OBJS-$(CONFIG_HEVC_QSV_HWACCEL) += qsvdec.o OBJS-$(CONFIG_HEVC_VAAPI_HWACCEL) += vaapi_hevc.o h265_profile_level.o OBJS-$(CONFIG_HEVC_VDPAU_HWACCEL) += vdpau_hevc.o h265_profile_level.o OBJS-$(CONFIG_HEVC_VULKAN_HWACCEL) += vulkan_decode.o vulkan_hevc.o +OBJS-$(CONFIG_HEVC_NVTEGRA_HWACCEL) += nvtegra_hevc.o OBJS-$(CONFIG_MJPEG_NVDEC_HWACCEL) += nvdec_mjpeg.o OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL) += vaapi_mjpeg.o OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL) += nvdec_mpeg12.o diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c index b41dc46053..41bde57920 100644 --- a/libavcodec/hevcdec.c +++ b/libavcodec/hevcdec.c @@ -406,7 +406,8 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) CONFIG_HEVC_VAAPI_HWACCEL + \ CONFIG_HEVC_VIDEOTOOLBOX_HWACCEL + \ CONFIG_HEVC_VDPAU_HWACCEL + \ - CONFIG_HEVC_VULKAN_HWACCEL) + CONFIG_HEVC_VULKAN_HWACCEL + \ + CONFIG_HEVC_NVTEGRA_HWACCEL) enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts; switch (sps->pix_fmt) { @@ -436,6 +437,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) #endif #if CONFIG_HEVC_VULKAN_HWACCEL *fmt++ = AV_PIX_FMT_VULKAN; +#endif +#if CONFIG_HEVC_NVTEGRA_HWACCEL + *fmt++ = AV_PIX_FMT_NVTEGRA; #endif break; case AV_PIX_FMT_YUV420P10: @@ -463,6 +467,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps) #endif #if CONFIG_HEVC_NVDEC_HWACCEL *fmt++ = AV_PIX_FMT_CUDA; +#endif +#if CONFIG_HEVC_NVTEGRA_HWACCEL + *fmt++ = AV_PIX_FMT_NVTEGRA; #endif break; case AV_PIX_FMT_YUV444P: @@ -598,6 +605,7 @@ static int hls_slice_header(HEVCContext *s) GetBitContext *gb = &s->HEVClc->gb; SliceHeader *sh = &s->sh; int i, ret; + int nvidia_skip_len_start; // Coded parameters sh->first_slice_in_pic_flag = get_bits1(gb); @@ -700,6 +708,8 @@ static int hls_slice_header(HEVCContext *s) return AVERROR_INVALIDDATA; } + nvidia_skip_len_start = get_bits_left(gb); + // when flag is not present, picture is inferred to be output sh->pic_output_flag = 1; if (s->ps.pps->output_flag_present_flag) @@ -753,6 +763,7 @@ static int hls_slice_header(HEVCContext *s) } sh->long_term_ref_pic_set_size = pos - get_bits_left(gb); + sh->nvidia_skip_length = nvidia_skip_len_start - get_bits_left(gb); if (s->ps.sps->sps_temporal_mvp_enabled_flag) sh->slice_temporal_mvp_enabled_flag = get_bits1(gb); else @@ -765,6 +776,7 @@ static int hls_slice_header(HEVCContext *s) sh->short_term_rps = NULL; sh->long_term_ref_pic_set_size = 0; sh->slice_temporal_mvp_enabled_flag = 0; + sh->nvidia_skip_length = nvidia_skip_len_start - get_bits_left(gb); } /* 8.3.1 */ @@ -3743,6 +3755,9 @@ const FFCodec ff_hevc_decoder = { #endif #if CONFIG_HEVC_VULKAN_HWACCEL HWACCEL_VULKAN(hevc), +#endif +#if CONFIG_HEVC_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(hevc), #endif NULL }, diff --git a/libavcodec/hevcdec.h b/libavcodec/hevcdec.h index e82daf6679..2df96ed629 100644 --- a/libavcodec/hevcdec.h +++ b/libavcodec/hevcdec.h @@ -277,6 +277,8 @@ typedef struct SliceHeader { int16_t chroma_offset_l1[16][2]; int slice_ctb_addr_rs; + + int nvidia_skip_length; } SliceHeader; typedef struct CodingUnit { diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index 463fd333a1..77892dc2b2 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -47,6 +47,7 @@ extern const struct FFHWAccel ff_hevc_nvdec_hwaccel; extern const struct FFHWAccel ff_hevc_vaapi_hwaccel; extern const struct FFHWAccel ff_hevc_vdpau_hwaccel; extern const struct FFHWAccel ff_hevc_videotoolbox_hwaccel; +extern const struct FFHWAccel ff_hevc_nvtegra_hwaccel; extern const struct FFHWAccel ff_hevc_vulkan_hwaccel; extern const struct FFHWAccel ff_mjpeg_nvdec_hwaccel; extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel; diff --git a/libavcodec/nvtegra_hevc.c b/libavcodec/nvtegra_hevc.c new file mode 100644 index 0000000000..97c585d755 --- /dev/null +++ b/libavcodec/nvtegra_hevc.c @@ -0,0 +1,633 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "hevcdec.h" +#include "hevc_data.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraHEVCDecodeContext { + FFNVTegraDecodeContext core; + + AVNVTegraMap common_map; + uint32_t tile_sizes_off, scaling_list_off, + coloc_off, filter_off; + + unsigned int colmv_size, sao_offset, bsd_offset; + uint8_t pattern_id; + + struct NVTegraHEVCRefFrame { + AVNVTegraMap *map; + uint32_t chroma_off; + int poc; + } refs[16+1]; + + uint64_t refs_mask; + int8_t scratch_ref; +} NVTegraHEVCDecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +/* Maximum size (width, height) of a coding tree unit */ +#define CTU_SIZE 64 + +#define FILTER_SIZE 480 +#define SAO_SIZE 3840 +#define BSD_SIZE 60 + +static int nvtegra_hevc_decode_uninit(AVCodecContext *avctx) { + NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA HEVC decoder\n"); + + err = av_nvtegra_map_destroy(&ctx->common_map); + if (err < 0) + return err; + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_hevc_decode_init(AVCodecContext *avctx) { + NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + uint32_t aligned_width, aligned_height, + coloc_size, filter_buffer_size, common_map_size; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA HEVC decoder\n"); + + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_hevc_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->tile_sizes_off = FFALIGN(ctx->core.cmdbuf_off + 3*AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->scaling_list_off = FFALIGN(ctx->tile_sizes_off + 0x900, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->scaling_list_off + 0x400, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->tile_sizes_off - ctx->core.cmdbuf_off; + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + aligned_width = FFALIGN(avctx->coded_width, CTU_SIZE); + aligned_height = FFALIGN(avctx->coded_height, CTU_SIZE); + coloc_size = (aligned_width * aligned_height) + (aligned_width * aligned_height / MB_SIZE); + filter_buffer_size = (FILTER_SIZE + SAO_SIZE + BSD_SIZE) * aligned_height; + + ctx->coloc_off = 0; + ctx->filter_off = FFALIGN(ctx->coloc_off + coloc_size, AV_NVTEGRA_MAP_ALIGN); + common_map_size = FFALIGN(ctx->filter_off + filter_buffer_size, 0x1000); + + err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + ctx->colmv_size = aligned_width * aligned_height / 16; + ctx->sao_offset = FILTER_SIZE * aligned_height; + ctx->bsd_offset = (FILTER_SIZE + SAO_SIZE) * aligned_height; + + return 0; + +fail: + nvtegra_hevc_decode_uninit(avctx); + return err; +} + +static void nvtegra_hevc_set_scaling_list(nvdec_hevc_scaling_list_s *list, HEVCContext *s) { + const ScalingList *sl = s->ps.pps->scaling_list_data_present_flag ? + &s->ps.pps->scaling_list : &s->ps.sps->scaling_list; + + int i; + + for (i = 0; i < FF_ARRAY_ELEMS(list->ScalingListDCCoeff16x16); ++i) + list->ScalingListDCCoeff16x16[i] = sl->sl_dc[0][i]; + for (i = 0; i < FF_ARRAY_ELEMS(list->ScalingListDCCoeff32x32); ++i) + list->ScalingListDCCoeff32x32[i] = sl->sl_dc[1][i * 3]; + + for (i = 0; i < 6; ++i) + memcpy(list->ScalingList4x4[i], sl->sl[0][i], 16); + for (i = 0; i < 6; ++i) + memcpy(list->ScalingList8x8[i], sl->sl[1][i], 64); + for (i = 0; i < 6; ++i) + memcpy(list->ScalingList16x16[i], sl->sl[2][i], 64); + memcpy(list->ScalingList32x32[0], sl->sl[3][0], 64); + memcpy(list->ScalingList32x32[1], sl->sl[3][3], 64); +} + +static void nvtegra_hevc_set_tile_sizes(uint16_t *sizes, HEVCContext *s) { + const HEVCPPS *pps = s->ps.pps; + const HEVCSPS *sps = s->ps.sps; + + int i, j, sum; + + uint16_t *tile_thing = sizes + 0x380; + if (pps->uniform_spacing_flag) { + for (i = 0; i < pps->num_tile_columns; ++i) + *tile_thing++ = (i + 1) * sps->ctb_width / pps->num_tile_columns << + (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4); + for (i = 0; i < pps->num_tile_rows; ++i) + *tile_thing++ = (i + 1) * sps->ctb_height / pps->num_tile_rows << + (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4); + } else { + sum = 0; + for (i = 0; i < pps->num_tile_columns; ++i) + *tile_thing++ = (sum += pps->column_width[i]) << + (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4); + sum = 0; + for (i = 0; i < pps->num_tile_rows; ++i) + *tile_thing++ = (sum += pps->row_height[i]) << + (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4); + } + + for (i = 0; i < pps->num_tile_rows; ++i) { + for (j = 0; j < pps->num_tile_columns; ++j) { + sizes[0] = pps->column_width[j]; + sizes[1] = pps->row_height [i]; + sizes += 2; + } + } +} + +static enum RPSType find_ref_rps_type(HEVCContext *s, HEVCFrame *f) { + int i; + +#define CHECK_SET(set) ({ \ + for (i = 0; i < s->rps[set].nb_refs; ++i) { \ + if (s->rps[set].ref[i] == f) \ + return set; \ + } \ +}) + + CHECK_SET(ST_CURR_BEF); + CHECK_SET(ST_CURR_AFT); + CHECK_SET(ST_FOLL); + CHECK_SET(LT_CURR); + CHECK_SET(LT_FOLL); + + return -1; +} + +static inline int find_slot(uint64_t *mask) { + int slot = ff_ctzll(~*mask); + *mask |= (1 << slot); + return slot; +} + +static void nvtegra_hevc_prepare_frame_setup(nvdec_hevc_pic_s *setup, AVCodecContext *avctx, + AVFrame *frame, NVTegraHEVCDecodeContext *ctx) +{ + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + AVHWFramesContext *frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data; + HEVCContext *s = avctx->priv_data; + SliceHeader *sh = &s->sh; + const HEVCPPS *pps = s->ps.pps; + const HEVCSPS *sps = s->ps.sps; + + HEVCFrame *fr; + enum RPSType st; + uint8_t *mem; + uint16_t *tile_sizes; + int output_mode, cur_frame, scratch_ref_diff_poc, i, j; + int8_t dpb_to_ref[16+1] = {0}, ref_to_dpb[16+1] = {0}; + int8_t rps_stcurrbef[8], rps_stcurraft[8], rps_ltcurr[8]; + + mem = av_nvtegra_map_get_addr(input_map); + + /* Match source color depth regardless of colorspace */ + /* TODO: Dithered down 8-bit post-processing (needs DISPLAY_BUF mappings) */ + if (frames_ctx->sw_format == AV_PIX_FMT_P010 && sps->bit_depth == 10) { + output_mode = 1; /* 10-bit bt709 */ + } else { + if (sps->bit_depth == 8) { + output_mode = 0; /* 8-bit bt709 */ + } else { + switch (avctx->colorspace) { + default: + case AVCOL_SPC_BT709: + output_mode = 2; /* 10-bit bt709 truncated to 8-bit */ + break; + case AVCOL_SPC_BT2020_CL: + case AVCOL_SPC_BT2020_NCL: + output_mode = 3; /* 10-bit bt2020 truncated to 8-bit */ + break; + } + } + } + + *setup = (nvdec_hevc_pic_s){ + .gptimer_timeout_value = 0, /* Default value */ + + .tileformat = 0, /* TBL */ + .gob_height = 0, /* GOB_2 */ + + .sw_start_code_e = 1, + .disp_output_mode = output_mode, + + /* Divide by two if we are decoding to a 2bpp surface */ + .framestride = { + s->frame->linesize[0] / ((output_mode == 1) ? 2 : 1), + s->frame->linesize[1] / ((output_mode == 1) ? 2 : 1), + }, + + .colMvBuffersize = ctx->colmv_size / 256, + .HevcSaoBufferOffset = ctx->sao_offset / 256, + .HevcBsdCtrlOffset = ctx->bsd_offset / 256, + + .pic_width_in_luma_samples = sps->width, + .pic_height_in_luma_samples = sps->height, + + .chroma_format_idc = 1, /* 4:2:0 */ + .bit_depth_luma = sps->bit_depth, + .bit_depth_chroma = sps->bit_depth, + .log2_min_luma_coding_block_size = sps->log2_min_cb_size, + .log2_max_luma_coding_block_size = sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size, + .log2_min_transform_block_size = sps->log2_min_tb_size, + .log2_max_transform_block_size = sps->log2_max_trafo_size, + + .max_transform_hierarchy_depth_inter = sps->max_transform_hierarchy_depth_inter, + .max_transform_hierarchy_depth_intra = sps->max_transform_hierarchy_depth_intra, + .scalingListEnable = sps->scaling_list_enable_flag, + .amp_enable_flag = sps->amp_enabled_flag, + .sample_adaptive_offset_enabled_flag = sps->sao_enabled, + .pcm_enabled_flag = sps->pcm_enabled_flag, + .pcm_sample_bit_depth_luma = sps->pcm_enabled_flag ? sps->pcm.bit_depth : 0, + .pcm_sample_bit_depth_chroma = sps->pcm_enabled_flag ? sps->pcm.bit_depth_chroma : 0, + .log2_min_pcm_luma_coding_block_size = sps->pcm_enabled_flag ? sps->pcm.log2_min_pcm_cb_size : 0, + .log2_max_pcm_luma_coding_block_size = sps->pcm_enabled_flag ? sps->pcm.log2_max_pcm_cb_size : 0, + .pcm_loop_filter_disabled_flag = sps->pcm_enabled_flag ? sps->pcm.loop_filter_disable_flag : 0, + .sps_temporal_mvp_enabled_flag = sps->sps_temporal_mvp_enabled_flag, + .strong_intra_smoothing_enabled_flag = sps->sps_strong_intra_smoothing_enable_flag, + + .dependent_slice_segments_enabled_flag = pps->dependent_slice_segments_enabled_flag, + .output_flag_present_flag = pps->output_flag_present_flag, + .num_extra_slice_header_bits = pps->num_extra_slice_header_bits, + .sign_data_hiding_enabled_flag = pps->sign_data_hiding_flag, + .cabac_init_present_flag = pps->cabac_init_present_flag, + .num_ref_idx_l0_default_active = pps->num_ref_idx_l0_default_active, + .num_ref_idx_l1_default_active = pps->num_ref_idx_l1_default_active, + .init_qp = pps->pic_init_qp_minus26 + 26 + (sps->bit_depth - 8) * 6, + .constrained_intra_pred_flag = pps->constrained_intra_pred_flag, + .transform_skip_enabled_flag = pps->transform_skip_enabled_flag, + .cu_qp_delta_enabled_flag = pps->cu_qp_delta_enabled_flag, + .diff_cu_qp_delta_depth = pps->diff_cu_qp_delta_depth, + + .pps_cb_qp_offset = pps->cb_qp_offset, + .pps_cr_qp_offset = pps->cr_qp_offset, + .pps_beta_offset = pps->beta_offset, + .pps_tc_offset = pps->tc_offset, + .pps_slice_chroma_qp_offsets_present_flag = pps->pic_slice_level_chroma_qp_offsets_present_flag, + .weighted_pred_flag = pps->weighted_pred_flag, + .weighted_bipred_flag = pps->weighted_bipred_flag, + .transquant_bypass_enabled_flag = pps->transquant_bypass_enable_flag, + .tiles_enabled_flag = pps->tiles_enabled_flag, + .entropy_coding_sync_enabled_flag = pps->entropy_coding_sync_enabled_flag, + .num_tile_columns = pps->tiles_enabled_flag ? pps->num_tile_columns : 0, + .num_tile_rows = pps->tiles_enabled_flag ? pps->num_tile_rows : 0, + .loop_filter_across_tiles_enabled_flag = pps->tiles_enabled_flag ? pps->loop_filter_across_tiles_enabled_flag : 0, + .loop_filter_across_slices_enabled_flag = pps->seq_loop_filter_across_slices_enabled_flag, + .deblocking_filter_control_present_flag = pps->deblocking_filter_control_present_flag, + .deblocking_filter_override_enabled_flag = pps->deblocking_filter_override_enabled_flag, + .pps_deblocking_filter_disabled_flag = pps->disable_dbf, + .lists_modification_present_flag = pps->lists_modification_present_flag, + .log2_parallel_merge_level = pps->log2_parallel_merge_level, + .slice_segment_header_extension_present_flag = pps->slice_header_extension_present_flag, + + .num_ref_frames = ff_hevc_frame_nb_refs(s), + + .IDR_picture_flag = IS_IDR(s), + .RAP_picture_flag = IS_IRAP(s), + .pattern_id = ((output_mode == 0) || (output_mode == 1)) ? 2 : ctx->pattern_id, /* Disable/enable dithering */ + .sw_hdr_skip_length = sh->nvidia_skip_length, + + /* + * Ignored in official code + .separate_colour_plane_flag = sps->separate_colour_plane_flag, + .log2_max_pic_order_cnt_lsb_minus4 = sps->log2_max_poc_lsb - 4, + .num_short_term_ref_pic_sets = sps->nb_st_rps, + .num_long_term_ref_pics_sps = sps->num_long_term_ref_pics_sps, + .num_delta_pocs_of_rps_idx = s->sh.short_term_rps ? s->sh.short_term_rps->rps_idx_num_delta_pocs : 0, + .long_term_ref_pics_present_flag = sps->long_term_ref_pics_present_flag, + .num_bits_short_term_ref_pics_in_slice = sh->short_term_ref_pic_set_size; + */ + }; + + /* Remove stale references from our ref list */ + for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) { + if (!(ctx->refs_mask & (1 << i))) + continue; + + for (j = 0; j < FF_ARRAY_ELEMS(s->DPB); ++j) { + if (s->DPB[j].frame && s->DPB[j].poc == ctx->refs[i].poc) + break; + } + + if (j == FF_ARRAY_ELEMS(s->DPB) || s->DPB[j].poc == s->ref->poc || + !(s->DPB[j].flags & (HEVC_FRAME_FLAG_SHORT_REF | HEVC_FRAME_FLAG_LONG_REF))) + ctx->refs_mask &= ~(1 << i), ctx->refs[i].map = NULL; + else + dpb_to_ref[i] = j, ref_to_dpb[j] = i; + } + + /* Try to find a valid reference */ + ctx->scratch_ref = -1, scratch_ref_diff_poc = 0; + for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) { + if (!(ctx->refs_mask & (1 << i)) || + (ctx->refs[i].map == av_nvtegra_frame_get_fbuf_map(s->frame))) + continue; + + st = find_ref_rps_type(s, &s->DPB[dpb_to_ref[i]]); + if ((st != ST_CURR_BEF) && (st != ST_CURR_AFT) && (st != LT_CURR)) + continue; + + ctx->scratch_ref = i; + scratch_ref_diff_poc = av_clip_int8(s->ref->poc - s->DPB[dpb_to_ref[i]].poc); + break; + } + + /* Add the current frame to our ref list */ + setup->curr_pic_idx = cur_frame = find_slot(&ctx->refs_mask); + ctx->refs[cur_frame] = (struct NVTegraHEVCRefFrame){ + .map = av_nvtegra_frame_get_fbuf_map(s->frame), + .chroma_off = s->frame->data[1] - s->frame->data[0], + .poc = s->ref->poc, + }; + + /* If there were no valid references, use the current frame */ + if (ctx->scratch_ref == -1) + ctx->scratch_ref = cur_frame; + + /* Fill the POC metadata */ + for (i = 0; i < FF_ARRAY_ELEMS(setup->RefDiffPicOrderCnts); ++i) { + if (i == cur_frame) + continue; + + if (ctx->refs_mask & (1 << i)) { + fr = &s->DPB[dpb_to_ref[i]]; + setup->RefDiffPicOrderCnts[i] = av_clip_int8(s->ref->poc - fr->poc); + setup->longtermflag |= !!(fr->flags & HEVC_FRAME_FLAG_LONG_REF) << (15 - i); + } else { + setup->RefDiffPicOrderCnts[i] = scratch_ref_diff_poc; + } + } + +#define RPS_TO_DPB_IDX(set, array) ({ \ + for (i = 0; i < s->rps[set].nb_refs; ++i) { \ + for (j = 0; j < FF_ARRAY_ELEMS(s->DPB); ++j) { \ + if (s->rps[set].ref[i] == &s->DPB[j]) { \ + array[i] = ref_to_dpb[j]; \ + break; \ + } \ + } \ + } \ +}) + + RPS_TO_DPB_IDX(ST_CURR_BEF, rps_stcurrbef); + RPS_TO_DPB_IDX(ST_CURR_AFT, rps_stcurraft); + RPS_TO_DPB_IDX(LT_CURR, rps_ltcurr); + +#define FILL_REFLIST(list, set, array) ({ \ + int len = FFMIN(s->rps[set].nb_refs, 16 - i); \ + memcpy(&setup->list[i], array, len); \ + i += len; \ +}) + + /* Fill the RPS metadata */ + if (s->rps[ST_CURR_BEF].nb_refs + s->rps[ST_CURR_AFT].nb_refs + s->rps[LT_CURR].nb_refs) { + for (i = 0; i < 16;) { + FILL_REFLIST(initreflistidxl0, ST_CURR_BEF, rps_stcurrbef); + FILL_REFLIST(initreflistidxl0, ST_CURR_AFT, rps_stcurraft); + FILL_REFLIST(initreflistidxl0, LT_CURR, rps_ltcurr); + } + + for (i = 0; i < 16;) { + FILL_REFLIST(initreflistidxl1, ST_CURR_AFT, rps_stcurraft); + FILL_REFLIST(initreflistidxl1, ST_CURR_BEF, rps_stcurrbef); + FILL_REFLIST(initreflistidxl1, LT_CURR, rps_ltcurr); + } + } + + ctx->pattern_id ^= 1; + + if (sps->scaling_list_enable_flag) + nvtegra_hevc_set_scaling_list((nvdec_hevc_scaling_list_s *)(mem + ctx->scaling_list_off), s); + + tile_sizes = (uint16_t *)(mem + ctx->tile_sizes_off); + if (pps->tiles_enabled_flag) { + nvtegra_hevc_set_tile_sizes(tile_sizes, s); + } else { + tile_sizes[0] = pps->column_width[0]; + tile_sizes[1] = pps->row_height [0]; + } +} + +static int nvtegra_hevc_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, HEVCContext *s, + NVTegraHEVCDecodeContext *ctx, AVFrame *cur_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)cur_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int i; + int err; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, HEVC)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, + AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE, HEVC) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_HEVC_SET_SCALING_LIST_OFFSET, + input_map, ctx->scaling_list_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_HEVC_SET_TILE_SIZES_OFFSET, + input_map, ctx->tile_sizes_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_HEVC_SET_FILTER_BUFFER_OFFSET, + &ctx->common_map, ctx->filter_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET, + &ctx->common_map, ctx->coloc_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(ref, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + ref.map, 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + ref.map, ref.chroma_off, NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) { + if (ctx->refs_mask & (1 << i)) + PUSH_FRAME(ctx->refs[i], i); + else + PUSH_FRAME(ctx->refs[ctx->scratch_ref], i); + } + + /* TODO: Dithered 8-bit post-processing, binding to DISPLAY_BUF */ + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_hevc_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + HEVCContext *s = avctx->priv_data; + AVFrame *frame = s->frame; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting HEVC-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + nvtegra_hevc_prepare_frame_setup((nvdec_hevc_pic_s *)(mem + ctx->core.pic_setup_off), + avctx, frame, ctx); + + return 0; +} + +static int nvtegra_hevc_end_frame(AVCodecContext *avctx) { + HEVCContext *s = avctx->priv_data; + NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = s->ref->frame; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_hevc_pic_s *setup; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending HEVC-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_hevc_pic_s *)(mem + ctx->core.pic_setup_off); + setup->stream_len = ctx->core.bitstream_len; + + err = nvtegra_hevc_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame); + if (err < 0) + return err; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0); +} + +static int nvtegra_hevc_decode_slice(AVCodecContext *avctx, const uint8_t *buf, + uint32_t buf_size) +{ + HEVCContext *s = avctx->priv_data; + AVFrame *frame = s->ref->frame; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + uint8_t *mem; + + mem = av_nvtegra_map_get_addr(input_map); + + /* + * Official code adds a 4-byte 00000001 startcode, + * though decoding was observed to work without it + */ + AV_WB8(mem + ctx->core.bitstream_off + ctx->core.bitstream_len, 0); + ctx->core.bitstream_len += 1; + + return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, AV_RB24(buf) != 1); +} + +#if CONFIG_HEVC_NVTEGRA_HWACCEL +const FFHWAccel ff_hevc_nvtegra_hwaccel = { + .p.name = "hevc_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_HEVC, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_hevc_start_frame, + .end_frame = &nvtegra_hevc_end_frame, + .decode_slice = &nvtegra_hevc_decode_slice, + .init = &nvtegra_hevc_decode_init, + .uninit = &nvtegra_hevc_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraHEVCDecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif From patchwork Thu May 30 19:43:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49426 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp79508vqg; Thu, 30 May 2024 13:10:51 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXsyVWlV7ikaI7SlK9i6fUjSgKyKQGgLbAMf0mhol4iU/Aq3En6PmleQQxeNC3qZzI+aiKRBIs8DF98e2wCG9fN8f7jY5hACujCZw== X-Google-Smtp-Source: AGHT+IHt117YcdFpfeAgR+r4ftLwpNt02dLTs7PIQ+uxDdqV98uajCmrLec+emGtBvXezTbYCs6k X-Received: by 2002:a05:6512:158f:b0:520:77e7:79d0 with SMTP id 2adb3069b0e04-52b7e0eba5dmr1300388e87.4.1717099850903; Thu, 30 May 2024 13:10:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717099850; cv=none; d=google.com; s=arc-20160816; b=ccpVXh2VsQVkEJ/eH2jkHhFEZQNIqXOaiwpvxcam85F26IQ2t1noLkJVBXikrgXpwa a/1DzdD7aNRuZpwpmxjsjspqLztNwNvjW9sL09ai/NDaI2K9I5vy4Q+HqJh2glUC9Njj VxRQXs7nFQbL2ltv3L60rls+uL5MdKkC/DaNJeF8MXgeYPZ+ovcrtw5I+jDyJp5ppvLE 3M9cLZkDo8KouE0Gql1qYFO+2OGAkqizhZpQb7dveVIWtxJ/xSC35gArfq8uHprnFwm8 l2Y8McqZ5m9x+jYlgPcOna+0RJn+KVfyAKuD/JGBX9vuG96FM1DzBjo62x2AcBOmHzwq jxTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=B8K1sAkz0kGIH2CY0VfjgsQsGJxp9HUC8mxZtAsIb+M=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=rNCsqjo4f/xQDvBLtt/tloAFhXfJ3BeaovO0QjWDyxJUz7ELw4eO9xd/LneKhhqzL0 fYzHXacW2HCfM4YAqpas5gMa0JS3lZFISbQd7o+46mcfX63ZKFJ3RpXj4zYzgdCD3H4M ggIR7XqWlSGlj33sr0YckFhXKRRdDoBs58oa7WfjMmvEb18Juhn4nDI8872CJkx7sqvM CFZB72DUjuMMbR3puy1icgKxJi3+y/nC3Rj1AMLIckysOY9nkt2twd/FLMLqVX3qOUXE medkeTAW0TLBS1gebEaaE0XK+NugBwPSsfpj8TD755YhbjE5XGeNUxopCB9rU6zIbvCl KJAg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=kBbwTPMk; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52b84d7ec4esi110686e87.403.2024.05.30.13.10.50; Thu, 30 May 2024 13:10:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=kBbwTPMk; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4EFC668D60C; Thu, 30 May 2024 22:44:56 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4C96C68D5EE for ; Thu, 30 May 2024 22:44:45 +0300 (EEST) Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-4212a3e82b6so3326445e9.0 for ; Thu, 30 May 2024 12:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098284; x=1717703084; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HCYeLWDZlCT0usEJj9CbbdeFQ3MK2DceI0o6M6Av1z0=; b=kBbwTPMkJCjAZEDZrLzVzJyZQIhYZC5b6eEgw/Jubuz7ZweiindblL+usNTS8lVJlw 100fyM8Qwxja2nNWwUdmY6SHC4VUGyQDqOVgjFMcUH5BC5g3T/JKWdDbFB6Jbhjjm98O 4SxhmuLucbI9R19kA2Wh9G16TNu4tfWXyptgZlNAqgbjJpa4hFfrZ2oE6fc2kq8EMs8r jWcz05H1l2GLgs1G9lRK+HNoyseY/liVQVhH7Tu5zCquJFiL6Y8DbpFGMv0kKxI6AzrP WuaYFXoOEH8sUoM2IJZQ+3GrRANIOBczTosIAJOr24HQRcdQ/BE5V0yh67zGhdJ3bXDK +QqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098284; x=1717703084; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HCYeLWDZlCT0usEJj9CbbdeFQ3MK2DceI0o6M6Av1z0=; b=BuJrA1ty1CQzbZxkTxM+jcDC2B/fN+diOq0LFlUACl6QAdZU+tHxKcj6n24q+CeeoO XDOdCcMMAmn7MFYfSUF2sNUNYlIlr1iCYzyZcQ7pbROeB2UCFFsAvtGXXHgsuVozwmMI h1m34DGWjEbDFx9yGsZwkeQYT/GXuIJbNhsANDBnOi18CwWHswFjsjzzSPeFmokJ7fEU CrTdurMc8C+qaJVjupj7MTorjtoq3BWUzwBAXiXvqekBzcqEb0VT//qHcwUaarpoxZLE dGldt02VRgC0SMliYUToHro9qtxwg6wGsH2dAby7y0Ju1dLh5oNQ0kNp76SxE4Zt5lpv ZXNw== X-Gm-Message-State: AOJu0YyXdv/9B7Tr0dwq9v7n56LNbuXS7Tkpjp3DVNrlzBF4qKvyFyDi 9/4HmZxmcY4FJDEdLmaebTSZXdmnIswAFKC/LCEjJi4/5Ffkc0mnIM9SCQ== X-Received: by 2002:adf:f846:0:b0:34d:b0bf:f1b5 with SMTP id ffacd0b85a97d-35dc7ecdd1bmr1923231f8f.35.1717098284198; Thu, 30 May 2024 12:44:44 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:44 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:16 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 14/16] nvtegra: add vp8 hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1J+51Z/NnKSu Signed-off-by: averne --- configure | 2 + libavcodec/Makefile | 1 + libavcodec/hwaccels.h | 1 + libavcodec/nvtegra_vp8.c | 334 +++++++++++++++++++++++++++++++++++++++ libavcodec/vp8.c | 6 + 5 files changed, 344 insertions(+) create mode 100644 libavcodec/nvtegra_vp8.c diff --git a/configure b/configure index ba4c5287e3..a347337dd4 100755 --- a/configure +++ b/configure @@ -3277,6 +3277,8 @@ vp8_nvdec_hwaccel_deps="nvdec" vp8_nvdec_hwaccel_select="vp8_decoder" vp8_vaapi_hwaccel_deps="vaapi" vp8_vaapi_hwaccel_select="vp8_decoder" +vp8_nvtegra_hwaccel_deps="nvtegra" +vp8_nvtegra_hwaccel_select="vp8_decoder" vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9" vp9_d3d11va_hwaccel_select="vp9_decoder" vp9_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_VP9" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index de667b8a4b..89c5986aab 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1053,6 +1053,7 @@ OBJS-$(CONFIG_VC1_VDPAU_HWACCEL) += vdpau_vc1.o OBJS-$(CONFIG_VC1_NVTEGRA_HWACCEL) += nvtegra_vc1.o OBJS-$(CONFIG_VP8_NVDEC_HWACCEL) += nvdec_vp8.o OBJS-$(CONFIG_VP8_VAAPI_HWACCEL) += vaapi_vp8.o +OBJS-$(CONFIG_VP8_NVTEGRA_HWACCEL) += nvtegra_vp8.o OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL) += dxva2_vp9.o OBJS-$(CONFIG_VP9_DXVA2_HWACCEL) += dxva2_vp9.o OBJS-$(CONFIG_VP9_D3D12VA_HWACCEL) += dxva2_vp9.o d3d12va_vp9.o diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index 77892dc2b2..7d43aeccec 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -80,6 +80,7 @@ extern const struct FFHWAccel ff_vc1_vdpau_hwaccel; extern const struct FFHWAccel ff_vc1_nvtegra_hwaccel; extern const struct FFHWAccel ff_vp8_nvdec_hwaccel; extern const struct FFHWAccel ff_vp8_vaapi_hwaccel; +extern const struct FFHWAccel ff_vp8_nvtegra_hwaccel; extern const struct FFHWAccel ff_vp9_d3d11va_hwaccel; extern const struct FFHWAccel ff_vp9_d3d11va2_hwaccel; extern const struct FFHWAccel ff_vp9_d3d12va_hwaccel; diff --git a/libavcodec/nvtegra_vp8.c b/libavcodec/nvtegra_vp8.c new file mode 100644 index 0000000000..a3aa69fe62 --- /dev/null +++ b/libavcodec/nvtegra_vp8.c @@ -0,0 +1,334 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "vp8.h" +#include "vp8data.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraVP8DecodeContext { + FFNVTegraDecodeContext core; + + AVNVTegraMap common_map; + uint32_t prob_data_off, history_off; + uint32_t history_size; + + AVFrame *golden_frame, *altref_frame, + *previous_frame; +} NVTegraVP8DecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +static int nvtegra_vp8_decode_uninit(AVCodecContext *avctx) { + NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VP8 decoder\n"); + + err = av_nvtegra_map_destroy(&ctx->common_map); + if (err < 0) + return err; + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static void nvtegra_vp8_init_probs(void *p) { + int i, j, k; + uint8_t *ptr = p; + + memset(p, 0, 0x4cc); + + for (i = 0; i < 4; ++i) { + for (j = 0; j < 8; ++j) { + for (k = 0; k < 3; ++k) { + memcpy(ptr, vp8_token_default_probs[i][j][k], NUM_DCT_TOKENS - 1); + ptr += NUM_DCT_TOKENS; + } + } + } + + memcpy(ptr, vp8_pred16x16_prob_inter, sizeof(vp8_pred16x16_prob_inter)); + ptr += 4; + + memcpy(ptr, vp8_pred8x8c_prob_inter, sizeof(vp8_pred8x8c_prob_inter)); + ptr += 4; + + for (i = 0; i < 2; ++i) { + memcpy(ptr, vp8_mv_default_prob[i], 19); + ptr += 20; + } +} + +static int nvtegra_vp8_decode_init(AVCodecContext *avctx) { + NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + uint32_t width_in_mbs, common_map_size; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA VP8 decoder\n"); + + /* Ignored: histogram map, size 0x400 */ + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_vp8_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->core.cmdbuf_off + AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->core.bitstream_off - ctx->core.cmdbuf_off; + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + width_in_mbs = FFALIGN(avctx->coded_width, MB_SIZE) / MB_SIZE; + ctx->history_size = width_in_mbs * 0x200; + + ctx->prob_data_off = 0; + ctx->history_off = FFALIGN(ctx->prob_data_off + 0x4b00, AV_NVTEGRA_MAP_ALIGN); + common_map_size = FFALIGN(ctx->history_off + ctx->history_size, 0x1000); + + err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + nvtegra_vp8_init_probs((uint8_t *)av_nvtegra_map_get_addr(&ctx->common_map) + ctx->prob_data_off); + + return 0; + +fail: + nvtegra_vp8_decode_uninit(avctx); + return err; +} + +static void nvtegra_vp8_prepare_frame_setup(nvdec_vp8_pic_s *setup, VP8Context *h, + NVTegraVP8DecodeContext *ctx) +{ + *setup = (nvdec_vp8_pic_s){ + .gptimer_timeout_value = 0, /* Default value */ + + .FrameWidth = FFALIGN(h->framep[VP8_FRAME_CURRENT]->tf.f->width, MB_SIZE), + .FrameHeight = FFALIGN(h->framep[VP8_FRAME_CURRENT]->tf.f->height, MB_SIZE), + + .keyFrame = h->keyframe, + .version = h->profile, + + .tileFormat = 0, /* TBL */ + .gob_height = 0, /* GOB_2 */ + + .errorConcealOn = 1, + + .firstPartSize = h->header_partition_size, + + .HistBufferSize = ctx->history_size / 256, + + .FrameStride = { + h->framep[VP8_FRAME_CURRENT]->tf.f->linesize[0] / MB_SIZE, + h->framep[VP8_FRAME_CURRENT]->tf.f->linesize[1] / MB_SIZE, + }, + + .luma_top_offset = 0, + .luma_bot_offset = 0, + .luma_frame_offset = 0, + .chroma_top_offset = 0, + .chroma_bot_offset = 0, + .chroma_frame_offset = 0, + + .current_output_memory_layout = 0, /* NV12 */ + .output_memory_layout = { 0, 0, 0 }, /* NV12 */ + + /* ???: Official code sets this value at 0x8d (reserved1[0]), so just set both */ + .segmentation_feature_data_update = h->segmentation.enabled ? h->segmentation.update_feature_data : 0, + .reserved1[0] = h->segmentation.enabled ? h->segmentation.update_feature_data : 0, + + .resultValue = 0, + }; +} + +static int nvtegra_vp8_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, VP8Context *h, + NVTegraVP8DecodeContext *ctx, AVFrame *cur_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)cur_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int err; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, VP8)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, + AV_NVTEGRA_ENUM(NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE, VP8) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP8_SET_PROB_DATA_OFFSET, + &ctx->common_map, ctx->prob_data_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET, + &ctx->common_map, ctx->history_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(fr, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0], \ + NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + PUSH_FRAME(ctx->golden_frame, 0); + PUSH_FRAME(ctx->altref_frame, 1); + PUSH_FRAME(ctx->previous_frame, 2); + PUSH_FRAME(cur_frame, 3); + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_vp8_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + VP8Context *h = avctx->priv_data; + AVFrame *frame = h->framep[VP8_FRAME_CURRENT]->tf.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting VP8-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + nvtegra_vp8_prepare_frame_setup((nvdec_vp8_pic_s *)(mem + ctx->core.pic_setup_off), h, ctx); + +#define SAFE_REF(type) (h->framep[(type)] ?: h->framep[VP8_FRAME_CURRENT]) + ctx->golden_frame = ff_nvtegra_safe_get_ref(SAFE_REF(VP8_FRAME_GOLDEN) ->tf.f, frame); + ctx->altref_frame = ff_nvtegra_safe_get_ref(SAFE_REF(VP8_FRAME_ALTREF) ->tf.f, frame); + ctx->previous_frame = ff_nvtegra_safe_get_ref(SAFE_REF(VP8_FRAME_PREVIOUS)->tf.f, frame); + + return 0; +} + +static int nvtegra_vp8_end_frame(AVCodecContext *avctx) { + VP8Context *h = avctx->priv_data; + NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = h->framep[VP8_FRAME_CURRENT]->tf.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_vp8_pic_s *setup; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending VP8-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_vp8_pic_s *)(mem + ctx->core.pic_setup_off); + setup->VLDBufferSize = ctx->core.bitstream_len; + + err = nvtegra_vp8_prepare_cmdbuf(&ctx->core.cmdbuf, h, ctx, frame); + if (err < 0) + return err; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0); +} + +static int nvtegra_vp8_decode_slice(AVCodecContext *avctx, const uint8_t *buf, + uint32_t buf_size) +{ + VP8Context *h = avctx->priv_data; + AVFrame *frame = h->framep[VP8_FRAME_CURRENT]->tf.f; + + int offset = h->keyframe ? 10 : 3; + + return ff_nvtegra_decode_slice(avctx, frame, buf + offset, buf_size - offset, false); +} + +#if CONFIG_VP8_NVTEGRA_HWACCEL +const FFHWAccel ff_vp8_nvtegra_hwaccel = { + .p.name = "vp8_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_VP8, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_vp8_start_frame, + .end_frame = &nvtegra_vp8_end_frame, + .decode_slice = &nvtegra_vp8_decode_slice, + .init = &nvtegra_vp8_decode_init, + .uninit = &nvtegra_vp8_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraVP8DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif diff --git a/libavcodec/vp8.c b/libavcodec/vp8.c index 8e91613068..8b4676e3ff 100644 --- a/libavcodec/vp8.c +++ b/libavcodec/vp8.c @@ -184,6 +184,9 @@ static enum AVPixelFormat get_pixel_format(VP8Context *s) #endif #if CONFIG_VP8_NVDEC_HWACCEL AV_PIX_FMT_CUDA, +#endif +#if CONFIG_VP8_NVTEGRA_HWACCEL + AV_PIX_FMT_NVTEGRA, #endif AV_PIX_FMT_YUV420P, AV_PIX_FMT_NONE, @@ -2972,6 +2975,9 @@ const FFCodec ff_vp8_decoder = { #endif #if CONFIG_VP8_NVDEC_HWACCEL HWACCEL_NVDEC(vp8), +#endif +#if CONFIG_VP8_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(vp8), #endif NULL }, From patchwork Thu May 30 19:43:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49425 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp79154vqg; Thu, 30 May 2024 13:10:10 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXjF0XcUIecbedGccj/f7QgwIFgqe6O0ih4fYoPVdL3BJ+7MDx3ua/dYpFWd3qtwvKbnlDeDbIqelXFqnMg0cCl7Zg9VkMBuoaHPw== X-Google-Smtp-Source: AGHT+IGKrByx/dgJIQSjygfRYwlk9j798J8m4LXgkmIGJjJqkAnSbQeirkFee2R3VW4D37TBD33G X-Received: by 2002:a17:907:1707:b0:a63:41f7:d47 with SMTP id a640c23a62f3a-a65e8e69487mr221326366b.39.1717099810300; Thu, 30 May 2024 13:10:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717099810; cv=none; d=google.com; s=arc-20160816; b=s2zwkr4JFutL0/ka1dtWIYwGQ2ze4I/iQqmD5sSLW9vj3alRhyfJc/Hjt+TDUaMXGw vbdgWfW9MM49EF6gjk+vuAuDw/wTKIS+KdcuMWgB2HYmQ5X9cZ3djna2qVL5qIc8hZQf RVjT1R17QnSgdX/3eFvBaBAKLVoty6uaOPGBr1pl9RvnYdx5BTiC86fdyXUzlsC7W4Ow gjvzw3R0umpcyjX2/4CERMMqcw5SQCsNDW4L9TVH34zfuTcnnpvg6egdkYW7oqpVizK3 kDDe6y/XKGVTvKhVpH7lnmaTEnJe+iJmKNLiVBtNfhlT/zmdLO2fOZp1BEQWtDsDaCEA 2h8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=SK97axiciIhXLHKXu59eJ1Tde3D+S5xZM9V7muuac2Y=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=GTIaSYq8Wo3M71lJzkzvCrBpeE8AscoJovbMue3lpSQozpQJggoNl3SlcpbgMziS6B UUKSZGPIpREH0Ab6JmZizuAcmo+wvV46XbqZBBM3cdMmD8jt7Mvgn/hH3aylhq0nZfca +7wA2gzGWEesOuZojxcYHJAsLqtDkXGTnPr9ExKrNwdJ8EK9WUgkKv+LTZgvPL+10tvl SOZAgy2wrNzhWL8FivLO1gSbbpyPcZxPIkvUU7k6g+ia2Oxiva1UnEOHBpnPQfDKLtWb pl+DKWtuw8qps2BkfZofRYYyfHxrUbA5S4ENCub/A5JP+m8Z21Ay5QhCRpsGWimZZwOD 36Rw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=hT3btnbT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67eaf652b1si10062066b.867.2024.05.30.13.10.09; Thu, 30 May 2024 13:10:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=hT3btnbT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 29CD968D5F1; Thu, 30 May 2024 22:44:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8973F68D543 for ; Thu, 30 May 2024 22:44:47 +0300 (EEST) Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-35507fc2600so1080409f8f.0 for ; Thu, 30 May 2024 12:44:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098287; x=1717703087; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qA6q47HbtrzLTlGpN7TCF74vEke5AjAriV8mIqst150=; b=hT3btnbT9Xi2ZGKJVGZcVzyjfDenE61V5c53HnCmUMQbdBFnGPdexftkBF2PlGhHXA ZOdCBWgZoLNQGOqUxFszAHNLZ071kDtU7DWKinTdpxvHSAVc1uAxT35OymQFUZ33kNUb Q8h2SlZGyZ0asn2FAy3IL3wLf/FycW+HLeJFeVDaBW6+VZH4KUKsHGbASogIt75zJ6Rt Pn+nkNcojFLq0jOGNB0i41r27x9Tjyh67OYunhX5qT4m8SKRGTUMeinYLjasxc5opwgG JU3ogomXa1HOQEuyxIST5XHweX13noU0PV2Qw1uOtWlei/C/6RrrTsN9O7i9Ts9dvFXw l3GA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098287; x=1717703087; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qA6q47HbtrzLTlGpN7TCF74vEke5AjAriV8mIqst150=; b=oGVAV4iV7VXCYxElsPlB42oEX1nY6ehAwmlQGer9WOJY2cShHWJWCIieiFaXQSTFY4 p4ACFku/NpbWlAyrdxCeEl33TgPIJUO+vqSt2gN/TFFdSwPRG62nfKzBcqHnGWopPzmU VO1+LLdw6ymCFnO8Ixj4ofCK+musBfxLNU16UEluEvltajH3g9wm/JHXk90+DTq6PxnS 3KtF6ApTczt3cvu8zHOYX3P7jqp0BgtEMdnxjHlmg+y5+IIdVudcxd4kijOb5auS7WmM w20dIUzlx2dkqLROUunGbUFhp6wrKvKP29PkxifTpit3HxO+taKT44CKz3UJqY0d1Bz7 +/hw== X-Gm-Message-State: AOJu0YyPIVzczwhG9A8ZZOe/Hs+TQjorYbejE0nYo+sM2xJXEIZUGoMe I80lIU2LuPDASp3yND6s1u71MQRC59nk8vk26ShFnxtUTwccktzNOFq7Ow== X-Received: by 2002:adf:eb8d:0:b0:354:f34c:646f with SMTP id ffacd0b85a97d-35dc02ba5camr2432946f8f.58.1717098285324; Thu, 30 May 2024 12:44:45 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:45 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:17 +0200 Message-ID: <133e86925f3ee08ef79496a0cbbd70834d487ec4.1717083800.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 15/16] nvtegra: add vp9 hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: E9YuS1oh6ONW This hardware block was based on/licensed from the hantro implementation (as evidenced by the identical structures). Relevant V4L2 kernel code was referenced when implementing backward entropy updates. Signed-off-by: averne --- configure | 2 + libavcodec/Makefile | 1 + libavcodec/hwaccels.h | 1 + libavcodec/nvtegra_vp9.c | 665 +++++++++++++++++++++++++++++++++++++++ libavcodec/vp9.c | 10 +- 5 files changed, 678 insertions(+), 1 deletion(-) create mode 100644 libavcodec/nvtegra_vp9.c diff --git a/configure b/configure index a347337dd4..3fe948d9ab 100755 --- a/configure +++ b/configure @@ -3295,6 +3295,8 @@ vp9_vdpau_hwaccel_deps="vdpau VdpPictureInfoVP9" vp9_vdpau_hwaccel_select="vp9_decoder" vp9_videotoolbox_hwaccel_deps="videotoolbox" vp9_videotoolbox_hwaccel_select="vp9_decoder" +vp9_nvtegra_hwaccel_deps="nvtegra" +vp9_nvtegra_hwaccel_select="vp9_decoder" wmv3_d3d11va_hwaccel_select="vc1_d3d11va_hwaccel" wmv3_d3d11va2_hwaccel_select="vc1_d3d11va2_hwaccel" wmv3_d3d12va_hwaccel_select="vc1_d3d12va_hwaccel" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 89c5986aab..914995558e 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1061,6 +1061,7 @@ OBJS-$(CONFIG_VP9_NVDEC_HWACCEL) += nvdec_vp9.o OBJS-$(CONFIG_VP9_VAAPI_HWACCEL) += vaapi_vp9.o OBJS-$(CONFIG_VP9_VDPAU_HWACCEL) += vdpau_vp9.o OBJS-$(CONFIG_VP9_VIDEOTOOLBOX_HWACCEL) += videotoolbox_vp9.o +OBJS-$(CONFIG_VP9_NVTEGRA_HWACCEL) += nvtegra_vp9.o OBJS-$(CONFIG_VP8_QSV_HWACCEL) += qsvdec.o # Objects duplicated from other libraries for shared builds diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index 7d43aeccec..a3babfc309 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -89,6 +89,7 @@ extern const struct FFHWAccel ff_vp9_nvdec_hwaccel; extern const struct FFHWAccel ff_vp9_vaapi_hwaccel; extern const struct FFHWAccel ff_vp9_vdpau_hwaccel; extern const struct FFHWAccel ff_vp9_videotoolbox_hwaccel; +extern const struct FFHWAccel ff_vp9_nvtegra_hwaccel; extern const struct FFHWAccel ff_wmv3_d3d11va_hwaccel; extern const struct FFHWAccel ff_wmv3_d3d11va2_hwaccel; extern const struct FFHWAccel ff_wmv3_d3d12va_hwaccel; diff --git a/libavcodec/nvtegra_vp9.c b/libavcodec/nvtegra_vp9.c new file mode 100644 index 0000000000..a0cca1a5a4 --- /dev/null +++ b/libavcodec/nvtegra_vp9.c @@ -0,0 +1,665 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "vp9data.h" +#include "vp9dec.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraVP9DecodeContext { + FFNVTegraDecodeContext core; + + uint32_t prob_tab_off; + + AVNVTegraMap common_map; + uint32_t segment_rw1_off, segment_rw2_off, tile_sizes_off, filter_off, + col_mvrw1_off, col_mvrw2_off, ctx_counter_off; + + bool prev_show_frame; + + AVFrame *refs[3]; +} NVTegraVP9DecodeContext; + +/* Size (width, height) of a macroblock */ +#define MB_SIZE 16 + +/* Maximum size (width, height) of a superblock */ +#define SB_SIZE 64 + +#define CEILDIV(a, b) (((a) + (b) - 1) / (b)) + +/* Prediction modes aren't layed out in the same order in ffmpeg's defaults than in hardware */ +static const uint8_t pmconv[] = { 2, 0, 1, 3, 4, 5, 6, 8, 7, 9 }; + +static int nvtegra_vp9_decode_uninit(AVCodecContext *avctx) { + NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VP9 decoder\n"); + + err = av_nvtegra_map_destroy(&ctx->common_map); + if (err < 0) + return err; + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_vp9_decode_init(AVCodecContext *avctx) { + NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + AVHWDeviceContext *hw_device_ctx; + AVNVTegraDeviceContext *device_hwctx; + uint32_t aligned_width, aligned_height, max_sb_size, + segment_rw_size, filter_size, col_mvrw_size, ctx_counter_size, + common_map_size; + uint8_t *mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA VP9 decoder\n"); + + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_vp9_pic_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvdec_status_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->prob_tab_off = FFALIGN(ctx->core.cmdbuf_off + 2*AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->prob_tab_off + sizeof(nvdec_vp9EntropyProbs_t), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->prob_tab_off - ctx->core.cmdbuf_off; + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data; + device_hwctx = hw_device_ctx->hwctx; + + aligned_width = FFALIGN(avctx->coded_width, MB_SIZE); + aligned_height = FFALIGN(avctx->coded_height, MB_SIZE); + max_sb_size = CEILDIV(aligned_width, 64) * CEILDIV(aligned_height, 64); + segment_rw_size = FFALIGN(max_sb_size * 32, 0x100); + filter_size = FFALIGN(avctx->height, 64) * 988; + col_mvrw_size = max_sb_size * 1024; + ctx_counter_size = FFALIGN(sizeof(nvdec_vp9EntropyCounts_t), 0x100); + + ctx->segment_rw1_off = 0; + ctx->segment_rw2_off = FFALIGN(ctx->segment_rw1_off + segment_rw_size, AV_NVTEGRA_MAP_ALIGN); + ctx->tile_sizes_off = FFALIGN(ctx->segment_rw2_off + segment_rw_size, AV_NVTEGRA_MAP_ALIGN); + ctx->filter_off = FFALIGN(ctx->tile_sizes_off + 0x700, AV_NVTEGRA_MAP_ALIGN); + ctx->col_mvrw1_off = FFALIGN(ctx->filter_off + filter_size, AV_NVTEGRA_MAP_ALIGN); + ctx->col_mvrw2_off = FFALIGN(ctx->col_mvrw1_off + col_mvrw_size, AV_NVTEGRA_MAP_ALIGN); + ctx->ctx_counter_off = FFALIGN(ctx->col_mvrw2_off + col_mvrw_size, AV_NVTEGRA_MAP_ALIGN); + common_map_size = FFALIGN(ctx->ctx_counter_off + ctx_counter_size, 0x1000); + + err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100, + NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE); + if (err < 0) + goto fail; + + mem = av_nvtegra_map_get_addr(&ctx->common_map); + + memset(mem + ctx->segment_rw1_off, 0, segment_rw_size); + memset(mem + ctx->segment_rw2_off, 0, segment_rw_size); + + memset(mem + ctx->tile_sizes_off, 0, 0x700); + ((uint16_t *)(mem + ctx->tile_sizes_off))[0x37a] = 9; + ((uint16_t *)(mem + ctx->tile_sizes_off))[0x37b] = 1; + + memset(mem + ctx->col_mvrw1_off, 0, col_mvrw_size); + memset(mem + ctx->col_mvrw2_off, 0, col_mvrw_size); + + memset(mem + ctx->ctx_counter_off, 0, sizeof(nvdec_vp9EntropyCounts_t)); + + return 0; + +fail: + nvtegra_vp9_decode_uninit(avctx); + return err; +} + +static void nvtegra_vp9_init_probs(nvdec_vp9EntropyProbs_t *probs) { + int i, j; + + for (i = 0; i < FF_ARRAY_ELEMS(probs->kf_bmode_prob); ++i) { + for (j = 0; j < FF_ARRAY_ELEMS(probs->kf_bmode_prob[0]); ++j) { + memcpy(probs->kf_bmode_prob[i][j], ff_vp9_default_kf_ymode_probs[pmconv[i]][pmconv[j]], 8); + probs->kf_bmode_probB[i][j][0] = ff_vp9_default_kf_ymode_probs[pmconv[i]][pmconv[j]][8]; + } + memcpy(probs->kf_uv_mode_prob[i], ff_vp9_default_kf_uvmode_probs[pmconv[i]], 8); + probs->kf_uv_mode_probB[i][0] = ff_vp9_default_kf_uvmode_probs[pmconv[i]][8]; + } +} + +static void nvtegra_vp9_update_probs(nvdec_vp9EntropyProbs_t *probs, + VP9Context *s, bool init) +{ + ProbContext *p = &s->prob.p; + + int i, j, k, l; + + if (init) { + memset(probs, 0, sizeof(nvdec_vp9EntropyProbs_t)); + nvtegra_vp9_init_probs(probs); + } + + for (i = 0; i < FF_ARRAY_ELEMS(probs->ref_pred_probs); ++i) + probs->ref_pred_probs[i] = *s->intra_pred_data[i]; + + memcpy(probs->mb_segment_tree_probs, s->s.h.segmentation.prob, sizeof(probs->mb_segment_tree_probs)); + if (s->s.h.segmentation.temporal) + memcpy(probs->segment_pred_probs, s->s.h.segmentation.pred_prob, sizeof(probs->segment_pred_probs)); + else + memset(probs->segment_pred_probs, 0xff, sizeof(probs->segment_pred_probs)); + + /* Ignored by official software: ref_scores, prob_comppred */ + + for (i = 0; i < FF_ARRAY_ELEMS(probs->a.inter_mode_prob); ++i) + memcpy(probs->a.inter_mode_prob[i], p->mv_mode[i], 3); + + memcpy(probs->a.intra_inter_prob, p->intra, sizeof(probs->a.intra_inter_prob)); + + for (i = 0; i < FF_ARRAY_ELEMS(probs->a.uv_mode_prob); ++i) { + memcpy(probs->a.uv_mode_prob[i], p->uv_mode[pmconv[i]], 8); + probs->a.uv_mode_probB[i][0] = p->uv_mode[pmconv[i]][8]; + } + + for (i = 0; i < FF_ARRAY_ELEMS(probs->a.tx8x8_prob); ++i) { + memcpy(probs->a.tx8x8_prob [i], &p->tx8p [i], 1); + memcpy(probs->a.tx16x16_prob[i], p->tx16p[i], 2); + memcpy(probs->a.tx32x32_prob[i], p->tx32p[i], 3); + } + + for (i = 0; i < FF_ARRAY_ELEMS(probs->a.sb_ymode_prob); ++i) { + memcpy(probs->a.sb_ymode_prob[i], p->y_mode[i], 8); + probs->a.sb_ymode_probB[i][0] = p->y_mode[i][8]; + } + + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + memcpy(probs->a.partition_prob[0][4*(3-i)+j], + &ff_vp9_default_kf_partition_probs[i][j], 3); + memcpy(probs->a.partition_prob[1][4*(3-i)+j], &p->partition[i][j], 3); + } + } + + memcpy(probs->a.switchable_interp_prob, p->filter, sizeof(probs->a.switchable_interp_prob)); + memcpy(probs->a.comp_inter_prob, p->comp, sizeof(probs->a.comp_inter_prob)); + memcpy(probs->a.mbskip_probs, p->skip, sizeof(probs->a.mbskip_probs)); + + memcpy(probs->a.nmvc.joints, p->mv_joint, 3); + for (i = 0; i < FF_ARRAY_ELEMS(p->mv_comp); ++i) { + probs->a.nmvc.sign [i] = p->mv_comp[i].sign; + probs->a.nmvc.class0 [i][0] = p->mv_comp[i].class0; + probs->a.nmvc.class0_hp[i] = p->mv_comp[i].class0_hp; + probs->a.nmvc.hp [i] = p->mv_comp[i].hp; + memcpy(probs->a.nmvc.fp [i], p->mv_comp[i].fp, 3); + memcpy(probs->a.nmvc.classes [i], p->mv_comp[i].classes, 10); + memcpy(probs->a.nmvc.class0_fp[i], p->mv_comp[i].class0_fp, 2 * 3); + memcpy(probs->a.nmvc.bits [i], p->mv_comp[i].bits, 10); + } + + memcpy(probs->a.single_ref_prob, p->single_ref, sizeof(probs->a.single_ref_prob)); + memcpy(probs->a.comp_ref_prob, p->comp_ref, sizeof(probs->a.comp_ref_prob)); + + for (i = 0; i < FF_ARRAY_ELEMS(probs->a.probCoeffs); ++i) { + for (j = 0; j < FF_ARRAY_ELEMS(probs->a.probCoeffs[0]); ++j) { + for (k = 0; k < FF_ARRAY_ELEMS(probs->a.probCoeffs[0][0]); ++k) { + for (l = 0; l < FF_ARRAY_ELEMS(probs->a.probCoeffs[0][0][0]); ++l) { + memcpy(probs->a.probCoeffs [i][j][k][l], s->prob.coef[0][i][j][k][l], 3); + memcpy(probs->a.probCoeffs8x8 [i][j][k][l], s->prob.coef[1][i][j][k][l], 3); + memcpy(probs->a.probCoeffs16x16[i][j][k][l], s->prob.coef[2][i][j][k][l], 3); + memcpy(probs->a.probCoeffs32x32[i][j][k][l], s->prob.coef[3][i][j][k][l], 3); + } + } + } + } +} + +static void nvtegra_vp9_set_tile_sizes(uint16_t *sizes, VP9Context *s) { + int i, j; + + for (i = 0; i < s->s.h.tiling.tile_rows; ++i) { + for (j = 0; j < s->s.h.tiling.tile_cols; ++j) { + sizes[0] = (s->sb_cols * (j + 1) >> s->s.h.tiling.log2_tile_cols) - + (s->sb_cols * j >> s->s.h.tiling.log2_tile_cols); + sizes[1] = (s->sb_rows * (i + 1) >> s->s.h.tiling.log2_tile_rows) - + (s->sb_rows * i >> s->s.h.tiling.log2_tile_rows); + sizes += 2; + } + } +} + +static void nvtegra_vp9_update_counts(nvdec_vp9EntropyCounts_t *cts, + VP9TileData *td) +{ + int i, j, k, l; + + for (i = 0; i < FF_ARRAY_ELEMS(td->counts.y_mode); ++i) { + for (j = 0; j < FF_ARRAY_ELEMS(td->counts.y_mode[0]); ++j) { + td->counts.y_mode[i][pmconv[j]] = cts->sb_ymode_counts[i][j]; + } + } + + for (i = 0; i < FF_ARRAY_ELEMS(td->counts.uv_mode); ++i) { + for (j = 0; j < FF_ARRAY_ELEMS(td->counts.uv_mode[0]); ++j) { + td->counts.uv_mode[pmconv[i]][pmconv[j]] = cts->uv_mode_counts[i][j]; + } + } + + memcpy(td->counts.filter, cts->switchable_interp_counts, sizeof(td->counts.filter)); + memcpy(td->counts.intra, cts->intra_inter_count, sizeof(td->counts.intra)); + memcpy(td->counts.comp, cts->comp_inter_count, sizeof(td->counts.comp)); + memcpy(td->counts.single_ref, cts->single_ref_count, sizeof(td->counts.single_ref)); + memcpy(td->counts.tx32p, cts->tx32x32_count, sizeof(td->counts.tx32p)); + memcpy(td->counts.tx16p, cts->tx16x16_count, sizeof(td->counts.tx16p)); + memcpy(td->counts.tx8p, cts->tx8x8_count, sizeof(td->counts.tx8p)); + memcpy(td->counts.skip, cts->mbskip_count, sizeof(td->counts.skip)); + + for (i = 0; i < FF_ARRAY_ELEMS(td->counts.mv_mode); ++i) { + td->counts.mv_mode[i][0] = cts->inter_mode_counts[i][1][0]; + td->counts.mv_mode[i][1] = cts->inter_mode_counts[i][2][0]; + td->counts.mv_mode[i][2] = cts->inter_mode_counts[i][0][0]; + td->counts.mv_mode[i][3] = cts->inter_mode_counts[i][2][1]; + } + + memcpy(td->counts.mv_joint, cts->nmvcount.joints, sizeof(td->counts.mv_joint)); + for (i = 0; i < FF_ARRAY_ELEMS(td->counts.mv_comp); ++i) { + memcpy(td->counts.mv_comp[i].sign, cts->nmvcount.sign [i], sizeof(td->counts.mv_comp[i].sign)); + memcpy(td->counts.mv_comp[i].classes, cts->nmvcount.classes [i], sizeof(td->counts.mv_comp[i].classes)); + memcpy(td->counts.mv_comp[i].class0, cts->nmvcount.class0 [i], sizeof(td->counts.mv_comp[i].class0)); + memcpy(td->counts.mv_comp[i].bits, cts->nmvcount.bits [i], sizeof(td->counts.mv_comp[i].bits)); + memcpy(td->counts.mv_comp[i].class0_fp, cts->nmvcount.class0_fp[i], sizeof(td->counts.mv_comp[i].class0_fp)); + memcpy(td->counts.mv_comp[i].fp, cts->nmvcount.fp [i], sizeof(td->counts.mv_comp[i].fp)); + memcpy(td->counts.mv_comp[i].class0_hp, cts->nmvcount.class0_hp[i], sizeof(td->counts.mv_comp[i].class0_hp)); + memcpy(td->counts.mv_comp[i].hp, cts->nmvcount.hp [i], sizeof(td->counts.mv_comp[i].hp)); + } + + memcpy(td->counts.partition[0], cts->partition_counts[12], sizeof(td->counts.partition[0])); + memcpy(td->counts.partition[1], cts->partition_counts[ 8], sizeof(td->counts.partition[1])); + memcpy(td->counts.partition[2], cts->partition_counts[ 4], sizeof(td->counts.partition[2])); + memcpy(td->counts.partition[3], cts->partition_counts[ 0], sizeof(td->counts.partition[3])); + + for (i = 0; i < FF_ARRAY_ELEMS(td->counts.coef[0]); ++i) { + for (j = 0; j < FF_ARRAY_ELEMS(td->counts.coef[0][0]); ++j) { + for (k = 0; k < FF_ARRAY_ELEMS(td->counts.coef[0][0][0]); ++k) { + for (l = 0; l < FF_ARRAY_ELEMS(td->counts.coef[0][0][0][0]); ++l) { + memcpy(td->counts.coef[0][i][j][k][l], cts->countCoeffs [i][j][k][l], + sizeof(td->counts.coef[0][i][j][k][l])); + memcpy(td->counts.coef[1][i][j][k][l], cts->countCoeffs8x8 [i][j][k][l], + sizeof(td->counts.coef[1][i][j][k][l])); + memcpy(td->counts.coef[2][i][j][k][l], cts->countCoeffs16x16[i][j][k][l], + sizeof(td->counts.coef[2][i][j][k][l])); + memcpy(td->counts.coef[3][i][j][k][l], cts->countCoeffs32x32[i][j][k][l], + sizeof(td->counts.coef[3][i][j][k][l])); + td->counts.eob[0][i][j][k][l][0] = cts->countCoeffs [i][j][k][l][3]; + td->counts.eob[0][i][j][k][l][1] = cts->countEobs[0][i][j][k][l] - td->counts.eob[0][i][j][k][l][0]; + td->counts.eob[1][i][j][k][l][0] = cts->countCoeffs8x8 [i][j][k][l][3]; + td->counts.eob[1][i][j][k][l][1] = cts->countEobs[1][i][j][k][l] - td->counts.eob[1][i][j][k][l][0]; + td->counts.eob[2][i][j][k][l][0] = cts->countCoeffs16x16[i][j][k][l][3]; + td->counts.eob[2][i][j][k][l][1] = cts->countEobs[2][i][j][k][l] - td->counts.eob[2][i][j][k][l][0]; + td->counts.eob[3][i][j][k][l][0] = cts->countCoeffs32x32[i][j][k][l][3]; + td->counts.eob[3][i][j][k][l][1] = cts->countEobs[3][i][j][k][l] - td->counts.eob[3][i][j][k][l][0]; + } + } + } + } +} + +static void nvtegra_vp9_prepare_frame_setup(nvdec_vp9_pic_s *setup, AVCodecContext *avctx, + NVTegraVP9DecodeContext *ctx) +{ + VP9Context *s = avctx->priv_data; + VP9SharedContext *h = &s->s; + + int i; + + /* Note: the stride is divided by 2 when the depth is > 8 (not supported on T210) */ +#define FWIDTH(f) ((f && f->private_ref) ? f->width : 0) +#define FHEIGHT(f) ((f && f->private_ref) ? f->height : 0) +#define FSTRIDE(f, c) ((f && f->private_ref) ? f->linesize[c] : 0) + + /* Note: the v1 substructure isn't filled out on T210 */ + *setup = (nvdec_vp9_pic_s){ + .gptimer_timeout_value = 0, /* Default value */ + + .tileformat = 0, /* TBL */ + .gob_height = 0, /* GOB_2 */ + + .Vp9BsdCtrlOffset = FFALIGN(avctx->height, 64) * 912 / 256, + + .ref0_width = FWIDTH (h->refs[h->h.refidx[0]].f), + .ref0_height = FHEIGHT(h->refs[h->h.refidx[0]].f), + .ref0_stride = { + FSTRIDE(h->refs[h->h.refidx[0]].f, 0), + FSTRIDE(h->refs[h->h.refidx[0]].f, 1), + }, + + .ref1_width = FWIDTH (h->refs[h->h.refidx[1]].f), + .ref1_height = FHEIGHT(h->refs[h->h.refidx[1]].f), + .ref1_stride = { + FSTRIDE(h->refs[h->h.refidx[1]].f, 0), + FSTRIDE(h->refs[h->h.refidx[1]].f, 1), + }, + + .ref2_width = FWIDTH (h->refs[h->h.refidx[2]].f), + .ref2_height = FHEIGHT(h->refs[h->h.refidx[2]].f), + .ref2_stride = { + FSTRIDE(h->refs[h->h.refidx[2]].f, 0), + FSTRIDE(h->refs[h->h.refidx[2]].f, 1), + }, + + .width = FWIDTH (h->frames[CUR_FRAME].tf.f), + .height = FHEIGHT(h->frames[CUR_FRAME].tf.f), + .framestride = { + FSTRIDE(h->frames[CUR_FRAME].tf.f, 0), + FSTRIDE(h->frames[CUR_FRAME].tf.f, 1), + }, + + .keyFrame = h->h.keyframe, + .prevIsKeyFrame = s->last_keyframe, + .errorResilient = h->h.errorres, + .prevShowFrame = ctx->prev_show_frame, + .intraOnly = h->h.intraonly, + + .refFrameSignBias = { + 0, + h->h.signbias[0], h->h.signbias[1], h->h.signbias[2], + }, + + .loopFilterLevel = h->h.filter.level, + .loopFilterSharpness = h->h.filter.sharpness, + + .qpYAc = h->h.yac_qi, + .qpYDc = h->h.ydc_qdelta, + .qpChAc = h->h.uvdc_qdelta, + .qpChDc = h->h.uvac_qdelta, + + .lossless = h->h.lossless, + .transform_mode = h->h.txfmmode, + .allow_high_precision_mv = h->h.keyframe ? 0 : h->h.highprecisionmvs, + .mcomp_filter_type = h->h.filtermode, + .comp_pred_mode = h->h.comppredmode, + .comp_fixed_ref = h->h.allowcompinter ? h->h.fixcompref + 1 : 0, + .comp_var_ref = { + h->h.allowcompinter ? h->h.varcompref[0] + 1 : 0, + h->h.allowcompinter ? h->h.varcompref[1] + 1 : 0, + }, + + .log2_tile_columns = h->h.tiling.log2_tile_cols, + .log2_tile_rows = h->h.tiling.log2_tile_rows, + + .segmentEnabled = h->h.segmentation.enabled, + .segmentMapUpdate = h->h.segmentation.update_map, + .segmentMapTemporalUpdate = h->h.segmentation.temporal, + .segmentFeatureMode = h->h.segmentation.absolute_vals, + .modeRefLfEnabled = h->h.lf_delta.enabled, + .mbRefLfDelta = { + h->h.lf_delta.ref[0], h->h.lf_delta.ref[1], + h->h.lf_delta.ref[2], h->h.lf_delta.ref[3], + }, + .mbModeLfDelta = { + h->h.lf_delta.mode[0], h->h.lf_delta.mode[1], + }, + }; + + for (i = 0; i < 8; ++i) { + setup->segmentFeatureEnable[i][0] = h->h.segmentation.feat[i].q_enabled; + setup->segmentFeatureEnable[i][1] = h->h.segmentation.feat[i].lf_enabled; + setup->segmentFeatureEnable[i][2] = h->h.segmentation.feat[i].ref_enabled; + setup->segmentFeatureEnable[i][3] = h->h.segmentation.feat[i].skip_enabled; + + setup->segmentFeatureData[i][0] = h->h.segmentation.feat[i].q_val; + setup->segmentFeatureData[i][1] = h->h.segmentation.feat[i].lf_val; + setup->segmentFeatureData[i][2] = h->h.segmentation.feat[i].ref_val; + setup->segmentFeatureData[i][3] = 0; + } + + ctx->prev_show_frame = !h->h.invisible; +} + +static int nvtegra_vp9_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, VP9SharedContext *h, + NVTegraVP9DecodeContext *ctx, AVFrame *cur_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)cur_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + uint32_t col_mvwrite_off, col_mvread_off; + int err; + + if (ctx->core.frame_idx % 2 == 0) + col_mvwrite_off = ctx->col_mvrw1_off, col_mvread_off = ctx->col_mvrw2_off; + else + col_mvwrite_off = ctx->col_mvrw2_off, col_mvread_off = ctx->col_mvrw1_off; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, VP9)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, + AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE, VP9) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) | + AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_PROB_TAB_BUF_OFFSET, + input_map, ctx->prob_tab_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_CTX_COUNTER_BUF_OFFSET, + &ctx->common_map, ctx->ctx_counter_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_TILE_SIZE_BUF_OFFSET, + &ctx->common_map, ctx->tile_sizes_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_COL_MVWRITE_BUF_OFFSET, + &ctx->common_map, col_mvwrite_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_COL_MVREAD_BUF_OFFSET, + &ctx->common_map, col_mvread_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_SEGMENT_READ_BUF_OFFSET, + &ctx->common_map, ctx->segment_rw1_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_SEGMENT_WRITE_BUF_OFFSET, + &ctx->common_map, ctx->segment_rw2_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_FILTER_BUFFER_OFFSET, + &ctx->common_map, ctx->filter_off, NVHOST_RELOC_TYPE_DEFAULT); + +#define PUSH_FRAME(fr, offset) ({ \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \ + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \ + av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0], \ + NVHOST_RELOC_TYPE_DEFAULT); \ +}) + + PUSH_FRAME(ctx->refs[0], 0); + PUSH_FRAME(ctx->refs[1], 1); + PUSH_FRAME(ctx->refs[2], 2); + PUSH_FRAME(cur_frame, 3); + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE, + AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + if (h->h.segmentation.update_map) + FFSWAP(uint32_t, ctx->segment_rw1_off, ctx->segment_rw2_off); + + return 0; +} + +static int nvtegra_vp9_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + VP9Context *s = avctx->priv_data; + VP9SharedContext *h = &s->s; + AVFrame *frame = h->frames[CUR_FRAME].tf.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem, *common_mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting VP9-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + if (s->s.h.refreshctx && s->s.h.parallelmode) { + int i, j, k, l, m; + + for (i = 0; i < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef); i++) { + for (j = 0; j < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0]); j++) + for (k = 0; k < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0]); k++) + for (l = 0; l < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0][0]); l++) + for (m = 0; m < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0][0][0]); m++) + memcpy(s->prob_ctx[s->s.h.framectxid].coef[i][j][k][l][m], + s->prob.coef[i][j][k][l][m], + FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0][0][0][0])); + if (s->s.h.txfmmode == i) + break; + } + + s->prob_ctx[s->s.h.framectxid].p = s->prob.p; + } + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map), common_mem = av_nvtegra_map_get_addr(&ctx->common_map); + + nvtegra_vp9_prepare_frame_setup((nvdec_vp9_pic_s *)(mem + ctx->core.pic_setup_off), avctx, ctx); + nvtegra_vp9_set_tile_sizes((uint16_t *)(common_mem + ctx->tile_sizes_off), s); + nvtegra_vp9_update_probs((nvdec_vp9EntropyProbs_t *)(mem + ctx->prob_tab_off), s, ctx->core.new_input_buffer); + + ctx->refs[0] = ff_nvtegra_safe_get_ref(h->refs[h->h.refidx[0]].f, h->frames[CUR_FRAME].tf.f); + ctx->refs[1] = ff_nvtegra_safe_get_ref(h->refs[h->h.refidx[1]].f, h->frames[CUR_FRAME].tf.f); + ctx->refs[2] = ff_nvtegra_safe_get_ref(h->refs[h->h.refidx[2]].f, h->frames[CUR_FRAME].tf.f); + + return 0; +} + +static int nvtegra_vp9_end_frame(AVCodecContext *avctx) { + VP9Context *s = avctx->priv_data; + VP9SharedContext *h = avctx->priv_data; + NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = h->frames[CUR_FRAME].tf.f; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvdec_vp9_pic_s *setup; + uint8_t *mem, *common_mem; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending VP9-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvdec_vp9_pic_s *)(mem + ctx->core.pic_setup_off); + setup->stream_len = ctx->core.bitstream_len; + + err = nvtegra_vp9_prepare_cmdbuf(&ctx->core.cmdbuf, h, ctx, frame); + if (err < 0) + return err; + + err = ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0); + if (err < 0) + return err; + + /* + * Perform backward probability updates if necessary. + * Since it depends on entropy counts calculated by the hardware, + * we need to wait for the decode operation to complete. + */ + if (!s->s.h.errorres && !s->s.h.parallelmode) { + err = ff_nvtegra_wait_decode(avctx, frame); + if (err < 0) + return err; + + common_mem = av_nvtegra_map_get_addr(&ctx->common_map); + + nvtegra_vp9_update_counts((nvdec_vp9EntropyCounts_t *)(common_mem + ctx->ctx_counter_off), + s->td); + ff_vp9_adapt_probs(s); + } + + return 0; +} + +static int nvtegra_vp9_decode_slice(AVCodecContext *avctx, const uint8_t *buf, + uint32_t buf_size) +{ + VP9SharedContext *h = avctx->priv_data; + AVFrame *frame = h->frames[CUR_FRAME].tf.f; + + int offset = h->h.uncompressed_header_size + h->h.compressed_header_size; + + return ff_nvtegra_decode_slice(avctx, frame, buf + offset, buf_size - offset, false); +} + +#if CONFIG_VP9_NVTEGRA_HWACCEL +const FFHWAccel ff_vp9_nvtegra_hwaccel = { + .p.name = "vp9_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_VP9, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_vp9_start_frame, + .end_frame = &nvtegra_vp9_end_frame, + .decode_slice = &nvtegra_vp9_decode_slice, + .init = &nvtegra_vp9_decode_init, + .uninit = &nvtegra_vp9_decode_uninit, + .frame_params = &ff_nvtegra_frame_params, + .priv_data_size = sizeof(NVTegraVP9DecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif diff --git a/libavcodec/vp9.c b/libavcodec/vp9.c index 8ede2e2eb3..6f2b6f5241 100644 --- a/libavcodec/vp9.c +++ b/libavcodec/vp9.c @@ -165,7 +165,8 @@ static int update_size(AVCodecContext *avctx, int w, int h) CONFIG_VP9_NVDEC_HWACCEL + \ CONFIG_VP9_VAAPI_HWACCEL + \ CONFIG_VP9_VDPAU_HWACCEL + \ - CONFIG_VP9_VIDEOTOOLBOX_HWACCEL) + CONFIG_VP9_VIDEOTOOLBOX_HWACCEL + \ + CONFIG_VP9_NVTEGRA_HWACCEL) enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmtp = pix_fmts; VP9Context *s = avctx->priv_data; uint8_t *p; @@ -180,6 +181,10 @@ static int update_size(AVCodecContext *avctx, int w, int h) switch (s->pix_fmt) { case AV_PIX_FMT_YUV420P: +#if CONFIG_VP9_NVTEGRA_HWACCEL + *fmtp++ = AV_PIX_FMT_NVTEGRA; +#endif + /* fallthrough */ case AV_PIX_FMT_YUV420P10: #if CONFIG_VP9_DXVA2_HWACCEL *fmtp++ = AV_PIX_FMT_DXVA2_VLD; @@ -1870,6 +1875,9 @@ const FFCodec ff_vp9_decoder = { #endif #if CONFIG_VP9_VIDEOTOOLBOX_HWACCEL HWACCEL_VIDEOTOOLBOX(vp9), +#endif +#if CONFIG_VP9_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(vp9), #endif NULL }, From patchwork Thu May 30 19:43:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: averne X-Patchwork-Id: 49420 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9183:0:b0:460:55fa:d5ed with SMTP id s3csp67779vqg; Thu, 30 May 2024 12:46:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV8UNKEAe8w9LzNWif+wf6/10diOn54v+ZhFHwqRR9i6EHlcuLCrnmDd4zKu8wk7AKjvCX3PQwVK9caH4P9UidL8eJoQykDbNER6w== X-Google-Smtp-Source: AGHT+IHabaqhpxHan/4RvzmodR+B+YOC4LSXoNRpvWuvJaL635/JPaAxhlUKk5C4UWCRIXY2c4vo X-Received: by 2002:a17:907:1593:b0:a64:41ba:e7ee with SMTP id a640c23a62f3a-a65e8e563c6mr178414066b.32.1717098368398; Thu, 30 May 2024 12:46:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717098368; cv=none; d=google.com; s=arc-20160816; b=VGPmIlt9Gqug2bCAOE1qzgrWqifdWHsnUSxqRFei0O8qfs+bz1g4pOHEe+41JGAp7h oyEc8SMtILuX0G/QVEJQjgDnPMlYFOHEOiWdRXnlWBgJ1+UHzxpw+Fbq1ijy4C3UNgzK prabsi3qdfdVZSmCl+wkflccAB6G0i/gt/p2EoqIuANTyKefFFZX4O32cBoomg5Ph2G5 P67ZAlMLRFAk41/SO05mFqCwbQy+P4E5Wy2xE805pvoBGetrnKxiVUIkz0EhDz7BAo4e aBbUMotwH0BRTKn6eJod8Yl8UqWvTcJ8JmjAHautGCSkDwSUB3XrnOpak2v75VDKPUZ0 tKiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=mcqXLDG3vJpyo08w8+BKz4Ye+ZLKaNxrqrxiaM6fHPo=; fh=o4ZBG0WnuIFUfokYFX1900fRPFIkFoDCXPv5+z2b8Jo=; b=OgZogiZdgcoXXbbOsZ9Ejx+/Nx0nCcXdOYnyW36PTjPZuI5+7KSwWhg9CyuWyoL2uu UDMUTswNNkuBgCOIIl2748FX3+U2hWAWnCZak7oL5RW2UcRhNQ5kZz/1P/KqNwTGKu5g aDpZNrM0bq9HXCdS2JFG2m+gGIyK/LwHoxXJLMEOGetIeJ77/dFChU/I8XndNP7zTdNA 8yetzcdyzijMAlBmADUycV/YWM9jjh3ujw0qNIOUBQ75yovoYM0UxTFhwAVlPHSibT89 zjaQP/MIGIN6bO22PlYvQ6/DlDElXpi4xjHxiuYGjA9VTEEzE7mS0a+YLDOMKE8FvV3A IMLQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=RL2HMiac; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a67ea990f23si8669466b.639.2024.05.30.12.46.07; Thu, 30 May 2024 12:46:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=RL2HMiac; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BCEE768D56E; Thu, 30 May 2024 22:44:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2508F68D5B4 for ; Thu, 30 May 2024 22:44:46 +0300 (EEST) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-35dc1d8867eso1096077f8f.0 for ; Thu, 30 May 2024 12:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717098286; x=1717703086; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=POvf63Pr8mJHhaW0NV2uwVgHAbKuLrZQSJSv0YhHk/0=; b=RL2HMiacST2AQB+dzbLLbF0YcQ0qkF0qOH59YHN9TE7mu6OtN/aqqw7tssu2sHYxkK oaRHCC5IPdqz2PFQddRw64gK0IfCy//OWQScYi/niUQy/xGYaZkT99F3lWHulPbzJ07B h85tH/Lnsbh2db/PwTv+IpEnnGPC2SR6KlFjusopkOAh34wIWfDl7kJGf13MJPfadMkd CCu1IscaDwhMHPjypABCOkemESG+p6jEzqZsggTutcDSL4tqMheHZ1M7hfX/xiAYtqTL JRwuUs2Z8dP0Kz/+0oFbDSLHVjsRiOoxDfI2sim6awVSVhRifXFWLcyGb4bQIEiOxyZx 6rag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717098286; x=1717703086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=POvf63Pr8mJHhaW0NV2uwVgHAbKuLrZQSJSv0YhHk/0=; b=jNLr4knnQkChM7eb+nx8iwFY3gdZMH+OA6NaaPWgyrItTDxHc7QfpWuQXSHpV4KvYI 5RtIw/NjcF/A7lDxGCyUsk1rk3F3kWTWk5EDEhO5w8j16wHjm9g8gXioXdnyfEw2m0CB 81xOrgi/WLm5LpHgmJA+ONFPnJDpsCSZkOSnzJmXvs1CLCCdZUvDlZN9S77EoocSkjEr 6wXXcBTg3n2XO7QoIWxNx2eox92lZeJSBBQZsbwLAjPAYNuyfe7/X7k5mpVw3iJ1ejS2 ao6CCdUXKv8R7fAYNKc3UQosgHX6OmDAQfpP1ZLMV1Ha2uzPOAanA7qNsQvwOn4Utj8G 6rQw== X-Gm-Message-State: AOJu0Yxc221aD0i4UcR0asai9zhianQUhMBSvrFAVRGV+uYThv3LvgJ9 xeDvxoc0sBrksFlXNZ3yJVnzZHOBBKpXSH+2lNtTzHnlKUude6rwxP+UZg== X-Received: by 2002:adf:f74d:0:b0:354:f6bf:b272 with SMTP id ffacd0b85a97d-35dc00c6ecamr2342964f8f.48.1717098286221; Thu, 30 May 2024 12:44:46 -0700 (PDT) Received: from fractale.lan ([2001:861:5102:3290:f88d:fc8b:a14:3fcb]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-35dd04c0de3sm225126f8f.9.2024.05.30.12.44.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 12:44:46 -0700 (PDT) From: averne To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 21:43:18 +0200 Message-ID: <9fd169524a21cb2a4bef2673147b791c7cbc2209.1717083800.git.averne381@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 16/16] nvtegra: add mjpeg hardware decoding X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: averne Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: RryCnt65QW9Z This uses NVJPG, a hardware engine separate from NVDEC. On the tegra 210 (and possibly later hardware), it has the specificity of being unable to decode to tiled surfaces, along with some quirks that have been observed to hang the hardware. Signed-off-by: averne --- configure | 2 + libavcodec/Makefile | 1 + libavcodec/hwaccels.h | 1 + libavcodec/mjpegdec.c | 6 + libavcodec/nvtegra_mjpeg.c | 336 +++++++++++++++++++++++++++++++++++++ 5 files changed, 346 insertions(+) create mode 100644 libavcodec/nvtegra_mjpeg.c diff --git a/configure b/configure index 3fe948d9ab..1d885ed655 100755 --- a/configure +++ b/configure @@ -3219,6 +3219,8 @@ mjpeg_nvdec_hwaccel_deps="nvdec" mjpeg_nvdec_hwaccel_select="mjpeg_decoder" mjpeg_vaapi_hwaccel_deps="vaapi" mjpeg_vaapi_hwaccel_select="mjpeg_decoder" +mjpeg_nvtegra_hwaccel_deps="nvtegra" +mjpeg_nvtegra_hwaccel_select="mjpeg_decoder" mpeg1_nvdec_hwaccel_deps="nvdec" mpeg1_nvdec_hwaccel_select="mpeg1video_decoder" mpeg1_vdpau_hwaccel_deps="vdpau" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 914995558e..6a773f8d3e 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -1025,6 +1025,7 @@ OBJS-$(CONFIG_HEVC_VULKAN_HWACCEL) += vulkan_decode.o vulkan_hevc.o OBJS-$(CONFIG_HEVC_NVTEGRA_HWACCEL) += nvtegra_hevc.o OBJS-$(CONFIG_MJPEG_NVDEC_HWACCEL) += nvdec_mjpeg.o OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL) += vaapi_mjpeg.o +OBJS-$(CONFIG_MJPEG_NVTEGRA_HWACCEL) += nvtegra_mjpeg.o OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL) += nvdec_mpeg12.o OBJS-$(CONFIG_MPEG1_VDPAU_HWACCEL) += vdpau_mpeg12.o OBJS-$(CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h index a3babfc309..f5a121d23f 100644 --- a/libavcodec/hwaccels.h +++ b/libavcodec/hwaccels.h @@ -51,6 +51,7 @@ extern const struct FFHWAccel ff_hevc_nvtegra_hwaccel; extern const struct FFHWAccel ff_hevc_vulkan_hwaccel; extern const struct FFHWAccel ff_mjpeg_nvdec_hwaccel; extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel; +extern const struct FFHWAccel ff_mjpeg_nvtegra_hwaccel; extern const struct FFHWAccel ff_mpeg1_nvdec_hwaccel; extern const struct FFHWAccel ff_mpeg1_vdpau_hwaccel; extern const struct FFHWAccel ff_mpeg1_videotoolbox_hwaccel; diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c index 1481a7f285..f8b00a92d6 100644 --- a/libavcodec/mjpegdec.c +++ b/libavcodec/mjpegdec.c @@ -733,6 +733,9 @@ int ff_mjpeg_decode_sof(MJpegDecodeContext *s) #endif #if CONFIG_MJPEG_VAAPI_HWACCEL AV_PIX_FMT_VAAPI, +#endif +#if CONFIG_MJPEG_NVTEGRA_HWACCEL + AV_PIX_FMT_NVTEGRA, #endif s->avctx->pix_fmt, AV_PIX_FMT_NONE, @@ -3021,6 +3024,9 @@ const FFCodec ff_mjpeg_decoder = { #endif #if CONFIG_MJPEG_VAAPI_HWACCEL HWACCEL_VAAPI(mjpeg), +#endif +#if CONFIG_MJPEG_NVTEGRA_HWACCEL + HWACCEL_NVTEGRA(mjpeg), #endif NULL }, diff --git a/libavcodec/nvtegra_mjpeg.c b/libavcodec/nvtegra_mjpeg.c new file mode 100644 index 0000000000..9139116159 --- /dev/null +++ b/libavcodec/nvtegra_mjpeg.c @@ -0,0 +1,336 @@ +/* + * Copyright (c) 2024 averne + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "config_components.h" + +#include "avcodec.h" +#include "hwaccel_internal.h" +#include "internal.h" +#include "hwconfig.h" +#include "mjpegdec.h" +#include "decode.h" +#include "nvtegra_decode.h" + +#include "libavutil/pixdesc.h" +#include "libavutil/nvtegra_host1x.h" + +typedef struct NVTegraMJPEGDecodeContext { + FFNVTegraDecodeContext core; +} NVTegraMJPEGDecodeContext; + +static int nvtegra_mjpeg_decode_uninit(AVCodecContext *avctx) { + NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MJPEG decoder\n"); + + err = ff_nvtegra_decode_uninit(avctx, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mjpeg_decode_init(AVCodecContext *avctx) { + MJpegDecodeContext *s = avctx->priv_data; + NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + + enum AVPixelFormat fmt; + int luma, err; + + av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA MJPEG decoder\n"); + + /* Reject encodes with known hardware issues */ + if (avctx->profile != AV_PROFILE_MJPEG_HUFFMAN_BASELINE_DCT) { + av_log(avctx, AV_LOG_ERROR, "Non-baseline encoded jpegs are not supported by NVJPG\n"); + return AVERROR(EINVAL); + } + + fmt = s->avctx->pix_fmt, luma = s->comp_index[0]; + if ((fmt == AV_PIX_FMT_YUV444P || fmt == AV_PIX_FMT_YUVJ444P) + && (s->h_count[luma] != 1 || s->v_count[luma] != 1)) { + av_log(avctx, AV_LOG_ERROR, "Subsampled YUV444 is not supported by NVJPG\n"); + return AVERROR(EINVAL); + } + + ctx->core.pic_setup_off = 0; + ctx->core.status_off = FFALIGN(ctx->core.pic_setup_off + sizeof(nvjpg_dec_drv_pic_setup_s), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.cmdbuf_off = FFALIGN(ctx->core.status_off + sizeof(nvjpg_dec_status), + AV_NVTEGRA_MAP_ALIGN); + ctx->core.bitstream_off = FFALIGN(ctx->core.cmdbuf_off + AV_NVTEGRA_MAP_ALIGN, + AV_NVTEGRA_MAP_ALIGN); + ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx), + 0x1000); + + ctx->core.max_cmdbuf_size = ctx->core.slice_offsets_off - ctx->core.cmdbuf_off; + ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off; + + ctx->core.is_nvjpg = true; + + err = ff_nvtegra_decode_init(avctx, &ctx->core); + if (err < 0) + goto fail; + + return 0; + +fail: + nvtegra_mjpeg_decode_uninit(avctx); + return err; +} + +static void nvtegra_mjpeg_prepare_frame_setup(nvjpg_dec_drv_pic_setup_s *setup, MJpegDecodeContext *s, + NVTegraMJPEGDecodeContext *ctx) +{ + int input_chroma_mode, output_chroma_mode, memory_mode; + int i, j; + + switch (s->hwaccel_sw_pix_fmt) { + case AV_PIX_FMT_GRAY8: + input_chroma_mode = 0; /* Monochrome */ + output_chroma_mode = 0; /* Monochrome */ + memory_mode = 3; /* YUV420, for some reason decoding fails with NV12 */ + break; + default: + case AV_PIX_FMT_YUV420P: + case AV_PIX_FMT_YUVJ420P: + input_chroma_mode = 1; /* YUV420 */ + output_chroma_mode = 1; /* YUV420 */ + memory_mode = 0; /* NV12 */ + break; + case AV_PIX_FMT_YUV422P: + case AV_PIX_FMT_YUVJ422P: + input_chroma_mode = 2; /* YUV422H (not sure what nvidia means by that) */ + output_chroma_mode = 1; /* YUV420 */ + memory_mode = 0; /* NV12 */ + break; + case AV_PIX_FMT_YUV440P: + case AV_PIX_FMT_YUVJ440P: + input_chroma_mode = 3; /* YUV422V (ditto) */ + output_chroma_mode = 1; /* YUV420 */ + memory_mode = 0; /* NV12 */ + break; + case AV_PIX_FMT_YUV444P: + case AV_PIX_FMT_YUVJ444P: + input_chroma_mode = 4; /* YUV444 */ + output_chroma_mode = 1; /* YUV420 */ + memory_mode = 0; /* NV12 */ + break; + } + + *setup = (nvjpg_dec_drv_pic_setup_s){ + .restart_interval = s->restart_interval, + .frame_width = s->width, + .frame_height = s->height, + .mcu_width = s->mb_width, + .mcu_height = s->mb_height, + .comp = s->nb_components, + + .stream_chroma_mode = input_chroma_mode, + .output_chroma_mode = output_chroma_mode, + .output_pixel_format = 0, /* YUV */ + .output_stride_luma = s->picture->linesize[0], + .output_stride_chroma = s->picture->linesize[1], + + .tile_mode = 0, /* Pitch linear (tiled formats are unsupported by the T210) */ + .memory_mode = memory_mode, + .power2_downscale = 0, + .motion_jpeg_type = 0, /* Type A */ + + .start_mcu_x = 0, + .start_mcu_y = 0, + }; + + for (i = 0; i < 4; ++i) { + for (j = 0; j < 16; ++j) { + setup->huffTab[0][i].codeNum[j] = s->raw_huffman_lengths[0][i][j]; + setup->huffTab[1][i].codeNum[j] = s->raw_huffman_lengths[1][i][j]; + } + + memcpy(setup->huffTab[0][i].symbol, s->raw_huffman_values[0][i], sizeof(setup->huffTab[0][i].symbol)); + memcpy(setup->huffTab[1][i].symbol, s->raw_huffman_values[1][i], sizeof(setup->huffTab[1][i].symbol)); + } + + for (i = 0; i < s->nb_components; ++i) { + j = s->comp_index[i]; + setup->blkPar[j].ac = s->ac_index [i]; + setup->blkPar[j].dc = s->dc_index [i]; + setup->blkPar[j].hblock = s->h_count [i]; + setup->blkPar[j].vblock = s->v_count [i]; + setup->blkPar[j].quant = s->quant_index[i]; + } + + for (i = 0; i < 4; ++i) { + for (j = 0; j < 64; ++j) + setup->quant[i][j] = s->quant_matrixes[i][j]; + } +} + +static int nvtegra_mjpeg_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, MJpegDecodeContext *s, + NVTegraMJPEGDecodeContext *ctx, AVFrame *current_frame) +{ + FrameDecodeData *fdd = (FrameDecodeData *)current_frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + AVNVTegraMap *input_map = (AVNVTegraMap *)tf->input_map_ref->data; + + int err; + + err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVJPG); + if (err < 0) + return err; + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_SET_APPLICATION_ID, + AV_NVTEGRA_ENUM(NVE7D0_SET_APPLICATION_ID, ID, NVJPG_DECODER)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_SET_CONTROL_PARAMS, + AV_NVTEGRA_VALUE(NVE7D0_SET_CONTROL_PARAMS, DUMP_CYCLE_COUNT, 1) | + AV_NVTEGRA_VALUE(NVE7D0_SET_CONTROL_PARAMS, GPTIMER_ON, 1)); + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_SET_PICTURE_INDEX, + AV_NVTEGRA_VALUE(NVE7D0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx)); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_IN_DRV_PIC_SETUP, + input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_BITSTREAM, + input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_OUT_STATUS, + input_map, ctx->core.status_off, NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_CUR_PIC, av_nvtegra_frame_get_fbuf_map(current_frame), + 0, NVHOST_RELOC_TYPE_DEFAULT); + AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_CUR_PIC_CHROMA_U, av_nvtegra_frame_get_fbuf_map(current_frame), + current_frame->data[1] - current_frame->data[0], NVHOST_RELOC_TYPE_DEFAULT); + + AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_EXECUTE, + AV_NVTEGRA_ENUM(NVE7D0_EXECUTE, AWAKEN, ENABLE)); + + err = av_nvtegra_cmdbuf_end(cmdbuf); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mjpeg_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + MJpegDecodeContext *s = avctx->priv_data; + AVFrame *frame = s->picture; + NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + + int err; + + av_log(avctx, AV_LOG_DEBUG, "Starting MJPEG-NVTEGRA frame with pixel format %s\n", + av_get_pix_fmt_name(avctx->sw_pix_fmt)); + + err = ff_nvtegra_start_frame(avctx, frame, &ctx->core); + if (err < 0) + return err; + + return 0; +} + +static int nvtegra_mjpeg_end_frame(AVCodecContext *avctx) { + MJpegDecodeContext *s = avctx->priv_data; + NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = s->picture; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv; + + nvjpg_dec_drv_pic_setup_s *setup; + uint8_t *mem; + AVNVTegraMap *output_map; + int err; + + av_log(avctx, AV_LOG_DEBUG, "Ending MJPEG-NVTEGRA frame with %u slices -> %u bytes\n", + ctx->core.num_slices, ctx->core.bitstream_len); + + if (!tf || !ctx->core.num_slices) + return 0; + + mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data); + + setup = (nvjpg_dec_drv_pic_setup_s *)(mem + ctx->core.pic_setup_off); + setup->bitstream_offset = 0; + setup->bitstream_size = ctx->core.bitstream_len; + + err = nvtegra_mjpeg_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame); + if (err < 0) + return err; + + output_map = av_nvtegra_frame_get_fbuf_map(frame); + output_map->is_linear = true; + + return ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0); +} + +static int nvtegra_mjpeg_decode_slice(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) { + MJpegDecodeContext *s = avctx->priv_data; + NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data; + AVFrame *frame = s->picture; + FrameDecodeData *fdd = (FrameDecodeData *)frame->private_ref->data; + + FFNVTegraDecodeFrame *tf; + AVNVTegraMap *input_map; + uint8_t *mem; + + tf = fdd->hwaccel_priv; + input_map = (AVNVTegraMap *)tf->input_map_ref->data; + mem = av_nvtegra_map_get_addr(input_map); + + /* In nvtegra_mjpeg_start_frame the JFIF headers haven't been entirely parsed yet */ + nvtegra_mjpeg_prepare_frame_setup((nvjpg_dec_drv_pic_setup_s *)(mem + ctx->core.pic_setup_off), s, ctx); + + return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false); +} + +static int nvtegra_mjpeg_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) { + AVHWFramesContext *frames_ctx = (AVHWFramesContext *)hw_frames_ctx->data; + + int err; + + err = ff_nvtegra_frame_params(avctx, hw_frames_ctx); + if (err < 0) + return err; + + /* + * NVJPG1 can only decode to pitch linear surfaces, which have a + * 256b alignment requirement in VIC. + */ + frames_ctx->width = FFALIGN(frames_ctx->width, 256); + frames_ctx->height = FFALIGN(frames_ctx->height, 4); + + return 0; +} + +#if CONFIG_MJPEG_NVTEGRA_HWACCEL +const FFHWAccel ff_mjpeg_nvtegra_hwaccel = { + .p.name = "mjpeg_nvtegra", + .p.type = AVMEDIA_TYPE_VIDEO, + .p.id = AV_CODEC_ID_MJPEG, + .p.pix_fmt = AV_PIX_FMT_NVTEGRA, + .start_frame = &nvtegra_mjpeg_start_frame, + .end_frame = &nvtegra_mjpeg_end_frame, + .decode_slice = &nvtegra_mjpeg_decode_slice, + .init = &nvtegra_mjpeg_decode_init, + .uninit = &nvtegra_mjpeg_decode_uninit, + .frame_params = &nvtegra_mjpeg_frame_params, + .priv_data_size = sizeof(NVTegraMJPEGDecodeContext), + .caps_internal = HWACCEL_CAP_ASYNC_SAFE, +}; +#endif