From patchwork Mon Jan 2 23:21:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 39831 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:bc95:b0:ad:ade2:bfd2 with SMTP id fx21csp7685022pzb; Mon, 2 Jan 2023 15:21:55 -0800 (PST) X-Google-Smtp-Source: AMrXdXs+/n8lS4oNw28VC3kfNF4ble1bqPRN6e0yd8dmsklvmySY/fQHa+Gg9IvimHAEzJJ/R6Lg X-Received: by 2002:a05:6402:2899:b0:48b:c8de:9d20 with SMTP id eg25-20020a056402289900b0048bc8de9d20mr7869763edb.32.1672701715088; Mon, 02 Jan 2023 15:21:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672701715; cv=none; d=google.com; s=arc-20160816; b=jv6zeJzK7su6ossZqbXHCabobUsUPifs5MwbkuG0Mn/61Yng5Y7cnBDF/ZdHKFMOrd SRLvqRqQh5uHHPuhwYAVsuyOygtb7uvZdE6ijQ04tH/QIAxp7b7kI3i0Tz0p1rt416Db 51rH+TY6hAX3bAw5qgjh85sccDtN/PBvnnmqi7F2i5B3Uo8CJEXzGFv8wdmmrAjBTgLI 43L+00Rg9KXv557+2ZV5J3Zrg2yIO/G/vj1iMbZSBKj03h35UMQGYZ9iBi/K3EqXXTp0 QGrbjog5xq6cY09AmCMFHrQfKMBETSvDPpbBy+iPHYfjtLUrQMFIscS8PvS+RyfC6Kui m/rA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=MoVqWlOj4YnHomxhCo3Faq82v3DYhCEvLyvLx/qV15w=; b=CpczdayEaLYy4plBVLhETgxXK+Zf2rL9X/Ueo/NS27eEj3HUqLLMJ1tvAoq/iR/e9A sNy5UH5yjEw3i/y+fLiF385TZ8eMPUr9SLrKVuMBDZdxteRf9TQOV9632urSqQNaRC73 Ov2r+G8+xxKRqrFu84iduQ8waM3MulfutjmSzKu/gl6GnezaEAhHTeyErIoM07NERLmF XjcOcMCzLayqJwBW7Fdqtdf7RsCchk9OmOvONccO9gc6o7T50yGBZOEYucoU+HpGO3YY OZLwDwrRowkqIkuR4/1mSr26FHzbFHcImBYxRm7bne/gtTJITOZQTnCm3mD5ciGUbSJq nMhg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=qCPas6rv; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=overt.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t14-20020a056402524e00b00489d3d09accsi12396027edd.247.2023.01.02.15.21.54; Mon, 02 Jan 2023 15:21:55 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=qCPas6rv; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=overt.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5CBA168BCCE; Tue, 3 Jan 2023 01:21:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.overt.org (unknown [72.14.183.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5B2A768BC64 for ; Tue, 3 Jan 2023 01:21:45 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=overt.org; s=mail; t=1672701703; bh=PCRcjsXqTYlxUAobFxR6TnTFIfHm7oGNHfN1euyf8sM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qCPas6rvlW8jejB3Bm+pcKd8jllBLOG+4nyxuormsMVurFZFGpWOaoE0J/RmPefgJ O6rXAygcog40MQtstiQ/lA3t+VxXKYTeJJ0ZeZQuVsng7ZvpytNAwk7smBBrHhx0zR 17aGudl+pX+r+FPrFAiyGq64X4jBOVsgYgpgytSt9FokrneiF8uplNathocPZ9VWMi fShLfkQaknfRmxQHg8tD0EpaPQruf41Gzt1C1GMQY4Vkh4qVGS2ETBeIExQlggdaXG Phxgqaj9LclBgTL8Z1KISU0rQsUGpy9mtRmepQ3sJY2eC78Hz+EkQCiEzD/+/pMjL/ xm09YIHKEJs2Q== Received: from authenticated-user (mail.overt.org [72.14.183.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 9CBD962A03; Mon, 2 Jan 2023 17:21:43 -0600 (CST) From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Jan 2023 15:21:32 -0800 Message-Id: <20230102232133.729217-2-philipl@overt.org> In-Reply-To: <20230102232133.729217-1-philipl@overt.org> References: <20230102232133.729217-1-philipl@overt.org> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_cuda: declare support for argb/abgr/rgba/bgra X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Aoe+Yr1oPqsl These can be useful. Signed-off-by: Philip Langdale --- libavutil/hwcontext_cuda.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c index 5ae7711c94..22eb9f5513 100644 --- a/libavutil/hwcontext_cuda.c +++ b/libavutil/hwcontext_cuda.c @@ -45,6 +45,10 @@ static const enum AVPixelFormat supported_formats[] = { AV_PIX_FMT_YUV444P16, AV_PIX_FMT_0RGB32, AV_PIX_FMT_0BGR32, + AV_PIX_FMT_ARGB, + AV_PIX_FMT_ABGR, + AV_PIX_FMT_RGBA, + AV_PIX_FMT_BGRA, #if CONFIG_VULKAN AV_PIX_FMT_VULKAN, #endif From patchwork Mon Jan 2 23:21:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 39832 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:bc95:b0:ad:ade2:bfd2 with SMTP id fx21csp7685139pzb; Mon, 2 Jan 2023 15:22:12 -0800 (PST) X-Google-Smtp-Source: AMrXdXt3sMthbHzAEvoBnEw3ILJENHuRJIDYKeVYHTOqvWm9SoSc1S5tEU3nhsBjJzwL+CFBx7GW X-Received: by 2002:aa7:c6c2:0:b0:46c:6bdc:4116 with SMTP id b2-20020aa7c6c2000000b0046c6bdc4116mr34480473eds.33.1672701732490; Mon, 02 Jan 2023 15:22:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672701732; cv=none; d=google.com; s=arc-20160816; b=XW/ZNO2BqGrlbtDkLEKznXzcnE3Ro7yZ8FIm5n573uyspu7tOcfCiWSbRQ3Q94nH90 +A0LaY6CvzRjJivJnz5ceNDMcbsmb+cf7Ivt0yA//E4Noq26MbJhiD4hOe3I3azJu4+l hbvQGDyhFMVNDL3NNEvcAw/ek/pgkhEhSaaez2hlFupu+rwYyBLZnpcNBUx6/pPaBjGq PaF38IHDLfeI5okEC/mDIYAlLrolSx/ImSklCHKZqUuhLafz3PbQrbb5soWeITCYJDdy /AOURl/COktpVBZkTHuo8Y9m1hdgMkkZlIgOtM05UUMmJmImsvVTS4AswAemwnOL3sbx BRiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=7QCfMOIscdfJri8d60/jCfHY/h4STFDvbwtD1CNY2eo=; b=0aSR9x+EUAE+QipyjdoQRV46qpoxh0BClhEtYB+QLWSd/ig1AZIlCGS5T9Hhb1IJ+V tDHRo/RSArpd+axxeNHyYyVPS2TA5UzfB+VmIFgKs2ceVb2QS2whg42PP1voQlp0neWY yUywBzGqU0tRjARj+ABSySB4WKJFHEFdkqs5VuawhvyrfbQFnEBa2UhqVoq0JKBgUWe/ G0jJOWabE2+k79R+0JAQRolcNSpD6LJ4cjvXZvX6wLr7zVFDBj3SiVaeGHp3wqGLe326 ZQ22ug3LnBKJ9xDI0eajUpLzRyris9mhE/NGrP964EQ1gt4iM+tJ5JU0l0rGEMjVSqZH Tejg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=s+V6Klri; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=overt.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s29-20020a50ab1d000000b0046b3638938bsi25184177edc.536.2023.01.02.15.22.12; Mon, 02 Jan 2023 15:22:12 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=s+V6Klri; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=overt.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 74C9068BCE3; Tue, 3 Jan 2023 01:21:53 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.overt.org (unknown [72.14.183.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B019068BCAF for ; Tue, 3 Jan 2023 01:21:45 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=overt.org; s=mail; t=1672701704; bh=d+obwRg8GCYdClRxeg0jf/1TN4DPpJCsUej/cK0PGxk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=s+V6KlriGqTPR2WT6snrfUBDObvdTdrVtezwkOU9CozmOM+CSM7sjEzYuDiMPjvEI TtPV3/rrZv313ix5QRtmiSd9TGBj62zGqxBwltLwV6hkpSCQYBjI/T/B4YykbM7SYY CSTwgM0s0ojqExFtXYCie/b/Yom/x2NaG11MluRGiHLjXirjkVGdnjLOGsNTi3axSG diOkkYHZntCDkDdEpUbW/3w0RAIeeCiPe2DG+qcR61ZuDDMRKJKIjyBq9BX1ji75eH OVC/vKlz7Pl3chcS+hYfsA2IpfJWvnkYUKtP4o2FNm0IBDtuHMhDaKuMLOtn3edTga V3eTO6d9hCWwg== Received: from authenticated-user (mail.overt.org [72.14.183.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 0F69062A07; Mon, 2 Jan 2023 17:21:44 -0600 (CST) From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Mon, 2 Jan 2023 15:21:33 -0800 Message-Id: <20230102232133.729217-3-philipl@overt.org> In-Reply-To: <20230102232133.729217-1-philipl@overt.org> References: <20230102232133.729217-1-philipl@overt.org> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] avfilter/vf_nvoffruc: Add filter for nvidia's Optical Flow FRUC library X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gHOUbRu3Qp8d The NvOFFRUC library provides a GPU accelerated interpolation feature based on nvidia's Optical Flow functionality. It's able to provide reasonably high quality realtime interpolation to increase the frame rate of a video stream - as opposed to vf_framerate that just does a linear blend or vf_minterpolate that is anything but realtime. As interesting as that sounds, there are a lot of limitations that mean this filter is mostly just a toy. 1. The licensing is useless. The library and header and distributed as part of the Optical Flow SDK which has a proprietary EULA, so anyone wanting to build the filter must obtain the SDK for both build and runtime and the resulting binaries will be nonfree and unredistributable. 2. The NvOFFRUC.h header doesn't even compile in pure C without modification. 3. The library can only handle NV12 and "ARGB" (which realy means any single plane, four channel, 8 bit format). This means it can't help with our inevitable future dominated by 10+ bit formats. 4. The pitch handling logic in the library is very inflexiable, and it assumes that for NV12, the UV plane is contiguous with the Y plane. This actually ends up making it incompatible with nvdec output for certain frame sizes. To avoid constantly fighting edge cases, I took the brute force approach and copy the input and output frames to/from CUarrays (which the library can accept) to give me a way to ensure the correct layout is used. 5. The library is stateful in an unhelpful way. It is called with one input frame, and one output buffer and always interpolates between the passed input frame and the frame from the previous call. This both requires special handling for the first frame, and also prevents generating more than one intermediate frame. If you want to do 3x or 4x etc interpolation, this approach doesn't work. So, again, I brute forced it by treating every interpolation as a new session - calling it twice with each input frame, even if the first frame happens to be the same as the last frame we called it with. This allows us to generate as many intermediate frames as we want, but it presumably consumes more GPU resources. 6. The library always creates a `NvOFFRUC` directory with an empty log file in it in $PWD. What a niusance. But with all those caveats and limitations, it does actually work. I was able to upsample a 24fps file to 144fps (my monitor limit) with respectable results. In some situations, it starts bogging down, and I'm not entirely sure where those limits are - certainly I can see it consuming a significant percentage of GPU resources for large scaling factors. The implementation here is heavily based on vf_framerate with the blending function ripped out and replaced by NvOFFRUC. That means we have all the nice properties in terms of being able to do non-integer scaling, and downsampling via interpolation as well. Is this mergeable? No - but it was an interesting exercise and maybe folks in narrow circumstances may find some genuine use from it. Signed-off-by: Philip Langdale --- configure | 7 +- libavfilter/Makefile | 1 + libavfilter/allfilters.c | 1 + libavfilter/vf_nvoffruc.c | 644 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 650 insertions(+), 3 deletions(-) create mode 100644 libavfilter/vf_nvoffruc.c diff --git a/configure b/configure index 675dc84f56..6ea9f89f97 100755 --- a/configure +++ b/configure @@ -3691,6 +3691,7 @@ mptestsrc_filter_deps="gpl" negate_filter_deps="lut_filter" nlmeans_opencl_filter_deps="opencl" nnedi_filter_deps="gpl" +nvoffruc_filter_deps="ffnvcodec nonfree" ocr_filter_deps="libtesseract" ocv_filter_deps="libopencv" openclsrc_filter_deps="opencl" @@ -6450,9 +6451,9 @@ fi if ! disabled ffnvcodec; then ffnv_hdr_list="ffnvcodec/nvEncodeAPI.h ffnvcodec/dynlink_cuda.h ffnvcodec/dynlink_cuviddec.h ffnvcodec/dynlink_nvcuvid.h" check_pkg_config ffnvcodec "ffnvcodec >= 12.0.16.0" "$ffnv_hdr_list" "" || \ - check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.2 ffnvcodec < 12.0" "$ffnv_hdr_list" "" || \ - check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.2 ffnvcodec < 11.1" "$ffnv_hdr_list" "" || \ - check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.14 ffnvcodec < 8.2" "$ffnv_hdr_list" "" + check_pkg_config ffnvcodec "ffnvcodec >= 11.1.5.3 ffnvcodec < 12.0" "$ffnv_hdr_list" "" || \ + check_pkg_config ffnvcodec "ffnvcodec >= 11.0.10.3 ffnvcodec < 11.1" "$ffnv_hdr_list" "" || \ + check_pkg_config ffnvcodec "ffnvcodec >= 8.1.24.15 ffnvcodec < 8.2" "$ffnv_hdr_list" "" fi if enabled_all libglslang libshaderc; then diff --git a/libavfilter/Makefile b/libavfilter/Makefile index cb41ccc622..292597f3a8 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -389,6 +389,7 @@ OBJS-$(CONFIG_NOFORMAT_FILTER) += vf_format.o OBJS-$(CONFIG_NOISE_FILTER) += vf_noise.o OBJS-$(CONFIG_NORMALIZE_FILTER) += vf_normalize.o OBJS-$(CONFIG_NULL_FILTER) += vf_null.o +OBJS-$(CONFIG_NVOFFRUC_FILTER) += vf_nvoffruc.o OBJS-$(CONFIG_OCR_FILTER) += vf_ocr.o OBJS-$(CONFIG_OCV_FILTER) += vf_libopencv.o OBJS-$(CONFIG_OSCILLOSCOPE_FILTER) += vf_datascope.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 52741b60e4..84f102806e 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -368,6 +368,7 @@ extern const AVFilter ff_vf_noformat; extern const AVFilter ff_vf_noise; extern const AVFilter ff_vf_normalize; extern const AVFilter ff_vf_null; +extern const AVFilter ff_vf_nvoffruc; extern const AVFilter ff_vf_ocr; extern const AVFilter ff_vf_ocv; extern const AVFilter ff_vf_oscilloscope; diff --git a/libavfilter/vf_nvoffruc.c b/libavfilter/vf_nvoffruc.c new file mode 100644 index 0000000000..e3a9f9e553 --- /dev/null +++ b/libavfilter/vf_nvoffruc.c @@ -0,0 +1,644 @@ +/* + * Copyright (C) 2022 Philip Langdale + * Based on vf_framerate - Copyright (C) 2012 Mark Himsley + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * filter upsamples the frame rate of a source using the nvidia Optical Flow + * FRUC library. + */ + +#include +#include "libavutil/avassert.h" +#include "libavutil/cuda_check.h" +#include "libavutil/hwcontext.h" +#include "libavutil/hwcontext_cuda_internal.h" +#include "libavutil/opt.h" +#include "libavutil/pixdesc.h" + +#include "avfilter.h" +#include "filters.h" +#include "internal.h" +/* + * This cannot be distributed with the filter due to licensing. If you want to + * compile this filter, you will need to obtain it from nvidia and then fix it + * to work in a pure C environment: + * * Remove the `using namespace std;` + * * Replace the `bool *` with `void *` + */ +#include "NvOFFRUC.h" + +typedef struct FRUCContext { + const AVClass *class; + + AVCUDADeviceContext *hwctx; + AVBufferRef *device_ref; + + CUcontext cu_ctx; + CUstream stream; + CUarray c0; ///< CUarray for f0 + CUarray c1; ///< CUarray for f1 + CUarray cw; ///< CUarray for work + + AVRational dest_frame_rate; + int interp_start; ///< start of range to apply interpolation + int interp_end; ///< end of range to apply interpolation + + AVRational srce_time_base; ///< timebase of source + AVRational dest_time_base; ///< timebase of destination + + int blend_factor_max; + AVFrame *work; + enum AVPixelFormat format; + + AVFrame *f0; ///< last frame + AVFrame *f1; ///< current frame + int64_t pts0; ///< last frame pts in dest_time_base + int64_t pts1; ///< current frame pts in dest_time_base + int64_t delta; ///< pts1 to pts0 delta + int flush; ///< 1 if the filter is being flushed + int64_t start_pts; ///< pts of the first output frame + int64_t n; ///< output frame counter + + void *fruc_dl; + PtrToFuncNvOFFRUCCreate NvOFFRUCCreate; + PtrToFuncNvOFFRUCRegisterResource NvOFFRUCRegisterResource; + PtrToFuncNvOFFRUCUnregisterResource NvOFFRUCUnregisterResource; + PtrToFuncNvOFFRUCProcess NvOFFRUCProcess; + PtrToFuncNvOFFRUCDestroy NvOFFRUCDestroy; + NvOFFRUCHandle fruc; +} FRUCContext; + +#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, s->hwctx->internal->cuda_dl, x) +#define OFFSET(x) offsetof(FRUCContext, x) +#define V AV_OPT_FLAG_VIDEO_PARAM +#define F AV_OPT_FLAG_FILTERING_PARAM +#define FRAMERATE_FLAG_SCD 01 + +static const AVOption nvoffruc_options[] = { + {"fps", "required output frames per second rate", OFFSET(dest_frame_rate), AV_OPT_TYPE_VIDEO_RATE, {.str="50"}, 0, INT_MAX, V|F }, + + {"interp_start", "point to start linear interpolation", OFFSET(interp_start), AV_OPT_TYPE_INT, {.i64=15}, 0, 255, V|F }, + {"interp_end", "point to end linear interpolation", OFFSET(interp_end), AV_OPT_TYPE_INT, {.i64=240}, 0, 255, V|F }, + + {NULL} +}; + +AVFILTER_DEFINE_CLASS(nvoffruc); + +static int blend_frames(AVFilterContext *ctx, int64_t work_pts) +{ + FRUCContext *s = ctx->priv; + AVFilterLink *outlink = ctx->outputs[0]; + + CudaFunctions *cu = s->hwctx->internal->cuda_dl; + CUDA_MEMCPY2D cpy_params = {0,}; + NvOFFRUC_PROCESS_IN_PARAMS in = {0,}; + NvOFFRUC_PROCESS_OUT_PARAMS out = {0,}; + NvOFFRUC_STATUS status; + + int num_channels = s->format == AV_PIX_FMT_NV12 ? 1 : 4; + int ret; + uint64_t ignored; + + // get work-space for output frame + s->work = ff_get_video_buffer(outlink, outlink->w, outlink->h); + if (!s->work) + return AVERROR(ENOMEM); + + av_frame_copy_props(s->work, s->f0); + + cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE, + cpy_params.srcDevice = (CUdeviceptr)s->f0->data[0], + cpy_params.srcPitch = s->f0->linesize[0], + cpy_params.srcY = 0, + cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY, + cpy_params.dstArray = s->c0, + cpy_params.dstY = 0, + cpy_params.WidthInBytes = s->f0->width * num_channels, + cpy_params.Height = s->f0->height, + ret = CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream)); + if (ret < 0) + return ret; + + if (s->f0->data[1]) { + cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE, + cpy_params.srcDevice = (CUdeviceptr)s->f0->data[1], + cpy_params.srcPitch = s->f0->linesize[1], + cpy_params.srcY = 0, + cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY, + cpy_params.dstArray = s->c0, + cpy_params.dstY = s->f0->height, + cpy_params.WidthInBytes = s->f0->width * num_channels, + cpy_params.Height = s->f0->height * 0.5, + CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream)); + if (ret < 0) + return ret; + } + + cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE, + cpy_params.srcDevice = (CUdeviceptr)s->f1->data[0], + cpy_params.srcPitch = s->f1->linesize[0], + cpy_params.srcY = 0, + cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY, + cpy_params.dstArray = s->c1, + cpy_params.dstY = 0, + cpy_params.WidthInBytes = s->f1->width * num_channels, + cpy_params.Height = s->f1->height, + CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream)); + if (ret < 0) + return ret; + + if (s->f1->data[1]) { + cpy_params.srcMemoryType = CU_MEMORYTYPE_DEVICE, + cpy_params.srcDevice = (CUdeviceptr)s->f1->data[1], + cpy_params.srcPitch = s->f1->linesize[1], + cpy_params.srcY = 0, + cpy_params.dstMemoryType = CU_MEMORYTYPE_ARRAY, + cpy_params.dstArray = s->c1, + cpy_params.dstY = s->f1->height, + cpy_params.WidthInBytes = s->f1->width * num_channels, + cpy_params.Height = s->f1->height * 0.5, + CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream)); + if (ret < 0) + return ret; + } + + in.stFrameDataInput.pFrame = s->c0; + in.stFrameDataInput.nTimeStamp = s->pts0; + out.stFrameDataOutput.pFrame = s->cw, + out.stFrameDataOutput.nTimeStamp = s->pts0; + out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored; + status = s->NvOFFRUCProcess(s->fruc, &in, &out); + if (status) { + av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status); + return AVERROR(ENOSYS); + } + + in.stFrameDataInput.pFrame = s->c1; + in.stFrameDataInput.nTimeStamp = s->pts1; + out.stFrameDataOutput.pFrame = s->cw, + out.stFrameDataOutput.nTimeStamp = work_pts; + out.stFrameDataOutput.bHasFrameRepetitionOccurred = &ignored; + status = s->NvOFFRUCProcess(s->fruc, &in, &out); + if (status) { + av_log(ctx, AV_LOG_ERROR, "FRUC: Process failure: %d\n", status); + return AVERROR(ENOSYS); + } + + cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY, + cpy_params.srcArray = s->cw, + cpy_params.srcY = 0, + cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE, + cpy_params.dstDevice = (CUdeviceptr)s->work->data[0], + cpy_params.dstPitch = s->work->linesize[0], + cpy_params.dstY = 0, + cpy_params.WidthInBytes = s->work->width * num_channels, + cpy_params.Height = s->work->height, + CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream)); + if (ret < 0) + return ret; + + if (s->work->data[1]) { + cpy_params.srcMemoryType = CU_MEMORYTYPE_ARRAY, + cpy_params.srcArray = s->cw, + cpy_params.srcY = s->work->height, + cpy_params.dstMemoryType = CU_MEMORYTYPE_DEVICE, + cpy_params.dstDevice = (CUdeviceptr)s->work->data[1], + cpy_params.dstPitch = s->work->linesize[1], + cpy_params.dstY = 0, + cpy_params.WidthInBytes = s->work->width * num_channels, + cpy_params.Height = s->work->height * 0.5, + CHECK_CU(cu->cuMemcpy2DAsync(&cpy_params, s->stream)); + if (ret < 0) + return ret; + } + + return 0; +} + +static int process_work_frame(AVFilterContext *ctx) +{ + FRUCContext *s = ctx->priv; + int64_t work_pts; + int64_t interpolate, interpolate8; + int ret; + + if (!s->f1) + return 0; + if (!s->f0 && !s->flush) + return 0; + + work_pts = s->start_pts + av_rescale_q(s->n, av_inv_q(s->dest_frame_rate), s->dest_time_base); + + if (work_pts >= s->pts1 && !s->flush) + return 0; + + if (!s->f0) { + av_assert1(s->flush); + s->work = s->f1; + s->f1 = NULL; + } else { + if (work_pts >= s->pts1 + s->delta && s->flush) + return 0; + + interpolate = av_rescale(work_pts - s->pts0, s->blend_factor_max, s->delta); + interpolate8 = av_rescale(work_pts - s->pts0, 256, s->delta); + ff_dlog(ctx, "process_work_frame() interpolate: %"PRId64"/256\n", interpolate8); + if (interpolate >= s->blend_factor_max || interpolate8 > s->interp_end) { + av_log(ctx, AV_LOG_DEBUG, "Matched f0: pts %lu\n", work_pts); + s->work = av_frame_clone(s->f1); + } else if (interpolate <= 0 || interpolate8 < s->interp_start) { + av_log(ctx, AV_LOG_DEBUG, "Matched f1: pts %lu\n", work_pts); + s->work = av_frame_clone(s->f0); + } else { + av_log(ctx, AV_LOG_DEBUG, "Unmatched pts: %lu\n", work_pts); + ret = blend_frames(ctx, work_pts); + if (ret < 0) + return ret; + } + } + + if (!s->work) + return AVERROR(ENOMEM); + + s->work->pts = work_pts; + s->n++; + + return 1; +} + +static av_cold int init(AVFilterContext *ctx) +{ + FRUCContext *s = ctx->priv; + s->start_pts = AV_NOPTS_VALUE; + + // TODO: Need windows equivalent symbol loading + s->fruc_dl = dlopen("libNvOFFRUC.so", RTLD_LAZY); + if (!s->fruc_dl) { + av_log(ctx, AV_LOG_ERROR, "Failed to load FRUC: %s\n", dlerror()); + return AVERROR(EINVAL); + } + + s->NvOFFRUCCreate = (PtrToFuncNvOFFRUCCreate) + dlsym(s->fruc_dl, "NvOFFRUCCreate"); + s->NvOFFRUCRegisterResource = (PtrToFuncNvOFFRUCRegisterResource) + dlsym(s->fruc_dl, "NvOFFRUCRegisterResource"); + s->NvOFFRUCUnregisterResource = (PtrToFuncNvOFFRUCUnregisterResource) + dlsym(s->fruc_dl, "NvOFFRUCUnregisterResource"); + s->NvOFFRUCProcess = (PtrToFuncNvOFFRUCProcess) + dlsym(s->fruc_dl, "NvOFFRUCProcess"); + s->NvOFFRUCDestroy = (PtrToFuncNvOFFRUCDestroy) + dlsym(s->fruc_dl, "NvOFFRUCDestroy"); + return 0; +} + +static av_cold void uninit(AVFilterContext *ctx) +{ + FRUCContext *s = ctx->priv; + CudaFunctions *cu = s->hwctx->internal->cuda_dl; + CUcontext dummy; + + CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx)); + + if (s->fruc) { + NvOFFRUC_UNREGISTER_RESOURCE_PARAM in_param = { + .pArrResource = {s->c0, s->c1, s->cw}, + .uiCount = 1, + }; + NvOFFRUC_STATUS nv_status = s->NvOFFRUCUnregisterResource(s->fruc, &in_param); + if (nv_status) { + av_log(ctx, AV_LOG_WARNING, "Could not unregister: %d\n", nv_status); + } + s->NvOFFRUCDestroy(s->fruc); + } + if (s->c0) + CHECK_CU(cu->cuArrayDestroy(s->c0)); + if (s->c1) + CHECK_CU(cu->cuArrayDestroy(s->c1)); + if (s->cw) + CHECK_CU(cu->cuArrayDestroy(s->cw)); + + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + + if (s->fruc_dl) + dlclose(s->fruc_dl); + av_frame_free(&s->f0); + av_frame_free(&s->f1); + av_buffer_unref(&s->device_ref); +} + +static const enum AVPixelFormat supported_formats[] = { + AV_PIX_FMT_NV12, + // Actually any single plane, four channel, 8bit format will work. + AV_PIX_FMT_ARGB, + AV_PIX_FMT_ABGR, + AV_PIX_FMT_RGBA, + AV_PIX_FMT_BGRA, + AV_PIX_FMT_NONE +}; + +static int format_is_supported(enum AVPixelFormat fmt) +{ + int i; + + for (i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++) + if (supported_formats[i] == fmt) + return 1; + return 0; +} + +static int activate(AVFilterContext *ctx) +{ + int ret, status; + AVFilterLink *inlink = ctx->inputs[0]; + AVFilterLink *outlink = ctx->outputs[0]; + FRUCContext *s = ctx->priv; + AVFrame *inpicref; + int64_t pts; + + CudaFunctions *cu = s->hwctx->internal->cuda_dl; + CUcontext dummy; + + FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink); + + CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx)); + +retry: + ret = process_work_frame(ctx); + if (ret < 0) { + goto exit; + } else if (ret == 1) { + ret = ff_filter_frame(outlink, s->work); + goto exit; + } + + ret = ff_inlink_consume_frame(inlink, &inpicref); + if (ret < 0) + goto exit; + + if (inpicref) { + if (inpicref->interlaced_frame) + av_log(ctx, AV_LOG_WARNING, "Interlaced frame found - the output will not be correct.\n"); + + if (inpicref->pts == AV_NOPTS_VALUE) { + av_log(ctx, AV_LOG_WARNING, "Ignoring frame without PTS.\n"); + av_frame_free(&inpicref); + } + } + + if (inpicref) { + pts = av_rescale_q(inpicref->pts, s->srce_time_base, s->dest_time_base); + + if (s->f1 && pts == s->pts1) { + av_log(ctx, AV_LOG_WARNING, "Ignoring frame with same PTS.\n"); + av_frame_free(&inpicref); + } + } + + if (inpicref) { + av_frame_free(&s->f0); + s->f0 = s->f1; + s->pts0 = s->pts1; + + s->f1 = inpicref; + s->pts1 = pts; + s->delta = s->pts1 - s->pts0; + + if (s->delta < 0) { + av_log(ctx, AV_LOG_WARNING, "PTS discontinuity.\n"); + s->start_pts = s->pts1; + s->n = 0; + av_frame_free(&s->f0); + } + + if (s->start_pts == AV_NOPTS_VALUE) + s->start_pts = s->pts1; + + goto retry; + } + + if (ff_inlink_acknowledge_status(inlink, &status, &pts)) { + if (!s->flush) { + s->flush = 1; + goto retry; + } + ff_outlink_set_status(outlink, status, pts); + ret = 0; + goto exit; + } + + FF_FILTER_FORWARD_WANTED(outlink, inlink); + + return FFERROR_NOT_READY; + +exit: + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + return ret; +} + +static int config_input(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + FRUCContext *s = ctx->priv; + + s->srce_time_base = inlink->time_base; + s->blend_factor_max = 1 << (8 -1); + + return 0; +} + +static int config_output(AVFilterLink *outlink) +{ + AVFilterContext *ctx = outlink->src; + AVFilterLink *inlink = outlink->src->inputs[0]; + AVHWFramesContext *in_frames_ctx; + AVHWFramesContext *output_frames; + FRUCContext *s = ctx->priv; + CudaFunctions *cu; + CUcontext dummy; + CUDA_ARRAY_DESCRIPTOR desc = {0,}; + NvOFFRUC_CREATE_PARAM create_param = {0,}; + NvOFFRUC_REGISTER_RESOURCE_PARAM register_param = {0,}; + NvOFFRUC_STATUS status; + int exact; + int ret; + + ff_dlog(ctx, "config_output()\n"); + + ff_dlog(ctx, + "config_output() input time base:%u/%u (%f)\n", + ctx->inputs[0]->time_base.num,ctx->inputs[0]->time_base.den, + av_q2d(ctx->inputs[0]->time_base)); + + // make sure timebase is small enough to hold the framerate + + exact = av_reduce(&s->dest_time_base.num, &s->dest_time_base.den, + av_gcd((int64_t)s->srce_time_base.num * s->dest_frame_rate.num, + (int64_t)s->srce_time_base.den * s->dest_frame_rate.den ), + (int64_t)s->srce_time_base.den * s->dest_frame_rate.num, INT_MAX); + + av_log(ctx, AV_LOG_INFO, + "time base:%u/%u -> %u/%u exact:%d\n", + s->srce_time_base.num, s->srce_time_base.den, + s->dest_time_base.num, s->dest_time_base.den, exact); + if (!exact) { + av_log(ctx, AV_LOG_WARNING, "Timebase conversion is not exact\n"); + } + + outlink->frame_rate = s->dest_frame_rate; + outlink->time_base = s->dest_time_base; + + ff_dlog(ctx, + "config_output() output time base:%u/%u (%f) w:%d h:%d\n", + outlink->time_base.num, outlink->time_base.den, + av_q2d(outlink->time_base), + outlink->w, outlink->h); + + + av_log(ctx, AV_LOG_INFO, "fps -> fps:%u/%u\n", + s->dest_frame_rate.num, s->dest_frame_rate.den); + + /* check that we have a hw context */ + if (!inlink->hw_frames_ctx) { + av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n"); + return AVERROR(EINVAL); + } + in_frames_ctx = (AVHWFramesContext*)inlink->hw_frames_ctx->data; + s->format = in_frames_ctx->sw_format; + + if (!format_is_supported(s->format)) { + av_log(ctx, AV_LOG_ERROR, "Unsupported input format: %s\n", + av_get_pix_fmt_name(s->format)); + return AVERROR(ENOSYS); + } + + s->device_ref = av_buffer_ref(in_frames_ctx->device_ref); + if (!s->device_ref) + return AVERROR(ENOMEM); + + s->hwctx = ((AVHWDeviceContext*)s->device_ref->data)->hwctx; + s->cu_ctx = s->hwctx->cuda_ctx; + s->stream = s->hwctx->stream; + cu = s->hwctx->internal->cuda_dl; + outlink->hw_frames_ctx = av_hwframe_ctx_alloc(s->device_ref); + if (!inlink->hw_frames_ctx) + return AVERROR(ENOMEM); + + output_frames = (AVHWFramesContext*)outlink->hw_frames_ctx->data; + + output_frames->format = AV_PIX_FMT_CUDA; + output_frames->sw_format = s->format; + output_frames->width = ctx->inputs[0]->w; + output_frames->height = ctx->inputs[0]->h; + + output_frames->initial_pool_size = 3; + + ret = ff_filter_init_hw_frames(ctx, outlink, 0); + if (ret < 0) + return ret; + + ret = av_hwframe_ctx_init(outlink->hw_frames_ctx); + if (ret < 0) { + av_log(ctx, AV_LOG_ERROR, "Failed to initialise CUDA frame " + "context for output: %d\n", ret); + return ret; + } + + outlink->w = inlink->w; + outlink->h = inlink->h; + + ret = CHECK_CU(cu->cuCtxPushCurrent(s->cu_ctx)); + if (ret < 0) + return ret; + + desc.Format = CU_AD_FORMAT_UNSIGNED_INT8; + desc.Height = inlink->h * (s->format == AV_PIX_FMT_NV12 ? 1.5 : 1); + desc.Width = inlink->w; + desc.NumChannels = s->format == AV_PIX_FMT_NV12 ? 1 : 4; + ret = CHECK_CU(cu->cuArrayCreate(&s->c0, &desc)); + if (ret < 0) + goto exit; + ret = CHECK_CU(cu->cuArrayCreate(&s->c1, &desc)); + if (ret < 0) + goto exit; + ret = CHECK_CU(cu->cuArrayCreate(&s->cw, &desc)); + if (ret < 0) + goto exit; + + create_param.uiWidth = inlink->w; + create_param.uiHeight = inlink->h; + create_param.pDevice = NULL; + create_param.eResourceType = CudaResource; + create_param.eSurfaceFormat = s->format == AV_PIX_FMT_NV12 ? NV12Surface : ARGBSurface; + create_param.eCUDAResourceType = CudaResourceCuArray; + status = s->NvOFFRUCCreate(&create_param, &s->fruc); + if (status) { + av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to create: %d\n", status); + ret = AVERROR(ENOSYS); + goto exit; + } + + register_param.pArrResource[0] = s->c0; + register_param.pArrResource[1] = s->c1; + register_param.pArrResource[2] = s->cw; + register_param.uiCount = 3; + status = s->NvOFFRUCRegisterResource(s->fruc, ®ister_param); + if (status) { + av_log(ctx, AV_LOG_ERROR, "FRUC: Failed to register: %d\n", status); + ret = AVERROR(ENOSYS); + goto exit; + } + + ret = 0; +exit: + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + return ret; +} + +static const AVFilterPad framerate_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_input, + }, +}; + +static const AVFilterPad framerate_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_output, + }, +}; + +const AVFilter ff_vf_nvoffruc = { + .name = "nvoffruc", + .description = NULL_IF_CONFIG_SMALL("Upsamples progressive source to specified frame rates with nvidia FRUC"), + .priv_size = sizeof(FRUCContext), + .priv_class = &nvoffruc_class, + .init = init, + .uninit = uninit, + .activate = activate, + FILTER_INPUTS(framerate_inputs), + FILTER_OUTPUTS(framerate_outputs), + FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA), + .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE, +};