From patchwork Fri Jan 10 21:05:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 17281 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id DA82F44B7C7 for ; Fri, 10 Jan 2020 23:05:29 +0200 (EET) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B178368ADF1; Fri, 10 Jan 2020 23:05:29 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C6AF768AC8C for ; Fri, 10 Jan 2020 23:05:22 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id A97E71060312 for ; Fri, 10 Jan 2020 21:05:21 +0000 (UTC) Authentication-Results: w4.tutanota.de; dkim=pass (2048-bit key; secure) header.d=lynne.ee header.i=@lynne.ee header.b="1Edy/emv"; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1578690321; s=s1; d=lynne.ee; h=Date:From:To:Message-ID:Subject:MIME-Version:Content-Type; bh=cCA3Tl2IFwRZQ8IhQgBDG7sE6QLjHkT+OSB1n5junDk=; b=1Edy/emv4utA8HsOY1ltMwenuRAa2M0b4r0gyMIPMF5vQ00Poh3OyWO1Sge5KdYH TLFTecwTuQYSFK/u7k7LdxkKxNVFhqWcQduJ0xq9TnYkFB34g/V8JhM3Xr9woR2Sank OyxCI+wcVfSIN1Kzas7j09+CmQYuD5zbnzJD5AkpF/5KOfsos6N/fk1Ba22lA1IHJGP w05dsYGIVc6UBp4MfTPJNlNFo4WTosyET8qdFVLYEFzRRkVXV1AKsllAL5T9URKeAOR jhyE0DZ5CuQ+tuxxIV4bmflKOaInGL68JZtoSjdXLLLDR65wAemJF92trmoWY7lkK6D U2MbOvopzQ== Date: Fri, 10 Jan 2020 22:05:21 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] Vulkan hwcontext and filters X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Patches attached Also pushed to https://github.com/cyanreg/FFmpeg/ master branch because they're 9 and they add about 7000 lines. Filtering won't work without a recent glslang version since they moved a header and broke API because they felt like it. Git log: commit aa9f0ea2cf210234ed26df349d3a1562b7de1110 Author: Philip Langdale Date:   Tue Dec 31 09:41:57 2019 -0800     lavu/hwcontext_cuda: refactor context initialisation         There's enough going on here now that it should not be duplicated     between cuda_device_create and cuda_device_derive. commit bb734edb10b6f1853e1f2f5735b7ffd0a1d48468 Author: Lynne Date:   Sun Oct 27 14:48:16 2019 +0000     lavfi: add an chromaber_vulkan filter         This commit adds a chromatic aberration filter for Vulkan that attempts to     emulate a lens chromatic aberration effect.     For a YUV frame it will instead shift the chroma channels, providing a     simple approximation. commit 1e3a50fbe4399f76c0ab0f62bf6d6c65b8565db4 Author: Lynne Date:   Sun Oct 27 14:47:18 2019 +0000     lavfi: add an avgblur_vulkan filter         This commit adds a fast avgblur Vulkan filter.     This will reset Intel GPUs on Windows due to a known, year-old driver bug. commit f4c77d10e5e2c37ec1bf305773ec94898b99a5e5 Author: Lynne Date:   Sun Oct 27 14:46:16 2019 +0000     lavfi: add an overlay_vulkan filter         This commit adds a basic, non-converting overlay filter for Vulkan. commit 0badbf31effc16cf8f0be86f1de4fbdd029cebe4 Author: Lynne Date:   Sun Oct 27 14:45:36 2019 +0000     lavfi: add an scale_vulkan filter         This commit adds a basic, non-converting Vulkan scaling filter. commit 04c1836f89d89dcdc892cef66ee82afbcfda9f2d Author: Lynne Date:   Sun Oct 27 14:44:00 2019 +0000     lavfi: add Vulkan filtering framework         This commit adds a Vulkan filtering infrastructure for libavfilter.     It attempts to abstract as much as possible of the Vulkan API from filters.         The way the hwcontext and the framework are designed permits for parallel,     non-CPU-blocking filtering throughout, with the exception of up/downloading     and mapping. commit e2d18e03e3a5fa8ef82159c68212b720198a9b91 Author: Philip Langdale Date:   Wed Oct 23 18:11:37 2019 -0700     lavfi/vf_hwupload: Add support for HW -> HW transfers         As we find ourselves wanting a way to transfer frames between     HW devices (or more realistically, between APIs on the same device),     it's desirable to have a way to describe the relationship. While     we could imagine introducing a `hwtransfer` filter, there is     almost no difference from `hwupload`. The main new feature we need     is a way to specify the target device. Having a single device     for the filter chain is obviously insufficient if we're dealing     with two devices.         So let's add a way to specify the upload target device, and if none     is specified, continue with the existing behaviour.         We must also correctly preserve the sw_format on such a transfer. commit d5f1bbc61fab452803443511b1241931169359b7 Author: Lynne Date:   Wed Aug 28 21:58:10 2019 +0100     lavu: add Vulkan hwcontext code         This commit adds the necessary code to initialize and use a Vulkan device     within the hwcontext libavutil framework.     Currently direct mapping to VAAPI and DRM frames is functional, and     transfers to CUDA and native frames are supported.         Lets hope the future Vulkan video decode extension fits well within this     framework. commit 2fefb0b7ff760f2fb019751da8c37cfd0578ef00 Author: Philip Langdale Date:   Wed Oct 23 18:01:52 2019 -0700     lavu/hwcontext: Add support for HW -> HW transfers         We are beginning to consider scenarios where a given HW Context     may be able to transfer frames to another HW Context without     passing via system memory - this would usually be when two     contexts represent different APIs on the same device (eg: Vulkan     and CUDA).         This is modelled as a transfer, as we have today, but where both     the src and the dst are hardware frames with hw contexts. We need     to be careful to ensure the contexts are compatible - particularly,     we cannot do transfers where one of the frames has been mapped via     a derived frames context - we can only do transfers for frames that     were directly allocated by the specified context.         Additionally, as we have two hardware contexts, the transfer function     could be implemented by either (or indeed both). To handle this     uncertainty, we explicitly look for ENOSYS as an indicator to try     the transfer in the other direction before giving up. From aa9f0ea2cf210234ed26df349d3a1562b7de1110 Mon Sep 17 00:00:00 2001 From: Philip Langdale Date: Tue, 31 Dec 2019 09:41:57 -0800 Subject: [PATCH 9/9] lavu/hwcontext_cuda: refactor context initialisation There's enough going on here now that it should not be duplicated between cuda_device_create and cuda_device_derive. --- libavutil/hwcontext_cuda.c | 114 ++++++++++++++++--------------------- 1 file changed, 50 insertions(+), 64 deletions(-) diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c index 18abb87bbd..53142edd0a 100644 --- a/libavutil/hwcontext_cuda.c +++ b/libavutil/hwcontext_cuda.c @@ -336,57 +336,44 @@ error: return ret; } -static int cuda_device_create(AVHWDeviceContext *device_ctx, - const char *device, - AVDictionary *opts, int flags) -{ +static int cuda_context_init(AVHWDeviceContext *device_ctx, int flags) { AVCUDADeviceContext *hwctx = device_ctx->hwctx; CudaFunctions *cu; CUcontext dummy; - int ret, dev_active = 0, device_idx = 0; + int ret, dev_active = 0; unsigned int dev_flags = 0; const unsigned int desired_flags = CU_CTX_SCHED_BLOCKING_SYNC; - if (device) - device_idx = strtol(device, NULL, 0); - - if (cuda_device_init(device_ctx) < 0) - goto error; - cu = hwctx->internal->cuda_dl; - ret = CHECK_CU(cu->cuInit(0)); - if (ret < 0) - goto error; - - ret = CHECK_CU(cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx)); - if (ret < 0) - goto error; - hwctx->internal->flags = flags; if (flags & AV_CUDA_USE_PRIMARY_CONTEXT) { - ret = CHECK_CU(cu->cuDevicePrimaryCtxGetState(hwctx->internal->cuda_device, &dev_flags, &dev_active)); + ret = CHECK_CU(cu->cuDevicePrimaryCtxGetState(hwctx->internal->cuda_device, + &dev_flags, &dev_active)); if (ret < 0) - goto error; + return ret; if (dev_active && dev_flags != desired_flags) { av_log(device_ctx, AV_LOG_ERROR, "Primary context already active with incompatible flags.\n"); - goto error; + return AVERROR(ENOTSUP); } else if (dev_flags != desired_flags) { - ret = CHECK_CU(cu->cuDevicePrimaryCtxSetFlags(hwctx->internal->cuda_device, desired_flags)); + ret = CHECK_CU(cu->cuDevicePrimaryCtxSetFlags(hwctx->internal->cuda_device, + desired_flags)); if (ret < 0) - goto error; + return ret; } - ret = CHECK_CU(cu->cuDevicePrimaryCtxRetain(&hwctx->cuda_ctx, hwctx->internal->cuda_device)); + ret = CHECK_CU(cu->cuDevicePrimaryCtxRetain(&hwctx->cuda_ctx, + hwctx->internal->cuda_device)); if (ret < 0) - goto error; + return ret; } else { - ret = CHECK_CU(cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device)); + ret = CHECK_CU(cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, + hwctx->internal->cuda_device)); if (ret < 0) - goto error; + return ret; CHECK_CU(cu->cuCtxPopCurrent(&dummy)); } @@ -397,6 +384,37 @@ static int cuda_device_create(AVHWDeviceContext *device_ctx, hwctx->stream = NULL; return 0; +} + +static int cuda_device_create(AVHWDeviceContext *device_ctx, + const char *device, + AVDictionary *opts, int flags) +{ + AVCUDADeviceContext *hwctx = device_ctx->hwctx; + CudaFunctions *cu; + int ret, device_idx = 0; + + if (device) + device_idx = strtol(device, NULL, 0); + + if (cuda_device_init(device_ctx) < 0) + goto error; + + cu = hwctx->internal->cuda_dl; + + ret = CHECK_CU(cu->cuInit(0)); + if (ret < 0) + goto error; + + ret = CHECK_CU(cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx)); + if (ret < 0) + goto error; + + ret = cuda_context_init(device_ctx, flags); + if (ret < 0) + goto error; + + return 0; error: cuda_device_uninit(device_ctx); @@ -409,11 +427,7 @@ static int cuda_device_derive(AVHWDeviceContext *device_ctx, AVCUDADeviceContext *hwctx = device_ctx->hwctx; CudaFunctions *cu; const char *src_uuid = NULL; - CUcontext dummy; - int ret, i, device_count, dev_active = 0; - unsigned int dev_flags = 0; - - const unsigned int desired_flags = CU_CTX_SCHED_BLOCKING_SYNC; + int ret, i, device_count; switch (src_ctx->type) { #if CONFIG_VULKAN @@ -470,37 +484,9 @@ static int cuda_device_derive(AVHWDeviceContext *device_ctx, goto error; } - hwctx->internal->flags = flags; - - if (flags & AV_CUDA_USE_PRIMARY_CONTEXT) { - ret = CHECK_CU(cu->cuDevicePrimaryCtxGetState(hwctx->internal->cuda_device, &dev_flags, &dev_active)); - if (ret < 0) - goto error; - - if (dev_active && dev_flags != desired_flags) { - av_log(device_ctx, AV_LOG_ERROR, "Primary context already active with incompatible flags.\n"); - goto error; - } else if (dev_flags != desired_flags) { - ret = CHECK_CU(cu->cuDevicePrimaryCtxSetFlags(hwctx->internal->cuda_device, desired_flags)); - if (ret < 0) - goto error; - } - - ret = CHECK_CU(cu->cuDevicePrimaryCtxRetain(&hwctx->cuda_ctx, hwctx->internal->cuda_device)); - if (ret < 0) - goto error; - } else { - ret = CHECK_CU(cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device)); - if (ret < 0) - goto error; - - CHECK_CU(cu->cuCtxPopCurrent(&dummy)); - } - - hwctx->internal->is_allocated = 1; - - // Setting stream to NULL will make functions automatically use the default CUstream - hwctx->stream = NULL; + ret = cuda_context_init(device_ctx, flags); + if (ret < 0) + goto error; return 0; -- 2.25.0.rc2