From patchwork Fri Jan 10 21:05:21 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Lynne <dev@lynne.ee>
X-Patchwork-Id: 17281
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id DA82F44B7C7
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 10 Jan 2020 23:05:29 +0200 (EET)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B178368ADF1;
	Fri, 10 Jan 2020 23:05:29 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C6AF768AC8C
 for <ffmpeg-devel@ffmpeg.org>; Fri, 10 Jan 2020 23:05:22 +0200 (EET)
Received: from w3.tutanota.de (unknown [192.168.1.164])
 by w4.tutanota.de (Postfix) with ESMTP id A97E71060312
 for <ffmpeg-devel@ffmpeg.org>; Fri, 10 Jan 2020 21:05:21 +0000 (UTC)
Authentication-Results: w4.tutanota.de; dkim=pass (2048-bit key;
 secure) header.d=lynne.ee header.i=@lynne.ee header.b="1Edy/emv";
 dkim-atps=neutral
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1578690321;
 s=s1; d=lynne.ee;
 h=Date:From:To:Message-ID:Subject:MIME-Version:Content-Type;
 bh=cCA3Tl2IFwRZQ8IhQgBDG7sE6QLjHkT+OSB1n5junDk=;
 b=1Edy/emv4utA8HsOY1ltMwenuRAa2M0b4r0gyMIPMF5vQ00Poh3OyWO1Sge5KdYH
 TLFTecwTuQYSFK/u7k7LdxkKxNVFhqWcQduJ0xq9TnYkFB34g/V8JhM3Xr9woR2Sank
 OyxCI+wcVfSIN1Kzas7j09+CmQYuD5zbnzJD5AkpF/5KOfsos6N/fk1Ba22lA1IHJGP
 w05dsYGIVc6UBp4MfTPJNlNFo4WTosyET8qdFVLYEFzRRkVXV1AKsllAL5T9URKeAOR
 jhyE0DZ5CuQ+tuxxIV4bmflKOaInGL68JZtoSjdXLLLDR65wAemJF92trmoWY7lkK6D
 U2MbOvopzQ==
Date: Fri, 10 Jan 2020 22:05:21 +0100 (CET)
From: Lynne <dev@lynne.ee>
To: Ffmpeg Devel <ffmpeg-devel@ffmpeg.org>
Message-ID: <LyGGdQ0--3-2@lynne.ee>
MIME-Version: 1.0
Subject: [FFmpeg-devel] [PATCH] Vulkan hwcontext and filters
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Patches attached
Also pushed to https://github.com/cyanreg/FFmpeg/ master branch because they're 9 and they add about 7000 lines.
Filtering won't work without a recent glslang version since they moved a header and broke API because they felt like it.

Git log:

commit aa9f0ea2cf210234ed26df349d3a1562b7de1110
Author: Philip Langdale <philipl@overt.org>
Date:   Tue Dec 31 09:41:57 2019 -0800

    lavu/hwcontext_cuda: refactor context initialisation
   
    There's enough going on here now that it should not be duplicated
    between cuda_device_create and cuda_device_derive.

commit bb734edb10b6f1853e1f2f5735b7ffd0a1d48468
Author: Lynne <dev@lynne.ee>
Date:   Sun Oct 27 14:48:16 2019 +0000

    lavfi: add an chromaber_vulkan filter
   
    This commit adds a chromatic aberration filter for Vulkan that attempts to
    emulate a lens chromatic aberration effect.
    For a YUV frame it will instead shift the chroma channels, providing a
    simple approximation.

commit 1e3a50fbe4399f76c0ab0f62bf6d6c65b8565db4
Author: Lynne <dev@lynne.ee>
Date:   Sun Oct 27 14:47:18 2019 +0000

    lavfi: add an avgblur_vulkan filter
   
    This commit adds a fast avgblur Vulkan filter.
    This will reset Intel GPUs on Windows due to a known, year-old driver bug.

commit f4c77d10e5e2c37ec1bf305773ec94898b99a5e5
Author: Lynne <dev@lynne.ee>
Date:   Sun Oct 27 14:46:16 2019 +0000

    lavfi: add an overlay_vulkan filter
   
    This commit adds a basic, non-converting overlay filter for Vulkan.

commit 0badbf31effc16cf8f0be86f1de4fbdd029cebe4
Author: Lynne <dev@lynne.ee>
Date:   Sun Oct 27 14:45:36 2019 +0000

    lavfi: add an scale_vulkan filter
   
    This commit adds a basic, non-converting Vulkan scaling filter.

commit 04c1836f89d89dcdc892cef66ee82afbcfda9f2d
Author: Lynne <dev@lynne.ee>
Date:   Sun Oct 27 14:44:00 2019 +0000

    lavfi: add Vulkan filtering framework
   
    This commit adds a Vulkan filtering infrastructure for libavfilter.
    It attempts to abstract as much as possible of the Vulkan API from filters.
   
    The way the hwcontext and the framework are designed permits for parallel,
    non-CPU-blocking filtering throughout, with the exception of up/downloading
    and mapping.

commit e2d18e03e3a5fa8ef82159c68212b720198a9b91
Author: Philip Langdale <philipl@overt.org>
Date:   Wed Oct 23 18:11:37 2019 -0700

    lavfi/vf_hwupload: Add support for HW -> HW transfers
   
    As we find ourselves wanting a way to transfer frames between
    HW devices (or more realistically, between APIs on the same device),
    it's desirable to have a way to describe the relationship. While
    we could imagine introducing a `hwtransfer` filter, there is
    almost no difference from `hwupload`. The main new feature we need
    is a way to specify the target device. Having a single device
    for the filter chain is obviously insufficient if we're dealing
    with two devices.
   
    So let's add a way to specify the upload target device, and if none
    is specified, continue with the existing behaviour.
   
    We must also correctly preserve the sw_format on such a transfer.

commit d5f1bbc61fab452803443511b1241931169359b7
Author: Lynne <dev@lynne.ee>
Date:   Wed Aug 28 21:58:10 2019 +0100

    lavu: add Vulkan hwcontext code
   
    This commit adds the necessary code to initialize and use a Vulkan device
    within the hwcontext libavutil framework.
    Currently direct mapping to VAAPI and DRM frames is functional, and
    transfers to CUDA and native frames are supported.
   
    Lets hope the future Vulkan video decode extension fits well within this
    framework.

commit 2fefb0b7ff760f2fb019751da8c37cfd0578ef00
Author: Philip Langdale <philipl@overt.org>
Date:   Wed Oct 23 18:01:52 2019 -0700

    lavu/hwcontext: Add support for HW -> HW transfers
   
    We are beginning to consider scenarios where a given HW Context
    may be able to transfer frames to another HW Context without
    passing via system memory - this would usually be when two
    contexts represent different APIs on the same device (eg: Vulkan
    and CUDA).
   
    This is modelled as a transfer, as we have today, but where both
    the src and the dst are hardware frames with hw contexts. We need
    to be careful to ensure the contexts are compatible - particularly,
    we cannot do transfers where one of the frames has been mapped via
    a derived frames context - we can only do transfers for frames that
    were directly allocated by the specified context.
   
    Additionally, as we have two hardware contexts, the transfer function
    could be implemented by either (or indeed both). To handle this
    uncertainty, we explicitly look for ENOSYS as an indicator to try
    the transfer in the other direction before giving up.

From aa9f0ea2cf210234ed26df349d3a1562b7de1110 Mon Sep 17 00:00:00 2001
From: Philip Langdale <philipl@overt.org>
Date: Tue, 31 Dec 2019 09:41:57 -0800
Subject: [PATCH 9/9] lavu/hwcontext_cuda: refactor context initialisation

There's enough going on here now that it should not be duplicated
between cuda_device_create and cuda_device_derive.
---
 libavutil/hwcontext_cuda.c | 114 ++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 64 deletions(-)

diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
index 18abb87bbd..53142edd0a 100644
--- a/libavutil/hwcontext_cuda.c
+++ b/libavutil/hwcontext_cuda.c
@@ -336,57 +336,44 @@ error:
     return ret;
 }
 
-static int cuda_device_create(AVHWDeviceContext *device_ctx,
-                              const char *device,
-                              AVDictionary *opts, int flags)
-{
+static int cuda_context_init(AVHWDeviceContext *device_ctx, int flags) {
     AVCUDADeviceContext *hwctx = device_ctx->hwctx;
     CudaFunctions *cu;
     CUcontext dummy;
-    int ret, dev_active = 0, device_idx = 0;
+    int ret, dev_active = 0;
     unsigned int dev_flags = 0;
 
     const unsigned int desired_flags = CU_CTX_SCHED_BLOCKING_SYNC;
 
-    if (device)
-        device_idx = strtol(device, NULL, 0);
-
-    if (cuda_device_init(device_ctx) < 0)
-        goto error;
-
     cu = hwctx->internal->cuda_dl;
 
-    ret = CHECK_CU(cu->cuInit(0));
-    if (ret < 0)
-        goto error;
-
-    ret = CHECK_CU(cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx));
-    if (ret < 0)
-        goto error;
-
     hwctx->internal->flags = flags;
 
     if (flags & AV_CUDA_USE_PRIMARY_CONTEXT) {
-        ret = CHECK_CU(cu->cuDevicePrimaryCtxGetState(hwctx->internal->cuda_device, &dev_flags, &dev_active));
+        ret = CHECK_CU(cu->cuDevicePrimaryCtxGetState(hwctx->internal->cuda_device,
+                       &dev_flags, &dev_active));
         if (ret < 0)
-            goto error;
+            return ret;
 
         if (dev_active && dev_flags != desired_flags) {
             av_log(device_ctx, AV_LOG_ERROR, "Primary context already active with incompatible flags.\n");
-            goto error;
+            return AVERROR(ENOTSUP);
         } else if (dev_flags != desired_flags) {
-            ret = CHECK_CU(cu->cuDevicePrimaryCtxSetFlags(hwctx->internal->cuda_device, desired_flags));
+            ret = CHECK_CU(cu->cuDevicePrimaryCtxSetFlags(hwctx->internal->cuda_device,
+                           desired_flags));
             if (ret < 0)
-                goto error;
+                return ret;
         }
 
-        ret = CHECK_CU(cu->cuDevicePrimaryCtxRetain(&hwctx->cuda_ctx, hwctx->internal->cuda_device));
+        ret = CHECK_CU(cu->cuDevicePrimaryCtxRetain(&hwctx->cuda_ctx,
+                                                    hwctx->internal->cuda_device));
         if (ret < 0)
-            goto error;
+            return ret;
     } else {
-        ret = CHECK_CU(cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device));
+        ret = CHECK_CU(cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags,
+                                       hwctx->internal->cuda_device));
         if (ret < 0)
-            goto error;
+            return ret;
 
         CHECK_CU(cu->cuCtxPopCurrent(&dummy));
     }
@@ -397,6 +384,37 @@ static int cuda_device_create(AVHWDeviceContext *device_ctx,
     hwctx->stream = NULL;
 
     return 0;
+}
+
+static int cuda_device_create(AVHWDeviceContext *device_ctx,
+                              const char *device,
+                              AVDictionary *opts, int flags)
+{
+    AVCUDADeviceContext *hwctx = device_ctx->hwctx;
+    CudaFunctions *cu;
+    int ret, device_idx = 0;
+
+    if (device)
+        device_idx = strtol(device, NULL, 0);
+
+    if (cuda_device_init(device_ctx) < 0)
+        goto error;
+
+    cu = hwctx->internal->cuda_dl;
+
+    ret = CHECK_CU(cu->cuInit(0));
+    if (ret < 0)
+        goto error;
+
+    ret = CHECK_CU(cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx));
+    if (ret < 0)
+        goto error;
+
+    ret = cuda_context_init(device_ctx, flags);
+    if (ret < 0)
+        goto error;
+
+    return 0;
 
 error:
     cuda_device_uninit(device_ctx);
@@ -409,11 +427,7 @@ static int cuda_device_derive(AVHWDeviceContext *device_ctx,
     AVCUDADeviceContext *hwctx = device_ctx->hwctx;
     CudaFunctions *cu;
     const char *src_uuid = NULL;
-    CUcontext dummy;
-    int ret, i, device_count, dev_active = 0;
-    unsigned int dev_flags = 0;
-
-    const unsigned int desired_flags = CU_CTX_SCHED_BLOCKING_SYNC;
+    int ret, i, device_count;
 
     switch (src_ctx->type) {
 #if CONFIG_VULKAN
@@ -470,37 +484,9 @@ static int cuda_device_derive(AVHWDeviceContext *device_ctx,
         goto error;
     }
 
-    hwctx->internal->flags = flags;
-
-    if (flags & AV_CUDA_USE_PRIMARY_CONTEXT) {
-        ret = CHECK_CU(cu->cuDevicePrimaryCtxGetState(hwctx->internal->cuda_device, &dev_flags, &dev_active));
-        if (ret < 0)
-            goto error;
-
-        if (dev_active && dev_flags != desired_flags) {
-            av_log(device_ctx, AV_LOG_ERROR, "Primary context already active with incompatible flags.\n");
-            goto error;
-        } else if (dev_flags != desired_flags) {
-            ret = CHECK_CU(cu->cuDevicePrimaryCtxSetFlags(hwctx->internal->cuda_device, desired_flags));
-            if (ret < 0)
-                goto error;
-        }
-
-        ret = CHECK_CU(cu->cuDevicePrimaryCtxRetain(&hwctx->cuda_ctx, hwctx->internal->cuda_device));
-        if (ret < 0)
-            goto error;
-    } else {
-        ret = CHECK_CU(cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device));
-        if (ret < 0)
-            goto error;
-
-        CHECK_CU(cu->cuCtxPopCurrent(&dummy));
-    }
-
-    hwctx->internal->is_allocated = 1;
-
-    // Setting stream to NULL will make functions automatically use the default CUstream
-    hwctx->stream = NULL;
+    ret = cuda_context_init(device_ctx, flags);
+    if (ret < 0)
+        goto error;
 
     return 0;
 
-- 
2.25.0.rc2