From patchwork Sat Sep 10 08:16:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Arzumanyan X-Patchwork-Id: 37829 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:139a:b0:8f:1db5:eae2 with SMTP id w26csp1369080pzh; Sat, 10 Sep 2022 01:16:59 -0700 (PDT) X-Google-Smtp-Source: AA6agR4WEwEaIB2ZUzjmlhkrS/lcVEAOUWyat94i5Je3kDhc1mlyAXqtX5XyG71z5Ug//qTuC3jJ X-Received: by 2002:a05:6402:501d:b0:443:1c7:ccb9 with SMTP id p29-20020a056402501d00b0044301c7ccb9mr14422746eda.101.1662797819591; Sat, 10 Sep 2022 01:16:59 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id z1-20020a50cd01000000b004477568d7a4si2390449edi.215.2022.09.10.01.16.57; Sat, 10 Sep 2022 01:16:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@Nvidia.com header.s=selector2 header.b=D4hDH1Bp; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0F7FA68BB14; Sat, 10 Sep 2022 11:16:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2044.outbound.protection.outlook.com [40.107.243.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BEA5668B44B for ; Sat, 10 Sep 2022 11:16:45 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CZBfDaL0ewfrymMEgFET+VPyc9MqzeBYYWcFIJOc4+qv1T6imS/dbown+kzJ9B/BIRfi4AUjGUjo/NXWDg3rXPXOicZNvPiNcAvoqW/JbJLrcgd0a8p3UApAX9FYESWsyzXZepdWRvKrW5eV8QyS0ehSj2T9sk60Cjuzdoklj/sffjRbkzPQFZpP1Mzki7O4tOGhHO5AXEIDQmU9MjHuZDkp4uoIEuCJAw3ht7djsINR5mz/6ParEhO/bjEDptbypnwQnY903o/lWgU9wFu9krfPZ7TJitQiIlLQLPM7nfYSVjpTZNXjwNdwfOBENCHVEzLVjCjOO2H1txB3KwZ6qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9mKEMihOdYpCmICCx0Gr+BtDIR+9WughQao0c8kl9C0=; b=baj4S8xJDRY5/K84D8svr4B5ztfP5Nk/k2S/DZ4yUxpjZ+rDiHIE9BG+2bOY6VmMIRwBd4xQzXCAMAt5XPqY/2MDq/M0V3nd0ejiQnKMCtJ+5b5wPEXVG6bSVmOMPF5LrTgO6II9KY8rRz9q/61RHO04Z8sHtBsNonAHoNnDzAotw1XnhsQt5Ea0DxRMbXhPjqrxnXAMjmnwTn92bxtYyk7KNOLly6L8tD7gMsK12x0a/UD6VZCs11Q7KNbD3TNLZo/J72rjRLw9ieK7w5Q50N+UnLjzyS3PftohCywmxsZsIrbnkbeMuPoHYAeOBysD+YiYi8C5VqmH+dlFlsNxsw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9mKEMihOdYpCmICCx0Gr+BtDIR+9WughQao0c8kl9C0=; b=D4hDH1Bp1QVTP1OZOM+E4XQ9HPDmoEbFSodIdiITz2fbho8yhUQLJ2unmENUhBxQace/ocF7ZEIixuMfspGE0EFaki/gf4nhtLyqXw/UruGCOhMyAzuUprDpMhsIRiQlgxSpi7uhVmCSL1aLfsgfnmnRo4tD1TnsSOjujpS3qmpWGNstgz2sd+X5TDc0mWmHAzowIEHsABM77ll9p9c1QRjbg36537LKWvQG9nnAUyLREhGlwGKaYFBElSmc9mMIG5dVy8mkxQYQIIDf6sJxW45/cq4wnJcknHQ7Bd8bYFL8YnGC3JETDsN+z0t4N5n19b6f3dEcdigDpuUnwdrNPg== Received: from PH7PR12MB5831.namprd12.prod.outlook.com (2603:10b6:510:1d6::13) by MN0PR12MB6368.namprd12.prod.outlook.com (2603:10b6:208:3d2::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.16; Sat, 10 Sep 2022 08:16:40 +0000 Received: from PH7PR12MB5831.namprd12.prod.outlook.com ([fe80::dc19:2d25:8ca7:60ca]) by PH7PR12MB5831.namprd12.prod.outlook.com ([fe80::dc19:2d25:8ca7:60ca%4]) with mapi id 15.20.5612.020; Sat, 10 Sep 2022 08:16:40 +0000 From: Roman Arzumanyan To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH] libavfilter/vf_colorrange_cuda: CUDA-accelerated video filter for MPEG and JPEG color range conversions Thread-Index: AQHYxOyhXyavxE/bjU2gpL6pjmNvcQ== Date: Sat, 10 Sep 2022 08:16:39 +0000 Message-ID: Accept-Language: ru-RU, en-US Content-Language: ru-RU X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PH7PR12MB5831:EE_|MN0PR12MB6368:EE_ x-ms-office365-filtering-correlation-id: 6c770394-ad2a-4f91-20c7-08da9304c9a0 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: M80ikAWQ5r0B9K87lMNG0ZbhV/7F5Yk39OnNPSH+DNYNYkz111wW/nGLdyMUCq1SGbyoSuIBKBipMg+RBp4lDxB2uOAxL9Zj0XIwgToI9wK92PdvHYJM9safxArJ+cbPU0AhG0Ayhczi+Cx0O1pT/ftnYR/OI2i8C4dsddL6YYbT1ac4nk4S0y10bDZkxYhdzuqdkvW/EG1K9jlIFZm9QdlaFszR7GDfqUDHtK/IC8h7ydPJJ8vJEr0QO/KIsPG9rJ33DlCx2xC0OSvG0BJPoVgiihphgwYd5WMwMnUQ5JHr4FmcsW4aclYPnL2xyLBVsurDf+HUV4tmKkgxAV4cIxOgbwJyHIldJTGpevulQ3lXshOudLUc2LASrMMeDqraik4Gjf+Qx2sPhcZ3aOTOcTU32o/sE9eqmSquUHBsS4YZuhGOyRKILUiaTm/rTkXAEhvcHV+pogJq0RC0HsC6wH60c+9/PFS8k1k2HlnE5zeg3kuFy79XQwftCdOrZiNcAPgPQ6wKsgYJYaQnyDJZ9bBQCA0lv8ij5A9xcVZG4BVXW127cApD2fiNmRPRA241hgNjHSxE5xfXRmHc2DyggZYl7ypt6QoPwOMfcECqDBzrCJhUf6iFRGbPcmTWmZbQEB+Tdbf5JILLCk7fIp4mJWwGR+4D5s6YIZGdmdshXxTbK6Pc/+VCezl5+T4XJKFuaVqmlL2mqAsqHnr9a9GGIVP9kImi5hVLdDicBQBjqB/uzMADZwsr+5/SuLkpBs0Ws8o/IOE8PCgj6qW2rdUXhQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR12MB5831.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(396003)(39860400002)(136003)(376002)(346002)(366004)(7696005)(26005)(38070700005)(6506007)(86362001)(107886003)(122000001)(478600001)(41300700001)(9686003)(52536014)(186003)(99936003)(38100700002)(71200400001)(33656002)(8936002)(5660300002)(4326008)(55016003)(2906002)(558084003)(76116006)(66946007)(66446008)(91956017)(66476007)(66556008)(8676002)(316002)(64756008)(19627405001)(6916009)(54906003); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?koi8-r?b?WWUraWVJd1prU0MrZ0x2bE9M?= =?koi8-r?b?OW43V3luZUw4SmlrdkFHNUVuSWFtVTJmL3NYcDdPN0FnYVBHZnlRQjROUVV2SW5x?= =?koi8-r?b?bC9SMFJmM3BwYlVhKzBhM0tNcFpWUDZhWnBwQjd5VE85SngxcGJBS2ZkODRmWTc2?= =?koi8-r?b?d0l1TjVESU1RaHFiaHdQcU9hc0U4QmQ4SUxEek1XVTd3WkRaTFRHdldCa0ZQWXpR?= =?koi8-r?b?ZGp2clFreGdBZEZXc0R4OW5GYkhiRkhRRzF1ODJUcE5zbDVybXZtaXh3S0UwbWNS?= =?koi8-r?b?d1NQQW0rb1Q2UUQ3SUg5dzJ2MUhqY1krQngwRVYyR2twdXk5V2F1UXlnZyt6RjBT?= =?koi8-r?b?a0JCamx0NU5yV0hoRGt6eWtGZEdwRldDMThvQzFqUkM3VGtPM3kwVnNGbDVRS2tR?= =?koi8-r?b?RnZJYXF1b0JNUTRORS9ERTY0b0xzeXlxVk1TdGQ2ZWltTlEyYXlDdlVlMG5wdlQw?= =?koi8-r?b?b29XM0JEbzhyU3UzZWY4VWJoZEI3dytVQWpaR0xkV0ZobjhwNFQzbUNISnU4ajdJ?= =?koi8-r?b?VTQ0VllqREVpcTN0NndlY3hoMUpySW1iZjY0WUJGaVUrZjhBQ1NVUEsranJyYU9E?= =?koi8-r?b?MHZGWkxGZ0xyYmZvSmxNT0pLWlNzbnRRa2tVK2JyQiswdHU4eGlUQWtzSkFSS2Rr?= =?koi8-r?b?S3grWlNDRGpvR0Z2YUh3aS81Y21HSFh0TDRsOXZzTE5YUHFXY2hCUnhVckJzTkdp?= =?koi8-r?b?Q3ZmMU5XaGM1bWVpT3NhVEU5R3NZTGRqcEFYTzl6dnUxZFRIN1VEZit2eDE1TEgz?= =?koi8-r?b?OWhTd3dweGprRklxQk9jdThkQmVkdkxQTENCaXdxTHduQUlROXZvOFVyeSswZUVT?= =?koi8-r?b?WjRNNWpGOXNxRGN5V0c4dVZwc2piNHJEeEcwYWVBVlo1cFJvTWRqNHJjR21iRHpL?= =?koi8-r?b?QlBuZ3Z0cFBzRENDeFBUdElPMFdrUWxieHBvTXRwNXd4aXFJWnNGUGQ5NzNCU21a?= =?koi8-r?b?SXZuSnJFU3BLbzM0dmNvYnZHYTJoUWo5VDF5YktqbW1XbHBBcWViRjJiRnN4Z1JD?= =?koi8-r?b?UFJjb1Mzays5eUw3U0dZTXJtOGtjM2ZjUDdRdCt4OXJhWDVKa3Iyb2dDbEQ4cTBa?= =?koi8-r?b?eVkwbGtGbVRudUdyT0R1OFpBbTdZSmZKOHpCTTlydjk0NDk3ZTBlczd2KytuRmZm?= =?koi8-r?b?dVZMUnZqS2ZvYlR2eG5sZFpoWUlUWGpvdmVqVXR0dm5EL3o0RkZnb0VHb3dOTUZJ?= =?koi8-r?b?Q2ZQZTlDNlJYNFBaSStQejlZVnI5Sk1yZWptNWZpQko3VHdvVS8va1RENWx0YkFk?= =?koi8-r?b?RmdqcWU4OXBpY1JqTFF2NkpKVjAxK2lVSGYrMW5ES21QLzJxSUd6SVNNYmFyK0Jv?= =?koi8-r?b?Y3dubzRJN2diY3dEMlhacWFYMFliZ3VPUUNCd1Nma3VOWHBRYUx2OTF3ZDk0MHF6?= =?koi8-r?b?YWZGc2JpQjJxd0JnWk5NbTA3V2FZN3gxVW0rNVNXcVZtcjRWSVBQRElURWZsQlMv?= =?koi8-r?b?L29VL0ZISnM4ZmlKVjk0aU5jWGQvYWZnanVSM0c2M2dvMGY4NENBZ3FZaUZJYmM1?= =?koi8-r?b?Vk1Idm1aeHkvVUc5WHgrQkVtNUhMUlIyUVdmdGtMTmY4ZmtkeSt6VXdwZk1XUW5t?= =?koi8-r?b?amZaTGI3bHhFdEpPVjZneU9xSkdWK0VZRENzQ2JyaUZHRnYvWm1GQnR2V0xHTkp6?= =?koi8-r?b?WXM0b21zM1NaT295OGFZTU5acDE4M0c3VjZxVVZSTzZjWUFJUnpyc3RITStrcnpK?= =?koi8-r?b?MFpzRjk3TUpyczR3WTE1d2lUVENkVkRWTVVFWFhkWmswSElnZk8reFl4YXpleFdk?= =?koi8-r?b?M1RRSmd1SWQ0Nk1DenhDM0p4aDhvK21VanlxaHNPSGIxYjRrTFVySUIrYWVSbDRI?= =?koi8-r?b?MnhyT2tUMkVIMVprUVZhdEF2NHJFSjU0MTh0ZVlSVWtVNVZLUFFHeFRmZG9xWmdM?= =?koi8-r?b?T0J1U3RxbFZjSkJDbjMvbEo0bFZNQWVFa25ZL2dTYjFETE9BQTJyK2FwM1NOd29h?= =?koi8-r?b?ckU2RlhlRVNiOEdvUnJXWWlvUmlaQjNBUTRyb3dqUGkzaUxmdE15S1NrMDNNRzZl?= =?koi8-r?b?QnNuYUIxNHdsMXJ2OGxSSjNzV3pDVlFKcnJHMWlSMXhQaSs1dithT0xLbUJjWg==?= MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB5831.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6c770394-ad2a-4f91-20c7-08da9304c9a0 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Sep 2022 08:16:39.7745 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: IODEYJh3sBcTmCdohw+WOWOj8qzC0Md2m65tZCZuxrVLTlpVMKEjWSertR1P4bU2sJVCJrgWdmW6VBdORV4WcQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6368 X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH] libavfilter/vf_colorrange_cuda: CUDA-accelerated video filter for MPEG and JPEG color range conversions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Yogender Gupta , timo , Sven Middelberg , Hermann Held Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ucOpPycLAHz8 Hello, This patch adds video filter which does color range conversion similar to swscale scaling filter. How to use it: ./ffmpeg \ -hwaccel cuda -hwaccel_output_format cuda \ -i /path/to/intput/file.mp4 \ -vf colorrange_cuda=range=mpeg \ -c:v h264_nvenc \ -y /path/to/output/file.mp4 From 2b15d8a609a12d97b1ba7500c7f8771b336e2fdf Mon Sep 17 00:00:00 2001 From: Roman Arzumanyan Date: Sat, 10 Sep 2022 11:05:56 +0300 Subject: [PATCH] libavfilter/vf_colorrange_cuda CUDA-accelerated color range conversion filter --- configure | 2 + libavfilter/Makefile | 3 + libavfilter/allfilters.c | 1 + libavfilter/vf_colorrange_cuda.c | 432 ++++++++++++++++++++++++++++++ libavfilter/vf_colorrange_cuda.cu | 93 +++++++ 5 files changed, 531 insertions(+) create mode 100644 libavfilter/vf_colorrange_cuda.c create mode 100644 libavfilter/vf_colorrange_cuda.cu diff --git a/configure b/configure index 9d6457d81b..e5f9738ad1 100755 --- a/configure +++ b/configure @@ -3155,6 +3155,8 @@ transpose_npp_filter_deps="ffnvcodec libnpp" overlay_cuda_filter_deps="ffnvcodec" overlay_cuda_filter_deps_any="cuda_nvcc cuda_llvm" sharpen_npp_filter_deps="ffnvcodec libnpp" +colorrange_cuda_filter_deps="ffnvcodec" +colorrange_cuda_filter_deps_any="cuda_nvcc cuda_llvm" amf_deps_any="libdl LoadLibrary" nvenc_deps="ffnvcodec" diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 30cc329fb6..784e154d81 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -230,6 +230,9 @@ OBJS-$(CONFIG_COLORMAP_FILTER) += vf_colormap.o OBJS-$(CONFIG_COLORMATRIX_FILTER) += vf_colormatrix.o OBJS-$(CONFIG_COLORSPACE_FILTER) += vf_colorspace.o colorspacedsp.o OBJS-$(CONFIG_COLORTEMPERATURE_FILTER) += vf_colortemperature.o +OBJS-$(CONFIG_COLORRANGE_CUDA_FILTER) += vf_colorrange_cuda.o \ + vf_colorrange_cuda.ptx.o \ + cuda/load_helper.o OBJS-$(CONFIG_CONVOLUTION_FILTER) += vf_convolution.o OBJS-$(CONFIG_CONVOLUTION_OPENCL_FILTER) += vf_convolution_opencl.o opencl.o \ opencl/convolution.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 5ebacfde27..5e9cbe57ec 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -213,6 +213,7 @@ extern const AVFilter ff_vf_colormap; extern const AVFilter ff_vf_colormatrix; extern const AVFilter ff_vf_colorspace; extern const AVFilter ff_vf_colortemperature; +extern const AVFilter ff_vf_colorrange_cuda; extern const AVFilter ff_vf_convolution; extern const AVFilter ff_vf_convolution_opencl; extern const AVFilter ff_vf_convolve; diff --git a/libavfilter/vf_colorrange_cuda.c b/libavfilter/vf_colorrange_cuda.c new file mode 100644 index 0000000000..949e7d3bbf --- /dev/null +++ b/libavfilter/vf_colorrange_cuda.c @@ -0,0 +1,432 @@ +/* + * Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include + +#include "libavutil/avstring.h" +#include "libavutil/common.h" +#include "libavutil/cuda_check.h" +#include "libavutil/hwcontext.h" +#include "libavutil/hwcontext_cuda_internal.h" +#include "libavutil/internal.h" +#include "libavutil/opt.h" +#include "libavutil/pixdesc.h" + +#include "avfilter.h" +#include "formats.h" +#include "internal.h" +#include "scale_eval.h" +#include "video.h" + +#include "cuda/load_helper.h" + +static const enum AVPixelFormat supported_formats[] = { + AV_PIX_FMT_NV12, + AV_PIX_FMT_YUV420P, + AV_PIX_FMT_YUV444P, +}; + +#define DIV_UP(a, b) (((a) + (b)-1) / (b)) +#define BLOCKX 32 +#define BLOCKY 16 + +#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, s->hwctx->internal->cuda_dl, x) + +typedef struct CUDAConvContext { + const AVClass* class; + + AVCUDADeviceContext* hwctx; + AVBufferRef* frames_ctx; + AVFrame* own_frame; + AVFrame* tmp_frame; + + CUcontext cu_ctx; + CUstream cu_stream; + CUmodule cu_module; + CUfunction cu_convert[AVCOL_RANGE_NB]; + + enum AVPixelFormat pix_fmt; + enum AVColorRange range; + + int num_planes; +} CUDAConvContext; + +static av_cold int cudaconv_init(AVFilterContext* ctx) +{ + CUDAConvContext* s = ctx->priv; + + s->own_frame = av_frame_alloc(); + if (!s->own_frame) + return AVERROR(ENOMEM); + + s->tmp_frame = av_frame_alloc(); + if (!s->tmp_frame) + return AVERROR(ENOMEM); + + return 0; +} + +static av_cold void cudaconv_uninit(AVFilterContext* ctx) +{ + CUDAConvContext* s = ctx->priv; + + if (s->hwctx && s->cu_module) { + CudaFunctions* cu = s->hwctx->internal->cuda_dl; + CUcontext dummy; + + CHECK_CU(cu->cuCtxPushCurrent(s->hwctx->cuda_ctx)); + CHECK_CU(cu->cuModuleUnload(s->cu_module)); + s->cu_module = NULL; + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + } + + av_frame_free(&s->own_frame); + av_buffer_unref(&s->frames_ctx); + av_frame_free(&s->tmp_frame); +} + +static av_cold int init_hwframe_ctx(CUDAConvContext* s, AVBufferRef* device_ctx, + int width, int height) +{ + AVBufferRef* out_ref = NULL; + AVHWFramesContext* out_ctx; + int ret; + + out_ref = av_hwframe_ctx_alloc(device_ctx); + if (!out_ref) + return AVERROR(ENOMEM); + + out_ctx = (AVHWFramesContext*)out_ref->data; + + out_ctx->format = AV_PIX_FMT_CUDA; + out_ctx->sw_format = s->pix_fmt; + out_ctx->width = FFALIGN(width, 32); + out_ctx->height = FFALIGN(height, 32); + + ret = av_hwframe_ctx_init(out_ref); + if (ret < 0) + goto fail; + + av_frame_unref(s->own_frame); + ret = av_hwframe_get_buffer(out_ref, s->own_frame, 0); + if (ret < 0) + goto fail; + + s->own_frame->width = width; + s->own_frame->height = height; + + av_buffer_unref(&s->frames_ctx); + s->frames_ctx = out_ref; + + return 0; +fail: + av_buffer_unref(&out_ref); + return ret; +} + +static int format_is_supported(enum AVPixelFormat fmt) +{ + for (int i = 0; i < FF_ARRAY_ELEMS(supported_formats); i++) + if (fmt == supported_formats[i]) + return 1; + + return 0; +} + +static av_cold int init_processing_chain(AVFilterContext* ctx, int width, + int height) +{ + CUDAConvContext* s = ctx->priv; + AVHWFramesContext* in_frames_ctx; + + int ret; + + if (!ctx->inputs[0]->hw_frames_ctx) { + av_log(ctx, AV_LOG_ERROR, "No hw context provided on input\n"); + return AVERROR(EINVAL); + } + + in_frames_ctx = (AVHWFramesContext*)ctx->inputs[0]->hw_frames_ctx->data; + s->pix_fmt = in_frames_ctx->sw_format; + + if (!format_is_supported(s->pix_fmt)) { + av_log(ctx, AV_LOG_ERROR, "Unsupported pixel format: %s\n", + av_get_pix_fmt_name(s->pix_fmt)); + return AVERROR(ENOSYS); + } + + s->num_planes = av_pix_fmt_count_planes(s->pix_fmt); + + ret = init_hwframe_ctx(s, in_frames_ctx->device_ref, width, height); + if (ret < 0) + return ret; + + ctx->outputs[0]->hw_frames_ctx = av_buffer_ref(s->frames_ctx); + if (!ctx->outputs[0]->hw_frames_ctx) + return AVERROR(ENOMEM); + + return 0; +} + +static av_cold int cudaconv_load_functions(AVFilterContext* ctx) +{ + CUDAConvContext* s = ctx->priv; + CUcontext dummy, cuda_ctx = s->hwctx->cuda_ctx; + CudaFunctions* cu = s->hwctx->internal->cuda_dl; + int ret; + + extern const unsigned char ff_vf_colorrange_cuda_ptx_data[]; + extern const unsigned int ff_vf_colorrange_cuda_ptx_len; + + ret = CHECK_CU(cu->cuCtxPushCurrent(cuda_ctx)); + if (ret < 0) + return ret; + + ret = ff_cuda_load_module(ctx, s->hwctx, &s->cu_module, + ff_vf_colorrange_cuda_ptx_data, + ff_vf_colorrange_cuda_ptx_len); + if (ret < 0) + goto fail; + + ret = CHECK_CU(cu->cuModuleGetFunction( + &s->cu_convert[AVCOL_RANGE_MPEG], s->cu_module, + "to_mpeg_cuda")); + + if (ret < 0) + goto fail; + + ret = CHECK_CU(cu->cuModuleGetFunction( + &s->cu_convert[AVCOL_RANGE_JPEG], s->cu_module, + "to_jpeg_cuda")); + + if (ret < 0) + goto fail; + +fail: + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + return ret; +} + +static av_cold int cudaconv_config_props(AVFilterLink* outlink) +{ + AVFilterContext* ctx = outlink->src; + AVFilterLink* inlink = outlink->src->inputs[0]; + CUDAConvContext* s = ctx->priv; + AVHWFramesContext* frames_ctx = + (AVHWFramesContext*)inlink->hw_frames_ctx->data; + AVCUDADeviceContext* device_hwctx = frames_ctx->device_ctx->hwctx; + int ret; + + s->hwctx = device_hwctx; + s->cu_stream = s->hwctx->stream; + + outlink->w = inlink->w; + outlink->h = inlink->h; + + ret = init_processing_chain(ctx, inlink->w, inlink->h); + if (ret < 0) + return ret; + + if (inlink->sample_aspect_ratio.num) { + outlink->sample_aspect_ratio = av_mul_q( + (AVRational){outlink->h * inlink->w, outlink->w * inlink->h}, + inlink->sample_aspect_ratio); + } else { + outlink->sample_aspect_ratio = inlink->sample_aspect_ratio; + } + + ret = cudaconv_load_functions(ctx); + if (ret < 0) + return ret; + + return ret; +} + +static int conv_cuda_convert(AVFilterContext* ctx, AVFrame* out, AVFrame* in) +{ + CUDAConvContext* s = ctx->priv; + CudaFunctions* cu = s->hwctx->internal->cuda_dl; + CUcontext dummy, cuda_ctx = s->hwctx->cuda_ctx; + int ret; + + ret = CHECK_CU(cu->cuCtxPushCurrent(cuda_ctx)); + if (ret < 0) + return ret; + + out->color_range = s->range; + + for (int i = 0; i < s->num_planes; i++) { + int width = in->width, height = in->height, comp_id = (i > 0); + + switch (s->pix_fmt) { + case AV_PIX_FMT_YUV444P: + break; + case AV_PIX_FMT_YUV420P: + width = comp_id ? in->width / 2 : in->width; + case AV_PIX_FMT_NV12: + height = comp_id ? in->height / 2 : in->height; + break; + default: + return AVERROR(ENOSYS); + } + + if (in->color_range != out->color_range) { + void* args[] = {&in->data[i], &out->data[i], &in->linesize[i], + &comp_id}; + ret = CHECK_CU(cu->cuLaunchKernel( + s->cu_convert[out->color_range], DIV_UP(width, BLOCKX), + DIV_UP(height, BLOCKY), 1, BLOCKX, BLOCKY, 1, 0, s->cu_stream, + args, NULL)); + } else { + av_hwframe_transfer_data(out, in, 0); + } + } + + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + return ret; +} + +static int cudaconv_conv(AVFilterContext* ctx, AVFrame* out, AVFrame* in) +{ + CUDAConvContext* s = ctx->priv; + AVFilterLink* outlink = ctx->outputs[0]; + AVFrame* src = in; + int ret; + + ret = conv_cuda_convert(ctx, s->own_frame, src); + if (ret < 0) + return ret; + + src = s->own_frame; + ret = av_hwframe_get_buffer(src->hw_frames_ctx, s->tmp_frame, 0); + if (ret < 0) + return ret; + + av_frame_move_ref(out, s->own_frame); + av_frame_move_ref(s->own_frame, s->tmp_frame); + + s->own_frame->width = outlink->w; + s->own_frame->height = outlink->h; + + ret = av_frame_copy_props(out, in); + if (ret < 0) + return ret; + + return 0; +} + +static int cudaconv_filter_frame(AVFilterLink* link, AVFrame* in) +{ + AVFilterContext* ctx = link->dst; + CUDAConvContext* s = ctx->priv; + AVFilterLink* outlink = ctx->outputs[0]; + CudaFunctions* cu = s->hwctx->internal->cuda_dl; + + AVFrame* out = NULL; + CUcontext dummy; + int ret = 0; + + out = av_frame_alloc(); + if (!out) { + ret = AVERROR(ENOMEM); + goto fail; + } + + ret = CHECK_CU(cu->cuCtxPushCurrent(s->hwctx->cuda_ctx)); + if (ret < 0) + goto fail; + + ret = cudaconv_conv(ctx, out, in); + + CHECK_CU(cu->cuCtxPopCurrent(&dummy)); + if (ret < 0) + goto fail; + + av_reduce(&out->sample_aspect_ratio.num, &out->sample_aspect_ratio.den, + (int64_t)in->sample_aspect_ratio.num * outlink->h * link->w, + (int64_t)in->sample_aspect_ratio.den * outlink->w * link->h, + INT_MAX); + + av_frame_free(&in); + return ff_filter_frame(outlink, out); +fail: + av_frame_free(&in); + av_frame_free(&out); + return ret; +} + +static AVFrame* cudaconv_get_video_buffer(AVFilterLink* inlink, int w, int h) +{ + return ff_default_get_video_buffer(inlink, w, h); +} + +#define OFFSET(x) offsetof(CUDAConvContext, x) +#define FLAGS (AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM) +static const AVOption options[] = { + {"range", "Output video range", OFFSET(range), AV_OPT_TYPE_INT, {.i64 = AVCOL_RANGE_UNSPECIFIED}, AVCOL_RANGE_UNSPECIFIED, AVCOL_RANGE_NB - 1, FLAGS, "range"}, + {"mpeg", "limited range", 0, AV_OPT_TYPE_CONST, {.i64 = AVCOL_RANGE_MPEG}, 0, 0, FLAGS, "range"}, + {"jpeg", "full range", 0, AV_OPT_TYPE_CONST, {.i64 = AVCOL_RANGE_JPEG}, 0, 0, FLAGS, "range"}, + {NULL}, +}; + +static const AVClass cudaconv_class = { + .class_name = "cudaconv", + .item_name = av_default_item_name, + .option = options, + .version = LIBAVUTIL_VERSION_INT, +}; + +static const AVFilterPad cudaconv_inputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .filter_frame = cudaconv_filter_frame, + .get_buffer.video = cudaconv_get_video_buffer, + }, +}; + +static const AVFilterPad cudaconv_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = cudaconv_config_props, + }, +}; + +const AVFilter ff_vf_colorrange_cuda = { + .name = "colorrange_cuda", + .description = + NULL_IF_CONFIG_SMALL("CUDA accelerated video color range converter"), + + .init = cudaconv_init, + .uninit = cudaconv_uninit, + + .priv_size = sizeof(CUDAConvContext), + .priv_class = &cudaconv_class, + + FILTER_INPUTS(cudaconv_inputs), + FILTER_OUTPUTS(cudaconv_outputs), + + FILTER_SINGLE_PIXFMT(AV_PIX_FMT_CUDA), + + .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE, +}; diff --git a/libavfilter/vf_colorrange_cuda.cu b/libavfilter/vf_colorrange_cuda.cu new file mode 100644 index 0000000000..6f617493f8 --- /dev/null +++ b/libavfilter/vf_colorrange_cuda.cu @@ -0,0 +1,93 @@ +/* + * Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +extern "C" { +#define MPEG_LUMA_MIN (16) +#define MPEG_CHROMA_MIN (16) +#define MPEG_LUMA_MAX (235) +#define MPEG_CHROMA_MAX (240) + +#define JPEG_LUMA_MIN (0) +#define JPEG_CHROMA_MIN (1) +#define JPEG_LUMA_MAX (255) +#define JPEG_CHROMA_MAX (255) + +__device__ int mpeg_min[] = {MPEG_LUMA_MIN, MPEG_CHROMA_MIN}; +__device__ int mpeg_max[] = {MPEG_LUMA_MAX, MPEG_CHROMA_MAX}; + +__device__ int jpeg_min[] = {JPEG_LUMA_MIN, JPEG_CHROMA_MIN}; +__device__ int jpeg_max[] = {JPEG_LUMA_MAX, JPEG_CHROMA_MAX}; + +__device__ int clamp(int val, int min, int max) +{ + if (val < min) + return min; + else if (val > max) + return max; + else + return val; +} + +__global__ void to_jpeg_cuda(const unsigned char* src, unsigned char* dst, + int pitch, int comp_id) +{ + int x = blockIdx.x * blockDim.x + threadIdx.x; + int y = blockIdx.y * blockDim.y + threadIdx.y; + int src_, dst_; + + // 8 bit -> 15 bit for better precision; + src_ = static_cast(src[x + y * pitch]) << 7; + + // Conversion; + dst_ = comp_id ? (min(src_, 30775) * 4663 - 9289992) >> 12 // chroma + : (min(src_, 30189) * 19077 - 39057361) >> 14; // luma + + // Dither replacement; + dst_ = dst_ + 64; + + // Back to 8 bit; + dst_ = clamp(dst_ >> 7, jpeg_min[comp_id], jpeg_max[comp_id]); + dst[x + y * pitch] = static_cast(dst_); +} + +__global__ void to_mpeg_cuda(const unsigned char* src, unsigned char* dst, + int pitch, int comp_id) +{ + int x = blockIdx.x * blockDim.x + threadIdx.x; + int y = blockIdx.y * blockDim.y + threadIdx.y; + int src_, dst_; + + // 8 bit -> 15 bit for better precision; + src_ = static_cast(src[x + y * pitch]) << 7; + + // Conversion; + dst_ = comp_id ? (src_ * 1799 + 4081085) >> 11 // chroma + : (src_ * 14071 + 33561947) >> 14; // luma + + // Dither replacement; + dst_ = dst_ + 64; + + // Back to 8 bit; + dst_ = clamp(dst_ >> 7, mpeg_min[comp_id], mpeg_max[comp_id]); + dst[x + y * pitch] = static_cast(dst_); +} +} \ No newline at end of file -- 2.25.1