From patchwork Fri Jun 16 05:15:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Langdale X-Patchwork-Id: 42128 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c526:b0:117:ac03:c9de with SMTP id gm38csp1268528pzb; Thu, 15 Jun 2023 22:15:40 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ62tQqEKqOJBms2cFIRmK6QtpJxIxlIXhxUiC7vpS6hBIK7MYEJlYk3XFM/vjis1x4pkBw2 X-Received: by 2002:a2e:98c3:0:b0:2a8:ac69:bfe with SMTP id s3-20020a2e98c3000000b002a8ac690bfemr801280ljj.42.1686892540285; Thu, 15 Jun 2023 22:15:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686892540; cv=none; d=google.com; s=arc-20160816; b=BWfeuJYD1x9qLCIs739bkW7PTLQPd+49hckyoSb8f7s5SP5DpQzfRev74RQS8NhRvY r4NZ0yO1r/Z40hx9aE336E4Ayqtt2mtkcEqm03pzfY0YkDGy/dPLi0XPQUwPV4Y3VpRd 1jKtN15z599WqKfze/nvlIPbnvMaVyGi76lgCqyjE/m1Rmwz+JYuFpXgO63PAXSiMVty 33WESxG0bRAVOeSFUkmk9/Rzk7wTV02a8Fu+BPEPa4/VfYpUjlJi0GYtZ1U2GnnZqkVt RFLPVQdlJHYBUiN8VtkuMMGi6DGfU6PD1gsz1Gd/A9TPeqK1gfU8S0DZXtkCr1p3J2iv yXCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=yFaKosAeVjP+O8Ju9o0Ctqp0FMRqeob7vyJBnogyc5I=; b=y0SaFlch6DEC6FC3/ab6Lkpq5+TUVstG0IoOAd3ObELNyYYP8T9m9XYQNYQgCZ/po1 whGG4qUKOKrjdJiWziBZ/9emU8rHIwUW72cGqmz2cPAe2CYt14zlSO5kmLKP6cq5clIS Ezm2z4bBPvdkhIJpmAO9I/ms2YjI5D6dBf8DjVkPSO9d88nCGp8Mxlo7SwT/OHDYnjb3 1sjORrH0bNAbWozEhGc5jeegrVaZ51IdJ4h5IWDbW4WB/1jMhaekHN96zT78VpF9YreR eiGZlG6tpj3XxtNGXZ72k2cA+O8cPCoDtAiQyT5yTIvn37bcE9WwPnumt29VJsHkOqeA NGEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=t8xcjUxs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=overt.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c16-20020aa7d610000000b00516a46dd145si1562367edr.292.2023.06.15.22.15.39; Thu, 15 Jun 2023 22:15:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@overt.org header.s=mail header.b=t8xcjUxs; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=overt.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 09A0368C500; Fri, 16 Jun 2023 08:15:36 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.overt.org (mail.overt.org [72.14.183.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AA9CC68C500 for ; Fri, 16 Jun 2023 08:15:28 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=overt.org; s=mail; t=1686892526; bh=t9g1AnZO/nirR9vNBcedxV1kl+TR1SMAaBczvFCHD7w=; h=From:To:Cc:Subject:Date:From; b=t8xcjUxsge/N5ZT1vosc1ubhJZIEdqekqN344RAMiAQIPchq35GQb4FryRDOFMz7F EoilX437ZS1vIS2UgX7q2DwHQbEU+6mETnzAxd/vSGRqbf5BxwYrCYVNDXRYRBcfEA cXml7jjYWSHSsDnIUZ0+JlLFULV6to6lPV3WZr6MmCTvBRDrXENjLAFIybWUG9hYkN fbGEPBTEljcUrkiToLlE/oQYPmQigIjh2SYHS1Y1nII0Fa90k1fLXArdIddwnkHl85 GGw3lXwQvSh6vILIBpMOqHQiy0O4OJU564saYUmdnBrms5uGzUsULGf/ev89Ows7sZ h09VaKxEUrXUA== Received: from authenticated-user (mail.overt.org [72.14.183.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.overt.org (Postfix) with ESMTPSA id 01F3D60844; Fri, 16 Jun 2023 00:15:25 -0500 (CDT) From: Philip Langdale To: ffmpeg-devel@ffmpeg.org Date: Thu, 15 Jun 2023 22:15:18 -0700 Message-Id: <20230616051518.949854-1-philipl@overt.org> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avfilter/scale_cuda: add support for rgb32/bgr32 conversions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Philip Langdale Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: NuR71zpgapjE As we are introducing two new formats and supporting conversions between them, and also with the existing 0rgb32/0bgr32 formats, we get a combinatorial explosion of kernels. I introduced a few new macros to keep the things mostly managable. The conversions are all simple, following existing patterns, with four specific exceptions. When converting from 0rgb32/0bgr32 to rgb32/bgr32, we need to ensure the alpha value is set to 1. In all other cases, it can just be passed through, either to be used or ignored. --- libavfilter/vf_scale_cuda.c | 2 + libavfilter/vf_scale_cuda.cu | 175 ++++++++++++++++++++++++++++------- 2 files changed, 146 insertions(+), 31 deletions(-) diff --git a/libavfilter/vf_scale_cuda.c b/libavfilter/vf_scale_cuda.c index 1c99befec8..370cb1d9cd 100644 --- a/libavfilter/vf_scale_cuda.c +++ b/libavfilter/vf_scale_cuda.c @@ -51,6 +51,8 @@ static const enum AVPixelFormat supported_formats[] = { AV_PIX_FMT_YUV444P16, AV_PIX_FMT_0RGB32, AV_PIX_FMT_0BGR32, + AV_PIX_FMT_RGB32, + AV_PIX_FMT_BGR32, }; #define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) ) diff --git a/libavfilter/vf_scale_cuda.cu b/libavfilter/vf_scale_cuda.cu index c9c6cafdb6..c82649e84f 100644 --- a/libavfilter/vf_scale_cuda.cu +++ b/libavfilter/vf_scale_cuda.cu @@ -853,9 +853,67 @@ struct Convert_yuv444p16le_yuv444p16le } }; -// bgr0->X - -struct Convert_bgr0_bgr0 +#define DEF_CONVERT_IDENTITY(fmt1, fmt2)\ + \ +struct Convert_##fmt1##_##fmt2 \ +{ \ + static const int in_bit_depth = 8; \ + typedef uchar4 in_T; \ + typedef uchar in_T_uv; \ + typedef uchar4 out_T; \ + typedef uchar out_T_uv; \ + \ + DEF_F(Convert, out_T) \ + { \ + DEFAULT_DST(0) = SUB_F(y, 0); \ + } \ + \ + DEF_F(Convert_uv, out_T_uv) \ + { \ + } \ +}; \ + +#define DEF_CONVERT_REORDER(fmt1, fmt2) \ + \ +struct Convert_##fmt1##_##fmt2 \ +{ \ + static const int in_bit_depth = 8; \ + typedef uchar4 in_T; \ + typedef uchar in_T_uv; \ + typedef uchar4 out_T; \ + typedef uchar out_T_uv; \ + \ + DEF_F(Convert, out_T) \ + { \ + uchar4 res = SUB_F(y, 0); \ + DEFAULT_DST(0) = make_uchar4( \ + res.z, \ + res.y, \ + res.x, \ + res.w \ + ); \ + } \ + \ + DEF_F(Convert_uv, out_T_uv) \ + { \ + } \ +}; \ + +#define DEF_CONVERT_RGB(fmt1, fmt2) \ + \ +DEF_CONVERT_IDENTITY(fmt1, fmt1) \ +DEF_CONVERT_REORDER (fmt1, fmt2) \ +DEF_CONVERT_REORDER (fmt2, fmt1) \ +DEF_CONVERT_IDENTITY(fmt2, fmt2) + +DEF_CONVERT_RGB(rgb0, bgr0) +DEF_CONVERT_RGB(rgba, bgra) +DEF_CONVERT_IDENTITY(rgba, rgb0) +DEF_CONVERT_IDENTITY(bgra, bgr0) +DEF_CONVERT_REORDER(rgba, bgr0) +DEF_CONVERT_REORDER(bgra, rgb0) + +struct Convert_bgr0_bgra { static const int in_bit_depth = 8; typedef uchar4 in_T; @@ -865,7 +923,13 @@ struct Convert_bgr0_bgr0 DEF_F(Convert, out_T) { - DEFAULT_DST(0) = SUB_F(y, 0); + uchar4 res = SUB_F(y, 0); + DEFAULT_DST(0) = make_uchar4( + res.x, + res.y, + res.z, + 1 + ); } DEF_F(Convert_uv, out_T_uv) @@ -873,7 +937,7 @@ struct Convert_bgr0_bgr0 } }; -struct Convert_bgr0_rgb0 +struct Convert_bgr0_rgba { static const int in_bit_depth = 8; typedef uchar4 in_T; @@ -888,7 +952,7 @@ struct Convert_bgr0_rgb0 res.z, res.y, res.x, - res.w + 1 ); } @@ -897,9 +961,7 @@ struct Convert_bgr0_rgb0 } }; -// rgb0->X - -struct Convert_rgb0_bgr0 +struct Convert_rgb0_bgra { static const int in_bit_depth = 8; typedef uchar4 in_T; @@ -914,7 +976,7 @@ struct Convert_rgb0_bgr0 res.z, res.y, res.x, - res.w + 1 ); } @@ -923,7 +985,7 @@ struct Convert_rgb0_bgr0 } }; -struct Convert_rgb0_rgb0 +struct Convert_rgb0_rgba { static const int in_bit_depth = 8; typedef uchar4 in_T; @@ -933,7 +995,13 @@ struct Convert_rgb0_rgb0 DEF_F(Convert, out_T) { - DEFAULT_DST(0) = SUB_F(y, 0); + uchar4 res = SUB_F(y, 0); + DEFAULT_DST(0) = make_uchar4( + res.x, + res.y, + res.z, + 1 + ); } DEF_F(Convert_uv, out_T_uv) @@ -1117,6 +1185,12 @@ extern "C" { NEAREST_KERNEL_RAW(p016le_ ## C) \ NEAREST_KERNEL_RAW(yuv444p16le_ ## C) +#define NEAREST_KERNELS_RGB(fmt1, fmt2) \ + NEAREST_KERNEL_RAW(fmt1##_##fmt1) \ + NEAREST_KERNEL_RAW(fmt1##_##fmt2) \ + NEAREST_KERNEL_RAW(fmt2##_##fmt1) \ + NEAREST_KERNEL_RAW(fmt2##_##fmt2) + NEAREST_KERNELS(yuv420p) NEAREST_KERNELS(nv12) NEAREST_KERNELS(yuv444p) @@ -1124,11 +1198,16 @@ NEAREST_KERNELS(p010le) NEAREST_KERNELS(p016le) NEAREST_KERNELS(yuv444p16le) -NEAREST_KERNEL_RAW(bgr0_bgr0) -NEAREST_KERNEL_RAW(rgb0_rgb0) -NEAREST_KERNEL_RAW(bgr0_rgb0) -NEAREST_KERNEL_RAW(rgb0_bgr0) - +NEAREST_KERNELS_RGB(rgb0, bgr0) +NEAREST_KERNELS_RGB(rgba, bgra) +NEAREST_KERNEL_RAW(rgb0_rgba) +NEAREST_KERNEL_RAW(rgb0_bgra) +NEAREST_KERNEL_RAW(bgr0_rgba) +NEAREST_KERNEL_RAW(bgr0_bgra) +NEAREST_KERNEL_RAW(rgba_rgb0) +NEAREST_KERNEL_RAW(rgba_bgr0) +NEAREST_KERNEL_RAW(bgra_rgb0) +NEAREST_KERNEL_RAW(bgra_bgr0) #define BILINEAR_KERNEL(C, S) \ __global__ void Subsample_Bilinear_##C##S( \ @@ -1152,6 +1231,12 @@ NEAREST_KERNEL_RAW(rgb0_bgr0) BILINEAR_KERNEL_RAW(p016le_ ## C) \ BILINEAR_KERNEL_RAW(yuv444p16le_ ## C) +#define BILINEAR_KERNELS_RGB(fmt1, fmt2)\ + BILINEAR_KERNEL_RAW(fmt1##_##fmt1) \ + BILINEAR_KERNEL_RAW(fmt1##_##fmt2) \ + BILINEAR_KERNEL_RAW(fmt2##_##fmt1) \ + BILINEAR_KERNEL_RAW(fmt2##_##fmt2) + BILINEAR_KERNELS(yuv420p) BILINEAR_KERNELS(nv12) BILINEAR_KERNELS(yuv444p) @@ -1159,10 +1244,16 @@ BILINEAR_KERNELS(p010le) BILINEAR_KERNELS(p016le) BILINEAR_KERNELS(yuv444p16le) -BILINEAR_KERNEL_RAW(bgr0_bgr0) -BILINEAR_KERNEL_RAW(rgb0_rgb0) -BILINEAR_KERNEL_RAW(bgr0_rgb0) -BILINEAR_KERNEL_RAW(rgb0_bgr0) +BILINEAR_KERNELS_RGB(rgb0, bgr0) +BILINEAR_KERNELS_RGB(rgba, bgra) +BILINEAR_KERNEL_RAW(rgb0_rgba) +BILINEAR_KERNEL_RAW(rgb0_bgra) +BILINEAR_KERNEL_RAW(bgr0_rgba) +BILINEAR_KERNEL_RAW(bgr0_bgra) +BILINEAR_KERNEL_RAW(rgba_rgb0) +BILINEAR_KERNEL_RAW(rgba_bgr0) +BILINEAR_KERNEL_RAW(bgra_rgb0) +BILINEAR_KERNEL_RAW(bgra_bgr0) #define BICUBIC_KERNEL(C, S) \ __global__ void Subsample_Bicubic_##C##S( \ @@ -1186,6 +1277,12 @@ BILINEAR_KERNEL_RAW(rgb0_bgr0) BICUBIC_KERNEL_RAW(p016le_ ## C) \ BICUBIC_KERNEL_RAW(yuv444p16le_ ## C) +#define BICUBIC_KERNELS_RGB(fmt1, fmt2) \ + BICUBIC_KERNEL_RAW(fmt1##_##fmt1) \ + BICUBIC_KERNEL_RAW(fmt1##_##fmt2) \ + BICUBIC_KERNEL_RAW(fmt2##_##fmt1) \ + BICUBIC_KERNEL_RAW(fmt2##_##fmt2) + BICUBIC_KERNELS(yuv420p) BICUBIC_KERNELS(nv12) BICUBIC_KERNELS(yuv444p) @@ -1193,11 +1290,16 @@ BICUBIC_KERNELS(p010le) BICUBIC_KERNELS(p016le) BICUBIC_KERNELS(yuv444p16le) -BICUBIC_KERNEL_RAW(bgr0_bgr0) -BICUBIC_KERNEL_RAW(rgb0_rgb0) -BICUBIC_KERNEL_RAW(bgr0_rgb0) -BICUBIC_KERNEL_RAW(rgb0_bgr0) - +BICUBIC_KERNELS_RGB(rgb0, bgr0) +BICUBIC_KERNELS_RGB(rgba, bgra) +BICUBIC_KERNEL_RAW(rgb0_rgba) +BICUBIC_KERNEL_RAW(rgb0_bgra) +BICUBIC_KERNEL_RAW(bgr0_rgba) +BICUBIC_KERNEL_RAW(bgr0_bgra) +BICUBIC_KERNEL_RAW(rgba_rgb0) +BICUBIC_KERNEL_RAW(rgba_bgr0) +BICUBIC_KERNEL_RAW(bgra_rgb0) +BICUBIC_KERNEL_RAW(bgra_bgr0) #define LANCZOS_KERNEL(C, S) \ __global__ void Subsample_Lanczos_##C##S( \ @@ -1221,6 +1323,12 @@ BICUBIC_KERNEL_RAW(rgb0_bgr0) LANCZOS_KERNEL_RAW(p016le_ ## C) \ LANCZOS_KERNEL_RAW(yuv444p16le_ ## C) +#define LANCZOS_KERNELS_RGB(fmt1, fmt2) \ + LANCZOS_KERNEL_RAW(fmt1##_##fmt1) \ + LANCZOS_KERNEL_RAW(fmt1##_##fmt2) \ + LANCZOS_KERNEL_RAW(fmt2##_##fmt1) \ + LANCZOS_KERNEL_RAW(fmt2##_##fmt2) + LANCZOS_KERNELS(yuv420p) LANCZOS_KERNELS(nv12) LANCZOS_KERNELS(yuv444p) @@ -1228,9 +1336,14 @@ LANCZOS_KERNELS(p010le) LANCZOS_KERNELS(p016le) LANCZOS_KERNELS(yuv444p16le) -LANCZOS_KERNEL_RAW(bgr0_bgr0) -LANCZOS_KERNEL_RAW(rgb0_rgb0) -LANCZOS_KERNEL_RAW(bgr0_rgb0) -LANCZOS_KERNEL_RAW(rgb0_bgr0) - +LANCZOS_KERNELS_RGB(rgb0, bgr0) +LANCZOS_KERNELS_RGB(rgba, bgra) +LANCZOS_KERNEL_RAW(rgb0_rgba) +LANCZOS_KERNEL_RAW(rgb0_bgra) +LANCZOS_KERNEL_RAW(bgr0_rgba) +LANCZOS_KERNEL_RAW(bgr0_bgra) +LANCZOS_KERNEL_RAW(rgba_rgb0) +LANCZOS_KERNEL_RAW(rgba_bgr0) +LANCZOS_KERNEL_RAW(bgra_rgb0) +LANCZOS_KERNEL_RAW(bgra_bgr0) }