From patchwork Fri Oct 18 23:05:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Koushik Dutta X-Patchwork-Id: 52396 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:143:b0:48e:c0f8:d0de with SMTP id h3csp1441454vqi; Sat, 19 Oct 2024 02:44:21 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVy2fb7lOzM8U9y60+9YpooxSpjG55S8kodd775yVuX44cHEzFP5F7PF8NCtZMa+4MiIgLPZDdBVV17mV2kENx1@gmail.com X-Google-Smtp-Source: AGHT+IGrImboWSiQEx+d45knbyYXyuGTZ50AC/hxBlRg1/Bsc6Ek7tYfHdyufyVrfoAVU8Xwftc5 X-Received: by 2002:a2e:701:0:b0:2fa:c0df:3d91 with SMTP id 38308e7fff4ca-2fb82eaedbemr23024771fa.19.1729331061706; Sat, 19 Oct 2024 02:44:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1729331061; cv=none; d=google.com; s=arc-20240605; b=AcAPk3ilww6efVMk5weFGrc4RVgbQuGuj+D5Eu+fgLPxjFdLggrcrdBUgYpNSNL+dV 4QYUPNBVPkc6ryp5bNEya5A+1JyuHBw6Q7b9afztb6QVVk81FO4lFNXi5Dclk2qYlR+U eDG12mXpUlYAB1aVZswNXrnhLQunFidelGnXTPOLwPwqaEDk6G/XGt9Mb32w8qncooPl /b0Hr+R96EOpxOYlrNifc7sIjzrKv/4C+M8FQRFuxvHJ6j0hELfdMZYcvI6fLE6G64MT kOM1Wvcf8kc5+akOrx5BJErIZOB4Mc0yLt/iS+2g1F0X/fC8WHbW0FZ5ILEF5wkGoDPt 4V5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=bFdK+mAnJa2OkDFp7lIO4CDrjH46W2GjzExr4HnI3YY=; fh=vf1rgm8KpEMxcRKiIfYPC5TR7MMWdHSIxg51vjgwMpE=; b=RPLFHUPJaCGvXck0zMcD3QDcOKSqB3RVASzzILjhIn1VLiocJ8a6HnZbYqGpjAMo5N zJRkhWYEswBtlPgKPIqPCaVghKrIUkbYJcQN3c16pATN+54g/uKaQW3GtNAbhn9Z10sV AonPAkvs22yWaz5DP98rXJRYbB3/A+8yvtszfoznPKQk0yMMllq6IQUfUpwTAKIyXS+1 2rsOddXKL+2814QjhkCz31aeIq6lZHaCKpG4cWpUmT9ZwPmNylFTkW0LYUnbZcR0uooT B3D55CPRe5PgJn7F7Csx6vDjb8gRzawuVOhyIsx7jgtP7pmTZDuYtwkM76zkYWzBcam9 f6Tg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=GWYvDzS8; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2fb809c5e80si11082941fa.182.2024.10.19.02.44.21; Sat, 19 Oct 2024 02:44:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=GWYvDzS8; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0ACE468DD37; Sat, 19 Oct 2024 02:06:28 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E7AA568DA68 for ; Sat, 19 Oct 2024 02:06:16 +0300 (EEST) Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-20ca388d242so20020745ad.2 for ; Fri, 18 Oct 2024 16:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729292775; x=1729897575; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dWs4DtCs1gmBZztB6d/McYLwkYGfWDlgejSjjPCinng=; b=GWYvDzS84e/HF5FNbtNJN6gVbFZCF31dlCnHN2tg2026wBDskTzeluCjXPnhUiT4IQ 1y5ziUTUJ1QvuwYVSumxDfmqKngreu06NGnCF/yr55JiN/K0v94RIzgHUjD/G83XWfbW fVayCRbEDv+rcnaQzv1mchXzPZlmzZYap1gKGSxtBLMiMaCY6eIRu+ASAnvRm4uuwiaF ThJ7PQVN/D9+irJy9RLNhzxHTakmDCozTTuxtMB9Zfs3ZIBokZv3r2XxKvPS00UTwPyx veYzeGBVCuyGw+CQsJ2A+6hwzxmFx9lyP1j3i0LhRlUwH03epW9/J0HMAdOCMKvJ73rS Dc6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729292775; x=1729897575; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dWs4DtCs1gmBZztB6d/McYLwkYGfWDlgejSjjPCinng=; b=hzr5YpFRYiQDAWN8pBIIGZ8xWxN2lrEKTIiHCyxSJuP1kPaoQu9rNa+dejfOv8YLTn 87Uq6JvcnMJs4kUAmmoB7wSqjam5+p1xZsfqCWJOoWXf9CkRBTNGhD7sBNFEZg6Ko2r8 7TKn0wo7JBe/ffXEamH1YMwqmwGrADJF0wpLQglg9v6fBtZptEmPmzC74CTCPOGlXNUn F5FmRRiVjvWJzD5o9JdoFeM7R97BzFHz659mN9k3e1mR1FUS5bYnnTmWtK4YZbR6Qx23 Fmjwq4qmZlB0YaQPtGRSL2dnGFuKnQoqjGxjByCLgzQtN8OfPLyT5S48ACjfB/G9fT3z zbsA== X-Gm-Message-State: AOJu0YyYCrWkWSjKjNDJeAv5mjaDEs2vbu9sRAJCQopd/+ePxhQTEzx7 vp5uJJMmVzB7zViEHZi8+d24juT7zYYh4MI1vDGOd1nGCYcCtBdTlIi9dA== X-Received: by 2002:a17:902:f68a:b0:20c:aa41:9968 with SMTP id d9443c01a7336-20e5a9587c3mr44442395ad.53.1729292774947; Fri, 18 Oct 2024 16:06:14 -0700 (PDT) Received: from Koushik-MacStudio.tail05204.ts.net ([2001:559:76c:0:493a:a469:ed1d:5e06]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e5a912dafsm17809655ad.276.2024.10.18.16.06.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 18 Oct 2024 16:06:14 -0700 (PDT) From: Koushik Dutta To: ffmpeg-devel@ffmpeg.org Date: Fri, 18 Oct 2024 16:05:50 -0700 Message-Id: <20241018230552.80169-2-koushd@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20241018230552.80169-1-koushd@gmail.com> References: <20241018230552.80169-1-koushd@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] scale_cuda frame crop support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Koushik Dutta Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Wx/yE3JKltmG The crop filter has no effect on scale_cuda: -vf crop=100:100,scale_cuda=300x300 Hardware frames (AV_PIX_FMT_FLAG_HWACCEL) are expected to use the crop_* properties, as seen in the implementation vf_crop.c. The current workaround is to hwdownload the full frame and perform the crop on CPU. --- libavfilter/vf_scale_cuda.c | 17 +++++++++++------ libavfilter/vf_scale_cuda.cu | 22 ++++++++++++++-------- 2 files changed, 25 insertions(+), 14 deletions(-) diff --git a/libavfilter/vf_scale_cuda.c b/libavfilter/vf_scale_cuda.c index 54a340949d..8615da308a 100644 --- a/libavfilter/vf_scale_cuda.c +++ b/libavfilter/vf_scale_cuda.c @@ -407,7 +407,7 @@ fail: } static int call_resize_kernel(AVFilterContext *ctx, CUfunction func, - CUtexObject src_tex[4], int src_width, int src_height, + CUtexObject src_tex[4], int src_left, int src_top, int src_width, int src_height, AVFrame *out_frame, int dst_width, int dst_height, int dst_pitch) { CUDAScaleContext *s = ctx->priv; @@ -422,7 +422,7 @@ static int call_resize_kernel(AVFilterContext *ctx, CUfunction func, &src_tex[0], &src_tex[1], &src_tex[2], &src_tex[3], &dst_devptr[0], &dst_devptr[1], &dst_devptr[2], &dst_devptr[3], &dst_width, &dst_height, &dst_pitch, - &src_width, &src_height, &s->param + &src_left, &src_top, &src_width, &src_height, &s->param }; return CHECK_CU(cu->cuLaunchKernel(func, @@ -440,6 +440,9 @@ static int scalecuda_resize(AVFilterContext *ctx, CUtexObject tex[4] = { 0, 0, 0, 0 }; + int crop_width = (in->width - in->crop_right) - in->crop_left; + int crop_height = (in->height - in->crop_bottom) - in->crop_top; + ret = CHECK_CU(cu->cuCtxPushCurrent(cuda_ctx)); if (ret < 0) return ret; @@ -477,7 +480,7 @@ static int scalecuda_resize(AVFilterContext *ctx, // scale primary plane(s). Usually Y (and A), or single plane of RGB frames. ret = call_resize_kernel(ctx, s->cu_func, - tex, in->width, in->height, + tex, in->crop_left, in->crop_top, crop_width, crop_height, out, out->width, out->height, out->linesize[0]); if (ret < 0) goto exit; @@ -485,8 +488,10 @@ static int scalecuda_resize(AVFilterContext *ctx, if (s->out_planes > 1) { // scale UV plane. Scale function sets both U and V plane, or singular interleaved plane. ret = call_resize_kernel(ctx, s->cu_func_uv, tex, - AV_CEIL_RSHIFT(in->width, s->in_desc->log2_chroma_w), - AV_CEIL_RSHIFT(in->height, s->in_desc->log2_chroma_h), + AV_CEIL_RSHIFT(in->crop_left, s->in_desc->log2_chroma_w), + AV_CEIL_RSHIFT(in->crop_top, s->in_desc->log2_chroma_h), + AV_CEIL_RSHIFT(crop_width, s->in_desc->log2_chroma_w), + AV_CEIL_RSHIFT(crop_height, s->in_desc->log2_chroma_h), out, AV_CEIL_RSHIFT(out->width, s->out_desc->log2_chroma_w), AV_CEIL_RSHIFT(out->height, s->out_desc->log2_chroma_h), @@ -545,7 +550,7 @@ static int cudascale_filter_frame(AVFilterLink *link, AVFrame *in) CUcontext dummy; int ret = 0; - if (s->passthrough) + if (s->passthrough && !in->crop_left && !in->crop_top && !in->crop_right && !in->crop_bottom) return ff_filter_frame(outlink, in); out = av_frame_alloc(); diff --git a/libavfilter/vf_scale_cuda.cu b/libavfilter/vf_scale_cuda.cu index de06ba9433..271b55cd5d 100644 --- a/libavfilter/vf_scale_cuda.cu +++ b/libavfilter/vf_scale_cuda.cu @@ -26,6 +26,7 @@ template using subsample_function_t = T (*)(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param); @@ -64,11 +65,12 @@ static inline __device__ ushort conv_16to10(ushort in) subsample_function_t subsample_func_uv> \ __device__ static inline void N(cudaTextureObject_t src_tex[4], T *dst[4], int xo, int yo, \ int dst_width, int dst_height, int dst_pitch, \ - int src_width, int src_height, float param) + int src_left, int src_top, int src_width, int src_height, float param) #define SUB_F(m, plane) \ subsample_func_##m(src_tex[plane], xo, yo, \ dst_width, dst_height, \ + src_left, src_top, \ src_width, src_height, \ in_bit_depth, param) @@ -1063,13 +1065,14 @@ template __device__ static inline T Subsample_Nearest(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param) { float hscale = (float)src_width / (float)dst_width; float vscale = (float)src_height / (float)dst_height; - float xi = (xo + 0.5f) * hscale; - float yi = (yo + 0.5f) * vscale; + float xi = (xo + 0.5f) * hscale + src_left; + float yi = (yo + 0.5f) * vscale + src_top; return tex2D(tex, xi, yi); } @@ -1078,13 +1081,14 @@ template __device__ static inline T Subsample_Bilinear(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param) { float hscale = (float)src_width / (float)dst_width; float vscale = (float)src_height / (float)dst_height; - float xi = (xo + 0.5f) * hscale; - float yi = (yo + 0.5f) * vscale; + float xi = (xo + 0.5f) * hscale + src_left; + float yi = (yo + 0.5f) * vscale + src_top; // 3-tap filter weights are {wh,1.0,wh} and {wv,1.0,wv} float wh = min(max(0.5f * (hscale - 1.0f), 0.0f), 1.0f); float wv = min(max(0.5f * (vscale - 1.0f), 0.0f), 1.0f); @@ -1109,13 +1113,14 @@ template __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param) { float hscale = (float)src_width / (float)dst_width; float vscale = (float)src_height / (float)dst_height; - float xi = (xo + 0.5f) * hscale - 0.5f; - float yi = (yo + 0.5f) * vscale - 0.5f; + float xi = (xo + 0.5f) * hscale - 0.5f + src_left; + float yi = (yo + 0.5f) * vscale - 0.5f + src_top; float px = floor(xi); float py = floor(yi); float fx = xi - px; @@ -1147,7 +1152,7 @@ __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex, cudaTextureObject_t src_tex_2, cudaTextureObject_t src_tex_3, \ T *dst_0, T *dst_1, T *dst_2, T *dst_3, \ int dst_width, int dst_height, int dst_pitch, \ - int src_width, int src_height, float param + int src_left, int src_top, int src_width, int src_height, float param #define SUBSAMPLE(Convert, T) \ cudaTextureObject_t src_tex[4] = \ @@ -1159,6 +1164,7 @@ __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex, Convert( \ src_tex, dst, xo, yo, \ dst_width, dst_height, dst_pitch, \ + src_left, src_top, \ src_width, src_height, param); extern "C" {