From patchwork Tue Sep 10 18:10:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Koushik Dutta X-Patchwork-Id: 51499 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:41b1:b0:48e:c0f8:d0de with SMTP id le49csp584329vqb; Tue, 10 Sep 2024 11:11:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXqaH+rnUWwrusab0EzCop/ct4HTDiILcvmearlAJ+ircjLSIYJiqrUuTj3KFRDTXajtywsUYMtPaDo3kxYziOJ@gmail.com X-Google-Smtp-Source: AGHT+IEgDvKTkAI6xs60PD+XxkwPaWZCBxGf74hKj0ooLHihHXv8/wyC4XjglkrUnAr8itr11Q9e X-Received: by 2002:a17:907:da9:b0:a8d:3705:4115 with SMTP id a640c23a62f3a-a8ffab88e1amr131836966b.32.1725991889276; Tue, 10 Sep 2024 11:11:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725991889; cv=none; d=google.com; s=arc-20240605; b=BFnPAdjHuYucV5TkuTAHI7WqlTuoukLTI/a/sMvg68Z9RE7NVVxrRLJo6BAkMmIxYu hPNmgSojs8/YwvfyrUvKM6xZ6ViNqOLomueTC7823Za7BJcGp+kqYtuwCm3yLasReKWI MM3vsqaUyu2Zf0JrPj2LL2Bqqq2C1lLox9j9jfddQcHViei6+d2fKgxV+YsVwywiaRJC 6U5yAWk/1JS6yy0PkoxguOgi95d2HhSsQWzVTzlykTdxnbSgRZNJ0TAtTbmyqCRYxhQD v0Z1aUhJZZNYwIfRBMk+DPYJF6voOmMY9ilHeDPsBS1X90qg1Xv2RA0UBlXmrp8P2e5o JcvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=1vH/lZ5BAyZJO3B/pHlRF2B7TzfA+z7GL54tFcsMsWw=; fh=vf1rgm8KpEMxcRKiIfYPC5TR7MMWdHSIxg51vjgwMpE=; b=carUk/BdtJap0xe3nbPA2viZ5AKIcyreu3nZmnOuscGgTMWF29IpXXtVSDcZOlpig2 ea+oYvMHfKrUoqIwRFpL97FCiE8mFCoZeo8Aw6r/vmm5keU6uyS3BmqwZ05VIE9d3TdF zXi8QpRQ0npvB/QdcGRBCAxbJbT6nsIzdBKeu/qcfLQAMKALA9CZryNu3IHvs7jasZfF I4wLqbvzwtz2XwI29c3muaIxAKFtdkmDAWGvF/TQQ4C/emVxb9HXGNRM2XUuz0m/TBXj ZI18VrU3dxS1GaTegFX2j8vEUyW0ZGFFOtm9u9IoBbizrNACG0AwubPCI9EohmwxyDiw e6HQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=nsYWAo64; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a8d25856a1bsi600285666b.134.2024.09.10.11.11.27; Tue, 10 Sep 2024 11:11:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=nsYWAo64; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com; dara=fail header.i=@gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D678A68E103; Tue, 10 Sep 2024 21:11:15 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E895C68DE63 for ; Tue, 10 Sep 2024 21:11:07 +0300 (EEST) Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-206aee40676so48816925ad.0 for ; Tue, 10 Sep 2024 11:11:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725991866; x=1726596666; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/a+xezM4SQBRsfa3hqwtSLDM6yfR+1CgNY0KmfUqP1I=; b=nsYWAo64vqpwWKp4LQy0bNtjvJdnQkfgNWVgXnrQnveaB/kjordZS/P+lPXh9FbQRU pAMgk5OvG6rv/exl5y5E1S3TRWVy7exom2Z9cM1t7LLbCIt+k3DIQfpH7+RPJfV01T/X ih7pk21B96tGfuF88v22clx0WWw2X/T204ruaBCV/bPnziOAG60pZqDlRSSOOm4/xcau UgeGITBUiCidixviFARAEv3Kvk/Os11m1DKQ104LyWF9lbbkesrT+jAzigL4BWfANASe VVd9jDsD6r4BwHo9oPIcnxxfaAW6n8nRLPlA78UFz9gbZRXTXm2ziGT4lZ1wTEo8Zc1H Znyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725991866; x=1726596666; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/a+xezM4SQBRsfa3hqwtSLDM6yfR+1CgNY0KmfUqP1I=; b=PPSWiW2HZreDgCJn36G/6WY19ss/VRGjeK+x5fT5raBBovHHagnu9MA8Rx3GBHFI97 rE517tSnoxVemlGDoeM8OAJoEtKHi9jTbrHwj/N15AvWn5wIHRS0iRdY+6CIpqaVgH+R KMILEbD9WvfR23TaRg/ZHDvt0auGM/JE5MQlxrEHoegxIEL4wagawYKyAIRMll5yoWPB w00D/V4LgU+oFWWU/AWbi19B1EjXe83wOWyMjUcoJ+U4+JvrDw/vvbKYdMdbD8EuEJCw 1bCC9s/8tKjx2JCi7WZEiNLXm2x7Ztt5Al2R/9+0dDEEN0mhT1U/u4CBy9vaSkEKqBXe KGpQ== X-Gm-Message-State: AOJu0YyrjuQR5xTF9pGy3uKYxLxAwGo88Il01Tn76hW7mdzmwKAD9ViO jHOE+Jpnvho/yFNK7u89E/M2jMbcKoYDXWfaxcg5MRzZOjEBNDkPEXkp/swe X-Received: by 2002:a17:902:e846:b0:206:c8dc:e334 with SMTP id d9443c01a7336-2074c69b72emr21799775ad.39.1725991865858; Tue, 10 Sep 2024 11:11:05 -0700 (PDT) Received: from Koushik-MacStudio.tail05204.ts.net ([2001:559:76c:0:ed2e:ef7a:2bf7:c169]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20710e1d5c3sm51265025ad.59.2024.09.10.11.11.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 10 Sep 2024 11:11:05 -0700 (PDT) From: Koushik Dutta To: ffmpeg-devel@ffmpeg.org Date: Tue, 10 Sep 2024 11:10:56 -0700 Message-Id: <20240910181057.43453-2-koushd@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240910181057.43453-1-koushd@gmail.com> References: <20240910181057.43453-1-koushd@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] scale_cuda frame crop support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Koushik Dutta Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: azowR2aNFA+H The crop filter has no effect on scale_cuda: -vf crop=100:100,scale_cuda=300x300 Hardware frames (AV_PIX_FMT_FLAG_HWACCEL) are expected to use the crop_* properties, as seen in the implementation vf_crop.c. The current workaround is to hwdownload the full frame and perform the crop on CPU. --- libavfilter/vf_scale_cuda.c | 15 ++++++++++----- libavfilter/vf_scale_cuda.cu | 22 ++++++++++++++-------- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/libavfilter/vf_scale_cuda.c b/libavfilter/vf_scale_cuda.c index 54a340949d..eb8beee771 100644 --- a/libavfilter/vf_scale_cuda.c +++ b/libavfilter/vf_scale_cuda.c @@ -407,7 +407,7 @@ fail: } static int call_resize_kernel(AVFilterContext *ctx, CUfunction func, - CUtexObject src_tex[4], int src_width, int src_height, + CUtexObject src_tex[4], int src_left, int src_top, int src_width, int src_height, AVFrame *out_frame, int dst_width, int dst_height, int dst_pitch) { CUDAScaleContext *s = ctx->priv; @@ -422,7 +422,7 @@ static int call_resize_kernel(AVFilterContext *ctx, CUfunction func, &src_tex[0], &src_tex[1], &src_tex[2], &src_tex[3], &dst_devptr[0], &dst_devptr[1], &dst_devptr[2], &dst_devptr[3], &dst_width, &dst_height, &dst_pitch, - &src_width, &src_height, &s->param + &src_left, &src_top, &src_width, &src_height, &s->param }; return CHECK_CU(cu->cuLaunchKernel(func, @@ -440,6 +440,9 @@ static int scalecuda_resize(AVFilterContext *ctx, CUtexObject tex[4] = { 0, 0, 0, 0 }; + int crop_width = (in->width - in->crop_right) - in->crop_left; + int crop_height = (in->height - in->crop_bottom) - in->crop_top; + ret = CHECK_CU(cu->cuCtxPushCurrent(cuda_ctx)); if (ret < 0) return ret; @@ -477,7 +480,7 @@ static int scalecuda_resize(AVFilterContext *ctx, // scale primary plane(s). Usually Y (and A), or single plane of RGB frames. ret = call_resize_kernel(ctx, s->cu_func, - tex, in->width, in->height, + tex, in->crop_left, in->crop_top, crop_width, crop_height, out, out->width, out->height, out->linesize[0]); if (ret < 0) goto exit; @@ -485,8 +488,10 @@ static int scalecuda_resize(AVFilterContext *ctx, if (s->out_planes > 1) { // scale UV plane. Scale function sets both U and V plane, or singular interleaved plane. ret = call_resize_kernel(ctx, s->cu_func_uv, tex, - AV_CEIL_RSHIFT(in->width, s->in_desc->log2_chroma_w), - AV_CEIL_RSHIFT(in->height, s->in_desc->log2_chroma_h), + AV_CEIL_RSHIFT(in->crop_left, s->in_desc->log2_chroma_w), + AV_CEIL_RSHIFT(in->crop_top, s->in_desc->log2_chroma_h), + AV_CEIL_RSHIFT(crop_width, s->in_desc->log2_chroma_w), + AV_CEIL_RSHIFT(crop_height, s->in_desc->log2_chroma_h), out, AV_CEIL_RSHIFT(out->width, s->out_desc->log2_chroma_w), AV_CEIL_RSHIFT(out->height, s->out_desc->log2_chroma_h), diff --git a/libavfilter/vf_scale_cuda.cu b/libavfilter/vf_scale_cuda.cu index de06ba9433..271b55cd5d 100644 --- a/libavfilter/vf_scale_cuda.cu +++ b/libavfilter/vf_scale_cuda.cu @@ -26,6 +26,7 @@ template using subsample_function_t = T (*)(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param); @@ -64,11 +65,12 @@ static inline __device__ ushort conv_16to10(ushort in) subsample_function_t subsample_func_uv> \ __device__ static inline void N(cudaTextureObject_t src_tex[4], T *dst[4], int xo, int yo, \ int dst_width, int dst_height, int dst_pitch, \ - int src_width, int src_height, float param) + int src_left, int src_top, int src_width, int src_height, float param) #define SUB_F(m, plane) \ subsample_func_##m(src_tex[plane], xo, yo, \ dst_width, dst_height, \ + src_left, src_top, \ src_width, src_height, \ in_bit_depth, param) @@ -1063,13 +1065,14 @@ template __device__ static inline T Subsample_Nearest(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param) { float hscale = (float)src_width / (float)dst_width; float vscale = (float)src_height / (float)dst_height; - float xi = (xo + 0.5f) * hscale; - float yi = (yo + 0.5f) * vscale; + float xi = (xo + 0.5f) * hscale + src_left; + float yi = (yo + 0.5f) * vscale + src_top; return tex2D(tex, xi, yi); } @@ -1078,13 +1081,14 @@ template __device__ static inline T Subsample_Bilinear(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param) { float hscale = (float)src_width / (float)dst_width; float vscale = (float)src_height / (float)dst_height; - float xi = (xo + 0.5f) * hscale; - float yi = (yo + 0.5f) * vscale; + float xi = (xo + 0.5f) * hscale + src_left; + float yi = (yo + 0.5f) * vscale + src_top; // 3-tap filter weights are {wh,1.0,wh} and {wv,1.0,wv} float wh = min(max(0.5f * (hscale - 1.0f), 0.0f), 1.0f); float wv = min(max(0.5f * (vscale - 1.0f), 0.0f), 1.0f); @@ -1109,13 +1113,14 @@ template __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex, int xo, int yo, int dst_width, int dst_height, + int src_left, int src_top, int src_width, int src_height, int bit_depth, float param) { float hscale = (float)src_width / (float)dst_width; float vscale = (float)src_height / (float)dst_height; - float xi = (xo + 0.5f) * hscale - 0.5f; - float yi = (yo + 0.5f) * vscale - 0.5f; + float xi = (xo + 0.5f) * hscale - 0.5f + src_left; + float yi = (yo + 0.5f) * vscale - 0.5f + src_top; float px = floor(xi); float py = floor(yi); float fx = xi - px; @@ -1147,7 +1152,7 @@ __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex, cudaTextureObject_t src_tex_2, cudaTextureObject_t src_tex_3, \ T *dst_0, T *dst_1, T *dst_2, T *dst_3, \ int dst_width, int dst_height, int dst_pitch, \ - int src_width, int src_height, float param + int src_left, int src_top, int src_width, int src_height, float param #define SUBSAMPLE(Convert, T) \ cudaTextureObject_t src_tex[4] = \ @@ -1159,6 +1164,7 @@ __device__ static inline T Subsample_Bicubic(cudaTextureObject_t tex, Convert( \ src_tex, dst, xo, yo, \ dst_width, dst_height, dst_pitch, \ + src_left, src_top, \ src_width, src_height, param); extern "C" {