From patchwork Mon Mar 19 23:05:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Thompson X-Patchwork-Id: 8047 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.2.1.70 with SMTP id c67csp3202499jad; Mon, 19 Mar 2018 16:05:47 -0700 (PDT) X-Google-Smtp-Source: AG47ELsVtEwaJ/Xg/Ln7DbLTqVrYcOEwBKfz7g0SvMcVOqnH2IpIe8pYIMmHRlirpwvZG/WLqJPl X-Received: by 10.223.179.13 with SMTP id j13mr3550827wrd.165.1521500747122; Mon, 19 Mar 2018 16:05:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521500747; cv=none; d=google.com; s=arc-20160816; b=PeczNEqOzJkeLwGBp3aMetRfQx1JLZSps7xeRlFNpthTrYSiRf8n3okdFmC8jxI5LJ ZatP9+zDMiIHOkrMdxaTlmorRi67DSJJZSNojnDsC9m+BSo+oSizGDyvaPZsvdHOq92H BsEUfEar9MtHZ5z+U+x9yDGrUwD2JncFUCZvfAei840/jThRJ3HcAupUGlRYpM5LK3hy RbxhUBvNiJAZd0DuWMRyn4WRVulWWJIg8Rit1EnFIYWMI+RojfTT5hU2IkZ8O+1Otkmz QUJkD6fAB8ek0clJaayy3joYD581USK/xASijwXByZF8T30zdZwGe7tSkMZYToBCfhMV KYVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to:arc-authentication-results; bh=3A5rla7pDYSM0dGYf/F4rDYLKhBhqHyJk0jjMb36DgQ=; b=Zjw/NzNefu1tBKl/NBm24+pSulj0yJo/RA4lvb7Wd4rcukKi/ETwOVUJBYv+H1tHH+ fQxpntfivUWhfBYWdwdEULRPv9Qbd7K3eOWTVKpE7NzfVMMwgFE9o1nzjopbqRqpLwoB kpxbFD53P5AXv47WeuwOWu3YYp+QRzS9u6XPgBozRUSgNjKKWKTrdsIWRLzf7FWHmqy3 5IaxbUgSVU+1b3Nux+QuQoqNhS2WG8ftNAgUykvDND8pwOospBnr7r/WAk65DJSOLHs4 ygBLSRSWHT25f5rNoOO0iEVTEU4MlO+RTvTnqKaeimOIaaQuO+9Zmf02nlKlEcqGX/Z3 6wog== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@jkqxz-net.20150623.gappssmtp.com header.s=20150623 header.b=HCPvBZJ+; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 64si240183wro.229.2018.03.19.16.05.45; Mon, 19 Mar 2018 16:05:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@jkqxz-net.20150623.gappssmtp.com header.s=20150623 header.b=HCPvBZJ+; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 15044689FCF; Tue, 20 Mar 2018 01:05:30 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 11CD1689908 for ; Tue, 20 Mar 2018 01:05:24 +0200 (EET) Received: by mail-wm0-f67.google.com with SMTP id r82so54977wme.0 for ; Mon, 19 Mar 2018 16:05:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jkqxz-net.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=aKLEzCoh2XJGhqH/drDo7Tsg1VDUaSPhY2RBDGvSlAc=; b=HCPvBZJ+Y8xXX/AVeU2dyVhGePxCdMM2rB/ZDzwvE/0GLo4QH5K9AGjcT/TJRYHAn6 82yeVdLKiQLQz2LRh9dmLFlQnhciOCSRoyh0qc2dBLyRG7bHjFOBUpyPHWJE9uRfBaHV V+aa4WDtg4Mg0MqYtNz6PYTbN1AwvXFNAGgEb6HPQZk8sR16V9Xttxf5cuYOFWX4++nl oqgSJQQcqQqWnt/9IKClSKm9p/seqjW3GNmRUdj7gjR0umwwz0FF3h7v/DuQspUzEGDV +18pAoaM94pDsVT1gwOswLMocJMINim96jIbKLXWkTvSOVILBgGSQ/Qh2kd7EPj6wzp6 2k0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=aKLEzCoh2XJGhqH/drDo7Tsg1VDUaSPhY2RBDGvSlAc=; b=LpBAEdyFpkG757H/MLtjHiHXB0fiPJZuAsYiRZiWn1vVb0l/l6BN7A1jhUHNgTRidf Vmj/+92FVtLFP9R7X8CZF1C6D3ndKrzU7g9+HNVHx1oOxz8W0Z6yjn2JMFqZtSS2KuB0 sqLoD6nSICf1eoGXsQXVjiNc0OXUqcWGdO5QhVdM9T/34htGWGRpSoVYkGbUyWUSSaon WZGWn3TOk0TItWmuKDKpE36ZlAKTfGxbSa6grCvP19K+ckIdEuSplqL+i8ec753bT/MD lbnDBTUohUn1hzndwwAWtX0BcfGLOQGlioMBhf6VpgktLgCzB/ip2nckzkB0oNp1XqkJ MiDw== X-Gm-Message-State: AElRT7GoWfQweujHsg9U6H+ZHB6FMf4A2qJd7vaeaxwqlojXNb73jxVI WOcgyZeWgS9VB/+GG5gL3vlIIFkn X-Received: by 10.28.71.83 with SMTP id u80mr373804wma.24.1521500737431; Mon, 19 Mar 2018 16:05:37 -0700 (PDT) Received: from rywe.jkqxz.net (cpc91242-cmbg18-2-0-cust650.5-4.cable.virginm.net. [82.8.130.139]) by smtp.gmail.com with ESMTPSA id x78sm416476wmd.2.2018.03.19.16.05.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Mar 2018 16:05:36 -0700 (PDT) From: Mark Thompson To: ffmpeg-devel@ffmpeg.org Date: Mon, 19 Mar 2018 23:05:29 +0000 Message-Id: <20180319230531.7079-1-sw@jkqxz.net> X-Mailer: git-send-email 2.16.1 Subject: [FFmpeg-devel] [PATCH 1/3] lavfi/opencl: Derive global work size from plane image sizes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Add a new function to find the global work size given the output image and the required block alignment, then use it in the overlay, program and unsharp filters. Fixes the overlay and unsharp filters applying the kernel to locations outside the frame when subsampled planes are present. --- libavfilter/opencl.c | 64 +++++++++++++++++++++++++++++++++++++++++ libavfilter/opencl.h | 8 ++++++ libavfilter/vf_overlay_opencl.c | 6 ++-- libavfilter/vf_program_opencl.c | 8 +++--- libavfilter/vf_unsharp_opencl.c | 16 +++++------ 5 files changed, 87 insertions(+), 15 deletions(-) diff --git a/libavfilter/opencl.c b/libavfilter/opencl.c index 37afc41f8b..ae61667380 100644 --- a/libavfilter/opencl.c +++ b/libavfilter/opencl.c @@ -22,6 +22,7 @@ #include "libavutil/hwcontext.h" #include "libavutil/hwcontext_opencl.h" #include "libavutil/mem.h" +#include "libavutil/pixdesc.h" #include "avfilter.h" #include "formats.h" @@ -276,3 +277,66 @@ fail: av_freep(&src); return err; } + +int ff_opencl_filter_work_size_from_image(AVFilterContext *avctx, + size_t *work_size, + AVFrame *frame, int plane, + int block_alignment) +{ + cl_mem image; + cl_mem_object_type type; + size_t width, height; + cl_int cle; + + if (frame->format != AV_PIX_FMT_OPENCL) { + av_log(avctx, AV_LOG_ERROR, "Invalid frame format %s, " + "opencl required.\n", av_get_pix_fmt_name(frame->format)); + return AVERROR(EINVAL); + } + + image = (cl_mem)frame->data[plane]; + if (!image) { + av_log(avctx, AV_LOG_ERROR, "Plane %d required but not set.\n", + plane); + return AVERROR(EINVAL); + } + + cle = clGetMemObjectInfo(image, CL_MEM_TYPE, sizeof(type), + &type, NULL); + if (cle != CL_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to query object type of " + "plane %d: %d.\n", plane, cle); + return AVERROR_UNKNOWN; + } + if (type != CL_MEM_OBJECT_IMAGE2D) { + av_log(avctx, AV_LOG_ERROR, "Plane %d is not a 2D image.\n", + plane); + return AVERROR(EINVAL); + } + + cle = clGetImageInfo(image, CL_IMAGE_WIDTH, sizeof(size_t), + &width, NULL); + if (cle != CL_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to query plane %d width: %d.\n", + plane, cle); + return AVERROR_UNKNOWN; + } + + cle = clGetImageInfo(image, CL_IMAGE_HEIGHT, sizeof(size_t), + &height, NULL); + if (cle != CL_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to query plane %d height: %d.\n", + plane, cle); + return AVERROR_UNKNOWN; + } + + if (block_alignment) { + width = FFALIGN(width, block_alignment); + height = FFALIGN(height, block_alignment); + } + + work_size[0] = width; + work_size[1] = height; + + return 0; +} diff --git a/libavfilter/opencl.h b/libavfilter/opencl.h index 4d740c18ab..45fe2a2e27 100644 --- a/libavfilter/opencl.h +++ b/libavfilter/opencl.h @@ -84,4 +84,12 @@ int ff_opencl_filter_load_program(AVFilterContext *avctx, int ff_opencl_filter_load_program_from_file(AVFilterContext *avctx, const char *filename); +/** + * Find the work size needed needed for a given plane of an image. + */ +int ff_opencl_filter_work_size_from_image(AVFilterContext *avctx, + size_t *work_size, + AVFrame *frame, int plane, + int block_alignment); + #endif /* AVFILTER_OPENCL_H */ diff --git a/libavfilter/vf_overlay_opencl.c b/libavfilter/vf_overlay_opencl.c index ee8381dfee..16e10f4371 100644 --- a/libavfilter/vf_overlay_opencl.c +++ b/libavfilter/vf_overlay_opencl.c @@ -216,8 +216,10 @@ static int overlay_opencl_blend(FFFrameSync *fs) goto fail_kernel_arg; } - global_work[0] = output->width; - global_work[1] = output->height; + err = ff_opencl_filter_work_size_from_image(avctx, global_work, + output, plane, 0); + if (err < 0) + goto fail; cle = clEnqueueNDRangeKernel(ctx->command_queue, ctx->kernel, 2, NULL, global_work, NULL, 0, NULL, NULL); diff --git a/libavfilter/vf_program_opencl.c b/libavfilter/vf_program_opencl.c index 4ee9668236..0bcf188ac7 100644 --- a/libavfilter/vf_program_opencl.c +++ b/libavfilter/vf_program_opencl.c @@ -142,10 +142,10 @@ static int program_opencl_run(AVFilterContext *avctx) } } - cle = clGetImageInfo(dst, CL_IMAGE_WIDTH, sizeof(size_t), - &global_work[0], NULL); - cle = clGetImageInfo(dst, CL_IMAGE_HEIGHT, sizeof(size_t), - &global_work[1], NULL); + err = ff_opencl_filter_work_size_from_image(avctx, global_work, + output, plane, 0); + if (err < 0) + goto fail; av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d " "(%zux%zu).\n", plane, global_work[0], global_work[1]); diff --git a/libavfilter/vf_unsharp_opencl.c b/libavfilter/vf_unsharp_opencl.c index 6a453c014b..19c91857cb 100644 --- a/libavfilter/vf_unsharp_opencl.c +++ b/libavfilter/vf_unsharp_opencl.c @@ -320,15 +320,13 @@ static int unsharp_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input) } } - if (ctx->global) { - global_work[0] = output->width; - global_work[1] = output->height; - } else { - global_work[0] = FFALIGN(output->width, 16); - global_work[1] = FFALIGN(output->height, 16); - local_work[0] = 16; - local_work[1] = 16; - } + err = ff_opencl_filter_work_size_from_image(avctx, global_work, output, p, + ctx->global ? 0 : 16); + if (err < 0) + goto fail; + + local_work[0] = 16; + local_work[1] = 16; av_log(avctx, AV_LOG_DEBUG, "Run kernel on plane %d " "(%"SIZE_SPECIFIER"x%"SIZE_SPECIFIER").\n",