From patchwork Tue Nov 7 07:32:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 44544 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:7aa7:b0:181:818d:5e7f with SMTP id u39csp145453pzh; Mon, 6 Nov 2023 23:32:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IE+3rlzFgVmeLfT8fwAhLaxEJ8kxS7ZdNywYEip4R2kYkIP1wTBQEWii4xUxoOTVjYgdQ7M X-Received: by 2002:a17:907:6d03:b0:9ad:8a9e:23ee with SMTP id sa3-20020a1709076d0300b009ad8a9e23eemr1516331ejc.13.1699342352406; Mon, 06 Nov 2023 23:32:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699342352; cv=none; d=google.com; s=arc-20160816; b=ipRiFqAMiT8K7Z1EvGzHv2307OnnsqF6JGHgmyEMv/6dZ97+E2SNgyiFouUh6WnjqS 3CMSi1VdjUEazExFb5yQYum0llpjR4H4qmpXIMynmew4YBrBHnqCx89Gyf4GQNSOG8cP KAXFvh+gMfX4xBSLaPLs8PJprhNA4/c3S1Tba3BMOlmnhqtaVs/jRP1bdLC7wDSYhXZn Q7yn9yZ0VWPPui93RHGkAxMK/S5iKX3PlYVV/tdHUECy8kI5AZuO3LYQAvhCz4BUpAdN iYu1M2fGGx6sOwAKN3YNc6op4nbIdACvSzijZ7lPuPddlnzkYszFdorLAYe7PIpg66tJ fGpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:message-id:to:from:date:dkim-signature:delivered-to; bh=3/hnZLofyYN5eyYD+d8ChrZ99XiJ9salnHCdBYwF6+s=; fh=Q46kXK7oI5D1Jhi90JBr53c7NIaTxGaU4KPeRZyM/hI=; b=US+wUouSgTRITpKk8qc0aryvcHwGbxSAEqY3AVlhwkpa4NeBS1y2XUCXRbJrqwFP8F S+7foAkoFyx6el2Xd4+FEgL22Bc6Ok6p9nFGVwxENr5xY55AkBInNSc52ZPbVvCb2YVa vNgiLrkwg0oOW6asht0lLRgPxDKrDZzHvMc28//6ICx/uCakKaCTr4V/RrFsJUBIF4Ya ytbMken9yxi2EovDrRvqjH7vRyr6riAbrxzsjvGLjkJBYhKw3tQCVln1YlB7NJ43FwT+ QjBOfrM8YQlZSGC8cjCA12BLWF8Uho3UeJ7xKeet1JgAJMbljzZAdwwIeDP6y0eWJPHS jWEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=eZ265ob5; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dp18-20020a170906c15200b009d40abfba61si776634ejc.634.2023.11.06.23.32.31; Mon, 06 Nov 2023 23:32:32 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=eZ265ob5; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8507468CAF8; Tue, 7 Nov 2023 09:32:27 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id F3BDD68C402 for ; Tue, 7 Nov 2023 09:32:20 +0200 (EET) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 0FCC110602EE for ; Tue, 7 Nov 2023 07:32:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1699342339; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=MH/cYD6zlG9o37R0kImPavHOk+8Kyiyc0FlxDe3Eo8o=; b=eZ265ob5a0v/bYh1lRyBDaBR9FPn1CCb8YZ5WUUxDSlOiAA+ohLEARUUnCGlhVi2 mI9S/7G/diAdOfTFbvk6zbPX6AFze1NNU7roWL/jpNBgW6pdwKofrg65Uj/B07+KXht yt1A6p74lNtNMPlp7imqiG0R8W8Ia4R3HFr3hSpT4KY0LXSDeY5i8HH2zc0QLGJP/m0 rm4ZjE1QjTGzjj2DuCYKIoD25I8+LkWtDDJqq2cebJiMI4GSeW0hyeIZACCbHkmIK9c 5YI+oKT0wCBdYYiHpGNK13Zf8nPmAItJPmzUz+k8szXppzSi6CDrnChYaTeLf/y3QyT 48NfUbup6A== Date: Tue, 7 Nov 2023 08:32:19 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] nlmeans_vulkan: fix offsets calculation and various stride issues X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: iJ92AgNhysRL We calculated offsets as pairs, but addressed them in the shader as single float values, while reading them as ivec2s. Also removes unused code (was provisionally added if cooperative matrices could be used, but that turned out to be impossible). Patch attached. From 0c3923af2150d036e50708040758c210f6bd6ade Mon Sep 17 00:00:00 2001 From: Lynne Date: Tue, 7 Nov 2023 07:27:30 +0000 Subject: [PATCH] nlmeans_vulkan: fix offsets calculation and various stride issues We calculated offsets as pairs, but addressed them in the shader as single float values, while reading them as ivec2s. Also removes unused code (was provisionally added if cooperative matrices could be used, but that turned out to be impossible). --- libavfilter/vf_nlmeans_vulkan.c | 78 +++++++++++++-------------------- 1 file changed, 31 insertions(+), 47 deletions(-) diff --git a/libavfilter/vf_nlmeans_vulkan.c b/libavfilter/vf_nlmeans_vulkan.c index 2b8f97d7d9..fac38d16f4 100644 --- a/libavfilter/vf_nlmeans_vulkan.c +++ b/libavfilter/vf_nlmeans_vulkan.c @@ -94,7 +94,7 @@ static void insert_horizontal_pass(FFVkSPIRVShader *shd, int nb_rows, int first, GLSLC(2, #pragma unroll(1) ); GLSLF(2, for (r = 0; r < %i; r++) { ,nb_rows); GLSLC(3, prefix_sum = DTYPE(0); ); - GLSLC(3, offset = uint64_t(int_stride)*(pos.y + r)*T_ALIGN; ); + GLSLC(3, offset = int_stride * uint64_t(pos.y + r); ); GLSLC(3, dst = DataBuffer(uint64_t(integral_data) + offset); ); GLSLC(0, ); GLSLF(3, for (pos.x = 0; pos.x < width[%i]; pos.x++) { ,plane); @@ -122,7 +122,7 @@ static void insert_vertical_pass(FFVkSPIRVShader *shd, int nb_rows, int first, i GLSLC(0, ); GLSLF(1, if (pos.x < width[%i]) { ,plane); GLSLF(2, for (pos.y = 0; pos.y < height[%i]; pos.y++) { ,plane); - GLSLC(3, offset = uint64_t(int_stride)*pos.y*T_ALIGN; ); + GLSLC(3, offset = int_stride * uint64_t(pos.y); ); GLSLC(3, dst = DataBuffer(uint64_t(integral_data) + offset); ); GLSLC(0, ); GLSLC(3, #pragma unroll(1) ); @@ -167,40 +167,26 @@ static void insert_weights_pass(FFVkSPIRVShader *shd, int nb_rows, int vert, GLSLC(0, ); GLSLC(3, lt = ((pos.x - p) < 0) || ((pos.y - p) < 0); ); GLSLC(0, ); - if (TYPE_ELEMS == 4) { - GLSLF(3, src[0] = texture(input_img[%i], pos + offs[0])[%i]; ,plane, comp); - GLSLF(3, src[1] = texture(input_img[%i], pos + offs[1])[%i]; ,plane, comp); - GLSLF(3, src[2] = texture(input_img[%i], pos + offs[2])[%i]; ,plane, comp); - GLSLF(3, src[3] = texture(input_img[%i], pos + offs[3])[%i]; ,plane, comp); - } else { - for (int i = 0; i < 16; i++) - GLSLF(3, src[%i][%i] = texture(input_img[%i], pos + offs[%i])[%i]; - ,i / 4, i % 4, plane, i, comp); - - } + GLSLF(3, src[0] = texture(input_img[%i], pos + offs[0])[%i]; ,plane, comp); + GLSLF(3, src[1] = texture(input_img[%i], pos + offs[1])[%i]; ,plane, comp); + GLSLF(3, src[2] = texture(input_img[%i], pos + offs[2])[%i]; ,plane, comp); + GLSLF(3, src[3] = texture(input_img[%i], pos + offs[3])[%i]; ,plane, comp); GLSLC(0, ); GLSLC(3, if (lt == false) { ); - GLSLC(4, a = integral_data.v[(pos.y - p)*int_stride + pos.x - p]; ); - GLSLC(4, c = integral_data.v[(pos.y - p)*int_stride + pos.x + p]; ); - GLSLC(4, b = integral_data.v[(pos.y + p)*int_stride + pos.x - p]; ); - GLSLC(4, d = integral_data.v[(pos.y + p)*int_stride + pos.x + p]; ); + GLSLC(3, offset = int_stride * uint64_t(pos.y - p); ); + GLSLC(3, dst = DataBuffer(uint64_t(integral_data) + offset); ); + GLSLC(4, a = dst.v[pos.x - p]; ); + GLSLC(4, c = dst.v[pos.x + p]; ); + GLSLC(3, offset = int_stride * uint64_t(pos.y + p); ); + GLSLC(3, dst = DataBuffer(uint64_t(integral_data) + offset); ); + GLSLC(4, b = dst.v[pos.x - p]; ); + GLSLC(4, d = dst.v[pos.x + p]; ); GLSLC(3, } ); GLSLC(0, ); GLSLC(3, patch_diff = d + a - b - c; ); - if (TYPE_ELEMS == 4) { - GLSLF(3, w = exp(patch_diff * strength[%i]); ,dst_comp); - GLSLC(3, w_sum = w[0] + w[1] + w[2] + w[3]; ); - GLSLC(3, sum = dot(w, src*255); ); - } else { - for (int i = 0; i < 4; i++) - GLSLF(3, w[%i] = exp(patch_diff[%i] * strength[%i]); ,i,i,dst_comp); - for (int i = 0; i < 4; i++) - GLSLF(3, w_sum %s w[%i][0] + w[%i][1] + w[%i][2] + w[%i][3]; - ,!i ? "=" : "+=", i, i, i, i); - for (int i = 0; i < 4; i++) - GLSLF(3, sum %s dot(w[%i], src[%i]*255); - ,!i ? "=" : "+=", i, i); - } + GLSLF(3, w = exp(patch_diff * strength[%i]); ,dst_comp); + GLSLC(3, w_sum = w[0] + w[1] + w[2] + w[3]; ); + GLSLC(3, sum = dot(w, src*255); ); GLSLC(0, ); if (t > 1) { GLSLF(3, atomicAdd(weights_%i[pos.y*ws_stride[%i] + pos.x], w_sum); ,dst_comp, dst_comp); @@ -220,8 +206,8 @@ typedef struct HorizontalPushData { int32_t patch_size[4]; float strength[4]; VkDeviceAddress integral_base; - uint32_t integral_size; - uint32_t int_stride; + uint64_t integral_size; + uint64_t int_stride; uint32_t xyoffs_start; } HorizontalPushData; @@ -275,8 +261,8 @@ static av_cold int init_weights_pipeline(FFVulkanContext *vkctx, FFVkExecPool *e GLSLC(1, ivec4 patch_size; ); GLSLC(1, vec4 strength; ); GLSLC(1, DataBuffer integral_base; ); - GLSLC(1, uint integral_size; ); - GLSLC(1, uint int_stride; ); + GLSLC(1, uint64_t integral_size; ); + GLSLC(1, uint64_t int_stride; ); GLSLC(1, uint xyoffs_start; ); GLSLC(0, }; ); GLSLC(0, ); @@ -371,13 +357,11 @@ static av_cold int init_weights_pipeline(FFVulkanContext *vkctx, FFVkExecPool *e GLSLF(1, ivec2 offs[%i]; ,TYPE_ELEMS); GLSLC(0, ); GLSLC(1, int invoc_idx = int(gl_WorkGroupID.z); ); - - GLSLC(1, offset = uint64_t(integral_size)*invoc_idx; ); - GLSLC(1, dst = DataBuffer(uint64_t(integral_data) + offset); ); - + GLSLC(0, ); + GLSLC(1, offset = integral_size * invoc_idx; ); GLSLC(1, integral_data = DataBuffer(uint64_t(integral_base) + offset); ); - for (int i = 0; i < TYPE_ELEMS*2; i += 2) - GLSLF(1, offs[%i] = xyoffsets[xyoffs_start + 2*%i*invoc_idx + %i]; ,i/2,TYPE_ELEMS,i); + for (int i = 0; i < TYPE_ELEMS; i++) + GLSLF(1, offs[%i] = xyoffsets[xyoffs_start + %i*invoc_idx + %i]; ,i,TYPE_ELEMS,i); GLSLC(0, ); GLSLC(1, DTYPE a; ); GLSLC(1, DTYPE b; ); @@ -759,7 +743,7 @@ static int nlmeans_vulkan_filter_frame(AVFilterLink *link, AVFrame *in) /* Integral */ AVBufferRef *integral_buf = NULL; FFVkBuffer *integral_vk; - uint32_t int_stride; + size_t int_stride; size_t int_size; /* Weights/sums */ @@ -787,8 +771,8 @@ static int nlmeans_vulkan_filter_frame(AVFilterLink *link, AVFrame *in) return AVERROR(EINVAL); /* Integral image */ - int_stride = s->pl_weights.wg_size[0]*s->pl_weights_rows; - int_size = int_stride * int_stride * TYPE_SIZE; + int_stride = s->pl_weights.wg_size[0]*s->pl_weights_rows*TYPE_SIZE; + int_size = s->pl_weights.wg_size[0]*s->pl_weights_rows*int_stride; /* Plane dimensions */ for (int i = 0; i < desc->nb_components; i++) { @@ -982,9 +966,9 @@ static int nlmeans_vulkan_filter_frame(AVFilterLink *link, AVFrame *in) { s->patch[0], s->patch[1], s->patch[2], s->patch[3] }, { s->strength[0], s->strength[1], s->strength[2], s->strength[2], }, integral_vk->address, - int_size, - int_stride, - offsets_dispatched * 2, + (uint64_t)int_size, + (uint64_t)int_stride, + offsets_dispatched, }; if (offsets_dispatched) { -- 2.40.1