From patchwork Tue Mar 14 06:33:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 40675 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:d046:b0:cd:afd7:272c with SMTP id hv6csp2341887pzb; Mon, 13 Mar 2023 23:33:57 -0700 (PDT) X-Google-Smtp-Source: AK7set97cLNe5tAwg6mxbwkEeEvAyUSUd2jlZHBIchF2gLfr2GIDHAtFJ1i6bsNP2Wqolq6xIXdt X-Received: by 2002:a17:906:d936:b0:8f8:35c2:1357 with SMTP id rn22-20020a170906d93600b008f835c21357mr1204687ejb.23.1678775636366; Mon, 13 Mar 2023 23:33:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678775636; cv=none; d=google.com; s=arc-20160816; b=M70vlzEDlSDfj9OYYq2WgKnTjjtCrtmaF2IF4780KtsWydexMGjTzSzRVs/wphKyRs UhXukkbycbcGjUTkXc2jVIvzJgxMS4VvRE3UvYk69Dm9mBhHijJDVcl0LNoQszrdFeKP aEPlLpkUEIAOWX2wx7uVpQlQI42Al8Y2qbNN6LmCfsE3MwcvTbB2TZV80/ta+Qk4ntVN cdmMspLK9aMPSN6n1s/C1zwRiJd92VbHRcjokj6dX5bHGS4oue5Mu+yDrB9Gs4TpHsW7 n3g/wkj+Ewn5rnuT69cDRJZSn1XMr5/5gX28J3DqZXOU1EZs0xvp8BXq7KkrLkCuzQaP BTPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:message-id:to:from:date:dkim-signature:delivered-to; bh=fZXgpAStbZ3JAUCNMRTM+aG7+p1t0be7MRn6EK5gqA4=; b=oQy9K7GvzGIwbSHmEb+KMWf0/z0FIVA5AaSuSLpB/rHLXFk9MS4UYjCFhaGtYJCHQB n8y3YCZaMONAqCKkBzgqaVztDWclDr5G+dCsQ1iVTB9ZJHIZpRjG5y4th+xQh6cb0Xnd fiPHasfjt8jUfRKuqW8ur7WulaCAMmpNPB70EwFlNpSMtqlED8NE64hQzcyYjebzQUni ZwNr3Sv/4pLWcxqffUFE6zQH6i+h6SexZqCewCxc94O0cFs/aiFcvw7OKsVUp7/mgGHt vU9LzHpdzBGAX8hvNEc4Y5u1QUasPWRG6+9J7vmlv7tVf6NXQYn2wG70yB2qye7p9E33 wKUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=AHQ8NeXe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qq20-20020a17090720d400b009222a9f5c00si1797988ejb.970.2023.03.13.23.33.55; Mon, 13 Mar 2023 23:33:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=AHQ8NeXe; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BEA7368BCEB; Tue, 14 Mar 2023 08:33:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7332068BC08 for ; Tue, 14 Mar 2023 08:33:44 +0200 (EET) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 9AF60106016F for ; Tue, 14 Mar 2023 06:33:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1678775623; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=7Nip7FFxNjFPJGGG1k1qcFHhuhto3axww3ACe2bJdkk=; b=AHQ8NeXe+8civ/vf3z9YP4mUEGb7yNg8RFncIun3cTp2coJC7v+cP3rlm1PbXh68 ZUFr0z/Pg57zgjfg6CLyzQ+19szBoja3cYarOHraFk6NTwuy4z6LhtooIY8UClHu5tm XFtYBYcL7rMpmBwiQueUcXCLweaCC99yy8iDfhfeUP4cAFM4yVn+2oS3v5GQ1MGpJD1 0+6Yd5aBwSjhnZ5Ns0FnQFFFwEERsfhKdVnK1/cV8A6tlr2tZ+v85O7HHpyS9OCdSRv MpMZIywBuXcvNv6Hu4pvhctmImM45RIGPiSHOVY2RxCmCTEQCWPAT94CwFpqE5n4ihk TqW72LyZdg== Date: Tue, 14 Mar 2023 07:33:43 +0100 (CET) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 31/92] Vulkan patchset part 1 - common code changes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: po8jBxbyBM7c The attached patchset is all the common code changes that my Vulkan patchset needs. In total lines of code, this part has 425 additions and 131 deletions. Most of that is additions to HEVC parsing. Excluding them, the patchset is 200 lines of code added, which is manageable. Apart from the parser changes, the following other changes have been made to the API: AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView objects to wrap VkImage objects, which we need to free once they're no longer in use. Every other API uses the direct objects in decoding, but with Vulkan, they have to be represented by other objects. We also use it to free the slice offsets buffer. AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being stateless in theory. The decoder has to be notified of flushes in order to reset decoding slots and other data it needs, such as motion vectors and reference lists for AV1. Otherwise, inferring whether a flush has happened can be codec dependent, and hacky. hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data into objects, making updating expensive. The change allows for hardware to only upload new parameters if they have been changed. It's insignificant for H264 and AV1, but HEVC's structures can reach 114 megabytes of data that has to be uploaded, for a specially crafted input, which is enough to DDOS an ingest. The data is set and managed by the hwaccel, but does need to be synchronized between different decoding threads, which this patch performs. Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being actually threadsafe, and requiring no serialization. It does work and it does actually make a difference, on average, it can increase performance by 20% for an average B-frame using HEVC stream, depending on the number of threads and the number of decode queues. While hardware decoders are fast in general, certain vendors such as AMD can choke up while playing 8k video, and this patch can significantly help increase throughput. In context, the changes can be viewed here: https://github.com/cyanreg/FFmpeg/tree/vulkan The rest of the whole patchset is either rewrites, filter code, or the actual hardware accel code. The patchset will not be pushed standalone, but as part of the greater Vulkan patchset. 31 patches attached. From 26136684812c3e94a35dea25cb5edfe32601a518 Mon Sep 17 00:00:00 2001 From: Lynne Date: Sat, 25 Feb 2023 09:36:58 +0100 Subject: [PATCH 31/92] lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts --- libswscale/input.c | 8 ++++++++ libswscale/utils.c | 4 ++++ tests/ref/fate/sws-pixdesc-query | 26 ++++++++++++++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/libswscale/input.c b/libswscale/input.c index d5676062a2..41795c636e 100644 --- a/libswscale/input.c +++ b/libswscale/input.c @@ -1452,9 +1452,13 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) c->chrToYV12 = p010BEToUV_c; break; case AV_PIX_FMT_P012LE: + case AV_PIX_FMT_P212LE: + case AV_PIX_FMT_P412LE: c->chrToYV12 = p012LEToUV_c; break; case AV_PIX_FMT_P012BE: + case AV_PIX_FMT_P212BE: + case AV_PIX_FMT_P412BE: c->chrToYV12 = p012BEToUV_c; break; case AV_PIX_FMT_P016LE: @@ -1944,9 +1948,13 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) c->lumToYV12 = p010BEToY_c; break; case AV_PIX_FMT_P012LE: + case AV_PIX_FMT_P212LE: + case AV_PIX_FMT_P412LE: c->lumToYV12 = p012LEToY_c; break; case AV_PIX_FMT_P012BE: + case AV_PIX_FMT_P212BE: + case AV_PIX_FMT_P412BE: c->lumToYV12 = p012BEToY_c; break; case AV_PIX_FMT_GRAYF32LE: diff --git a/libswscale/utils.c b/libswscale/utils.c index 925c536bf1..a3a7a40750 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -248,8 +248,12 @@ static const FormatEntry format_entries[] = { [AV_PIX_FMT_X2BGR10LE] = { 1, 1 }, [AV_PIX_FMT_P210BE] = { 1, 1 }, [AV_PIX_FMT_P210LE] = { 1, 1 }, + [AV_PIX_FMT_P212BE] = { 1, 1 }, + [AV_PIX_FMT_P212LE] = { 1, 1 }, [AV_PIX_FMT_P410BE] = { 1, 1 }, [AV_PIX_FMT_P410LE] = { 1, 1 }, + [AV_PIX_FMT_P412BE] = { 1, 1 }, + [AV_PIX_FMT_P412LE] = { 1, 1 }, [AV_PIX_FMT_P216BE] = { 1, 1 }, [AV_PIX_FMT_P216LE] = { 1, 1 }, [AV_PIX_FMT_P416BE] = { 1, 1 }, diff --git a/tests/ref/fate/sws-pixdesc-query b/tests/ref/fate/sws-pixdesc-query index 14156a383c..fd7f2aefc0 100644 --- a/tests/ref/fate/sws-pixdesc-query +++ b/tests/ref/fate/sws-pixdesc-query @@ -67,8 +67,12 @@ isNBPS: p012le p210be p210le + p212be + p212le p410be p410le + p412be + p412le x2bgr10be x2bgr10le x2rgb10be @@ -160,8 +164,10 @@ isBE: p012be p016be p210be + p212be p216be p410be + p412be p416be rgb444be rgb48be @@ -226,10 +232,14 @@ isYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le uyvy422 @@ -338,10 +348,14 @@ isPlanarYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le yuv410p @@ -431,10 +445,14 @@ isSemiPlanarYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le @@ -853,10 +871,14 @@ Planar: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le yuv410p @@ -1029,8 +1051,12 @@ DataInHighBits: p012le p210be p210le + p212be + p212le p410be p410le + p412be + p412le xv36be xv36le xyz12be -- 2.39.2