From patchwork Sun Apr 14 12:49:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 48050 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:670b:b0:1a9:af23:56c1 with SMTP id wh11csp1273287pzb; Sun, 14 Apr 2024 05:49:28 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWi5/Us+VqzPj+JYvwbvbvtpAzkPjpAlBEYgTl4FdHvmN1+YlaPmUzzAk8wCgORgQj5gRaSqVcCB5gDeL9gk8UQHMTk2BiHnRJa6g== X-Google-Smtp-Source: AGHT+IHXkEH96iOdk61jB1/2IeIJ99bAr8ApP1dKYki8x9iaodqc+WLzBCObjUJ61bD9KZMKhPgg X-Received: by 2002:a50:bb07:0:b0:56e:42e0:e53c with SMTP id y7-20020a50bb07000000b0056e42e0e53cmr5919485ede.34.1713098968087; Sun, 14 Apr 2024 05:49:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713098968; cv=none; d=google.com; s=arc-20160816; b=QE3uQNDVL/5TJXF22YhHC+kh2pMayxzC7/DwMJ6VuZh/LNyN/EpjgsE0frW5fC0e60 sudXHfCfIhZVSy+6ZzDT+ylW1vEufOV4c+m0bjbrKAkS4CA/ILyCkUv1ZmY8cOggaVJI BVQERJbTzmTOaEoCqQ0q71kpuxujpoahg834F5nFAwDSXHkety/gxnp5QVivxZ/4nuAC ajIHSZWaEg1jsd1ennZtkpXVXpmEHqV6YVPMGcRl+9I/+Sn5XYwx7Wi6ENyLtvoj6YwA qFImiTDr/apwGA7/FA0Nf0kML4wPo413IA3BTcmcwoDzQKnCrt4IdqO2zGRtrwSYSCLS s+eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:message-id:to:from:date:dkim-signature:delivered-to; bh=pdfmOvclorF3+k0AfyZNfZbAs3hO+eLstiX8UqKPR0Y=; fh=Q46kXK7oI5D1Jhi90JBr53c7NIaTxGaU4KPeRZyM/hI=; b=UhiVoAeQ57VqmGe6EBCMqug6XeZAZ7qMRsmctqOcCUxgTrJTwOr9KnYdEQG7lc2st+ chh0p1xpYLt8pzD5TjTEamu53RHs+Xm8A1YxPRoqcx2GwjX0EuuIZNnHCc6eyOxzAnVE ki0AJ+V2Q50YnLCRGALypWnTpOTTFpbfkiWzbY90oJYfoAANpXhJDKfu7cVOeVL4S6Ru 49AnP4YhT2LXRAVheSrOiNKRTAc4dhVUHT1lYkwxmBUsX0BY+TR2nYfIOPp0lhRTIWVv aE1hLiqF5COilC41X/6oDI5br3+59cLFjGipYDqcGGp1zXwFRdJo0BgB+LPMdh+nzGgY 5Ppg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=oeeebZ15; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id bl8-20020a056402210800b00570102e5381si1526117edb.172.2024.04.14.05.49.27; Sun, 14 Apr 2024 05:49:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=oeeebZ15; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B3F4368D321; Sun, 14 Apr 2024 15:49:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 51E7668D2E4 for ; Sun, 14 Apr 2024 15:49:15 +0300 (EEST) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 6CFCE1060136 for ; Sun, 14 Apr 2024 12:49:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1713098954; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=pc5Akq0y2Ms2y3iHpypKhajA4Qw0IJFXRZCGvQsfjtY=; b=oeeebZ15yv3IvtD26fQ7NQETCR8/39C1g05Xgg0WSoqmyLxbw/M+U00wfmTclHRg f782b/8hF5vaJu2KEYgEqv8GSImz69exqjdCmFLFyu3MFh9wAMbzML9VT76u/XcFrH7 EE6LZNYqbzLPmh4WpleZqGIGp4E8adcskFkVd0DpFHCuDNSS7RxfRMliswe4QhrMYtX 5vvJKaLll0gR/8ozZOjT6sy2G/ottjHqiQHQWc/cv24iYxr4cYdYGsXEgA5qRn6OFKe NSrg3VBt2ljaQB91kLi0nw1pddxhleuaNN+DYbUiYd9Rtvyh87Sj/gJ/ju5J440bCJu woLOntk4YQ== Date: Sun, 14 Apr 2024 14:49:14 +0200 (CEST) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] vulkan_av1: add workaround for NVIDIA drivers tested on broken CTS X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 2iirGWvIxska The first release of the CTS for AV1 decoding had incorrect offsets for the OrderHints values. The CTS will be fixed, and eventually, the drivers will be updated to the proper spec-conforming behaviour, but we still need to add a workaround as this will take months. Only NVIDIA use these values at all, so limit the workaround to only NVIDIA. Meant to be applied on top of jkqxz's previous 2 patches. Patch attached. From bbd2cc90206e59098accced3ff3a8896e0bfa269 Mon Sep 17 00:00:00 2001 From: Lynne Date: Sun, 14 Apr 2024 14:41:26 +0200 Subject: [PATCH] vulkan_av1: add workaround for NVIDIA drivers tested on broken CTS The first release of the CTS for AV1 decoding had incorrect offsets for the OrderHints values. The CTS will be fixed, and eventually, the drivers will be updated to the proper spec-conforming behaviour, but we still need to add a workaround as this will take months. Only NVIDIA use these values at all, so limit the workaround to only NVIDIA. --- libavcodec/vulkan_av1.c | 20 +++++++++++++++----- libavcodec/vulkan_decode.c | 8 ++++++++ libavcodec/vulkan_decode.h | 4 ++++ 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/libavcodec/vulkan_av1.c b/libavcodec/vulkan_av1.c index 8d532445a1..ff663c8cec 100644 --- a/libavcodec/vulkan_av1.c +++ b/libavcodec/vulkan_av1.c @@ -97,9 +97,15 @@ static int vk_av1_fill_pict(AVCodecContext *avctx, const AV1Frame **ref_src, .RefFrameSignBias = hp->ref_frame_sign_bias_mask, }; - if (saved_order_hints) - for (int i = 0; i < AV1_TOTAL_REFS_PER_FRAME; i++) - vkav1_std_ref->SavedOrderHints[i] = saved_order_hints[i]; + if (saved_order_hints) { + if (dec->quirk_av1_offset) { + for (int i = 1; i < AV1_TOTAL_REFS_PER_FRAME; i++) + vkav1_std_ref->SavedOrderHints[i - 1] = saved_order_hints[i]; + } else { + for (int i = 0; i < AV1_TOTAL_REFS_PER_FRAME; i++) + vkav1_std_ref->SavedOrderHints[i] = saved_order_hints[i]; + } + } *vkav1_ref = (VkVideoDecodeAV1DpbSlotInfoKHR) { .sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_AV1_DPB_SLOT_INFO_KHR, @@ -490,8 +496,12 @@ static int vk_av1_start_frame(AVCodecContext *avctx, } } - for (int i = 0; i < STD_VIDEO_AV1_NUM_REF_FRAMES; i++) { - ap->std_pic_info.OrderHints[i] = pic->order_hints[i]; + if (dec->quirk_av1_offset) { + for (int i = 1; i < STD_VIDEO_AV1_NUM_REF_FRAMES; i++) + ap->std_pic_info.OrderHints[i - 1] = pic->order_hints[i]; + } else { + for (int i = 0; i < STD_VIDEO_AV1_NUM_REF_FRAMES; i++) + ap->std_pic_info.OrderHints[i] = pic->order_hints[i]; } for (int i = 0; i < STD_VIDEO_AV1_TOTAL_REFS_PER_FRAME; i++) diff --git a/libavcodec/vulkan_decode.c b/libavcodec/vulkan_decode.c index 9c6c2d4efb..5cb328d8ca 100644 --- a/libavcodec/vulkan_decode.c +++ b/libavcodec/vulkan_decode.c @@ -1115,6 +1115,7 @@ int ff_vk_decode_init(AVCodecContext *avctx) FFVulkanFunctions *vk; const VkVideoProfileInfoKHR *profile; const FFVulkanDecodeDescriptor *vk_desc; + const VkPhysicalDeviceDriverProperties *driver_props; VkVideoDecodeH264SessionParametersCreateInfoKHR h264_params = { .sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_H264_SESSION_PARAMETERS_CREATE_INFO_KHR, @@ -1276,6 +1277,13 @@ int ff_vk_decode_init(AVCodecContext *avctx) return AVERROR_EXTERNAL; } + driver_props = &dec->shared_ctx->s.driver_props; + if (driver_props->driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY && + driver_props->conformanceVersion.major == 1 && + driver_props->conformanceVersion.minor == 3 && + driver_props->conformanceVersion.subminor == 8) + dec->quirk_av1_offset = 1; + ff_vk_decode_flush(avctx); av_log(avctx, AV_LOG_VERBOSE, "Vulkan decoder initialization sucessful\n"); diff --git a/libavcodec/vulkan_decode.h b/libavcodec/vulkan_decode.h index 7ba8b239cb..076af93499 100644 --- a/libavcodec/vulkan_decode.h +++ b/libavcodec/vulkan_decode.h @@ -72,6 +72,10 @@ typedef struct FFVulkanDecodeContext { int external_fg; /* Oddity #2 - hardware can't apply film grain */ uint32_t frame_id_alloc_mask; /* For AV1 only */ + /* Workaround for NVIDIA drivers tested with CTS version 1.3.8 for AV1. + * The tests were incorrect as the OrderHints were offset by 1. */ + int quirk_av1_offset; + /* Thread-local state below */ struct HEVCHeaderSet *hevc_headers; size_t hevc_headers_size; -- 2.43.0.381.gb435a96ce8