From patchwork Wed Jan 5 02:48:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 33073 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp347125iog; Tue, 4 Jan 2022 18:48:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJxQdAdNsSpeW3ITf712SJx2sRZQr3kGsJSgA9TVcgDGS2L82fkaRwWm1a/NoykAnEkiVP3x X-Received: by 2002:a05:6402:1008:: with SMTP id c8mr51966590edu.114.1641350911987; Tue, 04 Jan 2022 18:48:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1641350911; cv=none; d=google.com; s=arc-20160816; b=TS0c8AtVh7a+AO1+3SO43ZHIM8SH4EJv+q5areXgZ5cCgBvm3fiYkNtevqIFM1xRI8 WkcZFSQFKkkwFO3ATDn+H+W7lxbxJoTHLB8k17pyiVnCVeEFTKG2MhpnVUetFKEksdx9 9ifYFGTFQ+sHrtVNPVcEZ9/nXYh0maTey3+wmZPrZcOeMWe1qznIGiGmWm40Q98fLGvI 15WmXQhaxFn8ZauOP31Kdp0uX+4cwSs566g6/YvfMnhPUzewKwn50+udfzLEiZgB/QUg x1cYQl4KRR5TOlDvS338rE0hQjNw8bFheNYDcexCWPvC0I23jTw+HLv4Fbdk/rUl40f1 46hA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=Mn/VSlWf8oIlSbGhFqBYzIVX7cHQWAkw84ObbIsgs3w=; b=T/lmtcSImdJDXv2spv3kTGf5wWLRK0sshfqSnDFWWnEpw+IQ+wPn7QiUCtayx9ou6z YiDAhtaUf0zujOKlSTrzKAIC5Wz1/ZtmHZT5yWldZeN7V7djJdyd7/DD6qnL0LVyZ7+8 kqhGCLqkKSP1LwQYYHwR7BFUsK5F58PR1RA8/7tUUJvkyBERV/jvFvco15o4dcAQphq0 RgaDYDiS9HKAf2l+Ngdsq7oVLVtUZt7xo9+dXddy/OvY+tfVBaIlfkyRN6T7JC/WPJe1 GK3Jw3xaFL66J5ixjanXDnMoqME6RUJ9iO5KNhlX07Kjk95ayNSWdCiIwr3KwphUlSUC KiJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=oKGaQ1hT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dp22si19894544ejc.266.2022.01.04.18.48.31; Tue, 04 Jan 2022 18:48:31 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=oKGaQ1hT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B13116802C3; Wed, 5 Jan 2022 04:48:28 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5B663680212 for ; Wed, 5 Jan 2022 04:48:21 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641350906; x=1672886906; h=from:to:subject:date:message-id:mime-version: content-transfer-encoding; bh=mS/dob/hj5m+BVRU3Uyd4QCLwRkqnyPC7dckiQuuViI=; b=oKGaQ1hTAPYBd8ZKAagz/RJhjyheRQQRMNMdSRDnAaTvbmOABoYmPcPT acXuwQlXGslmi4Sc9v/YuHx7uhM0RrCqZY81GqobAEyZm0zwhixDdqrSJ PskndVgrVJc9q2ndpV1SB6Rv/MW9ZnZ0tKUh1X4ZmB93BT2z2Df4t4Pgf LmlW/21Bwc6lYPzaFgrSfsDCbBt0EQ0ADvyeSLXmOGE54tWBDKp9/o+TA PPfJek/VaYZbP0UrbptS91vk1LiNVSQF68JWFAP0BwKH/ASTINrbNHltz BeRwCU+aM0QQbLRw1xIUSngYfe6AGDrr0i751Ha6NGu51k7OaksSYJJ1h Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10217"; a="229171971" X-IronPort-AV: E=Sophos;i="5.88,262,1635231600"; d="scan'208";a="229171971" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2022 18:48:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,262,1635231600"; d="scan'208";a="470375911" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga003.jf.intel.com with ESMTP; 04 Jan 2022 18:48:18 -0800 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jan 2022 10:48:08 +0800 Message-Id: <20220105024810.435597-1-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH V2 1/3] libavcodec/vaapi_encode: Add new API adaption to vaapi_encode X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: taWs7REJiPga Add vaSyncBuffer to VAAPI encoder. Old version API vaSyncSurface wait surface to complete. When surface is used for multiple operation, it waits all operations to finish. vaSyncBuffer only wait one channel to finish. Add wait param to vaapi_encode_wait() to prepare for the async_depth option. "wait=1" means wait until operation ready. "wait=0" means query operation's status. If it is ready return 0, if it is still in progress return EAGAIN. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 47 +++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index 3bf379b1a0..b87b58a42b 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -134,7 +134,8 @@ static int vaapi_encode_make_misc_param_buffer(AVCodecContext *avctx, } static int vaapi_encode_wait(AVCodecContext *avctx, - VAAPIEncodePicture *pic) + VAAPIEncodePicture *pic, + uint8_t wait) { VAAPIEncodeContext *ctx = avctx->priv_data; VAStatus vas; @@ -150,11 +151,43 @@ static int vaapi_encode_wait(AVCodecContext *avctx, "(input surface %#x).\n", pic->display_order, pic->encode_order, pic->input_surface); - vas = vaSyncSurface(ctx->hwctx->display, pic->input_surface); - if (vas != VA_STATUS_SUCCESS) { - av_log(avctx, AV_LOG_ERROR, "Failed to sync to picture completion: " - "%d (%s).\n", vas, vaErrorStr(vas)); +#if VA_CHECK_VERSION(1, 9, 0) + // Try vaSyncBuffer. + vas = vaSyncBuffer(ctx->hwctx->display, + pic->output_buffer, + wait ? VA_TIMEOUT_INFINITE : 0); + if (vas == VA_STATUS_ERROR_TIMEDOUT) { + return AVERROR(EAGAIN); + } else if (vas != VA_STATUS_SUCCESS && vas != VA_STATUS_ERROR_UNIMPLEMENTED) { + av_log(avctx, AV_LOG_ERROR, "Failed to sync to output buffer completion: " + "%d (%s).\n", vas, vaErrorStr(vas)); return AVERROR(EIO); + } else if (vas == VA_STATUS_ERROR_UNIMPLEMENTED) + // If vaSyncBuffer is not implemented, try old version API. +#endif + { + if (!wait) { + VASurfaceStatus surface_status; + vas = vaQuerySurfaceStatus(ctx->hwctx->display, + pic->input_surface, + &surface_status); + if (vas == VA_STATUS_SUCCESS && + surface_status != VASurfaceReady && + surface_status != VASurfaceSkipped) { + return AVERROR(EAGAIN); + } else if (vas != VA_STATUS_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to query surface status: " + "%d (%s).\n", vas, vaErrorStr(vas)); + return AVERROR(EIO); + } + } else { + vas = vaSyncSurface(ctx->hwctx->display, pic->input_surface); + if (vas != VA_STATUS_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to sync to picture completion: " + "%d (%s).\n", vas, vaErrorStr(vas)); + return AVERROR(EIO); + } + } } // Input is definitely finished with now. @@ -633,7 +666,7 @@ static int vaapi_encode_output(AVCodecContext *avctx, uint8_t *ptr; int err; - err = vaapi_encode_wait(avctx, pic); + err = vaapi_encode_wait(avctx, pic, 1); if (err < 0) return err; @@ -695,7 +728,7 @@ fail: static int vaapi_encode_discard(AVCodecContext *avctx, VAAPIEncodePicture *pic) { - vaapi_encode_wait(avctx, pic); + vaapi_encode_wait(avctx, pic, 1); if (pic->output_buffer_ref) { av_log(avctx, AV_LOG_DEBUG, "Discard output for pic " From patchwork Wed Jan 5 02:48:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 33074 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp347198iog; Tue, 4 Jan 2022 18:48:40 -0800 (PST) X-Google-Smtp-Source: ABdhPJx1uZhey+3r6IoSI4/E4kR2ioEqcomo61fdw41ujsuaQnVMLI5rTi58zDM6BxIeECz6RES8 X-Received: by 2002:a05:6402:35d6:: with SMTP id z22mr51124435edc.334.1641350920270; Tue, 04 Jan 2022 18:48:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1641350920; cv=none; d=google.com; s=arc-20160816; b=cT96AMwMEGftLNlU72+vkSceIQKhaBfTP1WokvbWPdly83baJfBwIeKJ8Of4I+vC9B zMfPAnmB1RC/BAtSNOXhbGarkYow5kTyH8E53SPLx0J0o7cWeBr7dAMmwtdGr91O5DMe aJ4ZY0kxnweI28vntpmOUGIP/OCNYkOxzgw3A4RcrZqdqce/Ot2qN1A6o4L4aQFjPzZu eRDYTxk3GY5NIWRbD7T+Riqji3AvLp8L+m1m2r4Yhif1MVmq8Lubu3NSoNkz1umwmOyb 3WJlQkeeGd9LaFU8PrNdk+1EtWsQjODxX+JdVz1klBjAv6LG/9LrGnCIGANg84U3DhB6 5F3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=LWsvDPoC3+sBe7LaIsdTaOKo6ExqbPpQcOso+ozqKSw=; b=FSQcCJ2GZbm8nw5Dt9RaSy2exnZV1nwS0xBJlX7nD6hu/67IerL0zaKYbuQJWxU8ic lS1lrfd/ubn/qEwFd9FEBtqPjgQOzC/CcvyQ60gw5dsAa4LDPLjIQibLgdCYMboBtTqi oIFj3w5sik8kXYdGjuucbhvxvwWo7h7HMfX1iKhtPk06lPeEOyzio0CHG1b6ybQpnevz Bik3Y4Eza8HqKfqwga2Y6/C3ryCM0jRUlhFBwI00x0VfHd03vImM0vCTrle82zOFAF5f gmaTlXI3GoeDNXTULCf3ohYynj+QwKtz2CflNaiyEC613dNVsKUkSmWAk4jmJtW58H1H k/XA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b="fSc/OWLt"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id g26si19216830edv.517.2022.01.04.18.48.40; Tue, 04 Jan 2022 18:48:40 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b="fSc/OWLt"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 970C3680B71; Wed, 5 Jan 2022 04:48:29 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 06E826802C3 for ; Wed, 5 Jan 2022 04:48:22 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641350908; x=1672886908; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=hodF1ZiwJNTJro7UmZQgZo5KYsyfkLyISUD/aIlTcKU=; b=fSc/OWLtS4rKkTcY5JpY+47Ni8pW6YobD3Di2EnLpI7nyXSMdKRwWBC4 idab/ikhakZTLnH/si5ylSS7EBLmZvNcnDejWue/NP4Qb0E4MHkL95nOA NGN2vvtbgirbuwSs301W5laUHCX73u9gRawexUs4IqW0xXZJRezWZ6nHW +K9nU51nItDxG+25dcloIWN6/gKOiP+6HWPxH09aXn54vmIIfko2dJKDt kGgoJtENU5jhpMBVC5nD5Mlx1pO7mqeanwIYMq9Tpvy0fZU6WvUU865Ub y8EiAj5DRmmkHayCpodAbY6yHIodOkrjRK5rThL57oQa5EQVVe7jfOVXr A==; X-IronPort-AV: E=McAfee;i="6200,9189,10217"; a="229171973" X-IronPort-AV: E=Sophos;i="5.88,262,1635231600"; d="scan'208";a="229171973" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2022 18:48:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,262,1635231600"; d="scan'208";a="470375913" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga003.jf.intel.com with ESMTP; 04 Jan 2022 18:48:19 -0800 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jan 2022 10:48:09 +0800 Message-Id: <20220105024810.435597-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220105024810.435597-1-wenbin.chen@intel.com> References: <20220105024810.435597-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH V2 2/3] libavcodec/vaapi_encode: Change the way to call async to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: PnzBwiIUjEGZ Fix: #7706. After commit 5fdcf85bbffe7451c2, vaapi encoder's performance decrease. The reason is that vaRenderPicture() and vaSyncBuffer() are called at the same time (vaRenderPicture() always followed by a vaSyncBuffer()). When we encode stream with B frames, we need buffer to reorder frames, so we can send serveral frames to HW at once to increase performance. Now I changed them to be called in a asynchronous way, which will make better use of hardware. 1080p transcoding increases about 17% fps on my environment. This change fits vaSyncBuffer(), so if driver does not support vaSyncBuffer, it will keep previous operation. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 64 ++++++++++++++++++++++++++++++++------- libavcodec/vaapi_encode.h | 5 +++ 2 files changed, 58 insertions(+), 11 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index b87b58a42b..9a3b3ba4ad 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -984,8 +984,10 @@ static int vaapi_encode_pick_next(AVCodecContext *avctx, if (!pic && ctx->end_of_stream) { --b_counter; pic = ctx->pic_end; - if (pic->encode_issued) + if (pic->encode_complete) return AVERROR_EOF; + else if (pic->encode_issued) + return AVERROR(EAGAIN); } if (!pic) { @@ -1210,18 +1212,45 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) return AVERROR(EAGAIN); } - pic = NULL; - err = vaapi_encode_pick_next(avctx, &pic); - if (err < 0) - return err; - av_assert0(pic); +#if VA_CHECK_VERSION(1, 9, 0) + if (ctx->has_sync_buffer_func) { + while (av_fifo_size(ctx->encode_fifo) <= + MAX_PICTURE_REFERENCES * sizeof(VAAPIEncodePicture *)) { + pic = NULL; + err = vaapi_encode_pick_next(avctx, &pic); + if (err < 0) + break; + + av_assert0(pic); + pic->encode_order = ctx->encode_order + + (av_fifo_size(ctx->encode_fifo) / sizeof(VAAPIEncodePicture *)); + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } + av_fifo_generic_write(ctx->encode_fifo, &pic, sizeof(pic), NULL); + } + if (!av_fifo_size(ctx->encode_fifo)) + return err; + av_fifo_generic_read(ctx->encode_fifo, &pic, sizeof(pic), NULL); + ctx->encode_order = pic->encode_order + 1; + } else +#endif + { + pic = NULL; + err = vaapi_encode_pick_next(avctx, &pic); + if (err < 0) + return err; + av_assert0(pic); - pic->encode_order = ctx->encode_order++; + pic->encode_order = ctx->encode_order++; - err = vaapi_encode_issue(avctx, pic); - if (err < 0) { - av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); - return err; + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } } err = vaapi_encode_output(avctx, pic, pkt); @@ -2555,6 +2584,18 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) } } +#if VA_CHECK_VERSION(1, 9, 0) + //check vaSyncBuffer function + vas = vaSyncBuffer(ctx->hwctx->display, 0, 0); + if (vas != VA_STATUS_ERROR_UNIMPLEMENTED) { + ctx->has_sync_buffer_func = 1; + ctx->encode_fifo = av_fifo_alloc((MAX_PICTURE_REFERENCES + 1) * + sizeof(VAAPIEncodePicture *)); + if (!ctx->encode_fifo) + return AVERROR(ENOMEM); + } +#endif + return 0; fail: @@ -2592,6 +2633,7 @@ av_cold int ff_vaapi_encode_close(AVCodecContext *avctx) av_freep(&ctx->codec_sequence_params); av_freep(&ctx->codec_picture_params); + av_fifo_freep(&ctx->encode_fifo); av_buffer_unref(&ctx->recon_frames_ref); av_buffer_unref(&ctx->input_frames_ref); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index b41604a883..560a1c42a9 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -29,6 +29,7 @@ #include "libavutil/hwcontext.h" #include "libavutil/hwcontext_vaapi.h" +#include "libavutil/fifo.h" #include "avcodec.h" #include "hwconfig.h" @@ -345,6 +346,10 @@ typedef struct VAAPIEncodeContext { int roi_warned; AVFrame *frame; + //Store buffered pic + AVFifoBuffer *encode_fifo; + //Whether the driver support vaSyncBuffer + int has_sync_buffer_func; } VAAPIEncodeContext; enum { From patchwork Wed Jan 5 02:48:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 33075 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp347255iog; Tue, 4 Jan 2022 18:48:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJw45pgG/itCjO47tylK3ZhJtJm3BRFlGdd0JrFTtiEyOWh3sSRIJ5mrn85ao60uAmeJbDJC X-Received: by 2002:a17:906:6a0c:: with SMTP id qw12mr39533063ejc.87.1641350928166; Tue, 04 Jan 2022 18:48:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1641350928; cv=none; d=google.com; s=arc-20160816; b=TuJIG22a0hCeB7lOKJA0v74tDBXv7n7QcxvLMxsa42VQQhPFiyMy0tdWVO3hbW7nWo 6Gl20ofPYZqKiJoCD3ENSq9UZTlz1VQeQ4A5/lELtc0kqetS9kpyEJBd1Ncwd9QSSy/O vkWJwyjv92L42HE6E7N8BXktdKZBBbQvNC1hefLrp5Cp2n2M2cA6KqMyacwLhLhmCf/0 RL4O8AtiV+zIOLlv2idYmvRT9BoFUgpBwz1Csq8tEMqGCF8lINkMta2DLZkkJEuVXZZG V5oG2jJcP0mW9xLlS29NRZc9zpHcCS4D/XG8Hsqfbnatk/i0Ne3N3OE27lSS7LdIQdzd nwCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=Zw3CNit5OCcaq/fbWEUhg7U8tHTelTX006VIV6IOyQ4=; b=k+F71n0Eexp2rPi1Aim+uqfUVvtmQBidxECHbEyixmmGEOE/u/LUgB93U4KKMy2rfX 2kt3FzyhHpcjioDt01xua5D3z1PTiRVYKwVNrOZldCcd6kxyLFSSIc6MPl73ZEIea0y9 hmNR+psHfQoXfUZTa+zHMVTg9zPVkQcG+ebcPxlAbWrrhXy2oZHOWD9KG8BVim4k2bLq BaVFRFzp2f5erg8o25L8v2p4Tn49OGxCWsRbNXVe778uxdtjAPclqs5X0bySFX2eZYJ8 hrxKPRGOjIe9G9JBxJVKn80dLYHtr6aABxnatUxiywturPaGRpzELwSWypP6MFWxUArE w6Iw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=GikieoOd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d23si19660071edq.380.2022.01.04.18.48.47; Tue, 04 Jan 2022 18:48:48 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=GikieoOd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B851E68AA6A; Wed, 5 Jan 2022 04:48:33 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 464E468A987 for ; Wed, 5 Jan 2022 04:48:27 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641350912; x=1672886912; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=+v7o1TTGuYQH4cnl/o5oTQIak51nwEwZdoD7vC1ytSQ=; b=GikieoOdxS56XPp//oR8piS4ZoP/XJZDCqfyd/vRMcLYOr9pSfDhYkeZ xQSXYiqvYT0ZehnCafryllt79lDAjqWe6aVuPi1dRTR6WTLZrvw0xY30q FlTQ7l2Pu3CLo/MuQKjmjThVraNuLmAUjxATLJgPaEKqooTQnsYlSlI7q C7hpSDw681QmSObAbjEThCFm5THdhnB1n2mD7VVMKJf99wOJEsu+0X4BG kiUXdELJMUxpWlEIQlQE2tdmPDJjwmYW+7DZGIRxVT2i2tsFXFWcQLacY P0Bfm0hrw3MF91tyditt2CQccurDXF04X2HN66Rc4YuaI9WfB27w6+8MV Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10217"; a="229171974" X-IronPort-AV: E=Sophos;i="5.88,262,1635231600"; d="scan'208";a="229171974" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2022 18:48:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,262,1635231600"; d="scan'208";a="470375917" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga003.jf.intel.com with ESMTP; 04 Jan 2022 18:48:20 -0800 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jan 2022 10:48:10 +0800 Message-Id: <20220105024810.435597-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220105024810.435597-1-wenbin.chen@intel.com> References: <20220105024810.435597-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH V2 3/3] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1LFKnOBqJjpr Add async_depth to increase encoder's performance. Reuse encode_fifo as async buffer. Encoder puts all reordered frame to HW and then check fifo size. If fifo < async_depth and the top frame is not ready, it will return AVERROR(EAGAIN) to require more frames. 1080p transcoding (no B frames) with -async_depth=4 can increase 20% performance on my environment. The async increases performance but also introduces frame delay. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 19 ++++++++++++++----- libavcodec/vaapi_encode.h | 12 ++++++++++-- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index 9a3b3ba4ad..f9ffca0475 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -1158,7 +1158,8 @@ static int vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) if (ctx->input_order == ctx->decode_delay) ctx->dts_pts_diff = pic->pts - ctx->first_pts; if (ctx->output_delay > 0) - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = pic->pts; + ctx->ts_ring[ctx->input_order % + (3 * ctx->output_delay + ctx->async_depth)] = pic->pts; pic->display_order = ctx->input_order; ++ctx->input_order; @@ -1214,8 +1215,8 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) #if VA_CHECK_VERSION(1, 9, 0) if (ctx->has_sync_buffer_func) { - while (av_fifo_size(ctx->encode_fifo) <= - MAX_PICTURE_REFERENCES * sizeof(VAAPIEncodePicture *)) { + while (av_fifo_size(ctx->encode_fifo) < + MAX_ASYNC_DEPTH * sizeof(VAAPIEncodePicture *)) { pic = NULL; err = vaapi_encode_pick_next(avctx, &pic); if (err < 0) @@ -1233,6 +1234,14 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) } if (!av_fifo_size(ctx->encode_fifo)) return err; + if (av_fifo_size(ctx->encode_fifo) < + ctx->async_depth * sizeof(VAAPIEncodePicture *) && + !ctx->end_of_stream) { + av_fifo_generic_peek(ctx->encode_fifo, &pic, sizeof(pic), NULL); + err = vaapi_encode_wait(avctx, pic, 0); + if (err < 0) + return err; + } av_fifo_generic_read(ctx->encode_fifo, &pic, sizeof(pic), NULL); ctx->encode_order = pic->encode_order + 1; } else @@ -1268,7 +1277,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) pkt->dts = ctx->ts_ring[pic->encode_order] - ctx->dts_pts_diff; } else { pkt->dts = ctx->ts_ring[(pic->encode_order - ctx->decode_delay) % - (3 * ctx->output_delay)]; + (3 * ctx->output_delay + ctx->async_depth)]; } av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" dts %"PRId64".\n", pkt->pts, pkt->dts); @@ -2589,7 +2598,7 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) vas = vaSyncBuffer(ctx->hwctx->display, 0, 0); if (vas != VA_STATUS_ERROR_UNIMPLEMENTED) { ctx->has_sync_buffer_func = 1; - ctx->encode_fifo = av_fifo_alloc((MAX_PICTURE_REFERENCES + 1) * + ctx->encode_fifo = av_fifo_alloc(MAX_ASYNC_DEPTH * sizeof(VAAPIEncodePicture *)); if (!ctx->encode_fifo) return AVERROR(ENOMEM); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index 560a1c42a9..1a5824e702 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -48,6 +48,7 @@ enum { MAX_TILE_ROWS = 22, // A.4.1: table A.6 allows at most 20 tile columns for any level. MAX_TILE_COLS = 20, + MAX_ASYNC_DEPTH = 64, }; extern const AVCodecHWConfigInternal *const ff_vaapi_encode_hw_configs[]; @@ -298,7 +299,8 @@ typedef struct VAAPIEncodeContext { // Timestamp handling. int64_t first_pts; int64_t dts_pts_diff; - int64_t ts_ring[MAX_REORDER_DELAY * 3]; + int64_t ts_ring[MAX_REORDER_DELAY * 3 + + MAX_ASYNC_DEPTH]; // Slice structure. int slice_block_rows; @@ -350,6 +352,8 @@ typedef struct VAAPIEncodeContext { AVFifoBuffer *encode_fifo; //Whether the driver support vaSyncBuffer int has_sync_buffer_func; + //Max number of frame buffered in encoder. + int async_depth; } VAAPIEncodeContext; enum { @@ -460,7 +464,11 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); { "b_depth", \ "Maximum B-frame reference depth", \ OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ - { .i64 = 1 }, 1, INT_MAX, FLAGS } + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ + { "async_depth", "Maximum processing parallelism. " \ + "Increase this to improve single channel performance", \ + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ + { .i64 = 4 }, 0, MAX_ASYNC_DEPTH, FLAGS } #define VAAPI_ENCODE_RC_MODE(name, desc) \ { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \