From patchwork Fri Feb 18 02:07:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 34376 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp526543nkx; Thu, 17 Feb 2022 18:08:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJw8jZjPk6zIgcM96Bz9v84u4XGTS/8zXxWToI+yzPJ2LNw23QZBzvyXV//A2GY62HiqUE5w X-Received: by 2002:a05:6402:d0d:b0:410:8411:216 with SMTP id eb13-20020a0564020d0d00b0041084110216mr5644783edb.316.1645150102774; Thu, 17 Feb 2022 18:08:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645150102; cv=none; d=google.com; s=arc-20160816; b=Rn/ErzleUdDx743ogpr+eSfZ5sIU1Hf4oLm0kyD0ibmlr+/P6QhLE86U4JYqRIw9+K NdadmUiAhs70/BhXmCfd/nh6cs+9FSSmp6Hm6TnHSWtcqzP5VL3itMcovpe0qakDx1ka El3AfnWJC8Pd9MMo80bHPjpdaHeocMxO9i1/sgywFR3gyhU7WOkXS9u3yBFrBcPOTLTk Rsn0F5cTDOR2nLlQzR3lhpzcP0Us6xw4dDQPY0tz8ekglJ5NokaVF80IK1g4icRf0rn0 RNLD+0dNdVril6aPWXp+2uuL67z3Wd5xAicSV23xinklp20FUhOZs58H7+2ixhaR3pwM huRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=0uVavLmpdG1UDlrrlmQlOnZpRe9o7KeQu7Si7iRNGkY=; b=wWn1u5a/yERrfLFQn4cYxaZifWGLYYLehWtKP2cGu/XsHiOiSD8XTMktJPMiUPA8Ec d6xdgpHqbBoGn0+V5tsCxVa6Qh5Im+DYQp8/GZ663yQ0eKyMr2u0szu9AjxBh60gLF0h asIDGWVAQacIEJeDh+4eThwH84rkNy+MRhTEqe+/i2sOfi8ITW9v/k+baX/Zc/HrJgVr pgS1bJAqKuUZHO4kI7YN49ex8d/0+XBcvx4AtEs/peAhJJh8RbGx7BV+1Lr3fxTz+AmJ wZbT4429ZAFla3O2n4GwMA1bbyQ8lDXBfEokSaL+UwBfjTcg8zS16iUgajcj69YevDFK svcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=FwJLvJDr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id kx3si2814365ejc.130.2022.02.17.18.08.22; Thu, 17 Feb 2022 18:08:22 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=FwJLvJDr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9C47768B37B; Fri, 18 Feb 2022 04:08:15 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 57D6068B0E9 for ; Fri, 18 Feb 2022 04:08:08 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645150093; x=1676686093; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=nyxQ2OhjvSI63sP1T3qBRvYVocRMyU3QRsluyYMehy8=; b=FwJLvJDrrv/7dsyK5Plqa+ah/J+PWcE1ZiaDFUVd3xj77vAyzIhe7O5p i4RBgCFK4goLfkn51dl1U861SMxwp2gMpn8sBCDtpIquXnUZrwGsqWquf r/WAin4hhQ6fdpzT3poyYiihMNNLeKXIMK7sLxlUrNTgLN8OrB+N1jzL4 GLRcI3Z9M6e7mkmzMS0vJ52XUcaVqbeUoafsc1vg4ccf7AEQO5QuILi3s i9CNRj3ig1hBU61WiQQZsTKjBe1cx3QAieZX199AuieLGT4JN9bC6Ax9L RZj6LtE6GTMjSKZApRFFlJX/mS7Vpqx43ZjQXCDdw0TY5MUaVwqmch7b2 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10261"; a="231668156" X-IronPort-AV: E=Sophos;i="5.88,377,1635231600"; d="scan'208";a="231668156" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Feb 2022 18:08:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,377,1635231600"; d="scan'208";a="572108078" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga001.jf.intel.com with ESMTP; 17 Feb 2022 18:07:59 -0800 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Fri, 18 Feb 2022 10:07:57 +0800 Message-Id: <20220218020757.834409-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220218020757.834409-1-wenbin.chen@intel.com> References: <20220218020757.834409-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 2/2] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: n/WGga2xSKKx From: Wenbin Chen Fix: #7706. After commit 5fdcf85bbffe7451c2, vaapi encoder's performance decrease. The reason is that vaRenderPicture() and vaSyncBuffer() are called at the same time (vaRenderPicture() always followed by a vaSyncBuffer()). Now I changed them to be called in a asynchronous way, which will make better use of hardware. Async_depth is added to increase encoder's performance. The frames that are sent to hardware are stored in a fifo. Encoder will sync output after async fifo is full. Signed-off-by: Wenbin Chen Signed-off-by: Haihao Xiang --- doc/encoders.texi | 6 ++++ libavcodec/vaapi_encode.c | 65 +++++++++++++++++++++++++++++++-------- libavcodec/vaapi_encode.h | 16 ++++++++-- 3 files changed, 72 insertions(+), 15 deletions(-) diff --git a/doc/encoders.texi b/doc/encoders.texi index bfb6c7eef6..6bac2b7f28 100644 --- a/doc/encoders.texi +++ b/doc/encoders.texi @@ -3600,6 +3600,12 @@ will refer only to P- or I-frames. When set to greater values multiple layers of B-frames will be present, frames in each layer only referring to frames in higher layers. +@item async_depth +Maximum processing parallelism. Increase this to improve single channel +performance. This option doesn't work if driver doesn't implement vaSyncBuffer +function. Please make sure there are enough hw_frames allocated if a large +number of async_depth is used. + @item rc_mode Set the rate control mode to use. A given driver may only support a subset of modes. diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index 335a8e450a..8c6e881702 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -965,8 +965,10 @@ static int vaapi_encode_pick_next(AVCodecContext *avctx, if (!pic && ctx->end_of_stream) { --b_counter; pic = ctx->pic_end; - if (pic->encode_issued) + if (pic->encode_complete) return AVERROR_EOF; + else if (pic->encode_issued) + return AVERROR(EAGAIN); } if (!pic) { @@ -1137,7 +1139,8 @@ static int vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) if (ctx->input_order == ctx->decode_delay) ctx->dts_pts_diff = pic->pts - ctx->first_pts; if (ctx->output_delay > 0) - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = pic->pts; + ctx->ts_ring[ctx->input_order % + (3 * ctx->output_delay + ctx->async_depth)] = pic->pts; pic->display_order = ctx->input_order; ++ctx->input_order; @@ -1191,18 +1194,47 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) return AVERROR(EAGAIN); } - pic = NULL; - err = vaapi_encode_pick_next(avctx, &pic); - if (err < 0) - return err; - av_assert0(pic); + if (ctx->has_sync_buffer_func) { + pic = NULL; + + if (av_fifo_can_write(ctx->encode_fifo)) { + err = vaapi_encode_pick_next(avctx, &pic); + if (!err) { + av_assert0(pic); + pic->encode_order = ctx->encode_order + + av_fifo_can_read(ctx->encode_fifo); + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } + av_fifo_write(ctx->encode_fifo, &pic, 1); + } + } - pic->encode_order = ctx->encode_order++; + if (!av_fifo_can_read(ctx->encode_fifo)) + return err; - err = vaapi_encode_issue(avctx, pic); - if (err < 0) { - av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); - return err; + // More frames can be buffered + if (av_fifo_can_write(ctx->encode_fifo) && !ctx->end_of_stream) + return AVERROR(EAGAIN); + + av_fifo_read(ctx->encode_fifo, &pic, 1); + ctx->encode_order = pic->encode_order + 1; + } else { + pic = NULL; + err = vaapi_encode_pick_next(avctx, &pic); + if (err < 0) + return err; + av_assert0(pic); + + pic->encode_order = ctx->encode_order++; + + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } } err = vaapi_encode_output(avctx, pic, pkt); @@ -1220,7 +1252,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) pkt->dts = ctx->ts_ring[pic->encode_order] - ctx->dts_pts_diff; } else { pkt->dts = ctx->ts_ring[(pic->encode_order - ctx->decode_delay) % - (3 * ctx->output_delay)]; + (3 * ctx->output_delay + ctx->async_depth)]; } av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" dts %"PRId64".\n", pkt->pts, pkt->dts); @@ -2541,6 +2573,12 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) vas = vaSyncBuffer(ctx->hwctx->display, VA_INVALID_ID, 0); if (vas != VA_STATUS_ERROR_UNIMPLEMENTED) { ctx->has_sync_buffer_func = 1; + ctx->encode_fifo = av_fifo_alloc2(ctx->async_depth, + sizeof(VAAPIEncodePicture *), + 0); + if (!ctx->encode_fifo) + return AVERROR(ENOMEM); + } #endif return 0; @@ -2580,6 +2618,7 @@ av_cold int ff_vaapi_encode_close(AVCodecContext *avctx) av_freep(&ctx->codec_sequence_params); av_freep(&ctx->codec_picture_params); + av_fifo_freep2(&ctx->encode_fifo); av_buffer_unref(&ctx->recon_frames_ref); av_buffer_unref(&ctx->input_frames_ref); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index 29d9e9b91c..1b40819c69 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -29,6 +29,7 @@ #include "libavutil/hwcontext.h" #include "libavutil/hwcontext_vaapi.h" +#include "libavutil/fifo.h" #include "avcodec.h" #include "hwconfig.h" @@ -47,6 +48,7 @@ enum { MAX_TILE_ROWS = 22, // A.4.1: table A.6 allows at most 20 tile columns for any level. MAX_TILE_COLS = 20, + MAX_ASYNC_DEPTH = 64, }; extern const AVCodecHWConfigInternal *const ff_vaapi_encode_hw_configs[]; @@ -297,7 +299,8 @@ typedef struct VAAPIEncodeContext { // Timestamp handling. int64_t first_pts; int64_t dts_pts_diff; - int64_t ts_ring[MAX_REORDER_DELAY * 3]; + int64_t ts_ring[MAX_REORDER_DELAY * 3 + + MAX_ASYNC_DEPTH]; // Slice structure. int slice_block_rows; @@ -348,6 +351,10 @@ typedef struct VAAPIEncodeContext { // Whether the driver support vaSyncBuffer int has_sync_buffer_func; + // Store buffered pic + AVFifo *encode_fifo; + // Max number of frame buffered in encoder. + int async_depth; } VAAPIEncodeContext; enum { @@ -458,7 +465,12 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); { "b_depth", \ "Maximum B-frame reference depth", \ OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ - { .i64 = 1 }, 1, INT_MAX, FLAGS } + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ + { "async_depth", "Maximum processing parallelism. " \ + "Increase this to improve single channel performance. This option " \ + "doesn't work if driver doesn't implement vaSyncBuffer function.", \ + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ + { .i64 = 2 }, 1, MAX_ASYNC_DEPTH, FLAGS } #define VAAPI_ENCODE_RC_MODE(name, desc) \ { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \