From patchwork Fri Feb 18 03:07:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 34378 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp559117nkx; Thu, 17 Feb 2022 19:08:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJzFAYnYJqLmS15bO1Gl0kQIq0vh8+Grp4Sw8j3QxCAJ8eUbHso5tLSfZz4zcSzf3gDSLGAN X-Received: by 2002:aa7:dd9a:0:b0:410:b875:ab95 with SMTP id g26-20020aa7dd9a000000b00410b875ab95mr5777107edv.248.1645153692583; Thu, 17 Feb 2022 19:08:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645153692; cv=none; d=google.com; s=arc-20160816; b=M0UR8UsUv8PJcUzWvFk4SJqigAk/cufF+7b/L4A3T3iSmal/DW0HsxN0TUl8jqh4X6 /mcuGyBwGORk9da9lAGprpRfqrM4fKdKlDvYlhWYnrNO420ZR88o+HCEdMVApmy8tBf+ NAulV8rmirajXvQOilqwVvbJQhV92EnO2G48VLpzJ9S/PQsI5SoRsytJIQRw1ES/c9xB TtsZG1dHUmH+Ld1NW9/4ubf9z21MI4mjzkq1lMahiiKa3ET8tg1YYQwwFM48DLN4qUpA fyNWjXkMYAqIgsbc/thW9QHOzJMsNR1biAAgWWEHrxcsziilhxGrnBuA4Dzz0xdgg/Fs Hfnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=RTZtZTrOvhZ29X74AMgrRiV67kxBEjVyOmsNdni5luo=; b=TkHsf4Kr1VBQcdjqX9W2FXnuc+m/oFIY0S99DPQEbY9PgD9psE8BAQV4EfF/poQ0x6 Xn5qeXTAbvi5l4E6s8ztk0F55fzTo9cDFv9udp8gwjJYjSdwFsaBXUADZo1cA+st/tsB befdMDh4oPgwpeRrKGQN5WQbEbmkOUL+aSdmlo5tg2LpeOQl32L34ta1C5f3KCpiNxFi M0VDnahTDY57yDanJbSapgFAqyWLB7RhaIjwlfmYH7BXbQm4Q7pQzb8sLNwii0J4AAFz zeJ3Qtm/D+ItXJnmEGYg0ta1kIumy4Dmi2td/sgAB3x292QsxVEbHFmuqUtVw9DRT0OH SIhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=VDQX8fY6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id mf3si2815804ejb.43.2022.02.17.19.08.11; Thu, 17 Feb 2022 19:08:12 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=VDQX8fY6; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1CFA168B392; Fri, 18 Feb 2022 05:08:05 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C13AB68B2D4 for ; Fri, 18 Feb 2022 05:07:57 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645153683; x=1676689683; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=yV1BOH8hjBZGH953a4bs/oMZcwX1no83UMVdUIBY0TU=; b=VDQX8fY6UqKgjuB7PbhjwlT/Zgo1ZNcqj+h1eKWMREIGcndKnY+Uk3eF NlY0LWYEyFCwhRQkHn5UTnKE7y+cfhivbU7CUJUgcrc9ZWO5NZVkgqbc/ JFfsRPJ9ZVbRr3Uq0SOFxOMJZCfqURuCKDLsXmOW9qfS8Dm25tS2o1gNn 26vxdfyUdnaQEIO5axj5Q2mg1b1SXXeoLlZE6dIbAUMdvJOscs1nT/6Tn EHRsyxOO5M22V42N1ijhwBKaMnoiYvCB9FtqO3xuq9moVMDbo0rCYoMch PtDPDV5Y58Wt5UC+B0xAeiOtnNLCvN8OBbKuBfT8WZfOv8du3VOKrm2EI A==; X-IronPort-AV: E=McAfee;i="6200,9189,10261"; a="234570105" X-IronPort-AV: E=Sophos;i="5.88,377,1635231600"; d="scan'208";a="234570105" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Feb 2022 19:07:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,377,1635231600"; d="scan'208";a="541723543" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by fmsmga007.fm.intel.com with ESMTP; 17 Feb 2022 19:07:49 -0800 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Fri, 18 Feb 2022 11:07:47 +0800 Message-Id: <20220218030747.894232-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220218030747.894232-1-wenbin.chen@intel.com> References: <20220218030747.894232-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v5 2/2] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 4BeqWaztPbiQ Fix: #7706. After commit 5fdcf85bbffe7451c2, vaapi encoder's performance decrease. The reason is that vaRenderPicture() and vaSyncBuffer() are called at the same time (vaRenderPicture() always followed by a vaSyncBuffer()). Now I changed them to be called in a asynchronous way, which will make better use of hardware. Async_depth is added to increase encoder's performance. The frames that are sent to hardware are stored in a fifo. Encoder will sync output after async fifo is full. Signed-off-by: Wenbin Chen Signed-off-by: Haihao Xiang --- doc/encoders.texi | 6 ++++ libavcodec/vaapi_encode.c | 64 +++++++++++++++++++++++++++++++-------- libavcodec/vaapi_encode.h | 16 ++++++++-- 3 files changed, 71 insertions(+), 15 deletions(-) diff --git a/doc/encoders.texi b/doc/encoders.texi index bfb6c7eef6..6bac2b7f28 100644 --- a/doc/encoders.texi +++ b/doc/encoders.texi @@ -3600,6 +3600,12 @@ will refer only to P- or I-frames. When set to greater values multiple layers of B-frames will be present, frames in each layer only referring to frames in higher layers. +@item async_depth +Maximum processing parallelism. Increase this to improve single channel +performance. This option doesn't work if driver doesn't implement vaSyncBuffer +function. Please make sure there are enough hw_frames allocated if a large +number of async_depth is used. + @item rc_mode Set the rate control mode to use. A given driver may only support a subset of modes. diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index 3f8c8ace2a..8c6e881702 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -965,8 +965,10 @@ static int vaapi_encode_pick_next(AVCodecContext *avctx, if (!pic && ctx->end_of_stream) { --b_counter; pic = ctx->pic_end; - if (pic->encode_issued) + if (pic->encode_complete) return AVERROR_EOF; + else if (pic->encode_issued) + return AVERROR(EAGAIN); } if (!pic) { @@ -1137,7 +1139,8 @@ static int vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) if (ctx->input_order == ctx->decode_delay) ctx->dts_pts_diff = pic->pts - ctx->first_pts; if (ctx->output_delay > 0) - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = pic->pts; + ctx->ts_ring[ctx->input_order % + (3 * ctx->output_delay + ctx->async_depth)] = pic->pts; pic->display_order = ctx->input_order; ++ctx->input_order; @@ -1191,18 +1194,47 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) return AVERROR(EAGAIN); } - pic = NULL; - err = vaapi_encode_pick_next(avctx, &pic); - if (err < 0) - return err; - av_assert0(pic); + if (ctx->has_sync_buffer_func) { + pic = NULL; + + if (av_fifo_can_write(ctx->encode_fifo)) { + err = vaapi_encode_pick_next(avctx, &pic); + if (!err) { + av_assert0(pic); + pic->encode_order = ctx->encode_order + + av_fifo_can_read(ctx->encode_fifo); + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } + av_fifo_write(ctx->encode_fifo, &pic, 1); + } + } - pic->encode_order = ctx->encode_order++; + if (!av_fifo_can_read(ctx->encode_fifo)) + return err; - err = vaapi_encode_issue(avctx, pic); - if (err < 0) { - av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); - return err; + // More frames can be buffered + if (av_fifo_can_write(ctx->encode_fifo) && !ctx->end_of_stream) + return AVERROR(EAGAIN); + + av_fifo_read(ctx->encode_fifo, &pic, 1); + ctx->encode_order = pic->encode_order + 1; + } else { + pic = NULL; + err = vaapi_encode_pick_next(avctx, &pic); + if (err < 0) + return err; + av_assert0(pic); + + pic->encode_order = ctx->encode_order++; + + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } } err = vaapi_encode_output(avctx, pic, pkt); @@ -1220,7 +1252,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) pkt->dts = ctx->ts_ring[pic->encode_order] - ctx->dts_pts_diff; } else { pkt->dts = ctx->ts_ring[(pic->encode_order - ctx->decode_delay) % - (3 * ctx->output_delay)]; + (3 * ctx->output_delay + ctx->async_depth)]; } av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" dts %"PRId64".\n", pkt->pts, pkt->dts); @@ -2541,6 +2573,11 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) vas = vaSyncBuffer(ctx->hwctx->display, VA_INVALID_ID, 0); if (vas != VA_STATUS_ERROR_UNIMPLEMENTED) { ctx->has_sync_buffer_func = 1; + ctx->encode_fifo = av_fifo_alloc2(ctx->async_depth, + sizeof(VAAPIEncodePicture *), + 0); + if (!ctx->encode_fifo) + return AVERROR(ENOMEM); } #endif @@ -2581,6 +2618,7 @@ av_cold int ff_vaapi_encode_close(AVCodecContext *avctx) av_freep(&ctx->codec_sequence_params); av_freep(&ctx->codec_picture_params); + av_fifo_freep2(&ctx->encode_fifo); av_buffer_unref(&ctx->recon_frames_ref); av_buffer_unref(&ctx->input_frames_ref); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index 29d9e9b91c..1b40819c69 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -29,6 +29,7 @@ #include "libavutil/hwcontext.h" #include "libavutil/hwcontext_vaapi.h" +#include "libavutil/fifo.h" #include "avcodec.h" #include "hwconfig.h" @@ -47,6 +48,7 @@ enum { MAX_TILE_ROWS = 22, // A.4.1: table A.6 allows at most 20 tile columns for any level. MAX_TILE_COLS = 20, + MAX_ASYNC_DEPTH = 64, }; extern const AVCodecHWConfigInternal *const ff_vaapi_encode_hw_configs[]; @@ -297,7 +299,8 @@ typedef struct VAAPIEncodeContext { // Timestamp handling. int64_t first_pts; int64_t dts_pts_diff; - int64_t ts_ring[MAX_REORDER_DELAY * 3]; + int64_t ts_ring[MAX_REORDER_DELAY * 3 + + MAX_ASYNC_DEPTH]; // Slice structure. int slice_block_rows; @@ -348,6 +351,10 @@ typedef struct VAAPIEncodeContext { // Whether the driver support vaSyncBuffer int has_sync_buffer_func; + // Store buffered pic + AVFifo *encode_fifo; + // Max number of frame buffered in encoder. + int async_depth; } VAAPIEncodeContext; enum { @@ -458,7 +465,12 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); { "b_depth", \ "Maximum B-frame reference depth", \ OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ - { .i64 = 1 }, 1, INT_MAX, FLAGS } + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ + { "async_depth", "Maximum processing parallelism. " \ + "Increase this to improve single channel performance. This option " \ + "doesn't work if driver doesn't implement vaSyncBuffer function.", \ + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ + { .i64 = 2 }, 1, MAX_ASYNC_DEPTH, FLAGS } #define VAAPI_ENCODE_RC_MODE(name, desc) \ { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \