From patchwork Tue Feb 8 03:05:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Chen, Wenbin" X-Patchwork-Id: 34161 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2c4e:0:0:0:0 with SMTP id x14csp416101iov; Mon, 7 Feb 2022 19:06:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJy/PsJYpjAQQJ3HM6nBd+nhPLed0SsqNPkneTFclMN/jVsDLT8sEadNDK+1/SDKRneqcqiG X-Received: by 2002:aa7:d299:: with SMTP id w25mr2354824edq.21.1644289588751; Mon, 07 Feb 2022 19:06:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644289588; cv=none; d=google.com; s=arc-20160816; b=dRAik8q6wrIoMfItCfA0BNw1zjkSNaHUCc9rv9q3kavFiTG8qtGZFbn3fzV9I03sn7 uaWDf0ofcNXDJDEjywWLYVZxfavQI8mF4wBr9266GM1GyydtpNEaonmy7aM2McGaH/+p gIk5skmEbEp7piM2bCit0Ys90TYR++vECdq814yPMhZ1pJttEeTKpue+Ocxoe4bTksWm Pygds2QCGPYPzOAtdC0K1gmWVHte53GLcdkjxdm4l0LQU5THS84z1iqWNWHsqQ1+fOIt 90c2shm3fxTfvbGSbxm4xbNum6hGwJILBoIF/1R9mU/3JZAgibL0CaxLdRzvMgQazD9v v3iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=im4TQ2Og9E914vMqOfRaM2w7xCTMVG4KyvOsxuGrVHA=; b=Tzz4hIKQYUnKlEWW0DRGufWKvs7Zi8oMSyLeiKk5wda/VhdgSRdi6viFBGYLnwToMZ Cbm+7Q3cQ1o8r4NyBbYpeXwhUV2KqVQkbSPnqBUSWRmaBaVYflFXl+L+J1fRE0et/BS1 VMyLV2HuI6mIHWN7Idbrtvwr318GWyJSI7jlslQYPD0HcUMVz6eyTYfo3Dg+tmcIXi3C Q3gkE5noRiU6DiIjZBSFZqSoFrliawnHr/w0rje8F2MvpHNi0/o3R4T1hSrZqiW0lgpF +2AZYQ1GFL9Mw1IqNPJkpj7+nECUzvfWHYvmA9FyNIwps4oAdSyTEo1QDjbGazS+tIXq EQyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b="Jh/7tmm+"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f9si9511807edf.226.2022.02.07.19.06.28; Mon, 07 Feb 2022 19:06:28 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b="Jh/7tmm+"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 34D0B68B185; Tue, 8 Feb 2022 05:06:08 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 60EFB68A897 for ; Tue, 8 Feb 2022 05:06:00 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1644289565; x=1675825565; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=sM93SyByF+ptb3y3h/mwNpnZgXn8Sum4YMOJBA2DOSA=; b=Jh/7tmm+kAUcGjUNUqFcFj0qOywRdatTqSdhWahisweob32Uz5m5JXub q2F3jFuadEBIjVFg/+CWoHROok8dWq5FmxUk/IS/lHMSGbmWLXtX5K9sQ jg6Z+D8x9C42mAEyYaGIorfYHMq4x5U5prfa13C//t4ItCz7VMzs1JVuT /GAOAaX0MaAGr0vlI3wVw9IbGt2on6RKQ2M2mCUYrPJ9b/FV2MjH0YNC+ w5QMLSz/dIZjDifIvY5kzvViHDKM7QS4c3QmWfHoYUIZYO9fe0CHe3NJW y6vYUlTJXbvqKRehNO/2Tw9vQWGFQambWbU95P+y9djoAjbJ5MDGtj7WH w==; X-IronPort-AV: E=McAfee;i="6200,9189,10251"; a="228829623" X-IronPort-AV: E=Sophos;i="5.88,351,1635231600"; d="scan'208";a="228829623" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Feb 2022 19:05:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,351,1635231600"; d="scan'208";a="628742460" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by fmsmga002.fm.intel.com with ESMTP; 07 Feb 2022 19:05:53 -0800 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Tue, 8 Feb 2022 11:05:49 +0800 Message-Id: <20220208030549.340748-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220208030549.340748-1-wenbin.chen@intel.com> References: <20220208030549.340748-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH V3 3/3] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: YMcVSMXpiwBn Add async_depth to increase encoder's performance. Reuse encode_fifo as async buffer. Encoder puts all reordered frame to HW and then check fifo size. If fifo < async_depth and the top frame is not ready, it will return AVERROR(EAGAIN) to require more frames. 1080p transcoding (no B frames) with -async_depth=4 can increase 20% performance on my environment. The async increases performance but also introduces frame delay. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 16 ++++++++++++---- libavcodec/vaapi_encode.h | 12 ++++++++++-- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index 15ddbbaa4a..432abf31f7 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -1158,7 +1158,8 @@ static int vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) if (ctx->input_order == ctx->decode_delay) ctx->dts_pts_diff = pic->pts - ctx->first_pts; if (ctx->output_delay > 0) - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = pic->pts; + ctx->ts_ring[ctx->input_order % + (3 * ctx->output_delay + ctx->async_depth)] = pic->pts; pic->display_order = ctx->input_order; ++ctx->input_order; @@ -1214,7 +1215,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) #if VA_CHECK_VERSION(1, 9, 0) if (ctx->has_sync_buffer_func) { - while (av_fifo_can_read(ctx->encode_fifo) <= MAX_PICTURE_REFERENCES) { + while (av_fifo_can_read(ctx->encode_fifo) <= MAX_ASYNC_DEPTH) { pic = NULL; err = vaapi_encode_pick_next(avctx, &pic); if (err < 0) @@ -1232,6 +1233,13 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) } if (!av_fifo_can_read(ctx->encode_fifo)) return err; + if (av_fifo_can_read(ctx->encode_fifo) < ctx->async_depth && + !ctx->end_of_stream) { + av_fifo_peek(ctx->encode_fifo, &pic, 1, 0); + err = vaapi_encode_wait(avctx, pic, 0); + if (err < 0) + return err; + } av_fifo_read(ctx->encode_fifo, &pic, 1); ctx->encode_order = pic->encode_order + 1; } else @@ -1267,7 +1275,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) pkt->dts = ctx->ts_ring[pic->encode_order] - ctx->dts_pts_diff; } else { pkt->dts = ctx->ts_ring[(pic->encode_order - ctx->decode_delay) % - (3 * ctx->output_delay)]; + (3 * ctx->output_delay + ctx->async_depth)]; } av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" dts %"PRId64".\n", pkt->pts, pkt->dts); @@ -2588,7 +2596,7 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) vas = vaSyncBuffer(ctx->hwctx->display, 0, 0); if (vas != VA_STATUS_ERROR_UNIMPLEMENTED) { ctx->has_sync_buffer_func = 1; - ctx->encode_fifo = av_fifo_alloc2(MAX_PICTURE_REFERENCES + 1, + ctx->encode_fifo = av_fifo_alloc2(MAX_ASYNC_DEPTH, sizeof(VAAPIEncodePicture *), 0); if (!ctx->encode_fifo) diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index d33a486cb8..691521387d 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -48,6 +48,7 @@ enum { MAX_TILE_ROWS = 22, // A.4.1: table A.6 allows at most 20 tile columns for any level. MAX_TILE_COLS = 20, + MAX_ASYNC_DEPTH = 64, }; extern const AVCodecHWConfigInternal *const ff_vaapi_encode_hw_configs[]; @@ -298,7 +299,8 @@ typedef struct VAAPIEncodeContext { // Timestamp handling. int64_t first_pts; int64_t dts_pts_diff; - int64_t ts_ring[MAX_REORDER_DELAY * 3]; + int64_t ts_ring[MAX_REORDER_DELAY * 3 + + MAX_ASYNC_DEPTH]; // Slice structure. int slice_block_rows; @@ -350,6 +352,8 @@ typedef struct VAAPIEncodeContext { AVFifo *encode_fifo; //Whether the driver support vaSyncBuffer int has_sync_buffer_func; + //Max number of frame buffered in encoder. + int async_depth; } VAAPIEncodeContext; enum { @@ -460,7 +464,11 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); { "b_depth", \ "Maximum B-frame reference depth", \ OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ - { .i64 = 1 }, 1, INT_MAX, FLAGS } + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ + { "async_depth", "Maximum processing parallelism. " \ + "Increase this to improve single channel performance", \ + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ + { .i64 = 4 }, 0, MAX_ASYNC_DEPTH, FLAGS } #define VAAPI_ENCODE_RC_MODE(name, desc) \ { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \