From patchwork Wed Oct 27 08:57:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenbin Chen X-Patchwork-Id: 31232 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2084:0:0:0:0 with SMTP id a4csp2351538ioa; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyB7hmR6Qh0kcir6i1iYPE/pWwxWCQhmuW1M9Vj96E9zIS04B3YBMcIRBpdbosQICEXwNsZ X-Received: by 2002:aa7:cd88:: with SMTP id x8mr42175858edv.203.1635325290681; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635325290; cv=none; d=google.com; s=arc-20160816; b=plifCKsw8SL774Lr3fK9R8eNE+mDOzw9nA0mAwf5dRiO19HGKQ6dKBeokdBu2uG1JB cXMKWydm9KBkv30uy/kj5p9S4bIF0ANq0M+cy8SQbjtQhuVpZFEITdlPnSXYFPLIUXx+ ZsEttf6VMolwH4MmntoFoe2dyjN1uF6BQxEw7Mnc/FteEy+4RPNTO6vViPrmmZVRevy7 /BvCggBTVLJoqTLjpiEPGH+o2DU+O8Fz0B/YAtJpgQJHF7mlBPgW2PnxMWB9AAbuIfoG iAW9ZO5cDuxeLMESRvoWpE1jaLQzhVLK7SHYrVeXL9ra3FEXmuY/i4gBKN/KJK3AThFC 64rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=W1MLVVu+NY744YqZM2XRlihKeDjNBxM6oPnOO5f3VqQ=; b=PelGJ4iCjN1xHdd0UDBhrs7nQ8NJ8FJXVerjkIm1JH8NT+Qraf3lF6KPBRJ8OocWyn MQDXuDbnp1KNeE1ywm4bxFeu7YTfNOsv40ulF5DKQuwNkPGGUjlE9EjNo0Aa4wmoL06J T+7CeXkKO12S+rlTbT7KGR8ryUEGfMKGG1jdvOp/+/4KYUVs6EBwAxwEdn7tMgT49TQ1 k4bj9C6h5Zn/Hvvzk51xk6A/EkKd49ZSZDE8XHPv6argDQdjoGG6odfJtMbbrNF1kuaR TMq3gfxvFGHr8Rpa4WqNtZ1g684iw6l7R09ZjO7YE0DvThXdegAXo460jokBEn8Cg2+J f3SA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id o3si26054111ejy.753.2021.10.27.02.01.30; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A399968AA5C; Wed, 27 Oct 2021 12:01:05 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4897168AA36 for ; Wed, 27 Oct 2021 12:00:58 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10149"; a="228866476" X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="228866476" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2021 02:00:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="486577226" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga007.jf.intel.com with ESMTP; 27 Oct 2021 02:00:55 -0700 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 27 Oct 2021 16:57:05 +0800 Message-Id: <20211027085705.4114165-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211027085705.4114165-1-wenbin.chen@intel.com> References: <20211027085705.4114165-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wenbin Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: DejyJizoZJL5 Add async_depth to increase encoder's performance. Reuse encode_fifo as async buffer. Encoder puts all reordered frame to HW and then check fifo size. If fifo < async_depth and the top frame is not ready, it will return AVERROR(EAGAIN) to require more frames. 1080p transcoding (no B frames) with -async_depth=4 can increase 20% performance on my environment. The async increases performance but also introduces frame delay. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 20 +++++++++++++++----- libavcodec/vaapi_encode.h | 12 ++++++++++-- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index db0ae136a1..616fb7c089 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -1158,7 +1158,8 @@ static int vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) if (ctx->input_order == ctx->decode_delay) ctx->dts_pts_diff = pic->pts - ctx->first_pts; if (ctx->output_delay > 0) - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = pic->pts; + ctx->ts_ring[ctx->input_order % + (3 * ctx->output_delay + ctx->async_depth)] = pic->pts; pic->display_order = ctx->input_order; ++ctx->input_order; @@ -1212,7 +1213,8 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) return AVERROR(EAGAIN); } - while (av_fifo_size(ctx->encode_fifo) <= MAX_PICTURE_REFERENCES * sizeof(VAAPIEncodePicture *)) { + while (av_fifo_size(ctx->encode_fifo) < + MAX_ASYNC_DEPTH * sizeof(VAAPIEncodePicture *)) { pic = NULL; err = vaapi_encode_pick_next(avctx, &pic); if (err < 0) @@ -1234,6 +1236,14 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) if (!av_fifo_size(ctx->encode_fifo)) return err; + if (av_fifo_size(ctx->encode_fifo) < ctx->async_depth * sizeof(VAAPIEncodePicture *) && + !ctx->end_of_stream) { + av_fifo_generic_peek(ctx->encode_fifo, &pic, sizeof(pic), NULL); + err = vaapi_encode_wait(avctx, pic, 0); + if (err < 0) + return err; + } + av_fifo_generic_read(ctx->encode_fifo, &pic, sizeof(pic), NULL); ctx->encode_order = pic->encode_order + 1; @@ -1252,7 +1262,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) pkt->dts = ctx->ts_ring[pic->encode_order] - ctx->dts_pts_diff; } else { pkt->dts = ctx->ts_ring[(pic->encode_order - ctx->decode_delay) % - (3 * ctx->output_delay)]; + (3 * ctx->output_delay + ctx->async_depth)]; } av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" dts %"PRId64".\n", pkt->pts, pkt->dts); @@ -2566,8 +2576,8 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) } } - ctx->encode_fifo = av_fifo_alloc((MAX_PICTURE_REFERENCES + 1) * - sizeof(VAAPIEncodePicture *)); + ctx->encode_fifo = av_fifo_alloc(MAX_ASYNC_DEPTH * + sizeof(VAAPIEncodePicture *)); if (!ctx->encode_fifo) return AVERROR(ENOMEM); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index 89fe8de466..1bf5d7c337 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -48,6 +48,7 @@ enum { MAX_TILE_ROWS = 22, // A.4.1: table A.6 allows at most 20 tile columns for any level. MAX_TILE_COLS = 20, + MAX_ASYNC_DEPTH = 64, }; extern const AVCodecHWConfigInternal *const ff_vaapi_encode_hw_configs[]; @@ -298,7 +299,8 @@ typedef struct VAAPIEncodeContext { // Timestamp handling. int64_t first_pts; int64_t dts_pts_diff; - int64_t ts_ring[MAX_REORDER_DELAY * 3]; + int64_t ts_ring[MAX_REORDER_DELAY * 3 + + MAX_ASYNC_DEPTH]; // Slice structure. int slice_block_rows; @@ -348,6 +350,8 @@ typedef struct VAAPIEncodeContext { AVFrame *frame; AVFifoBuffer *encode_fifo; + + int async_depth; } VAAPIEncodeContext; enum { @@ -458,7 +462,11 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); { "b_depth", \ "Maximum B-frame reference depth", \ OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ - { .i64 = 1 }, 1, INT_MAX, FLAGS } + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ + { "async_depth", "Maximum processing parallelism. " \ + "Increase this to improve single channel performance", \ + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ + { .i64 = 4 }, 0, MAX_ASYNC_DEPTH, FLAGS } #define VAAPI_ENCODE_RC_MODE(name, desc) \ { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \