From patchwork Wed Oct 27 08:57:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenbin Chen X-Patchwork-Id: 31234 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2084:0:0:0:0 with SMTP id a4csp2351516ioa; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy9NOKlX9g2I13JgwW+Ukuyy8pO41GETknpMBX7v83QhIDvI1Yb+GC96EgqZGmHJUoPB0O1 X-Received: by 2002:a05:6402:35c5:: with SMTP id z5mr43545680edc.388.1635325289837; Wed, 27 Oct 2021 02:01:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635325289; cv=none; d=google.com; s=arc-20160816; b=pC/bdYNS+pJqwESVtUmCG3TP/7taKSOJQPbbZmEEpSX5/zFbbbm0B6WyJqyZm27l4X BE7I4/Q88gxzpt5f1wyq2tJ3eae34dFcYTMEgQrrJCbL7dhbIP7g0uKpE5u3qwX3KZTP QZZiJRm0KF58vd+tVEXaOHA2LdOBLW6LJ9OceeCyxQcHX7cw/iJcSSrH/h3lU4sl5J9G hxHG6QgvWG2BeCE8s1opeVOfjHDsI+Z0HY/Y0R8A0+eiwPkkUntiJqHujEVazIfZMtRA 3qjy5vhKRXRVnMMYwgh9WBWc42W1QHqJmY3keDIrtMJI7aWe8B09q88KBHgZt/IAe36M hIzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=14m0IYpqJPAvDAqpdopXP3RsVeCp1AXw08LfJDiXh+U=; b=RbIKJibxIFVmdrj5IDSVwId+Hsv/HShhppF4jw5WKskF3obotbY4+R2X36qnP4c2mo A7308oaTY5Te7FJrTNKnBMQ3x8wyntvESfKKacAV4cMY3QR5D5s0JwlVb5BAXdd8312D ri99aDMct67XKFmdzGM+zlhSU6LQCd3gJIfZ/hku47RzqAdUbyJKfiqHp14tEY31c0qU C+R2OMf07WOlEx/LQ77iML6UQN6hA3jUeAFVjTO+CXk4GA37o5iSXbX8axVp1RvTjko3 GmxuSH0SzrwtsQ24pojrQlQpn0ze2lTvVLcjdwvWUwdNBLiN56S0JxR4xMlR9mXv5yrv Xtiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t19si3411203edd.74.2021.10.27.02.01.05; Wed, 27 Oct 2021 02:01:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 839F168A47F; Wed, 27 Oct 2021 12:01:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 711DB680C28 for ; Wed, 27 Oct 2021 12:00:54 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10149"; a="228866459" X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="228866459" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2021 02:00:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="486577189" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga007.jf.intel.com with ESMTP; 27 Oct 2021 02:00:51 -0700 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 27 Oct 2021 16:57:03 +0800 Message-Id: <20211027085705.4114165-1-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] libavcodec/vaapi_encode: Change the way to call async to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wenbin Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ZpJiA2E0Wl5b Fix: #7706. After commit 5fdcf85bbffe7451c2, vaapi encoder's performance decrease. The reason is that vaRenderPicture() and vaSyncSurface() are called at the same time (vaRenderPicture() always followed by a vaSyncSurface()). When we encode stream with B frames, we need buffer to reorder frames, so we can send serveral frames to HW at once to increase performance. Now I changed them to be called in a asynchronous way, which will make better use of hardware. 1080p transcoding increases about 17% fps on my environment. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 41 ++++++++++++++++++++++++++++----------- libavcodec/vaapi_encode.h | 3 +++ 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index ec054ae701..5927849233 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -951,8 +951,10 @@ static int vaapi_encode_pick_next(AVCodecContext *avctx, if (!pic && ctx->end_of_stream) { --b_counter; pic = ctx->pic_end; - if (pic->encode_issued) + if (pic->encode_complete) return AVERROR_EOF; + else if (pic->encode_issued) + return AVERROR(EAGAIN); } if (!pic) { @@ -1177,20 +1179,31 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) return AVERROR(EAGAIN); } - pic = NULL; - err = vaapi_encode_pick_next(avctx, &pic); - if (err < 0) - return err; - av_assert0(pic); + while (av_fifo_size(ctx->encode_fifo) <= MAX_PICTURE_REFERENCES * sizeof(VAAPIEncodePicture *)) { + pic = NULL; + err = vaapi_encode_pick_next(avctx, &pic); + if (err < 0) + break; + av_assert0(pic); - pic->encode_order = ctx->encode_order++; + pic->encode_order = ctx->encode_order + + (av_fifo_size(ctx->encode_fifo) / sizeof(VAAPIEncodePicture *)); - err = vaapi_encode_issue(avctx, pic); - if (err < 0) { - av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); - return err; + err = vaapi_encode_issue(avctx, pic); + if (err < 0) { + av_log(avctx, AV_LOG_ERROR, "Encode failed: %d.\n", err); + return err; + } + + av_fifo_generic_write(ctx->encode_fifo, &pic, sizeof(pic), NULL); } + if (!av_fifo_size(ctx->encode_fifo)) + return err; + + av_fifo_generic_read(ctx->encode_fifo, &pic, sizeof(pic), NULL); + ctx->encode_order = pic->encode_order + 1; + err = vaapi_encode_output(avctx, pic, pkt); if (err < 0) { av_log(avctx, AV_LOG_ERROR, "Output failed: %d.\n", err); @@ -2520,6 +2533,11 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) } } + ctx->encode_fifo = av_fifo_alloc((MAX_PICTURE_REFERENCES + 1) * + sizeof(VAAPIEncodePicture *)); + if (!ctx->encode_fifo) + return AVERROR(ENOMEM); + return 0; fail: @@ -2552,6 +2570,7 @@ av_cold int ff_vaapi_encode_close(AVCodecContext *avctx) av_freep(&ctx->codec_sequence_params); av_freep(&ctx->codec_picture_params); + av_fifo_freep(&ctx->encode_fifo); av_buffer_unref(&ctx->recon_frames_ref); av_buffer_unref(&ctx->input_frames_ref); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index b41604a883..89fe8de466 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -29,6 +29,7 @@ #include "libavutil/hwcontext.h" #include "libavutil/hwcontext_vaapi.h" +#include "libavutil/fifo.h" #include "avcodec.h" #include "hwconfig.h" @@ -345,6 +346,8 @@ typedef struct VAAPIEncodeContext { int roi_warned; AVFrame *frame; + + AVFifoBuffer *encode_fifo; } VAAPIEncodeContext; enum { From patchwork Wed Oct 27 08:57:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenbin Chen X-Patchwork-Id: 31233 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2084:0:0:0:0 with SMTP id a4csp2351342ioa; Wed, 27 Oct 2021 02:01:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxBzmYhfMiW/x0MuWxE/JZ+WkmbP6JwlMIuW/5ysQ6kkiNoQKbe/x1ocID4ydTJc7UNX/4X X-Received: by 2002:a17:906:1e82:: with SMTP id e2mr795769ejj.32.1635325277537; Wed, 27 Oct 2021 02:01:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635325277; cv=none; d=google.com; s=arc-20160816; b=zA+lyVEMqRfGvNU/BaFVc7q3TTO+R5z/37KbCVFKEEQHVHkmtT3/crJtDsK8lo/jbh VBsi/75XqQy09Cj6m23r7ojlHR0uZKRNAn3fJ2IUqdxLmhgFDK2dAlxggcmnFZKLILfB qFtWU/dOhoFwdNSgJHaxaeGVD7lOYzQIHrX9RT48k/esp4IZJ22Xh1dZ8ZeZPkzTrEjA 4SSX5yWwx8iRmVbH1q9YWYzGSTH/cNc+NepvHp1AfhLIIWGwZgPuSWrjRDd3ZMARU/Dj 8OvyAL6+N5tg/p6TA1DSljpV5t8zf33FkO3YdPMu2u6v1/DfQuITY70309VSH/Ymseqo STvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=LxuVmdQAytq/kPYk0nvSII8d4X83PaMAtu56w5Vinak=; b=B9TnQR7y3Eb6En7SDss90iLnoP2R1S1NorUO1loIpF+ZKJqhHH0oVbhmdG2/yeo3Hw qXEDUDdDndnonEwjTWmGCjGKBRwATqsTeFXHAOzrWc/IV+pbnLOvcUk2UH4q+UhTZ9z6 c9ixkDVwS3wsYYUIkiBecjYr5M80ZjBo2MjFtd+/VUUZ65Lz182nJsnIQ5m3WFDHxhMf N8bTMkmhIIkRingosAoY9JtjBQQ/uQtYTJFoABIopcF2vZ5hgCr+dXxLMPznHgrfJcSC K1gt32MAw9DZn5C6Rh6Jb+IjMGEtw5Ye5peMM9eXK2OXsOKID4w+u+f8BSbyxQegNpGU fW9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w2si20167107edr.412.2021.10.27.02.01.17; Wed, 27 Oct 2021 02:01:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 99F1E68AA42; Wed, 27 Oct 2021 12:01:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3B50C68A877 for ; Wed, 27 Oct 2021 12:00:56 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10149"; a="228866465" X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="228866465" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2021 02:00:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="486577203" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga007.jf.intel.com with ESMTP; 27 Oct 2021 02:00:53 -0700 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 27 Oct 2021 16:57:04 +0800 Message-Id: <20211027085705.4114165-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211027085705.4114165-1-wenbin.chen@intel.com> References: <20211027085705.4114165-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] libavcodec/vaapi_encode: Add new API adaption to vaapi_encode X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wenbin Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kAn4ywYF+Yvz Add vaSyncBuffer to VAAPI encoder. Old version API vaSyncSurface wait surface to complete. When surface is used for multiple operation, it wait all operation to finish. vaSyncBuffer only wait one channel to finish. Add wait param to vaapi_encode_wait() to prepare for the async_depth option. "wait=1" means wait until operation ready. "wait=0" means query operation's status. If ready return 0, if still in progress return EAGAIN. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 47 +++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index 5927849233..db0ae136a1 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -134,7 +134,8 @@ static int vaapi_encode_make_misc_param_buffer(AVCodecContext *avctx, } static int vaapi_encode_wait(AVCodecContext *avctx, - VAAPIEncodePicture *pic) + VAAPIEncodePicture *pic, + uint8_t wait) { VAAPIEncodeContext *ctx = avctx->priv_data; VAStatus vas; @@ -150,11 +151,43 @@ static int vaapi_encode_wait(AVCodecContext *avctx, "(input surface %#x).\n", pic->display_order, pic->encode_order, pic->input_surface); - vas = vaSyncSurface(ctx->hwctx->display, pic->input_surface); - if (vas != VA_STATUS_SUCCESS) { - av_log(avctx, AV_LOG_ERROR, "Failed to sync to picture completion: " - "%d (%s).\n", vas, vaErrorStr(vas)); +#if VA_CHECK_VERSION(1, 9, 0) + // Try vaSyncBuffer. + vas = vaSyncBuffer(ctx->hwctx->display, + pic->output_buffer, + wait ? VA_TIMEOUT_INFINITE : 0); + if (vas == VA_STATUS_ERROR_TIMEDOUT) { + return AVERROR(EAGAIN); + } else if (vas != VA_STATUS_SUCCESS && vas != VA_STATUS_ERROR_UNIMPLEMENTED) { + av_log(avctx, AV_LOG_ERROR, "Failed to sync to output buffer completion: " + "%d (%s).\n", vas, vaErrorStr(vas)); return AVERROR(EIO); + } else if (vas == VA_STATUS_ERROR_UNIMPLEMENTED) + // If vaSyncBuffer is not implemented, try old version API. +#endif + { + if (!wait) { + VASurfaceStatus surface_status; + vas = vaQuerySurfaceStatus(ctx->hwctx->display, + pic->input_surface, + &surface_status); + if (vas == VA_STATUS_SUCCESS && + surface_status != VASurfaceReady && + surface_status != VASurfaceSkipped) { + return AVERROR(EAGAIN); + } else if (vas != VA_STATUS_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to query surface status: " + "%d (%s).\n", vas, vaErrorStr(vas)); + return AVERROR(EIO); + } + } else { + vas = vaSyncSurface(ctx->hwctx->display, pic->input_surface); + if (vas != VA_STATUS_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to sync to picture completion: " + "%d (%s).\n", vas, vaErrorStr(vas)); + return AVERROR(EIO); + } + } } // Input is definitely finished with now. @@ -633,7 +666,7 @@ static int vaapi_encode_output(AVCodecContext *avctx, uint8_t *ptr; int err; - err = vaapi_encode_wait(avctx, pic); + err = vaapi_encode_wait(avctx, pic, 1); if (err < 0) return err; @@ -695,7 +728,7 @@ fail: static int vaapi_encode_discard(AVCodecContext *avctx, VAAPIEncodePicture *pic) { - vaapi_encode_wait(avctx, pic); + vaapi_encode_wait(avctx, pic, 1); if (pic->output_buffer_ref) { av_log(avctx, AV_LOG_DEBUG, "Discard output for pic " From patchwork Wed Oct 27 08:57:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wenbin Chen X-Patchwork-Id: 31232 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2084:0:0:0:0 with SMTP id a4csp2351538ioa; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyB7hmR6Qh0kcir6i1iYPE/pWwxWCQhmuW1M9Vj96E9zIS04B3YBMcIRBpdbosQICEXwNsZ X-Received: by 2002:aa7:cd88:: with SMTP id x8mr42175858edv.203.1635325290681; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635325290; cv=none; d=google.com; s=arc-20160816; b=plifCKsw8SL774Lr3fK9R8eNE+mDOzw9nA0mAwf5dRiO19HGKQ6dKBeokdBu2uG1JB cXMKWydm9KBkv30uy/kj5p9S4bIF0ANq0M+cy8SQbjtQhuVpZFEITdlPnSXYFPLIUXx+ ZsEttf6VMolwH4MmntoFoe2dyjN1uF6BQxEw7Mnc/FteEy+4RPNTO6vViPrmmZVRevy7 /BvCggBTVLJoqTLjpiEPGH+o2DU+O8Fz0B/YAtJpgQJHF7mlBPgW2PnxMWB9AAbuIfoG iAW9ZO5cDuxeLMESRvoWpE1jaLQzhVLK7SHYrVeXL9ra3FEXmuY/i4gBKN/KJK3AThFC 64rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:delivered-to; bh=W1MLVVu+NY744YqZM2XRlihKeDjNBxM6oPnOO5f3VqQ=; b=PelGJ4iCjN1xHdd0UDBhrs7nQ8NJ8FJXVerjkIm1JH8NT+Qraf3lF6KPBRJ8OocWyn MQDXuDbnp1KNeE1ywm4bxFeu7YTfNOsv40ulF5DKQuwNkPGGUjlE9EjNo0Aa4wmoL06J T+7CeXkKO12S+rlTbT7KGR8ryUEGfMKGG1jdvOp/+/4KYUVs6EBwAxwEdn7tMgT49TQ1 k4bj9C6h5Zn/Hvvzk51xk6A/EkKd49ZSZDE8XHPv6argDQdjoGG6odfJtMbbrNF1kuaR TMq3gfxvFGHr8Rpa4WqNtZ1g684iw6l7R09ZjO7YE0DvThXdegAXo460jokBEn8Cg2+J f3SA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id o3si26054111ejy.753.2021.10.27.02.01.30; Wed, 27 Oct 2021 02:01:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A399968AA5C; Wed, 27 Oct 2021 12:01:05 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4897168AA36 for ; Wed, 27 Oct 2021 12:00:58 +0300 (EEST) X-IronPort-AV: E=McAfee;i="6200,9189,10149"; a="228866476" X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="228866476" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2021 02:00:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,186,1631602800"; d="scan'208";a="486577226" Received: from chenwenbin-z390-aorus-ultra.sh.intel.com ([10.239.35.110]) by orsmga007.jf.intel.com with ESMTP; 27 Oct 2021 02:00:55 -0700 From: Wenbin Chen To: ffmpeg-devel@ffmpeg.org Date: Wed, 27 Oct 2021 16:57:05 +0800 Message-Id: <20211027085705.4114165-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211027085705.4114165-1-wenbin.chen@intel.com> References: <20211027085705.4114165-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wenbin Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: DejyJizoZJL5 Add async_depth to increase encoder's performance. Reuse encode_fifo as async buffer. Encoder puts all reordered frame to HW and then check fifo size. If fifo < async_depth and the top frame is not ready, it will return AVERROR(EAGAIN) to require more frames. 1080p transcoding (no B frames) with -async_depth=4 can increase 20% performance on my environment. The async increases performance but also introduces frame delay. Signed-off-by: Wenbin Chen --- libavcodec/vaapi_encode.c | 20 +++++++++++++++----- libavcodec/vaapi_encode.h | 12 ++++++++++-- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c index db0ae136a1..616fb7c089 100644 --- a/libavcodec/vaapi_encode.c +++ b/libavcodec/vaapi_encode.c @@ -1158,7 +1158,8 @@ static int vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) if (ctx->input_order == ctx->decode_delay) ctx->dts_pts_diff = pic->pts - ctx->first_pts; if (ctx->output_delay > 0) - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = pic->pts; + ctx->ts_ring[ctx->input_order % + (3 * ctx->output_delay + ctx->async_depth)] = pic->pts; pic->display_order = ctx->input_order; ++ctx->input_order; @@ -1212,7 +1213,8 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) return AVERROR(EAGAIN); } - while (av_fifo_size(ctx->encode_fifo) <= MAX_PICTURE_REFERENCES * sizeof(VAAPIEncodePicture *)) { + while (av_fifo_size(ctx->encode_fifo) < + MAX_ASYNC_DEPTH * sizeof(VAAPIEncodePicture *)) { pic = NULL; err = vaapi_encode_pick_next(avctx, &pic); if (err < 0) @@ -1234,6 +1236,14 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) if (!av_fifo_size(ctx->encode_fifo)) return err; + if (av_fifo_size(ctx->encode_fifo) < ctx->async_depth * sizeof(VAAPIEncodePicture *) && + !ctx->end_of_stream) { + av_fifo_generic_peek(ctx->encode_fifo, &pic, sizeof(pic), NULL); + err = vaapi_encode_wait(avctx, pic, 0); + if (err < 0) + return err; + } + av_fifo_generic_read(ctx->encode_fifo, &pic, sizeof(pic), NULL); ctx->encode_order = pic->encode_order + 1; @@ -1252,7 +1262,7 @@ int ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) pkt->dts = ctx->ts_ring[pic->encode_order] - ctx->dts_pts_diff; } else { pkt->dts = ctx->ts_ring[(pic->encode_order - ctx->decode_delay) % - (3 * ctx->output_delay)]; + (3 * ctx->output_delay + ctx->async_depth)]; } av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" dts %"PRId64".\n", pkt->pts, pkt->dts); @@ -2566,8 +2576,8 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx) } } - ctx->encode_fifo = av_fifo_alloc((MAX_PICTURE_REFERENCES + 1) * - sizeof(VAAPIEncodePicture *)); + ctx->encode_fifo = av_fifo_alloc(MAX_ASYNC_DEPTH * + sizeof(VAAPIEncodePicture *)); if (!ctx->encode_fifo) return AVERROR(ENOMEM); diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h index 89fe8de466..1bf5d7c337 100644 --- a/libavcodec/vaapi_encode.h +++ b/libavcodec/vaapi_encode.h @@ -48,6 +48,7 @@ enum { MAX_TILE_ROWS = 22, // A.4.1: table A.6 allows at most 20 tile columns for any level. MAX_TILE_COLS = 20, + MAX_ASYNC_DEPTH = 64, }; extern const AVCodecHWConfigInternal *const ff_vaapi_encode_hw_configs[]; @@ -298,7 +299,8 @@ typedef struct VAAPIEncodeContext { // Timestamp handling. int64_t first_pts; int64_t dts_pts_diff; - int64_t ts_ring[MAX_REORDER_DELAY * 3]; + int64_t ts_ring[MAX_REORDER_DELAY * 3 + + MAX_ASYNC_DEPTH]; // Slice structure. int slice_block_rows; @@ -348,6 +350,8 @@ typedef struct VAAPIEncodeContext { AVFrame *frame; AVFifoBuffer *encode_fifo; + + int async_depth; } VAAPIEncodeContext; enum { @@ -458,7 +462,11 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); { "b_depth", \ "Maximum B-frame reference depth", \ OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ - { .i64 = 1 }, 1, INT_MAX, FLAGS } + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ + { "async_depth", "Maximum processing parallelism. " \ + "Increase this to improve single channel performance", \ + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ + { .i64 = 4 }, 0, MAX_ASYNC_DEPTH, FLAGS } #define VAAPI_ENCODE_RC_MODE(name, desc) \ { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, \