From patchwork Fri Apr 29 07:59:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Fei W" X-Patchwork-Id: 35501 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a885:b0:7f:4be2:bd17 with SMTP id ca5csp153272pzb; Fri, 29 Apr 2022 01:06:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyxkCt9YmfmVH5jpXOdj8zP6l2cED8PPR6P6gKV1+vkpeSCdEvTsDKkm277V2Yn3oLucZCf X-Received: by 2002:a05:6402:b2e:b0:425:d3f7:4c1a with SMTP id bo14-20020a0564020b2e00b00425d3f74c1amr31844551edb.366.1651219594146; Fri, 29 Apr 2022 01:06:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651219594; cv=none; d=google.com; s=arc-20160816; b=wnsEaI/iUyQ4IfUTO2aUs0Zi9KihU31eVXQMUcLH7q9zC69Bg9FXRZdW6Fa+k6XjfS uToIS26RQc/aEMoY6V4j8UMXT82RqbZ/0V8q96/k6Na65kZGFH1Uw5WzasB0EULBEvye ku+qHjb0o3kFejwD+9RdGMcjJfjGnwXyAdBdCLhPwobwk4lwEVLe7AS4y6XyvraFNF9+ r1Y94rIJ8i6LfBhCWklYeRRl+AbTTPTz5VSfmL2E+69nIC03kroSxUiW/X6keYfo4tIW V4y2Qc5LuQskK+XJi0hAHy5hTFbczTIiWDLOF9Tuw6ytXj7nWXZ2BOEk6PTaiVLvfeyV bkew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=xaD9MM+Jgo51/x1RrzgecLM0S9ETYUhKEfrjgVKWJco=; b=MQVKWwvGElPcC3q+dm2+i0KMHxweHRebBhq5ZgnVYVqVS/4HIXQTFGPRFhZGyHAzIC zItEjtcA3rkVjxUqxkaKmE7C58JEXdUdP7ZbjDFTQAgdBHB11hFMYQcXZVfLuo+TTNql 3Cx3ScSbJlintov+JXJAIZlkxVZX/AEP3Xp8TECY6odwpA2pP8O7GC5ZhsrrnJdjrHcI 1qq4ZTLq45Bu5EVnRFyxdwKcy8PZkywTJarmdjB+/pWwiTR1TyiHqbFPb8h1P0CRoIVs ZJsmXl8+hS26d/1KA0Hoi3NGy29WMEzouxO/tDsdkUXkmo7yJat8vrPkwmOlhtOaU8LK yNSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=MGKkzdeh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i21-20020a170906115500b006f3b767ac0dsi5086394eja.686.2022.04.29.01.06.33; Fri, 29 Apr 2022 01:06:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=MGKkzdeh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1648F68B2ED; Fri, 29 Apr 2022 11:06:13 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 09BE768AF4F for ; Fri, 29 Apr 2022 11:06:05 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651219571; x=1682755571; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AJmj8AocsYZq2/ikHtwjRxujjVTX2XfKLUpYuO4K8X8=; b=MGKkzdehMPe7y/k18axzY1UzP0jpyVyTIcv5FYXydZ1QCN9qKzr1Mqot oMsLi3Pze1LtPgtkm2fK5T2Bhq++k3tqU19Wnicb3jgbfE/8BwiwxwGzB phQJvK7EE0YrK/fsh4ZXamb6Yto2gfEGMNV46C3ZmVQUwvcyGO/ADLoWP 6VUGdcLMpDliLBdAtaKCoglfUW/fe7GKKcHupv6kykQ6FE9sQsczpg6If oiQ4GGbCR9vnJYNLmeOeELR8ihukREp2snHQRR2+lWOJ+qunZDESK6Jzq xEIAuBJdjV+oYayIdg0cJYCWnAp8d5Ll/zp+j/8m9nVjnE95dzke32BDG Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10331"; a="327065237" X-IronPort-AV: E=Sophos;i="5.91,297,1647327600"; d="scan'208";a="327065237" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 01:06:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,297,1647327600"; d="scan'208";a="581977675" Received: from t.sh.intel.com ([10.239.159.147]) by orsmga008.jf.intel.com with ESMTP; 29 Apr 2022 01:06:01 -0700 From: Fei Wang To: ffmpeg-devel@ffmpeg.org Date: Fri, 29 Apr 2022 15:59:40 +0800 Message-Id: <20220429075941.1844370-3-fei.w.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220429075941.1844370-1-fei.w.wang@intel.com> References: <20220429075941.1844370-1-fei.w.wang@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v1 3/4] lavc/hevc_vaapi: enable sub frame support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Fei Wang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gqe7vsLHUxyZ Intel HW provide a feature that allows decoder output another scaled frame beside original frame. And the scaled frame will attach to main frame as sub frame side data. The use case is mainly for video analysis. For example, scaled down frame can be used for analysis, and the result can be applied back to main frame. Normally, we use scale_vaapi for scaling in vaapi transcode pipeline if want to get a smaller resolution frame. While now sub frame can be used instead. For some platforms, the sub frame scaling is much more faster than scale_vaapi. For example, the decode + sub frame cmd will improve ~50% performance than decode + scaling on my DG2 i3-11100B@3.6GHz. decode + sub frame cmd: ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format vaapi -export_side_data sub_frame \ -sub_frame_opts "width=300:height=300:format=nv12" \ -i 1920x1080.h265 -f null - & decode + scaling cmd: ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format vaapi -i 1920x1080.h265 \ -vf 'scale_vaapi=w=300:h=300:format=nv12' -f null - & Signed-off-by: Fei Wang --- libavcodec/vaapi_decode.c | 46 ++++++++++++++++++++++++++- libavcodec/vaapi_decode.h | 4 +++ libavcodec/vaapi_hevc.c | 32 ++++++++++++++++++- libavutil/hwcontext_vaapi.c | 62 +++++++++++++++++++++++++++++++++++-- libavutil/hwcontext_vaapi.h | 15 ++++++++- 5 files changed, 153 insertions(+), 6 deletions(-) diff --git a/libavcodec/vaapi_decode.c b/libavcodec/vaapi_decode.c index a7abddb06b..920bab1ef4 100644 --- a/libavcodec/vaapi_decode.c +++ b/libavcodec/vaapi_decode.c @@ -160,6 +160,10 @@ int ff_vaapi_decode_issue(AVCodecContext *avctx, av_log(avctx, AV_LOG_DEBUG, "Decode to surface %#x.\n", pic->output_surface); + if (ctx->hwfc->enable_sub_frame) + av_log(avctx, AV_LOG_DEBUG, "Decode sub frame to surface %#x.\n", + pic->sub_frame_surface); + vas = vaBeginPicture(ctx->hwctx->display, ctx->va_context, pic->output_surface); if (vas != VA_STATUS_SUCCESS) { @@ -440,6 +444,9 @@ static int vaapi_decode_make_config(AVCodecContext *avctx, AVHWDeviceContext *device = (AVHWDeviceContext*)device_ref->data; AVVAAPIDeviceContext *hwctx = device->hwctx; + VAConfigAttrib attr; + int attr_num = 0, support_dec_processing = 0; + codec_desc = avcodec_descriptor_get(avctx->codec_id); if (!codec_desc) { err = AVERROR(EINVAL); @@ -518,8 +525,23 @@ static int vaapi_decode_make_config(AVCodecContext *avctx, } } + if (avctx->export_side_data & AV_CODEC_EXPORT_DATA_SUB_FRAME) { + attr.type = VAConfigAttribDecProcessing; + vas = vaGetConfigAttributes(hwctx->display, matched_va_profile, + VAEntrypointVLD, &attr, 1); + if (vas != VA_STATUS_SUCCESS) { + av_log(avctx, AV_LOG_ERROR, "Failed to query decode process " + "attributes: %d (%s).\n", vas, vaErrorStr(vas)); + return AVERROR_EXTERNAL; + } else if (attr.value | VA_DEC_PROCESSING) { + support_dec_processing = 1; + attr_num++; + } else + av_log(avctx, AV_LOG_WARNING, "Hardware doesn't support decode processing.\n"); + } + vas = vaCreateConfig(hwctx->display, matched_va_profile, - VAEntrypointVLD, NULL, 0, + VAEntrypointVLD, &attr, attr_num, va_config); if (vas != VA_STATUS_SUCCESS) { av_log(avctx, AV_LOG_ERROR, "Failed to create decode " @@ -564,10 +586,32 @@ static int vaapi_decode_make_config(AVCodecContext *avctx, if (frames_ref) { AVHWFramesContext *frames = (AVHWFramesContext *)frames_ref->data; + AVVAAPIFramesContext *avfc = frames->hwctx; frames->format = AV_PIX_FMT_VAAPI; frames->width = avctx->coded_width; frames->height = avctx->coded_height; + avfc->enable_sub_frame = support_dec_processing; + + if (avfc->enable_sub_frame) { + avfc->sub_frame_width = avctx->coded_width; + avfc->sub_frame_height = avctx->coded_height; + avfc->sub_frame_sw_format = AV_PIX_FMT_NV12; + if (avctx->sub_frame_opts) { + AVDictionaryEntry *e = NULL; + while ((e = av_dict_get(avctx->sub_frame_opts, "", e, AV_DICT_IGNORE_SUFFIX))) { + if (!strcmp(e->key, "width")) + avfc->sub_frame_width= atoi(e->value); + else if (!strcmp(e->key, "height")) + avfc->sub_frame_height = atoi(e->value); + else if (!strcmp(e->key, "format")) + avfc->sub_frame_sw_format = av_get_pix_fmt(e->value); + } + } + av_log(avctx, AV_LOG_DEBUG, "Sub frame set with width:%d, height:%d, " + "format:%s.\n", avfc->sub_frame_width, avfc->sub_frame_height, + av_get_pix_fmt_name(avfc->sub_frame_sw_format)); + } err = vaapi_decode_find_best_format(avctx, device, *va_config, frames); diff --git a/libavcodec/vaapi_decode.h b/libavcodec/vaapi_decode.h index 6beda14e52..fbac7e7a8e 100644 --- a/libavcodec/vaapi_decode.h +++ b/libavcodec/vaapi_decode.h @@ -45,6 +45,10 @@ typedef struct VAAPIDecodePicture { int nb_slices; VABufferID *slice_buffers; int slices_allocated; + + VASurfaceID sub_frame_surface; + VARectangle sub_frame_src; + VARectangle sub_frame_dst; } VAAPIDecodePicture; typedef struct VAAPIDecodeContext { diff --git a/libavcodec/vaapi_hevc.c b/libavcodec/vaapi_hevc.c index 9083331c45..209a302a2c 100644 --- a/libavcodec/vaapi_hevc.c +++ b/libavcodec/vaapi_hevc.c @@ -38,6 +38,7 @@ typedef struct VAAPIDecodePictureHEVC { VAPictureParameterBufferHEVC pic_param; VASliceParameterBufferHEVC last_slice_param; #endif + VAProcPipelineParameterBuffer proc_param; const uint8_t *last_buffer; size_t last_size; @@ -122,8 +123,8 @@ static int vaapi_hevc_start_frame(AVCodecContext *avctx, VAAPIDecodePictureHEVC *pic = h->ref->hwaccel_picture_private; const HEVCSPS *sps = h->ps.sps; const HEVCPPS *pps = h->ps.pps; - const ScalingList *scaling_list = NULL; + AVFrameSideData *sd; int pic_param_size, err, i; VAPictureParameterBufferHEVC *pic_param = (VAPictureParameterBufferHEVC *)&pic->pic_param; @@ -285,6 +286,35 @@ static int vaapi_hevc_start_frame(AVCodecContext *avctx, goto fail; } + sd = av_frame_get_side_data(h->ref->frame, AV_FRAME_DATA_SUB_FRAME); + if (sd) { + VAProcPipelineParameterBuffer *proc_param = &pic->proc_param; + AVFrame *sub_frame = (AVFrame *)sd->data; + + memset(proc_param, 0, sizeof(VAProcPipelineParameterBuffer)); + + pic->pic.sub_frame_src.x = pic->pic.sub_frame_src.y = 0; + pic->pic.sub_frame_src.width = sps->width; + pic->pic.sub_frame_src.height = sps->height; + + pic->pic.sub_frame_dst.x = pic->pic.sub_frame_dst.y = 0; + pic->pic.sub_frame_dst.width = sub_frame->width; + pic->pic.sub_frame_dst.height = sub_frame->height; + + pic->pic.sub_frame_surface = ff_vaapi_get_surface_id(sub_frame); + proc_param->surface = pic->pic.output_surface; + proc_param->surface_region = &pic->pic.sub_frame_src; + proc_param->output_region = &pic->pic.sub_frame_dst; + proc_param->additional_outputs = &pic->pic.sub_frame_surface; + proc_param->num_additional_outputs = 1; + + err = ff_vaapi_decode_make_param_buffer(avctx, &pic->pic, + VAProcPipelineParameterBufferType, + &pic->proc_param, sizeof(VAProcPipelineParameterBuffer)); + if (err < 0) + goto fail; + } + return 0; fail: diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c index c3a98bc4b1..1b3b487738 100644 --- a/libavutil/hwcontext_vaapi.c +++ b/libavutil/hwcontext_vaapi.c @@ -49,8 +49,7 @@ #include "hwcontext_vaapi.h" #include "mem.h" #include "pixdesc.h" -#include "pixfmt.h" - +#include "sub_frame_metadata.h" typedef struct VAAPIDevicePriv { #if HAVE_VAAPI_X11 @@ -82,6 +81,8 @@ typedef struct VAAPIFramesContext { // Caches whether VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME_2 is unsupported for // surface imports. int prime_2_import_unsupported; + + AVBufferRef *sub_frames_ref; } VAAPIFramesContext; typedef struct VAAPIMapping { @@ -511,7 +512,7 @@ static AVBufferRef *vaapi_pool_alloc(void *opaque, size_t size) return ref; } -static int vaapi_frames_init(AVHWFramesContext *hwfc) +static int vaapi_hw_frames_init(AVHWFramesContext *hwfc) { AVVAAPIFramesContext *avfc = hwfc->hwctx; VAAPIFramesContext *ctx = hwfc->internal->priv; @@ -663,17 +664,57 @@ fail: return err; } +static int vaapi_frames_init(AVHWFramesContext *hwfc) +{ + VAAPIFramesContext *ctx = hwfc->internal->priv; + AVVAAPIFramesContext *avfc = hwfc->hwctx; + AVHWFramesContext *sub_frames_ctx; + int ret; + + ret = vaapi_hw_frames_init(hwfc); + if (ret < 0) + return ret; + + if (avfc->enable_sub_frame){ + ctx->sub_frames_ref = av_hwframe_ctx_alloc(hwfc->device_ref); + if (!ctx->sub_frames_ref) { + return AVERROR(ENOMEM); + } + sub_frames_ctx = (AVHWFramesContext*)ctx->sub_frames_ref->data; + + sub_frames_ctx->width = avfc->sub_frame_width; + sub_frames_ctx->height = avfc->sub_frame_height; + sub_frames_ctx->format = AV_PIX_FMT_VAAPI; + sub_frames_ctx->sw_format = avfc->sub_frame_sw_format; + + ret = av_hwframe_ctx_init(ctx->sub_frames_ref); + if (ret < 0) { + av_buffer_unref(&ctx->sub_frames_ref); + av_log(hwfc, AV_LOG_ERROR, "Error to init sub frame hw context.\n"); + return ret; + } + } + + return 0; +} + static void vaapi_frames_uninit(AVHWFramesContext *hwfc) { AVVAAPIFramesContext *avfc = hwfc->hwctx; VAAPIFramesContext *ctx = hwfc->internal->priv; + av_buffer_unref(&ctx->sub_frames_ref); av_freep(&avfc->surface_ids); av_freep(&ctx->attributes); } static int vaapi_get_buffer(AVHWFramesContext *hwfc, AVFrame *frame) { + VAAPIFramesContext *ctx = hwfc->internal->priv; + AVVAAPIFramesContext *avfc = hwfc->hwctx; + AVFrame *sub_frame; + int ret; + frame->buf[0] = av_buffer_pool_get(hwfc->pool); if (!frame->buf[0]) return AVERROR(ENOMEM); @@ -683,6 +724,21 @@ static int vaapi_get_buffer(AVHWFramesContext *hwfc, AVFrame *frame) frame->width = hwfc->width; frame->height = hwfc->height; + if (avfc->enable_sub_frame) { + if (!ctx->sub_frames_ref) + return AVERROR(ENOSYS); + + sub_frame = av_sub_frame_create_side_data(frame); + if (!sub_frame) + return AVERROR(ENOMEM); + + ret = av_hwframe_get_buffer(ctx->sub_frames_ref, sub_frame, 0); + if (ret < 0) { + av_log(ctx, AV_LOG_ERROR, "Can't get sub frame.\n"); + return ret; + } + } + return 0; } diff --git a/libavutil/hwcontext_vaapi.h b/libavutil/hwcontext_vaapi.h index 0b2e071cb3..aea0ec9263 100644 --- a/libavutil/hwcontext_vaapi.h +++ b/libavutil/hwcontext_vaapi.h @@ -19,6 +19,7 @@ #ifndef AVUTIL_HWCONTEXT_VAAPI_H #define AVUTIL_HWCONTEXT_VAAPI_H +#include "pixfmt.h" #include /** @@ -81,7 +82,7 @@ typedef struct AVVAAPIDeviceContext { } AVVAAPIDeviceContext; /** - * VAAPI-specific data associated with a frame pool. + * VAAPI-specific data associated with a frame pool and sub frame. * * Allocated as AVHWFramesContext.hwctx. */ @@ -100,6 +101,18 @@ typedef struct AVVAAPIFramesContext { */ VASurfaceID *surface_ids; int nb_surfaces; + + /** + * Set by the user to indicate if need to enable sub frame support. + */ + int enable_sub_frame; + + /** + * Sub frame width/height/format. Only avaliable if enable_sub_frame + * is true. + */ + int sub_frame_width, sub_frame_height; + enum AVPixelFormat sub_frame_sw_format; } AVVAAPIFramesContext; /**