From patchwork Fri May 27 13:51:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Cox X-Patchwork-Id: 35950 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6914:b0:82:6b11:2509 with SMTP id q20csp902608pzj; Fri, 27 May 2022 06:51:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz4twWZIoUcNlHoqc9nKwmSk993wud31E4DgZ+5w5aCsVrKZk3qsJtzcvojjxuyUvZZxNoc X-Received: by 2002:a17:906:c302:b0:6fe:a216:20a4 with SMTP id s2-20020a170906c30200b006fea21620a4mr34897675ejz.556.1653659492221; Fri, 27 May 2022 06:51:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653659492; cv=none; d=google.com; s=arc-20160816; b=c5h2ej8qEdbCicCgeBYpGtj8E2bIfuDuzEes1+uatYlr0l80GEc94RmwbP402Hm8dn xAmgk8XegsqcNUJWdZkXZD7MwwZntuG4VhdWOK/oapGtnbKTmj6+tLUG9Xt7yHHahDIS CXvo2GVYkOk/4CeWoH8pM6ZszbwqXGQUA8HUxX02i5tzlbZOIGP6PQEZuv0BfKcjmo9l WO8MArix2ukDDheGDIz7SqvxkNNPAMLxNzE5CZ/GCGmdS7WtCLvvCc2AOA583ZPJ6VzS 4yPGv9VanX8tlNej2VLtIMP/7zl6A2T3Q5o16Fu4vHC2B7YO7P1rUA5RMx2hcDX0zJaW NW6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:user-agent:message-id:date:to:from :dkim-signature:delivered-to; bh=iDtq2Lw1M07T38UEljoUSFbQPy8MNshYtfxqeRbcs80=; b=fgx+orBgymdRi4YAYdcfoJw3jeLnrlJWCwof360bdTUffMHZvISe2Y0FyhBPECqA01 1UEttpMN7/wjNYLP2jBPjri6wjvr9D57Pm3EyLYIYKFLFH8xUabAsbGM+MBqrcZP8dPQ J/icTO6LJTZPuxVrw2P2JGLpbH9eby8pnZhj2sJ/pt9LpnYaEoI1l58kGIb2RVBRuB/F CbqFFsr57TXh9HKdH3C3YPaJhSMD9xTa9iZbPYCtAXjpzWX2Kb0IADr+5QKBJpGdEm8s XqEQ++b3XolLWbWnbP2+e4H88fn7F6jUHRL1OoC7ExGgO/+qHcJCzw4onbsueIsYVmKG bFig== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20210112.gappssmtp.com header.s=20210112 header.b=H8Wh6dha; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id di7-20020a170906730700b006ff3474aa2dsi2665103ejc.562.2022.05.27.06.51.30; Fri, 27 May 2022 06:51:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kynesim-co-uk.20210112.gappssmtp.com header.s=20210112 header.b=H8Wh6dha; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 54E1B68B5A4; Fri, 27 May 2022 16:51:26 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 904F368B575 for ; Fri, 27 May 2022 16:51:19 +0300 (EEST) Received: by mail-wr1-f41.google.com with SMTP id t13so5954112wrg.9 for ; Fri, 27 May 2022 06:51:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kynesim-co-uk.20210112.gappssmtp.com; s=20210112; h=from:to:subject:date:message-id:user-agent:mime-version :content-transfer-encoding; bh=zMEIlhZPUMQ3vq7Uul60sQxDWgVuYcvOvkirAEDHoos=; b=H8Wh6dhaD43eHy/MskibnvaS7ry4l1UWLeK3sUlji24kb6LdfxjK9QS1J7/eZusJIg +D+9oThJ4J3Z/vlN2TrZjD/0CJcqhxdlax8vP1T4UlwF8gGQd1GGs2MjKbb9KwOY68eP 2PSdHUxIyYtmKPe714zpu7KeBt4Jq76rONXzvptpf6yXEyCy6JLoTngtghyD0vQ9gzVR bcZNKnQOwVllTVjW8KkrkwddqnzOysFnwG4gzMkcID/MFuLrcHy98znGaBZrASK1LOo2 hDLTvkVcYJLcRR8pV8KKS9FZc55zXJ6piyD1qua1GOpDJaZGQUbc2ysuSFobzYNx+9Sg B1LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:user-agent :mime-version:content-transfer-encoding; bh=zMEIlhZPUMQ3vq7Uul60sQxDWgVuYcvOvkirAEDHoos=; b=42Mfj9Zo3yJWkKe2SaFWaA+a2kDDa0tdbanCyz32Anm9hOedaxrCUsBLUJZRwtXoVy xTOspC8ZPEbwmNbWY4xv/dyXuZYSPSneaPJufECN7JUgDRsqy5AG0b2e/AQjtle1k5/O Zwa4NMai0iOZvbFruIWWqUahxDtgPiV5mACnxsni7s8NHjCEPcOmijTDb5MCIRkLeYZn lsTQfblL+Axp04iTd9xUNFLJrVix24/sj99MPvIQ8/uwn4vsD2FwMrNc5EDvis4N1jgG LSeYFVogWoTXhUMFZtLoslli76FKgN+kCUmt5q3qWG92+w3wbCpO0BTA18iqL9DCu0f+ h2Qw== X-Gm-Message-State: AOAM531OUZ3h0W0N8CwCGj+0OtADl8i1uumohM38+q+YA/62kANrkjVe gzAfjtDRmhIpoWSbWxHHTNlpzl+mMhPWXBHc X-Received: by 2002:a5d:680b:0:b0:20d:932:8d55 with SMTP id w11-20020a5d680b000000b0020d09328d55mr36110221wru.389.1653659478818; Fri, 27 May 2022 06:51:18 -0700 (PDT) Received: from CTHALPA.outer.uphall.net (cpc1-cmbg20-2-0-cust759.5-4.cable.virginm.net. [86.21.218.248]) by smtp.gmail.com with ESMTPSA id c5-20020a056000104500b0020c5253d8d3sm1861372wrx.31.2022.05.27.06.51.17 for (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Fri, 27 May 2022 06:51:18 -0700 (PDT) From: John Cox To: FFmpeg development discussions and patches Date: Fri, 27 May 2022 14:51:17 +0100 Message-ID: User-Agent: ForteAgent/8.00.32.1272 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] hevc: If hwccel avoid creation/use of s/w only arrays X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 92TY8Wr07gX1 Hwaccel doesn't use any of the block strength, pcm, slice address, etc. arrays which can be >100k each for 4k video. Patch to avoid initial allocation and zeroing at the start of every frame. On a Pi4 the memsets can use 10% CPU on 4k 60Hz decode, this fixes that. Signed-off-by: John Cox --- libavcodec/hevc_refs.c | 35 +++++++++++++++++++++-------------- libavcodec/hevcdec.c | 42 +++++++++++++++++++++++++++++------------- 2 files changed, 50 insertions(+), 27 deletions(-) -- 2.34.1 diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c index fe18ca2b1d..ab3103f66c 100644 --- a/libavcodec/hevc_refs.c +++ b/libavcodec/hevc_refs.c @@ -97,18 +97,22 @@ static HEVCFrame *alloc_frame(HEVCContext *s) if (!frame->rpl_buf) goto fail; - frame->tab_mvf_buf = av_buffer_pool_get(s->tab_mvf_pool); - if (!frame->tab_mvf_buf) - goto fail; - frame->tab_mvf = (MvField *)frame->tab_mvf_buf->data; + if (s->tab_mvf_pool) { + frame->tab_mvf_buf = av_buffer_pool_get(s->tab_mvf_pool); + if (!frame->tab_mvf_buf) + goto fail; + frame->tab_mvf = (MvField *)frame->tab_mvf_buf->data; + } - frame->rpl_tab_buf = av_buffer_pool_get(s->rpl_tab_pool); - if (!frame->rpl_tab_buf) - goto fail; - frame->rpl_tab = (RefPicListTab **)frame->rpl_tab_buf->data; - frame->ctb_count = s->ps.sps->ctb_width * s->ps.sps->ctb_height; - for (j = 0; j < frame->ctb_count; j++) - frame->rpl_tab[j] = (RefPicListTab *)frame->rpl_buf->data; + if (s->rpl_tab_pool) { + frame->rpl_tab_buf = av_buffer_pool_get(s->rpl_tab_pool); + if (!frame->rpl_tab_buf) + goto fail; + frame->rpl_tab = (RefPicListTab **)frame->rpl_tab_buf->data; + frame->ctb_count = s->ps.sps->ctb_width * s->ps.sps->ctb_height; + for (j = 0; j < frame->ctb_count; j++) + frame->rpl_tab[j] = (RefPicListTab *)frame->rpl_buf->data; + } frame->frame->top_field_first = s->sei.picture_timing.picture_struct == AV_PICTURE_STRUCTURE_TOP_FIELD; frame->frame->interlaced_frame = (s->sei.picture_timing.picture_struct == AV_PICTURE_STRUCTURE_TOP_FIELD) || (s->sei.picture_timing.picture_struct == AV_PICTURE_STRUCTURE_BOTTOM_FIELD); @@ -283,14 +287,17 @@ static int init_slice_rpl(HEVCContext *s) int ctb_count = frame->ctb_count; int ctb_addr_ts = s->ps.pps->ctb_addr_rs_to_ts[s->sh.slice_segment_addr]; int i; + RefPicListTab * const tab = (RefPicListTab *)frame->rpl_buf->data + s->slice_idx; if (s->slice_idx >= frame->rpl_buf->size / sizeof(RefPicListTab)) return AVERROR_INVALIDDATA; - for (i = ctb_addr_ts; i < ctb_count; i++) - frame->rpl_tab[i] = (RefPicListTab *)frame->rpl_buf->data + s->slice_idx; + if (frame->rpl_tab) { + for (i = ctb_addr_ts; i < ctb_count; i++) + frame->rpl_tab[i] = tab; + } - frame->refPicList = (RefPicList *)frame->rpl_tab[ctb_addr_ts]; + frame->refPicList = tab->refPicList; return 0; } diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c index f782ea6394..48b059ce45 100644 --- a/libavcodec/hevcdec.c +++ b/libavcodec/hevcdec.c @@ -504,6 +504,16 @@ static int set_sps(HEVCContext *s, const HEVCSPS *sps, if (!sps) return 0; + // If hwaccel then we don't need all the s/w decode helper arrays + if (s->avctx->hwaccel) { + export_stream_params(s, sps); + + s->avctx->pix_fmt = pix_fmt; + s->ps.sps = sps; + s->ps.vps = (HEVCVPS*) s->ps.vps_list[s->ps.sps->vps_id]->data; + return 0; + } + ret = pic_arrays_init(s, sps); if (ret < 0) goto fail; @@ -3008,11 +3018,13 @@ static int hevc_frame_start(HEVCContext *s) ((s->ps.sps->height >> s->ps.sps->log2_min_cb_size) + 1); int ret; - memset(s->horizontal_bs, 0, s->bs_width * s->bs_height); - memset(s->vertical_bs, 0, s->bs_width * s->bs_height); - memset(s->cbf_luma, 0, s->ps.sps->min_tb_width * s->ps.sps->min_tb_height); - memset(s->is_pcm, 0, (s->ps.sps->min_pu_width + 1) * (s->ps.sps->min_pu_height + 1)); - memset(s->tab_slice_address, -1, pic_size_in_ctb * sizeof(*s->tab_slice_address)); + if (s->horizontal_bs) { + memset(s->horizontal_bs, 0, s->bs_width * s->bs_height); + memset(s->vertical_bs, 0, s->bs_width * s->bs_height); + memset(s->cbf_luma, 0, s->ps.sps->min_tb_width * s->ps.sps->min_tb_height); + memset(s->is_pcm, 0, (s->ps.sps->min_pu_width + 1) * (s->ps.sps->min_pu_height + 1)); + memset(s->tab_slice_address, -1, pic_size_in_ctb * sizeof(*s->tab_slice_address)); + } s->is_decoded = 0; s->first_nal_type = s->nal_unit_type; @@ -3555,15 +3567,19 @@ static int hevc_ref_frame(HEVCContext *s, HEVCFrame *dst, HEVCFrame *src) dst->needs_fg = 1; } - dst->tab_mvf_buf = av_buffer_ref(src->tab_mvf_buf); - if (!dst->tab_mvf_buf) - goto fail; - dst->tab_mvf = src->tab_mvf; + if (src->tab_mvf_buf) { + dst->tab_mvf_buf = av_buffer_ref(src->tab_mvf_buf); + if (!dst->tab_mvf_buf) + goto fail; + dst->tab_mvf = src->tab_mvf; + } - dst->rpl_tab_buf = av_buffer_ref(src->rpl_tab_buf); - if (!dst->rpl_tab_buf) - goto fail; - dst->rpl_tab = src->rpl_tab; + if (src->rpl_tab_buf) { + dst->rpl_tab_buf = av_buffer_ref(src->rpl_tab_buf); + if (!dst->rpl_tab_buf) + goto fail; + dst->rpl_tab = src->rpl_tab; + } dst->rpl_buf = av_buffer_ref(src->rpl_buf); if (!dst->rpl_buf)