From patchwork Sun Sep 22 03:55:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lance Wang X-Patchwork-Id: 15222 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id A8B7344817F for ; Sun, 22 Sep 2019 06:57:08 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 74457689917; Sun, 22 Sep 2019 06:57:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9D78B688158 for ; Sun, 22 Sep 2019 06:57:02 +0300 (EEST) Received: by mail-pf1-f196.google.com with SMTP id a2so6970903pfo.10 for ; Sat, 21 Sep 2019 20:57:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=PLzyEZtDqCRlmoF1/mPaapGXWUwAF3cN0lO9FDw02zk=; b=XY5vCb0uO3uK6c7vWTEOZb8qf6NRoD5UW4zJHoAFyxGDHfR3PyGT9L2xBvXfMOoG9H xt8iwUxzTRgdVvMuoCjRj/48hwLv6L8ArfHgnjHB1zgNrlXW32EE04Pq+SRjH8GxxJ0K uQQssAU2rKw+u/5Kf/qKnX3tlnuoaRRrVjlDiIqpdEV93etWSuSi3CdL7wfWiHKDbyXh hxvw8y17QYydhxDLYivpudzFph4MDW7FAdj4GvPMlJJGFeyBNXcvxsHDe6Tih0ZMFw3r Y44qZLKd3en//NtvUYWD/fmns+aVkT6vviTrQTLipTZLA2OK2CIiHW7hDun7/DaoHZWi Xy4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=PLzyEZtDqCRlmoF1/mPaapGXWUwAF3cN0lO9FDw02zk=; b=Xlg8h1YzQTv1CTSfXPJr04/4yr2GOfP+9vmY/qVy+MsaSYJlotqJSVrEAQSdjtU91e Z/rEnj9Fk3/XTpyr7YQvsgKLamspVnRiTZT2Nzyau4cyBeKLC5kx2xyTsBtRaRRxBBeT tcu+mnrtlIyV3Cv1UtfZ3s7l98vR5mD6eBtRLhpmVuYi1HUH1vkcOV7fYguA3MqjNNm0 AMKQo+WLwSXR/P5J8WtrXtip7E8VitexA4npLVpu2h6TZszgSI9vjV4To0XHHB/OGVz0 RPbrQkeVCPbpU1WIcQxAF57Y8/7ytiyGXLfzHK2z26uFnVMtd8tsIOo8B0iOXkaBYLBH wITw== X-Gm-Message-State: APjAAAWpk1YydDj9yRRBMqVzuTXohbqhIWYSWMI0eWsF/tCL1Iw344Lx g0tq2N2+3b9ZSVV77kqXiOFyCzt1 X-Google-Smtp-Source: APXvYqyVwIsJPc60le9dNtBEXhD6R2Hr2N+iv+fpP9JmQVTlyuUDklmlzC9a8N/HDfwhnwasvwyHIw== X-Received: by 2002:a62:7d54:: with SMTP id y81mr26740262pfc.86.1569124620419; Sat, 21 Sep 2019 20:57:00 -0700 (PDT) Received: from vpn.localdomain ([47.90.99.151]) by smtp.gmail.com with ESMTPSA id ep10sm43575810pjb.2.2019.09.21.20.56.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 21 Sep 2019 20:56:59 -0700 (PDT) From: lance.lmwang@gmail.com To: ffmpeg-devel@ffmpeg.org Date: Sun, 22 Sep 2019 11:55:49 +0800 Message-Id: <20190922035549.1023-1-lance.lmwang@gmail.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20190921031838.12849-1-lance.lmwang@gmail.com> References: <20190921031838.12849-1-lance.lmwang@gmail.com> Subject: [FFmpeg-devel] [PATCH v4 2/2] avcodec/v210dec: add the frame and slice threading support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Limin Wang MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Limin Wang The multithread is avoid one core cpu is full with other filter like scale etc. About the performance, the gain is very small, below is my testing for performance. In order to avoid the disk bottleneck, I'll use stream_loop mode for 10 frame only. ./ffmpeg -y -i ~/Movies/4k_Rec709_ProResHQ.mov -c:v v210 -f rawvideo -frames 10 ~/Movies/1.v210 master: ./ffmpeg -threads 1 -s 4096x3072 -stream_loop 100 -i ~/Movies/1.v210 -benchmark -f null - frame= 1010 fps= 42 q=-0.0 Lsize=N/A time=00:00:40.40 bitrate=N/A speed=1.69x video:529kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=10.082s stime=13.784s rtime=23.889s bench: maxrss=147836928kB patch applied: ./ffmpeg -threads 4 -thread_type frame+slice -s 4096x3072 -stream_loop 100 -i ~/Movies/1.v210 -benchmark -f null - frame= 1010 fps= 55 q=-0.0 Lsize=N/A time=00:00:40.40 bitrate=N/A speed=2.22x video:529kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=11.407s stime=17.258s rtime=18.279s bench: maxrss=442884096kB Signed-off-by: Limin Wang --- libavcodec/v210dec.c | 131 ++++++++++++++++++++++++++++++++------------------- 1 file changed, 83 insertions(+), 48 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index 6ce18aa..2e46342 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210dec.c @@ -28,6 +28,7 @@ #include "libavutil/internal.h" #include "libavutil/mem.h" #include "libavutil/intreadwrite.h" +#include "thread.h" #define READ_PIXELS(a, b, c) \ do { \ @@ -37,6 +38,12 @@ *c++ = (val >> 20) & 0x3FF; \ } while (0) +typedef struct ThreadData { + AVFrame *frame; + uint8_t *buf; + int stride; +} ThreadData; + static void v210_planar_unpack_c(const uint32_t *src, uint16_t *y, uint16_t *u, uint16_t *v, int width) { uint32_t val; @@ -70,55 +77,28 @@ static av_cold int decode_init(AVCodecContext *avctx) return 0; } -static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, - AVPacket *avpkt) +static int v210_decode_slice(AVCodecContext *avctx, void *arg, int jobnr, int threadnr) { V210DecContext *s = avctx->priv_data; - - int h, w, ret, stride, aligned_input; - AVFrame *pic = data; - const uint8_t *psrc = avpkt->data; + int h, w; + ThreadData *td = arg; + AVFrame *frame = td->frame; + int stride = td->stride; + int slice_h = avctx->height / avctx->thread_count; + int slice_m = avctx->height % avctx->thread_count; + int slice_start = jobnr * slice_h; + int slice_end = slice_start + slice_h; + const uint8_t *psrc = td->buf + stride * slice_start; uint16_t *y, *u, *v; - if (s->custom_stride ) - stride = s->custom_stride; - else { - int aligned_width = ((avctx->width + 47) / 48) * 48; - stride = aligned_width * 8 / 3; - } - - if (avpkt->size < stride * avctx->height) { - if ((((avctx->width + 23) / 24) * 24 * 8) / 3 * avctx->height == avpkt->size) { - stride = avpkt->size / avctx->height; - if (!s->stride_warning_shown) - av_log(avctx, AV_LOG_WARNING, "Broken v210 with too small padding (64 byte) detected\n"); - s->stride_warning_shown = 1; - } else { - av_log(avctx, AV_LOG_ERROR, "packet too small\n"); - return AVERROR_INVALIDDATA; - } - } - if (avctx->codec_tag == MKTAG('C', '2', '1', '0') - && AV_RN32(psrc) == AV_RN32("INFO") - && avpkt->size - 64 >= stride * avctx->height) - psrc += 64; - - aligned_input = !((uintptr_t)psrc & 0x1f) && !(stride & 0x1f); - if (aligned_input != s->aligned_input) { - s->aligned_input = aligned_input; - ff_v210dec_init(s); - } - - if ((ret = ff_get_buffer(avctx, pic, 0)) < 0) - return ret; - - y = (uint16_t*)pic->data[0]; - u = (uint16_t*)pic->data[1]; - v = (uint16_t*)pic->data[2]; - pic->pict_type = AV_PICTURE_TYPE_I; - pic->key_frame = 1; + /* add the remaining slice for the last job */ + if (jobnr == avctx->thread_count - 1) + slice_end += slice_m; - for (h = 0; h < avctx->height; h++) { + y = (uint16_t*)frame->data[0] + slice_start * frame->linesize[0] / 2; + u = (uint16_t*)frame->data[1] + slice_start * frame->linesize[1] / 2; + v = (uint16_t*)frame->data[2] + slice_start * frame->linesize[2] / 2; + for (h = slice_start; h < slice_end; h++) { const uint32_t *src = (const uint32_t*)psrc; uint32_t val; @@ -154,10 +134,63 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, } psrc += stride; - y += pic->linesize[0] / 2 - avctx->width + (avctx->width & 1); - u += pic->linesize[1] / 2 - avctx->width / 2; - v += pic->linesize[2] / 2 - avctx->width / 2; + y += frame->linesize[0] / 2 - avctx->width + (avctx->width & 1); + u += frame->linesize[1] / 2 - avctx->width / 2; + v += frame->linesize[2] / 2 - avctx->width / 2; + } + + return 0; +} + +static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, + AVPacket *avpkt) +{ + V210DecContext *s = avctx->priv_data; + ThreadData td; + int ret, stride, aligned_input; + ThreadFrame frame = { .f = data }; + AVFrame *pic = data; + const uint8_t *psrc = avpkt->data; + + if (s->custom_stride ) + stride = s->custom_stride; + else { + int aligned_width = ((avctx->width + 47) / 48) * 48; + stride = aligned_width * 8 / 3; + } + td.stride = stride; + + if (avpkt->size < stride * avctx->height) { + if ((((avctx->width + 23) / 24) * 24 * 8) / 3 * avctx->height == avpkt->size) { + stride = avpkt->size / avctx->height; + if (!s->stride_warning_shown) + av_log(avctx, AV_LOG_WARNING, "Broken v210 with too small padding (64 byte) detected\n"); + s->stride_warning_shown = 1; + } else { + av_log(avctx, AV_LOG_ERROR, "packet too small\n"); + return AVERROR_INVALIDDATA; + } } + if (avctx->codec_tag == MKTAG('C', '2', '1', '0') + && AV_RN32(psrc) == AV_RN32("INFO") + && avpkt->size - 64 >= stride * avctx->height) + psrc += 64; + + aligned_input = !((uintptr_t)psrc & 0x1f) && !(stride & 0x1f); + if (aligned_input != s->aligned_input) { + s->aligned_input = aligned_input; + ff_v210dec_init(s); + } + + if ((ret = ff_thread_get_buffer(avctx, &frame, 0)) < 0) + return ret; + + pic->pict_type = AV_PICTURE_TYPE_I; + pic->key_frame = 1; + + td.buf = (uint8_t*)psrc; + td.frame = pic; + avctx->execute2(avctx, v210_decode_slice, &td, NULL, avctx->thread_count); if (avctx->field_order > AV_FIELD_PROGRESSIVE) { /* we have interlaced material flagged in container */ @@ -193,6 +226,8 @@ AVCodec ff_v210_decoder = { .priv_data_size = sizeof(V210DecContext), .init = decode_init, .decode = decode_frame, - .capabilities = AV_CODEC_CAP_DR1, + .capabilities = AV_CODEC_CAP_DR1 | + AV_CODEC_CAP_SLICE_THREADS | + AV_CODEC_CAP_FRAME_THREADS, .priv_class = &v210dec_class, };