From patchwork Sat Sep 21 04:03:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lance Wang X-Patchwork-Id: 15202 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 3987444A59D for ; Sat, 21 Sep 2019 07:03:47 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1F74A68A609; Sat, 21 Sep 2019 07:03:47 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 98B3368A45F for ; Sat, 21 Sep 2019 07:03:40 +0300 (EEST) Received: by mail-pl1-f194.google.com with SMTP id e5so4116962pls.9 for ; Fri, 20 Sep 2019 21:03:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=3g4RdgFCvtfFv1/PjwSzUWpS8yXwn4Q5Q1Ct2/Oo3QE=; b=Cgtd44r/H6Incr/wtPH3ecaMLmZ4Kj46c14UKMY2znxuzi6BsK7ZJxugMe1gIKQvf/ FAEBhexnYJkNnGEb2K71bkx9jiGL8NAclG0zL3m2JVu2S+Xi9JItYjsScxyQNEm8X7BW CR0pztiW92Wfx4T/yuTiwihj3KBfKD1BMZUlnHvAXkJ0EFKWD3BJ2D9mH49Th5VNMVb9 NPpdaF3UgqQpohEEgSEnvr3f24a21fXEfqOpsc8onMgzx3Ds7C+I5UJRpGQq3vZB77wF NzF44/6ipo02Rd6A5eOHcbWhCuWQm4aPCVXQMfYMI2NAWIhTvY4koLqu7kItWqXvjo1z TgXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=3g4RdgFCvtfFv1/PjwSzUWpS8yXwn4Q5Q1Ct2/Oo3QE=; b=mqDddVNruxb15WZVqJAWYCwQx2snonJSlENrhOM+038pfx2UGEesDtUy3Mf5AGREVG PgNUSHQycbm0augVBNL2fNPLrApayNNUOSgi9ey3WDx/PxLG1dwKCvqAq+4pCbc5qkSs FvSE/8cTVZz/9CCBA/ILNtRWALyMWKrbV7aL5+ZKy29bcYzZ8E7ULSuCeDOMKTfcUCeq TnWo737MYntFwIIrtrcfG/M8qwpocfSk0UxHbJvd/z6Q1y9DYY0BZ15aNE45fhCXk+r0 iz7kcLSb1wcMYEm1lP8MbsupXJmZv63IW+5qWalBJ7h+oLN0ATM3R1TTdxG78LmEmYe7 kCBA== X-Gm-Message-State: APjAAAUrKSHdmbCC5jrzvtAuqXbkksTa6yQSYmX3o6An5hkW7UGfIoeE CN5dS2JdM6C0Vpg7J04raJ0on3JQ X-Google-Smtp-Source: APXvYqx5S4uCcEiw3n5add1EC46/reizakH9vRr8KQbwmsZ30Z1h87M0EH4rme2u8YP4jLwKa2GLog== X-Received: by 2002:a17:902:9001:: with SMTP id a1mr20785303plp.148.1569038618266; Fri, 20 Sep 2019 21:03:38 -0700 (PDT) Received: from vpn.localdomain ([47.90.99.151]) by smtp.gmail.com with ESMTPSA id 127sm3127870pgi.46.2019.09.20.21.03.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Sep 2019 21:03:37 -0700 (PDT) From: lance.lmwang@gmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 21 Sep 2019 12:03:12 +0800 Message-Id: <20190921040312.15796-1-lance.lmwang@gmail.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20190921031838.12849-1-lance.lmwang@gmail.com> References: <20190921031838.12849-1-lance.lmwang@gmail.com> Subject: [FFmpeg-devel] [PATCH v3 2/2] avcodec/v210dec: add the frame and slice threading support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Limin Wang MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Limin Wang The multithread is avoid one core cpu is full with other filter like scale etc. About the performance, the gain is very small, below is my testing for performance. In order to avoid the disk bottleneck, I'll use stream_loop mode for 10 frame only. ./ffmpeg -y -i ~/Movies/4k_Rec709_ProResHQ.mov -c:v v210 -f rawvideo -frames 10 ~/Movies/1.v210 master: ./ffmpeg -threads 1 -s 4096x3072 -stream_loop 100 -i ~/Movies/1.v210 -benchmark -f null - frame= 1010 fps= 42 q=-0.0 Lsize=N/A time=00:00:40.40 bitrate=N/A speed=1.69x video:529kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=10.082s stime=13.784s rtime=23.889s bench: maxrss=147836928kB patch applied: ./ffmpeg -threads 4 -thread_type frame+slice -s 4096x3072 -stream_loop 100 -i ~/Movies/1.v210 -benchmark -f null - frame= 1010 fps= 55 q=-0.0 Lsize=N/A time=00:00:40.40 bitrate=N/A speed=2.22x video:529kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=11.407s stime=17.258s rtime=18.279s bench: maxrss=442884096kB Signed-off-by: Limin Wang --- libavcodec/v210dec.c | 133 ++++++++++++++++++++++++++++++++------------------- libavcodec/v210dec.h | 1 + 2 files changed, 86 insertions(+), 48 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index 6ce18aa..beaf353 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210dec.c @@ -28,6 +28,7 @@ #include "libavutil/internal.h" #include "libavutil/mem.h" #include "libavutil/intreadwrite.h" +#include "thread.h" #define READ_PIXELS(a, b, c) \ do { \ @@ -37,6 +38,13 @@ *c++ = (val >> 20) & 0x3FF; \ } while (0) +#define MAX_THREADS 8 +typedef struct ThreadData { + AVFrame *frame; + uint8_t *buf; + int stride; +} ThreadData; + static void v210_planar_unpack_c(const uint32_t *src, uint16_t *y, uint16_t *u, uint16_t *v, int width) { uint32_t val; @@ -67,58 +75,32 @@ static av_cold int decode_init(AVCodecContext *avctx) s->aligned_input = 0; ff_v210dec_init(s); + s->slice_count = av_clip(avctx->thread_count, 1, MAX_THREADS); return 0; } -static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, - AVPacket *avpkt) +static int v210_decode_slice(AVCodecContext *avctx, void *arg, int jobnr, int threadnr) { V210DecContext *s = avctx->priv_data; - - int h, w, ret, stride, aligned_input; - AVFrame *pic = data; - const uint8_t *psrc = avpkt->data; + int h, w; + ThreadData *td = arg; + AVFrame *frame = td->frame; + int stride = td->stride; + int slice_h = avctx->height / s->slice_count; + int slice_m = avctx->height % s->slice_count; + int slice_start = jobnr * slice_h; + int slice_end = slice_start + slice_h; + const uint8_t *psrc = td->buf + stride * slice_start; uint16_t *y, *u, *v; - if (s->custom_stride ) - stride = s->custom_stride; - else { - int aligned_width = ((avctx->width + 47) / 48) * 48; - stride = aligned_width * 8 / 3; - } - - if (avpkt->size < stride * avctx->height) { - if ((((avctx->width + 23) / 24) * 24 * 8) / 3 * avctx->height == avpkt->size) { - stride = avpkt->size / avctx->height; - if (!s->stride_warning_shown) - av_log(avctx, AV_LOG_WARNING, "Broken v210 with too small padding (64 byte) detected\n"); - s->stride_warning_shown = 1; - } else { - av_log(avctx, AV_LOG_ERROR, "packet too small\n"); - return AVERROR_INVALIDDATA; - } - } - if (avctx->codec_tag == MKTAG('C', '2', '1', '0') - && AV_RN32(psrc) == AV_RN32("INFO") - && avpkt->size - 64 >= stride * avctx->height) - psrc += 64; - - aligned_input = !((uintptr_t)psrc & 0x1f) && !(stride & 0x1f); - if (aligned_input != s->aligned_input) { - s->aligned_input = aligned_input; - ff_v210dec_init(s); - } - - if ((ret = ff_get_buffer(avctx, pic, 0)) < 0) - return ret; - - y = (uint16_t*)pic->data[0]; - u = (uint16_t*)pic->data[1]; - v = (uint16_t*)pic->data[2]; - pic->pict_type = AV_PICTURE_TYPE_I; - pic->key_frame = 1; + /* add the remaining slice for the last job */ + if (jobnr == s->slice_count - 1) + slice_end += slice_m; - for (h = 0; h < avctx->height; h++) { + y = (uint16_t*)frame->data[0] + slice_start * frame->linesize[0] / 2; + u = (uint16_t*)frame->data[1] + slice_start * frame->linesize[1] / 2; + v = (uint16_t*)frame->data[2] + slice_start * frame->linesize[2] / 2; + for (h = slice_start; h < slice_end; h++) { const uint32_t *src = (const uint32_t*)psrc; uint32_t val; @@ -154,10 +136,63 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, } psrc += stride; - y += pic->linesize[0] / 2 - avctx->width + (avctx->width & 1); - u += pic->linesize[1] / 2 - avctx->width / 2; - v += pic->linesize[2] / 2 - avctx->width / 2; + y += frame->linesize[0] / 2 - avctx->width + (avctx->width & 1); + u += frame->linesize[1] / 2 - avctx->width / 2; + v += frame->linesize[2] / 2 - avctx->width / 2; + } + + return 0; +} + +static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, + AVPacket *avpkt) +{ + V210DecContext *s = avctx->priv_data; + ThreadData td; + int ret, stride, aligned_input; + ThreadFrame frame = { .f = data }; + AVFrame *pic = data; + const uint8_t *psrc = avpkt->data; + + if (s->custom_stride ) + stride = s->custom_stride; + else { + int aligned_width = ((avctx->width + 47) / 48) * 48; + stride = aligned_width * 8 / 3; + } + td.stride = stride; + + if (avpkt->size < stride * avctx->height) { + if ((((avctx->width + 23) / 24) * 24 * 8) / 3 * avctx->height == avpkt->size) { + stride = avpkt->size / avctx->height; + if (!s->stride_warning_shown) + av_log(avctx, AV_LOG_WARNING, "Broken v210 with too small padding (64 byte) detected\n"); + s->stride_warning_shown = 1; + } else { + av_log(avctx, AV_LOG_ERROR, "packet too small\n"); + return AVERROR_INVALIDDATA; + } } + if (avctx->codec_tag == MKTAG('C', '2', '1', '0') + && AV_RN32(psrc) == AV_RN32("INFO") + && avpkt->size - 64 >= stride * avctx->height) + psrc += 64; + + aligned_input = !((uintptr_t)psrc & 0x1f) && !(stride & 0x1f); + if (aligned_input != s->aligned_input) { + s->aligned_input = aligned_input; + ff_v210dec_init(s); + } + + if ((ret = ff_thread_get_buffer(avctx, &frame, 0)) < 0) + return ret; + + pic->pict_type = AV_PICTURE_TYPE_I; + pic->key_frame = 1; + + td.buf = (uint8_t*)psrc; + td.frame = pic; + avctx->execute2(avctx, v210_decode_slice, &td, NULL, s->slice_count); if (avctx->field_order > AV_FIELD_PROGRESSIVE) { /* we have interlaced material flagged in container */ @@ -193,6 +228,8 @@ AVCodec ff_v210_decoder = { .priv_data_size = sizeof(V210DecContext), .init = decode_init, .decode = decode_frame, - .capabilities = AV_CODEC_CAP_DR1, + .capabilities = AV_CODEC_CAP_DR1 | + AV_CODEC_CAP_SLICE_THREADS | + AV_CODEC_CAP_FRAME_THREADS, .priv_class = &v210dec_class, }; diff --git a/libavcodec/v210dec.h b/libavcodec/v210dec.h index cfdb29d..3581943 100644 --- a/libavcodec/v210dec.h +++ b/libavcodec/v210dec.h @@ -26,6 +26,7 @@ typedef struct { AVClass *av_class; int custom_stride; + int slice_count; // Number of slices for threaded operations int aligned_input; int stride_warning_shown; void (*unpack_frame)(const uint32_t *src, uint16_t *y, uint16_t *u, uint16_t *v, int width);