From patchwork Sat May 25 02:33:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jun Zhao X-Patchwork-Id: 13286 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id B93B54489F9 for ; Sat, 25 May 2019 05:33:44 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A3EF068A375; Sat, 25 May 2019 05:33:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3444E6898BD for ; Sat, 25 May 2019 05:33:38 +0300 (EEST) Received: by mail-pf1-f193.google.com with SMTP id a23so6359903pff.4 for ; Fri, 24 May 2019 19:33:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2vPbTob7JVTWo6ijrnzh1EW64aeZblpcbJIr9wMTLQE=; b=WcOHOT84m7WgAF6fncOOQgj3iV9v502U7yCC8W5FShVJ20r/xEQohtmmivpf178vTA 2MJho3HuULmXQV/Z5U4lk1j+9nmOUKarlPvu0nwtrgDrx3/PV88sEGukC8bJ7ulv5HEV WeogdizVqslpm3uxIsYzlsKHeXBTyvcCwBySS+is2iEdkLJo20GAk8qp613AuHuKzM1s zowdQIcRe0Yz9/iJrfD1srAhVHPDKJRLAziWR9f3IR5Edmb2iX6uToiqJFlpROq4ujP6 PEM6OYX9JYwRvMPyjKGscY+JMBm+CMu2YoHu7NvmCmoNsPx30xkBsi9BDize86QGUDfM dd3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=2vPbTob7JVTWo6ijrnzh1EW64aeZblpcbJIr9wMTLQE=; b=Yw0YlS5HfqxtzdrliLqjoMBognhDwGRh5pKlPufKU4ESJgQ+m+JYIjrcrRhZj0whhp 5ZcxiyTD9zOA7snXSBx4toHF6MlW2URe81czsXY/awUcqWh9FAOIq5th0d/oCWhyBX1Z sStoxgODSDjPuwVe48mH8qT0u4DzwXq2KPtUdeCha7k8Ub8di9xFNCMexkfAJ3IeLT1n wqyByvBobxpB1Z9ePSJ12j/Z/66ltXOijeQnu+tfHlt4pFD/mc3eaz5FSeBHMn4NFerb PI0+eI1G7urGbdoWbw7sc3uipuzLJhthUinLqIIFcf8MazkMYraINtjNZOd2GLoCoGuz tWlQ== X-Gm-Message-State: APjAAAUwCN2QyYfxmLQ+imSV/DYl2ei8skggFatp1yObT+aVvd2Gbe3Y 4z2/7n+TvrbG9IITmeL31gLduCep X-Google-Smtp-Source: APXvYqyjfsb8qYbF+rRQCYjC0PjuL5lbcM1aASCMPLSkEEo26ofsscnJa09A4vfrlaGayCzA7i9YDg== X-Received: by 2002:a17:90a:b78b:: with SMTP id m11mr13771164pjr.106.1558751616405; Fri, 24 May 2019 19:33:36 -0700 (PDT) Received: from localhost.localdomain ([47.90.47.25]) by smtp.gmail.com with ESMTPSA id x24sm3302185pjq.27.2019.05.24.19.33.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 24 May 2019 19:33:35 -0700 (PDT) From: Jun Zhao To: ffmpeg-devel@ffmpeg.org Date: Sat, 25 May 2019 10:33:26 +0800 Message-Id: <1558751606-4140-2-git-send-email-mypopydev@gmail.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1558751606-4140-1-git-send-email-mypopydev@gmail.com> References: <1558751606-4140-1-git-send-email-mypopydev@gmail.com> Subject: [FFmpeg-devel] [PATCH V2] lavfi/lut: Add slice threading support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Jun Zhao MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" From: Jun Zhao Used the command for 1080p h264 clip as follow: a). ffmpeg -i input -vf lutyuv="u=128:v=128" -f null /dev/null b). ffmpeg -i input -vf lutrgb="g=0:b=0" -f null /dev/null after enabled the slice threading, the fps change from: a). 144fps to 258fps (lutyuv) b). 94fps to 153fps (lutrgb) in Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz Signed-off-by: Jun Zhao --- libavfilter/vf_lut.c | 329 +++++++++++++++++++++++++++++++++----------------- 1 files changed, 216 insertions(+), 113 deletions(-) diff --git a/libavfilter/vf_lut.c b/libavfilter/vf_lut.c index c815ddc..7cb3b87 100644 --- a/libavfilter/vf_lut.c +++ b/libavfilter/vf_lut.c @@ -337,13 +337,193 @@ static int config_props(AVFilterLink *inlink) return 0; } +struct thread_data { + AVFrame *in; + AVFrame *out; + + int w; + int h; +}; + +/* packed, 16-bit */ +static int lut_packed_16bits(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) +{ + LutContext *s = ctx->priv; + const struct thread_data *td = arg; + + uint16_t *inrow, *outrow, *inrow0, *outrow0; + int i, j; + const int w = td->w; + const int h = td->h; + AVFrame *in = td->in; + AVFrame *out = td->out; + const uint16_t (*tab)[256*256] = (const uint16_t (*)[256*256])s->lut; + const int in_linesize = in->linesize[0] / 2; + const int out_linesize = out->linesize[0] / 2; + const int step = s->step; + + const int slice_start = (h * jobnr ) / nb_jobs; + const int slice_end = (h * (jobnr+1)) / nb_jobs; + + inrow0 = (uint16_t *)in ->data[0]; + outrow0 = (uint16_t *)out->data[0]; + + for (i = slice_start; i < slice_end; i++) { + inrow = inrow0 + i * in_linesize; + outrow = outrow0 + i * out_linesize; + for (j = 0; j < w; j++) { + + switch (step) { +#if HAVE_BIGENDIAN + case 4: outrow[3] = av_bswap16(tab[3][av_bswap16(inrow[3])]); // Fall-through + case 3: outrow[2] = av_bswap16(tab[2][av_bswap16(inrow[2])]); // Fall-through + case 2: outrow[1] = av_bswap16(tab[1][av_bswap16(inrow[1])]); // Fall-through + default: outrow[0] = av_bswap16(tab[0][av_bswap16(inrow[0])]); +#else + case 4: outrow[3] = tab[3][inrow[3]]; // Fall-through + case 3: outrow[2] = tab[2][inrow[2]]; // Fall-through + case 2: outrow[1] = tab[1][inrow[1]]; // Fall-through + default: outrow[0] = tab[0][inrow[0]]; +#endif + } + outrow += step; + inrow += step; + } + } + + return 0; +} + +/* packed, 8-bit */ +static int lut_packed_8bits(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) +{ + LutContext *s = ctx->priv; + const struct thread_data *td = arg; + + uint8_t *inrow, *outrow, *inrow0, *outrow0; + int i, j; + const int w = td->w; + const int h = td->h; + AVFrame *in = td->in; + AVFrame *out = td->out; + const uint16_t (*tab)[256*256] = (const uint16_t (*)[256*256])s->lut; + const int in_linesize = in->linesize[0]; + const int out_linesize = out->linesize[0]; + const int step = s->step; + + const int slice_start = (h * jobnr ) / nb_jobs; + const int slice_end = (h * (jobnr+1)) / nb_jobs; + + inrow0 = in ->data[0]; + outrow0 = out->data[0]; + + for (i = slice_start; i < slice_end; i++) { + inrow = inrow0 + i * in_linesize; + outrow = outrow0 + i * out_linesize; + for (j = 0; j < w; j++) { + switch (step) { + case 4: outrow[3] = tab[3][inrow[3]]; // Fall-through + case 3: outrow[2] = tab[2][inrow[2]]; // Fall-through + case 2: outrow[1] = tab[1][inrow[1]]; // Fall-through + default: outrow[0] = tab[0][inrow[0]]; + } + outrow += step; + inrow += step; + } + } + + return 0; +} + +/* planar >8 bit depth */ +static int lut_planar_16bits(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) +{ + LutContext *s = ctx->priv; + const struct thread_data *td = arg; + + uint16_t *inrow, *outrow; + int i, j, plane; + + AVFrame *in = td->in; + AVFrame *out = td->out; + + for (plane = 0; plane < 4 && in->data[plane] && in->linesize[plane]; plane++) { + int vsub = plane == 1 || plane == 2 ? s->vsub : 0; + int hsub = plane == 1 || plane == 2 ? s->hsub : 0; + int h = AV_CEIL_RSHIFT(td->h, vsub); + int w = AV_CEIL_RSHIFT(td->w, hsub); + const uint16_t *tab = s->lut[plane]; + const int in_linesize = in->linesize[plane] / 2; + const int out_linesize = out->linesize[plane] / 2; + + const int slice_start = (h * jobnr ) / nb_jobs; + const int slice_end = (h * (jobnr+1)) / nb_jobs; + + inrow = (uint16_t *)(in ->data[plane] + slice_start * in_linesize); + outrow = (uint16_t *)(out->data[plane] + slice_start * out_linesize); + + for (i = slice_start; i < slice_end; i++) { + for (j = 0; j < w; j++) { +#if HAVE_BIGENDIAN + outrow[j] = av_bswap16(tab[av_bswap16(inrow[j])]); +#else + outrow[j] = tab[inrow[j]]; +#endif + } + inrow += in_linesize; + outrow += out_linesize; + } + } + + return 0; +} + +/* planar 8bit depth */ +static int lut_planar_8bits(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) +{ + LutContext *s = ctx->priv; + const struct thread_data *td = arg; + + uint8_t *inrow, *outrow; + + int i, j, plane; + + AVFrame *in = td->in; + AVFrame *out = td->out; + + for (plane = 0; plane < 4 && in->data[plane] && in->linesize[plane]; plane++) { + int vsub = plane == 1 || plane == 2 ? s->vsub : 0; + int hsub = plane == 1 || plane == 2 ? s->hsub : 0; + int h = AV_CEIL_RSHIFT(td->h, vsub); + int w = AV_CEIL_RSHIFT(td->w, hsub); + const uint16_t *tab = s->lut[plane]; + const int in_linesize = in->linesize[plane]; + const int out_linesize = out->linesize[plane]; + + const int slice_start = (h * jobnr ) / nb_jobs; + const int slice_end = (h * (jobnr+1)) / nb_jobs; + + inrow = in ->data[plane] + slice_start * in_linesize; + outrow = out->data[plane] + slice_start * out_linesize; + + for (i = slice_start; i < slice_end; i++) { + for (j = 0; j < w; j++) + outrow[j] = tab[inrow[j]]; + inrow += in_linesize; + outrow += out_linesize; + } + } + + return 0; +} + static int filter_frame(AVFilterLink *inlink, AVFrame *in) { AVFilterContext *ctx = inlink->dst; LutContext *s = ctx->priv; AVFilterLink *outlink = ctx->outputs[0]; AVFrame *out; - int i, j, plane, direct = 0; + int direct = 0; if (av_frame_is_writable(in)) { direct = 1; @@ -359,121 +539,44 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in) if (s->is_rgb && s->is_16bit && !s->is_planar) { /* packed, 16-bit */ - uint16_t *inrow, *outrow, *inrow0, *outrow0; - const int w = inlink->w; - const int h = in->height; - const uint16_t (*tab)[256*256] = (const uint16_t (*)[256*256])s->lut; - const int in_linesize = in->linesize[0] / 2; - const int out_linesize = out->linesize[0] / 2; - const int step = s->step; - - inrow0 = (uint16_t*) in ->data[0]; - outrow0 = (uint16_t*) out->data[0]; - - for (i = 0; i < h; i ++) { - inrow = inrow0; - outrow = outrow0; - for (j = 0; j < w; j++) { - - switch (step) { -#if HAVE_BIGENDIAN - case 4: outrow[3] = av_bswap16(tab[3][av_bswap16(inrow[3])]); // Fall-through - case 3: outrow[2] = av_bswap16(tab[2][av_bswap16(inrow[2])]); // Fall-through - case 2: outrow[1] = av_bswap16(tab[1][av_bswap16(inrow[1])]); // Fall-through - default: outrow[0] = av_bswap16(tab[0][av_bswap16(inrow[0])]); -#else - case 4: outrow[3] = tab[3][inrow[3]]; // Fall-through - case 3: outrow[2] = tab[2][inrow[2]]; // Fall-through - case 2: outrow[1] = tab[1][inrow[1]]; // Fall-through - default: outrow[0] = tab[0][inrow[0]]; -#endif - } - outrow += step; - inrow += step; - } - inrow0 += in_linesize; - outrow0 += out_linesize; - } + struct thread_data td = { + .in = in, + .out = out, + .w = inlink->w, + .h = in->height, + }; + ctx->internal->execute(ctx, lut_packed_16bits, &td, NULL, + FFMIN(in->height, ff_filter_get_nb_threads(ctx))); } else if (s->is_rgb && !s->is_planar) { - /* packed */ - uint8_t *inrow, *outrow, *inrow0, *outrow0; - const int w = inlink->w; - const int h = in->height; - const uint16_t (*tab)[256*256] = (const uint16_t (*)[256*256])s->lut; - const int in_linesize = in->linesize[0]; - const int out_linesize = out->linesize[0]; - const int step = s->step; - - inrow0 = in ->data[0]; - outrow0 = out->data[0]; - - for (i = 0; i < h; i ++) { - inrow = inrow0; - outrow = outrow0; - for (j = 0; j < w; j++) { - switch (step) { - case 4: outrow[3] = tab[3][inrow[3]]; // Fall-through - case 3: outrow[2] = tab[2][inrow[2]]; // Fall-through - case 2: outrow[1] = tab[1][inrow[1]]; // Fall-through - default: outrow[0] = tab[0][inrow[0]]; - } - outrow += step; - inrow += step; - } - inrow0 += in_linesize; - outrow0 += out_linesize; - } + /* packed 8 bits */ + struct thread_data td = { + .in = in, + .out = out, + .w = inlink->w, + .h = in->height, + }; + ctx->internal->execute(ctx, lut_packed_8bits, &td, NULL, + FFMIN(in->height, ff_filter_get_nb_threads(ctx))); } else if (s->is_16bit) { - // planar >8 bit depth - uint16_t *inrow, *outrow; - - for (plane = 0; plane < 4 && in->data[plane] && in->linesize[plane]; plane++) { - int vsub = plane == 1 || plane == 2 ? s->vsub : 0; - int hsub = plane == 1 || plane == 2 ? s->hsub : 0; - int h = AV_CEIL_RSHIFT(inlink->h, vsub); - int w = AV_CEIL_RSHIFT(inlink->w, hsub); - const uint16_t *tab = s->lut[plane]; - const int in_linesize = in->linesize[plane] / 2; - const int out_linesize = out->linesize[plane] / 2; - - inrow = (uint16_t *)in ->data[plane]; - outrow = (uint16_t *)out->data[plane]; - - for (i = 0; i < h; i++) { - for (j = 0; j < w; j++) { -#if HAVE_BIGENDIAN - outrow[j] = av_bswap16(tab[av_bswap16(inrow[j])]); -#else - outrow[j] = tab[inrow[j]]; -#endif - } - inrow += in_linesize; - outrow += out_linesize; - } - } + /* planar >8 bit depth */ + struct thread_data td = { + .in = in, + .out = out, + .w = inlink->w, + .h = inlink->h, + }; + ctx->internal->execute(ctx, lut_planar_16bits, &td, NULL, + FFMIN(in->height, ff_filter_get_nb_threads(ctx))); } else { /* planar 8bit depth */ - uint8_t *inrow, *outrow; - - for (plane = 0; plane < 4 && in->data[plane] && in->linesize[plane]; plane++) { - int vsub = plane == 1 || plane == 2 ? s->vsub : 0; - int hsub = plane == 1 || plane == 2 ? s->hsub : 0; - int h = AV_CEIL_RSHIFT(inlink->h, vsub); - int w = AV_CEIL_RSHIFT(inlink->w, hsub); - const uint16_t *tab = s->lut[plane]; - const int in_linesize = in->linesize[plane]; - const int out_linesize = out->linesize[plane]; - - inrow = in ->data[plane]; - outrow = out->data[plane]; - - for (i = 0; i < h; i++) { - for (j = 0; j < w; j++) - outrow[j] = tab[inrow[j]]; - inrow += in_linesize; - outrow += out_linesize; - } - } + struct thread_data td = { + .in = in, + .out = out, + .w = inlink->w, + .h = inlink->h, + }; + ctx->internal->execute(ctx, lut_planar_8bits, &td, NULL, + FFMIN(in->height, ff_filter_get_nb_threads(ctx))); } if (!direct) @@ -508,7 +611,7 @@ static const AVFilterPad outputs[] = { .query_formats = query_formats, \ .inputs = inputs, \ .outputs = outputs, \ - .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC, \ + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC | AVFILTER_FLAG_SLICE_THREADS, \ } #if CONFIG_LUT_FILTER