From patchwork Sun Dec 10 22:11:17 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marton Balint <cus@passwd.hu>
X-Patchwork-Id: 6686
Delivered-To: ffmpegpatchwork@gmail.com
Received: by 10.2.161.94 with SMTP id m30csp2078771jah;
	Sun, 10 Dec 2017 14:11:53 -0800 (PST)
X-Google-Smtp-Source: 
 AGs4zMaX1sJcUeow/8d1CDtfliaZ6rCzA3siM+/SMkv10u33s10at4xVZa6ugSQmuVIhMPLZipVV
X-Received: by 10.28.6.148 with SMTP id 142mr7714658wmg.26.1512943913530;
	Sun, 10 Dec 2017 14:11:53 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512943913; cv=none;
	d=google.com; s=arc-20160816;
	b=FBnk9eKaQPVWTAc5tZ2sGlGq0DPXWcWebEuApIJNwDNBwp4Qr16ysDmKDAu8q9tXZ7
	YSFpKztbTqlYBaxOKcKLiJUwMza5SjWRCC215/tFNQKZXSexsJQMghAOSRVVzZSP5bEh
	QOdhIO0Qx+awM36evZdJ7pUwqOdc5SDWdnZkFyM3GB6Oh1NyujNNVENyI51VJijt+uR1
	+Q+hjc4KbnCaTX3jhGWZ7/Pa312jReIgT4cJH7ug5nj+2Aeo84m8uN1tmjQ1Qh8yD78b
	qYteVlC49k7xtT9yjz3cj2ZTexLuFc+c9KgOasdL6QjBqVPkSF+3c/0xnW5s8K958kQ9
	Yo6w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=arc-20160816;
	h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to
	:list-subscribe:list-help:list-post:list-archive:list-unsubscribe
	:list-id:precedence:subject:references:in-reply-to:message-id:date
	:to:from:delivered-to:arc-authentication-results;
	bh=S+vRp09j1rXbLsta3JFZ8jQsZDigUNmpwgX4Bc7OT20=;
	b=ET0x831qigWaomL5uMLS6D6CeUS49OkRAJWEZG96rllbvoen3fuZfmLCX726s1rR94
	azTcwHHNsyXuY2BHoSJBCZByznSiBWhlX9ncQBvKCFIzxpkulJ8R06+eGQPwD03e/8/d
	oXVUiuYaYgq0mSv0JnHJYL3PQqmB6zTmxvo+AwyLIx3yKC2iinxNXMxtKEcz/tbLTjd9
	83MmZVoC43S3/vbHlzq72Iy21d0Fr3XJ4S683VG82HQwcl31rWd+342pCvaNm43CFvsQ
	F7/uj7kdn1jWb0M38CYFXZ/F8W5YjtuEWhj+L7uiFMc5TX/zGl8F1t9HFttcGADVmnH3
	AkBA==
ARC-Authentication-Results: i=1; mx.google.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
	by mx.google.com with ESMTP id
	c11si9314443wrf.540.2017.12.10.14.11.53;
	Sun, 10 Dec 2017 14:11:53 -0800 (PST)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
	spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
	designates 79.124.17.100 as permitted sender)
	smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A059368A6AA;
	Mon, 11 Dec 2017 00:11:36 +0200 (EET)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from iq.passwd.hu (iq.passwd.hu [217.27.212.140])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9267B68A5E7
	for <ffmpeg-devel@ffmpeg.org>; Mon, 11 Dec 2017 00:11:29 +0200 (EET)
Received: from localhost (localhost [127.0.0.1])
	by iq.passwd.hu (Postfix) with ESMTP id E4D38E112A;
	Sun, 10 Dec 2017 23:11:36 +0100 (CET)
X-Virus-Scanned: amavisd-new at passwd.hu
Received: from iq.passwd.hu ([127.0.0.1])
	by localhost (iq.passwd.hu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id hwI-70npaRk4; Sun, 10 Dec 2017 23:11:35 +0100 (CET)
Received: from bluegene.passwd.hu (localhost [127.0.0.1])
	by iq.passwd.hu (Postfix) with ESMTP id 03584E1808;
	Sun, 10 Dec 2017 23:11:35 +0100 (CET)
From: Marton Balint <cus@passwd.hu>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 10 Dec 2017 23:11:17 +0100
Message-Id: <20171210221122.15674-2-cus@passwd.hu>
X-Mailer: git-send-email 2.13.6
In-Reply-To: <20171210221122.15674-1-cus@passwd.hu>
References: <20171210221122.15674-1-cus@passwd.hu>
Subject: [FFmpeg-devel] [PATCH 2/7] avfilter/vf_framerate: add threaded
	blending operations
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <http://ffmpeg.org/mailman/options/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <http://ffmpeg.org/pipermail/ffmpeg-devel/>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
	<mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches
	<ffmpeg-devel@ffmpeg.org>
Cc: Marton Balint <cus@passwd.hu>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Signed-off-by: Marton Balint <cus@passwd.hu>
---
 libavfilter/vf_framerate.c | 202 ++++++++++++++++++++++++++++-----------------
 1 file changed, 125 insertions(+), 77 deletions(-)

diff --git a/libavfilter/vf_framerate.c b/libavfilter/vf_framerate.c
index dc8b05f40f..d505c5a8a4 100644
--- a/libavfilter/vf_framerate.c
+++ b/libavfilter/vf_framerate.c
@@ -210,6 +210,117 @@ static double get_scene_score(AVFilterContext *ctx, AVFrame *crnt, AVFrame *next
     return ret;
 }
 
+typedef struct ThreadData {
+    AVFrame *copy_src1, *copy_src2;
+    uint16_t src1_factor, src2_factor;
+} ThreadData;
+
+static int filter_slice8(AVFilterContext *ctx, void *arg, int job, int nb_jobs)
+{
+    FrameRateContext *s = ctx->priv;
+    ThreadData *td = arg;
+    uint16_t src1_factor = td->src1_factor;
+    uint16_t src2_factor = td->src2_factor;
+    int plane, line, pixel;
+
+    for (plane = 0; plane < 4 && td->copy_src1->data[plane] && td->copy_src2->data[plane]; plane++) {
+        int cpy_line_width = s->line_size[plane];
+        uint8_t *cpy_src1_data = td->copy_src1->data[plane];
+        int cpy_src1_line_size = td->copy_src1->linesize[plane];
+        uint8_t *cpy_src2_data = td->copy_src2->data[plane];
+        int cpy_src2_line_size = td->copy_src2->linesize[plane];
+        int cpy_src_h = (plane > 0 && plane < 3) ? (td->copy_src1->height >> s->vsub) : (td->copy_src1->height);
+        uint8_t *cpy_dst_data = s->work->data[plane];
+        int cpy_dst_line_size = s->work->linesize[plane];
+        const int start = (cpy_src_h *  job   ) / nb_jobs;
+        const int end   = (cpy_src_h * (job+1)) / nb_jobs;
+        cpy_src1_data += start * cpy_src1_line_size;
+        cpy_src2_data += start * cpy_src2_line_size;
+        cpy_dst_data += start * cpy_dst_line_size;
+
+        if (plane <1 || plane >2) {
+            // luma or alpha
+            for (line = start; line < end; line++) {
+                for (pixel = 0; pixel < cpy_line_width; pixel++) {
+                    // integer version of (src1 * src1_factor) + (src2 + src2_factor) + 0.5
+                    // 0.5 is for rounding
+                    // 128 is the integer representation of 0.5 << 8
+                    cpy_dst_data[pixel] = ((cpy_src1_data[pixel] * src1_factor) + (cpy_src2_data[pixel] * src2_factor) + 128) >> 8;
+                }
+                cpy_src1_data += cpy_src1_line_size;
+                cpy_src2_data += cpy_src2_line_size;
+                cpy_dst_data += cpy_dst_line_size;
+            }
+        } else {
+            // chroma
+            for (line = start; line < end; line++) {
+                for (pixel = 0; pixel < cpy_line_width; pixel++) {
+                    // as above
+                    // because U and V are based around 128 we have to subtract 128 from the components.
+                    // 32896 is the integer representation of 128.5 << 8
+                    cpy_dst_data[pixel] = (((cpy_src1_data[pixel] - 128) * src1_factor) + ((cpy_src2_data[pixel] - 128) * src2_factor) + 32896) >> 8;
+                }
+                cpy_src1_data += cpy_src1_line_size;
+                cpy_src2_data += cpy_src2_line_size;
+                cpy_dst_data += cpy_dst_line_size;
+            }
+        }
+    }
+
+    return 0;
+}
+
+static int filter_slice16(AVFilterContext *ctx, void *arg, int job, int nb_jobs)
+{
+    FrameRateContext *s = ctx->priv;
+    ThreadData *td = arg;
+    uint16_t src1_factor = td->src1_factor;
+    uint16_t src2_factor = td->src2_factor;
+    const int half = s->max / 2;
+    const int uv = (s->max + 1) * half;
+    const int shift = s->bitdepth;
+    int plane, line, pixel;
+
+    for (plane = 0; plane < 4 && td->copy_src1->data[plane] && td->copy_src2->data[plane]; plane++) {
+        int cpy_line_width = s->line_size[plane];
+        const uint16_t *cpy_src1_data = (const uint16_t *)td->copy_src1->data[plane];
+        int cpy_src1_line_size = td->copy_src1->linesize[plane] / 2;
+        const uint16_t *cpy_src2_data = (const uint16_t *)td->copy_src2->data[plane];
+        int cpy_src2_line_size = td->copy_src2->linesize[plane] / 2;
+        int cpy_src_h = (plane > 0 && plane < 3) ? (td->copy_src1->height >> s->vsub) : (td->copy_src1->height);
+        uint16_t *cpy_dst_data = (uint16_t *)s->work->data[plane];
+        int cpy_dst_line_size = s->work->linesize[plane] / 2;
+        const int start = (cpy_src_h *  job   ) / nb_jobs;
+        const int end   = (cpy_src_h * (job+1)) / nb_jobs;
+        cpy_src1_data += start * cpy_src1_line_size;
+        cpy_src2_data += start * cpy_src2_line_size;
+        cpy_dst_data += start * cpy_dst_line_size;
+
+        if (plane <1 || plane >2) {
+            // luma or alpha
+            for (line = start; line < end; line++) {
+                for (pixel = 0; pixel < cpy_line_width; pixel++)
+                    cpy_dst_data[pixel] = ((cpy_src1_data[pixel] * src1_factor) + (cpy_src2_data[pixel] * src2_factor) + half) >> shift;
+                cpy_src1_data += cpy_src1_line_size;
+                cpy_src2_data += cpy_src2_line_size;
+                cpy_dst_data += cpy_dst_line_size;
+            }
+        } else {
+            // chroma
+            for (line = start; line < end; line++) {
+                for (pixel = 0; pixel < cpy_line_width; pixel++) {
+                    cpy_dst_data[pixel] = (((cpy_src1_data[pixel] - half) * src1_factor) + ((cpy_src2_data[pixel] - half) * src2_factor) + uv) >> shift;
+                }
+                cpy_src1_data += cpy_src1_line_size;
+                cpy_src2_data += cpy_src2_line_size;
+                cpy_dst_data += cpy_dst_line_size;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static int blend_frames16(AVFilterContext *ctx, float interpolate,
                           AVFrame *copy_src1, AVFrame *copy_src2)
 {
@@ -223,12 +334,11 @@ static int blend_frames16(AVFilterContext *ctx, float interpolate,
     }
     // decide if the shot-change detection allows us to blend two frames
     if (interpolate_scene_score < s->scene_score && copy_src2) {
-        uint16_t src2_factor = fabsf(interpolate) * (1 << (s->bitdepth - 8));
-        uint16_t src1_factor = s->max - src2_factor;
-        const int half = s->max / 2;
-        const int uv = (s->max + 1) * half;
-        const int shift = s->bitdepth;
-        int plane, line, pixel;
+        ThreadData td;
+        td.copy_src1 = copy_src1;
+        td.copy_src2 = copy_src2;
+        td.src2_factor = fabsf(interpolate) * (1 << (s->bitdepth - 8));
+        td.src1_factor = s->max - td.src2_factor;
 
         // get work-space for output frame
         s->work = ff_get_video_buffer(outlink, outlink->w, outlink->h);
@@ -238,37 +348,7 @@ static int blend_frames16(AVFilterContext *ctx, float interpolate,
         av_frame_copy_props(s->work, s->srce[s->crnt]);
 
         ff_dlog(ctx, "blend_frames16() INTERPOLATE to create work frame\n");
-        for (plane = 0; plane < 4 && copy_src1->data[plane] && copy_src2->data[plane]; plane++) {
-            int cpy_line_width = s->line_size[plane];
-            const uint16_t *cpy_src1_data = (const uint16_t *)copy_src1->data[plane];
-            int cpy_src1_line_size = copy_src1->linesize[plane] / 2;
-            const uint16_t *cpy_src2_data = (const uint16_t *)copy_src2->data[plane];
-            int cpy_src2_line_size = copy_src2->linesize[plane] / 2;
-            int cpy_src_h = (plane > 0 && plane < 3) ? (copy_src1->height >> s->vsub) : (copy_src1->height);
-            uint16_t *cpy_dst_data = (uint16_t *)s->work->data[plane];
-            int cpy_dst_line_size = s->work->linesize[plane] / 2;
-
-            if (plane <1 || plane >2) {
-                // luma or alpha
-                for (line = 0; line < cpy_src_h; line++) {
-                    for (pixel = 0; pixel < cpy_line_width; pixel++)
-                        cpy_dst_data[pixel] = ((cpy_src1_data[pixel] * src1_factor) + (cpy_src2_data[pixel] * src2_factor) + half) >> shift;
-                    cpy_src1_data += cpy_src1_line_size;
-                    cpy_src2_data += cpy_src2_line_size;
-                    cpy_dst_data += cpy_dst_line_size;
-                }
-            } else {
-                // chroma
-                for (line = 0; line < cpy_src_h; line++) {
-                    for (pixel = 0; pixel < cpy_line_width; pixel++) {
-                        cpy_dst_data[pixel] = (((cpy_src1_data[pixel] - half) * src1_factor) + ((cpy_src2_data[pixel] - half) * src2_factor) + uv) >> shift;
-                    }
-                    cpy_src1_data += cpy_src1_line_size;
-                    cpy_src2_data += cpy_src2_line_size;
-                    cpy_dst_data += cpy_dst_line_size;
-                }
-            }
-        }
+        ctx->internal->execute(ctx, filter_slice16, &td, NULL, FFMIN(outlink->h, ff_filter_get_nb_threads(ctx)));
         return 1;
     }
     return 0;
@@ -287,9 +367,11 @@ static int blend_frames8(AVFilterContext *ctx, float interpolate,
     }
     // decide if the shot-change detection allows us to blend two frames
     if (interpolate_scene_score < s->scene_score && copy_src2) {
-        uint16_t src2_factor = fabsf(interpolate);
-        uint16_t src1_factor = 256 - src2_factor;
-        int plane, line, pixel;
+        ThreadData td;
+        td.copy_src1 = copy_src1;
+        td.copy_src2 = copy_src2;
+        td.src2_factor = fabsf(interpolate);
+        td.src1_factor = 256 - td.src2_factor;
 
         // get work-space for output frame
         s->work = ff_get_video_buffer(outlink, outlink->w, outlink->h);
@@ -299,43 +381,8 @@ static int blend_frames8(AVFilterContext *ctx, float interpolate,
         av_frame_copy_props(s->work, s->srce[s->crnt]);
 
         ff_dlog(ctx, "blend_frames8() INTERPOLATE to create work frame\n");
-        for (plane = 0; plane < 4 && copy_src1->data[plane] && copy_src2->data[plane]; plane++) {
-            int cpy_line_width = s->line_size[plane];
-            uint8_t *cpy_src1_data = copy_src1->data[plane];
-            int cpy_src1_line_size = copy_src1->linesize[plane];
-            uint8_t *cpy_src2_data = copy_src2->data[plane];
-            int cpy_src2_line_size = copy_src2->linesize[plane];
-            int cpy_src_h = (plane > 0 && plane < 3) ? (copy_src1->height >> s->vsub) : (copy_src1->height);
-            uint8_t *cpy_dst_data = s->work->data[plane];
-            int cpy_dst_line_size = s->work->linesize[plane];
-            if (plane <1 || plane >2) {
-                // luma or alpha
-                for (line = 0; line < cpy_src_h; line++) {
-                    for (pixel = 0; pixel < cpy_line_width; pixel++) {
-                        // integer version of (src1 * src1_factor) + (src2 + src2_factor) + 0.5
-                        // 0.5 is for rounding
-                        // 128 is the integer representation of 0.5 << 8
-                        cpy_dst_data[pixel] = ((cpy_src1_data[pixel] * src1_factor) + (cpy_src2_data[pixel] * src2_factor) + 128) >> 8;
-                    }
-                    cpy_src1_data += cpy_src1_line_size;
-                    cpy_src2_data += cpy_src2_line_size;
-                    cpy_dst_data += cpy_dst_line_size;
-                }
-            } else {
-                // chroma
-                for (line = 0; line < cpy_src_h; line++) {
-                    for (pixel = 0; pixel < cpy_line_width; pixel++) {
-                        // as above
-                        // because U and V are based around 128 we have to subtract 128 from the components.
-                        // 32896 is the integer representation of 128.5 << 8
-                        cpy_dst_data[pixel] = (((cpy_src1_data[pixel] - 128) * src1_factor) + ((cpy_src2_data[pixel] - 128) * src2_factor) + 32896) >> 8;
-                    }
-                    cpy_src1_data += cpy_src1_line_size;
-                    cpy_src2_data += cpy_src2_line_size;
-                    cpy_dst_data += cpy_dst_line_size;
-                }
-            }
-        }
+        ctx->internal->execute(ctx, filter_slice8, &td, NULL, FFMIN(outlink->h, ff_filter_get_nb_threads(ctx)));
+
         return 1;
     }
     return 0;
@@ -738,4 +785,5 @@ AVFilter ff_vf_framerate = {
     .query_formats = query_formats,
     .inputs        = framerate_inputs,
     .outputs       = framerate_outputs,
+    .flags         = AVFILTER_FLAG_SLICE_THREADS,
 };