From patchwork Mon Jul 12 05:14:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lingjiang Fang X-Patchwork-Id: 28894 Delivered-To: andriy.gelman@gmail.com Received: by 2002:a25:bbc9:0:0:0:0:0 with SMTP id c9csp2613078ybk; Sun, 11 Jul 2021 22:14:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJymqHihy2vfXVZM8nm5hXxS9Cu/RjU/eurhh4yCxt331etBAvqfsESKEnqhSUlUAM/j9Z8a X-Received: by 2002:a50:ff01:: with SMTP id a1mr61915686edu.253.1626066868703; Sun, 11 Jul 2021 22:14:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626066868; cv=none; d=google.com; s=arc-20160816; b=SlXnUbbaEOJQ67F1JMn5VEjMJnvzAGHYy6V4zdZeDFa6e2eOT26T0TJLsryuivPabs 4OG1Adr/5eksS0Yo/P2evaVLP+y1f5Oh6fBvd0uV5OCKebUry7TTv7t7KfRy5hYE3Mag aJXvnHb0ntfGRRGjP0pzlXFUWReLMYXSusWeEFqG3smr3TY7bFZSQOM27iC2KIpk9xX2 11T98O+O4SoVFuxeiHTii6s8A+rockDXH8jelVwY6YfIEmJtW5UtUhH+gGxY5qHnUSCr k7dyOnvAJ1IQklLXdzmFcueYGkf3d9C2V72Pu9PBQhAVnf8b9K6IFeCyipm7N1QG6Ysl 97vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=cYbNmO95qwwQa79PUTYhqE9gTyDpBlpHeWRdAg5NdvE=; b=tB8UWyAvBB67e2D0grgPSXn8NZbxXTBsOtuGYsnAgBuvZe1Pr3VvxK/Gjq+OKbdaFN rezT68s0ciM1nD76Qi2UtYRBZFsiyWkM4ZEu/4wdHZt+BnFkbyh36j2bJXY3hR3Wjnkf ukAB8mnprqlcE3zlUQ+dOhzDuxuv5PM4abLS10/qvFd1KTs5h3JPch7Iv+C1/jL9cRgX 1zSDSyan1shbHytv1+foOns/284Sr//iNnWB/dqBZMx68EpwEhCgX89qDt9RvjMJEhfj +6zFWVGKXVQaCt8VCIg74CMn7tV2MvObjvlzk3PrTMhm3hyGhRI+Hn2In2DyUPjBoPfG Ilqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=g4tnRHix; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 1si16229895ejm.532.2021.07.11.22.14.28; Sun, 11 Jul 2021 22:14:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=g4tnRHix; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8B39268A6B6; Mon, 12 Jul 2021 08:14:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-87.mail.qq.com (out162-62-57-87.mail.qq.com [162.62.57.87]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E384C68A304 for ; Mon, 12 Jul 2021 08:14:19 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1626066855; bh=MmDrju5RGSqS+/rYlkcbEo4EsImlhvbPVoF7vRnlNQk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=g4tnRHix5/Lcf4BuYwc0mwa+a12t+wjAVHYV3+/O+movJJXtj8wjgmBruWmESlbHC /TchYTAmqLakkDJ/CA3+NCRpB5zHMFuR84LF+iCDPz9ggyyUJcL9WajUgm/6LqTqXq QVaYquRuBqcCj5JjCzAkNYid4JlMPQYe3jyVDO0E= Received: from localhost.localdomain ([14.17.22.36]) by newxmesmtplogicsvrszc7.qq.com (NewEsmtp) with SMTP id 3840A480; Mon, 12 Jul 2021 13:14:04 +0800 X-QQ-mid: xmsmtpt1626066844t8pj5c8qy Message-ID: X-QQ-XMAILINFO: MJ0QbbALek+qn4+FUATwj4Oenewdcd30veJDKtXDKpn+SVngoKmauNsNH+UD0w sZu8VDhokO7TU+qqmxNnkp2G1LM5ivz3NHZZ5M/YAr+R8d8J5pTAqibBootkPd94XXGfzt9hne2R MfIDQ5hBFDgIKF22vh8atG/aSH9qo9mvNquyB9NyJrU1FHcrCFyrPgaFA4iF9ALk8iF01+nSFX9D bMDSD2oLM3g2yxrtGPy0zgEnUaPOSX3GYFRlRTnSgH7sNeWoNPMVAHdClSSIvyTIfia70E9YegU2 ibSTV5bWqhg68pdlhM2xnx5V6yv0CpQRWXGLJir/eKIZ5BO2eCKYhEkAb1+WlxPV6BWYSJOMBCAl 6qx7Ju5P0lKEaeW7diVOj4lKh3PvscqAyRN0to4Qz5oLYV9xG3pvAPeXFJjPed+tNEkJTqoKyfYN 98/i8HK8IPMpw8w/8L3V3HdoLTtVNtl1yJeNLuhfI21W5FyjZLWr7wsSsFrErI6UYriE141Zf9dl w/HxX04jOaABtb1mcxuKdcp12ANHUCS8U5amciLmY+qsPhNFegB/gvwFsU9wUEfWr1+A0QjYr+Ub vR0Qn+OxRBOz+BEGRw0tXBmhi0wwfoxG6zwmisF3tIYw+jKlCEL4KeOGk9JbQmjH9IOlYP4Z2W6j LZb/Qvg8OXKml5pe5yfBJv+m+0GFcW9j9SzptwaX84ui/dSJp6V37QhX2HRABW0hvLmlQrdebnHy TPVtLar8iI717wJ2CiWzR0EonjZKxtdDamR5iQ4iHNm8ciALLG6hCukbPRaGajYjviiTFhhSNYEz FtC0UMckkDybTdlz3jCmjyY1g6nAPo8daV1XQBBqwQAUzPSpPOotbS6yr3MW2Ui0C494DzJqtqeD DHOyY59jdx4L6cVStFC4fEwWnphNj9lbvqwIypNi+s From: Lingjiang Fang To: ffmpeg-devel@ffmpeg.org Date: Mon, 12 Jul 2021 13:14:01 +0800 X-OQ-MSGID: <20210712051401.24413-1-vacingfang@foxmail.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH V4] lavf/vf_ocr: add subregion support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Lingjiang Fang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: msPFvHuF9/Rd Content-Length: 4337 follow comments from Steven Liu --- doc/filters.texi | 8 ++++++++ libavfilter/vf_ocr.c | 45 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 52 insertions(+), 1 deletion(-) diff --git a/doc/filters.texi b/doc/filters.texi index d991c06628..f41ba0ce46 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -15457,6 +15457,14 @@ Set character whitelist. @item blacklist Set character blacklist. + +@item x, y +Set top-left corner of the subregion, in pixels, default is (0,0). + +@item w, h +Set width and height of the subregion, in pixels, +default is the bottom-right part from given top-left corner. + @end table The filter exports recognized text as the frame metadata @code{lavfi.ocr.text}. diff --git a/libavfilter/vf_ocr.c b/libavfilter/vf_ocr.c index 6de474025a..55f04b6592 100644 --- a/libavfilter/vf_ocr.c +++ b/libavfilter/vf_ocr.c @@ -33,6 +33,8 @@ typedef struct OCRContext { char *language; char *whitelist; char *blacklist; + int x, y, x_in, y_in; + int w, h, w_in, h_in; TessBaseAPI *tess; } OCRContext; @@ -45,6 +47,10 @@ static const AVOption ocr_options[] = { { "language", "set language", OFFSET(language), AV_OPT_TYPE_STRING, {.str="eng"}, 0, 0, FLAGS }, { "whitelist", "set character whitelist", OFFSET(whitelist), AV_OPT_TYPE_STRING, {.str="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.:;,-+_!?\"'[]{}()<>|/\\=*&%$#@!~ "}, 0, 0, FLAGS }, { "blacklist", "set character blacklist", OFFSET(blacklist), AV_OPT_TYPE_STRING, {.str=""}, 0, 0, FLAGS }, + { "x", "top x of sub region", OFFSET(x), AV_OPT_TYPE_INT, {.i64=0}, 0, INT_MAX, FLAGS }, + { "y", "top y of sub region", OFFSET(y), AV_OPT_TYPE_INT, {.i64=0}, 0, INT_MAX, FLAGS }, + { "w", "width of sub region", OFFSET(w), AV_OPT_TYPE_INT, {.i64=0}, 0, INT_MAX, FLAGS }, + { "h", "height of sub region", OFFSET(h), AV_OPT_TYPE_INT, {.i64=0}, 0, INT_MAX, FLAGS }, { NULL } }; @@ -93,6 +99,41 @@ static int query_formats(AVFilterContext *ctx) return ff_set_common_formats(ctx, fmts_list); } +static void check_fix(int *x, int *y, int *w, int *h, int pic_w, int pic_h) +{ + // 0 <= x < pic_w + if (*x >= pic_w) + *x = 0; + // 0 <= y < pic_h + if (*y >= pic_h) + *y = 0; + + if (*w == 0 || *w + *x > pic_w) + *w = pic_w - *x; + if (*h == 0 || *h + *y > pic_h) + *h = pic_h - *y; +} + +static int config_input(AVFilterLink *inlink) +{ + AVFilterContext *ctx = inlink->dst; + OCRContext *s = ctx->priv; + + s->x_in = s->x; + s->y_in = s->y; + s->w_in = s->w; + s->h_in = s->h; + check_fix(&s->x_in, &s->y_in, &s->w_in, &s->h_in, inlink->w, inlink->h); + if ( s->x_in != s->x || s->y_in != s->y || + (s->w != 0 && s->w_in != s->w) || (s->h != 0 && s->h_in != s->h)) { + av_log(s, AV_LOG_WARNING, "config error, subregion changed to " + "x=%d, y=%d, w=%d, h=%d\n", + s->x_in, s->y_in, s->w_in, s->h_in); + } + + return 0; +} + static int filter_frame(AVFilterLink *inlink, AVFrame *in) { AVDictionary **metadata = &in->metadata; @@ -102,8 +143,9 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in) char *result; int *confs; + // TODO(vacing): support expression result = TessBaseAPIRect(s->tess, in->data[0], 1, - in->linesize[0], 0, 0, in->width, in->height); + in->linesize[0], s->x_in, s->y_in, s->w_in, s->h_in); confs = TessBaseAPIAllWordConfidences(s->tess); av_dict_set(metadata, "lavfi.ocr.text", result, 0); for (int i = 0; confs[i] != -1; i++) { @@ -134,6 +176,7 @@ static const AVFilterPad ocr_inputs[] = { .name = "default", .type = AVMEDIA_TYPE_VIDEO, .filter_frame = filter_frame, + .config_props = config_input, }, { NULL } };