From patchwork Thu Oct 28 19:09:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 31242 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5e:a610:0:0:0:0:0 with SMTP id q16csp851768ioi; Thu, 28 Oct 2021 12:09:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzVxntJEdw3Fr7WYh+0K+KsVSEfB9MAdqzuntLwFUIX0B+k1iqrE7kehVks0ZPpv2XaK7S4 X-Received: by 2002:a17:907:608e:: with SMTP id ht14mr7821803ejc.391.1635448172663; Thu, 28 Oct 2021 12:09:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635448172; cv=none; d=google.com; s=arc-20160816; b=N6wO4hlErQHN2+DIo71C88tJLudjt4JLEhpSrvrGT6aSJtCozDXjdeuWeuvPtjyHeK QNLqvmDaZZeSnEjdr3R8RUKetl1+7sau6Oq+hmH81y9IADVwO7S8F5KxoYV5pCoB2O9Y ku2XW7UQj5Sbk3KZ4uKeqV9w0KUl8ic0bP+5gXlKOxuGwrYAa4aRzWMju537X578KhyL LuZ2kywdONUiLlSiBICnNs8GlFQ5m77avHtwdpCR9KcxxcgskdAIx6AYhJ0amXsBewvc 0GVNjW4kKD9chj3p1jlN6qxm5MfNQOdekw22Oy7Rds+jP4MZgjXOe8/q6OCIiolyKxg1 wTiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=qzbj2H4+7GC8ZNsnB0E4k/Y78WBMqIvF3AvW/xp6HaE=; b=YSK0ANeYXyApgX0Xv49hDAEyrf+NMiMUMoGzEpBzAr7TvDQujm2udtAe1v2ymqtkub HGX59IfSOOEkoaDBYfOFP299KD39UvQheuWSxsBeOOEfV7qeRo/HTQpcbBJl8iCSUNRS WsqaEr7Wg0kurNChYD7Rpot4cLGoVkbwlUvKbG2cfY5rsgf4AaiOajLdyEXdloikuiYv WiKuSdAYjqD92GgaEF7iLRqd9atrHB63W0Woctr08HcwwZ7BKRuRHT0tjSvdsv5fnn/c mClpt+Ssa2bdl9wRw2nNBqaTvuMfgMe7vmvdDy5fS2IboSBfiTAX+vPFAzAaakeQluZH 4UFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=XudH4SA3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r14si5245930edp.43.2021.10.28.12.09.24; Thu, 28 Oct 2021 12:09:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=XudH4SA3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 944F668A9C0; Thu, 28 Oct 2021 22:09:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A33F668A926 for ; Thu, 28 Oct 2021 22:09:13 +0300 (EEST) Received: by mail-ed1-f44.google.com with SMTP id g8so29221600edb.2 for ; Thu, 28 Oct 2021 12:09:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=pAL/fxZa3WwZk0dfucVCU+H3NmsmhC6Dzwetm/A8HT8=; b=XudH4SA3zveMwym2y1as7L8mMckibG9ZWt22aCBDmrn1fhW+0jsmyWraMyqR3jARyg /WhAo4llkvGfmR7VZCvm1ReW7gLsu5+JBAc0pMifbA0UTU9e1kMirc/0Ofcf+Fg5ssyI 8fY2w373zp+Uuz+epiNceKgdrq5tr4fmZPZabQ6fvTuVsk7xqIDKmtkpfWrcHjVcDY6k fAzfsOy87Iq4Uo1eO/M/4dSa9DauUkXL1VYWfkEYQ6hJtMq/tLV0MNcniue0VKhBCuTQ X6FHxvz1fDQzGSQi+0vwtjmCWT3/nWfADAR88L/Vak6aByRV2yo2DbPZ4bfS4LNuWozv 2Wxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=pAL/fxZa3WwZk0dfucVCU+H3NmsmhC6Dzwetm/A8HT8=; b=CBKqmfektpDL/jx+1dsu2rkKBwEVL7k3ebMX9YuhaqM72ERiienRuA3i2htIl/BEW6 dGwwozlIAFQBZYZW27XzuiBLLonQl76khIQCC0stCRi6j6PglyGvzeCL3fwXu+ntuT0e tGiYL1olgDMMf9vs7FXcV6SuzdYUbOjtKQ2Qt9/gz5yka9tbOpSLY69lGMSo3ld877Xp 9+PfBctC3Pd+xsKpn+19IGlRnxlaj/hoP5OQDCTKK/nqCUyVBVi76cK7eFhiK3yRE/j9 OD6PteDsbe1R9yJuFb7AH3AcqDElNY0LNLfXVFia1AqP6Q0TijPlFwBLzpWPxd2bMp4M B8lw== X-Gm-Message-State: AOAM532SwmL2eYV7pKO7oFp8VkUy2DXf6H+YVPS0N1u1Axq8rbuGjSdo p8JHFEm9LrQC5XrKP+LUW+MvYlcv0bs= X-Received: by 2002:a17:907:868b:: with SMTP id qa11mr7824194ejc.383.1635448152827; Thu, 28 Oct 2021 12:09:12 -0700 (PDT) Received: from localhost.localdomain ([212.15.177.28]) by smtp.gmail.com with ESMTPSA id m10sm2180327edi.72.2021.10.28.12.09.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Oct 2021 12:09:12 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Thu, 28 Oct 2021 21:09:15 +0200 Message-Id: <20211028190915.1049335-1-onemda@gmail.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avfilter: add wpsnr video filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bM1UquV4JFNO Signed-off-by: Paul B Mahol --- libavfilter/Makefile | 1 + libavfilter/allfilters.c | 1 + libavfilter/vf_wpsnr.c | 894 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 896 insertions(+) create mode 100644 libavfilter/vf_wpsnr.c diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 4e6d8bdeea..6ac6e3b764 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -503,6 +503,7 @@ OBJS-$(CONFIG_VSTACK_FILTER) += vf_stack.o framesync.o OBJS-$(CONFIG_W3FDIF_FILTER) += vf_w3fdif.o OBJS-$(CONFIG_WAVEFORM_FILTER) += vf_waveform.o OBJS-$(CONFIG_WEAVE_FILTER) += vf_weave.o +OBJS-$(CONFIG_WPSNR_FILTER) += vf_wpsnr.o framesync.o OBJS-$(CONFIG_XBR_FILTER) += vf_xbr.o OBJS-$(CONFIG_XCORRELATE_FILTER) += vf_convolve.o framesync.o OBJS-$(CONFIG_XFADE_FILTER) += vf_xfade.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 04f1925c14..f5c539199c 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -479,6 +479,7 @@ extern const AVFilter ff_vf_vstack; extern const AVFilter ff_vf_w3fdif; extern const AVFilter ff_vf_waveform; extern const AVFilter ff_vf_weave; +extern const AVFilter ff_vf_wpsnr; extern const AVFilter ff_vf_xbr; extern const AVFilter ff_vf_xcorrelate; extern const AVFilter ff_vf_xfade; diff --git a/libavfilter/vf_wpsnr.c b/libavfilter/vf_wpsnr.c new file mode 100644 index 0000000000..bc4e88e183 --- /dev/null +++ b/libavfilter/vf_wpsnr.c @@ -0,0 +1,894 @@ +/* + * Copyright (c) 2021 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Calculate the WPSNR between two input videos. + */ + +#include "libavutil/avstring.h" +#include "libavutil/intreadwrite.h" +#include "libavutil/opt.h" +#include "libavutil/pixdesc.h" +#include "avfilter.h" +#include "drawutils.h" +#include "formats.h" +#include "framesync.h" +#include "internal.h" +#include "video.h" + +#define WIDTH 3840. +#define HEIGHT 2160. + +typedef struct WPSNRContext { + const AVClass *class; + FFFrameSync fs; + double wmse, min_wmse, max_wmse, wmse_comp[4]; + uint64_t nb_frames; + int mode; + int sizeN; + int sizeM; + double apic; + double amin; + int max[4], average_max; + char comps[4]; + int nb_components; + int nb_threads; + int planewidth[4]; + int planeheight[4]; + double planeweight[4]; + double **score; + + uint16_t *tmp; + uint64_t *sat; + double *weights; + + void (*compute_hx)(const uint8_t *ssrc, + int linesize, + int w, int h, + uint16_t *dstp, + int dst_linesize); + + void (*compute_sat)(const uint16_t *ssrc, + int linesize, + int w, int h, + uint64_t *dstp, + int dst_linesize); + + int (*compute_weights)(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs); + + int (*compute_wmse)(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs); +} WPSNRContext; + +#define OFFSET(x) offsetof(WPSNRContext, x) +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM + +static const AVOption wpsnr_options[] = { + { "mode", "set the mode", OFFSET(mode), AV_OPT_TYPE_INT, {.i64=0}, 0, 1, FLAGS, "mode" }, + { "block", "block based", 0, AV_OPT_TYPE_CONST, {.i64=0}, 0, 1, FLAGS, "mode" }, + { "sample", "sample based", 0, AV_OPT_TYPE_CONST, {.i64=1}, 0, 1, FLAGS, "mode" }, + { NULL } +}; + +FRAMESYNC_DEFINE_CLASS(wpsnr, WPSNRContext, fs); + +#define COMPUTE_HX(type, stype, depth) \ +static void compute_hx##depth(const uint8_t *ssrc, \ + int linesize, \ + int w, int h, \ + uint16_t *dstp, \ + int dst_linesize) \ +{ \ + const type *src = (const type *)ssrc; \ + stype *dst = (stype *)dstp; \ + \ + linesize /= (depth / 8); \ + \ + src += linesize; \ + dst += dst_linesize; \ + for (int y = 1; y < h - 1; y++) { \ + for (int x = 1; x < w - 1; x++) { \ + int v = 12 * src[x] - \ + 2 * (src[x-1] + src[x+1] + \ + src[x + linesize] + \ + src[x - linesize]) - \ + 1 * (src[x - 1 - linesize] + \ + src[x + 1 - linesize] + \ + src[x - 1 + linesize] + \ + src[x + 1 + linesize]); \ + dst[x] = FFABS(v); \ + } \ + \ + src += linesize; \ + dst += dst_linesize; \ + } \ +} + +COMPUTE_HX(uint8_t, uint16_t, 8) +COMPUTE_HX(uint16_t, uint16_t, 16) + +#define COMPUTE_SAT(type, stype, depth) \ +static void compute_sat##depth(const uint16_t *ssrc, \ + int linesize, \ + int w, int h, \ + uint64_t *dstp, \ + int dst_linesize) \ +{ \ + const type *src = (const type *)ssrc; \ + stype *dst = (stype *)dstp; \ + \ + dst += dst_linesize; \ + \ + for (int y = 0; y < h; y++) { \ + stype sum = 0; \ + \ + for (int x = 1; x < w; x++) { \ + sum += src[x - 1]; \ + dst[x] = sum + dst[x - dst_linesize]; \ + } \ + \ + src += linesize; \ + dst += dst_linesize; \ + } \ +} + +COMPUTE_SAT(uint16_t, uint64_t, 16) + +static inline unsigned pow_2(int base) +{ + return base*base; +} + +static inline double get_wpsnr(double wmse, uint64_t nb_frames, int max) +{ + return 10.0 * log10(pow_2(max) / (wmse / nb_frames)); +} + +typedef struct ThreadData { + const uint8_t *main_data[4]; + const uint8_t *ref_data[4]; + int main_linesize[4]; + int ref_linesize[4]; + int planewidth[4]; + int planeheight[4]; + double **score; + int nb_components; +} ThreadData; + +static double get_hxs(const uint64_t *src, + int x, int y, + int linesize, + int t, int b, + int l, int r) +{ + const uint64_t tl = src[(y - t) * linesize + x - l]; + const uint64_t tr = src[(y - t) * linesize + x + r]; + const uint64_t bl = src[(y + b) * linesize + x - l]; + const uint64_t br = src[(y + b) * linesize + x + r]; + const uint64_t sum = br + tl - bl - tr; + + return fabs(sum * 0.25); +} + +static double get_hx(const uint8_t *src, int linesize, int w, int h) +{ + int64_t sum = 0; + + for (int y = 0; y < h; y++) { + for (int x = 0; x < w; x++) { + sum += 12 * src[x] - + 2 * (src[x-1] + src[x+1] + + src[x + linesize] + + src[x - linesize]) - + 1 * (src[x - 1 - linesize] + + src[x + 1 - linesize] + + src[x - 1 + linesize] + + src[x + 1 + linesize]); + } + + src += linesize; + } + + return fabs(sum * 0.25); +} + +static double get_hx16(const uint8_t *ssrc, int linesize, int w, int h) +{ + const uint16_t *src = (const uint16_t *)ssrc; + int64_t sum = 0; + + linesize /= 2; + + for (int y = 0; y < h; y++) { + for (int x = 0; x < w; x++) { + sum += 12 * src[x] - + 2 * (src[x-1] + src[x+1] + + src[x + linesize] + + src[x - linesize]) - + 1 * (src[x - 1 - linesize] + + src[x + 1 - linesize] + + src[x - 1 + linesize] + + src[x + 1 + linesize]); + } + + src += linesize; + } + + return fabs(sum * 0.25); +} + +static double get_sd(const uint8_t *ref, int ref_linesize, + const uint8_t *main, int main_linesize, + int w, int h) +{ + int64_t sum = 0; + + for (int y = 0; y < h; y++) { + for (int x = 0; x < w; x++) + sum += pow_2(ref[x] - main[x]); + ref += ref_linesize; + main += main_linesize; + } + + return sum; +} + +static double get_sd16(const uint8_t *rref, int ref_linesize, + const uint8_t *mmain, int main_linesize, + int w, int h) +{ + const uint16_t *ref = (const uint16_t *)rref; + const uint16_t *main = (const uint16_t *)mmain; + int64_t sum = 0; + + ref_linesize /= 2; + main_linesize /= 2; + + for (int y = 0; y < h; y++) { + for (int x = 0; x < w; x++) + sum += pow_2(ref[x] - main[x]); + ref += ref_linesize; + main += main_linesize; + } + + return sum; +} + +static +int compute_block_wmse16(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + ThreadData *td = arg; + double *score = td->score[jobnr]; + const int sizeN = s->sizeN; + + for (int c = 0; c < s->nb_components; c++) { + const int planew = (td->planewidth[0] - 2) / sizeN; + const int blockw = (sizeN * td->planewidth[c]) / td->planewidth[0]; + const int blockh = (sizeN * td->planeheight[c]) / td->planeheight[0]; + const int slice_start = ((td->planeheight[c] / blockh) * jobnr) / nb_jobs; + const int slice_end = ((td->planeheight[c] / blockh) * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[c]; + const int main_linesize = td->main_linesize[c]; + const uint8_t *main_line = td->main_data[c] + main_linesize * slice_start * blockw; + const uint8_t *ref_line = td->ref_data[c] + ref_linesize * slice_start * blockw; + const double *weights = s->weights + slice_start * planew; + double m = 0; + + for (int i = slice_start; i < slice_end; i++) { + for (int j = 0; j < planew; j++) { + m += weights[j] * get_sd16(ref_line + 2 * j * blockw, ref_linesize, + main_line + 2 * j * blockw, main_linesize, + blockw, blockh); + } + weights += planew; + ref_line += ref_linesize * blockw; + main_line += main_linesize * blockw; + } + score[c] = m; + } + + return 0; +} + +static +int compute_sample_wmse16(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + ThreadData *td = arg; + double *score = td->score[jobnr]; + for (int c = 0; c < s->nb_components; c++) { + const int wstepw = td->planewidth[0] / td->planewidth[c]; + const int wsteph = td->planeheight[0] / td->planeheight[c]; + const int blockw = td->planewidth[c]; + const int blockh = td->planeheight[c]; + const int slice_start = (blockh * jobnr) / nb_jobs; + const int slice_end = (blockh * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[c]; + const int main_linesize = td->main_linesize[c]; + const uint8_t *main_line = td->main_data[c] + main_linesize * slice_start; + const uint8_t *ref_line = td->ref_data[c] + ref_linesize * slice_start; + const double *weights = s->weights + slice_start * blockw; + double m = 0.; + + for (int i = slice_start; i < slice_end; i++) { + for (int j = 0; j < blockw; j++) + m += weights[j * wstepw] * pow_2(AV_RN16(ref_line + 2*j) - AV_RN16(main_line + 2*j)); + weights += blockw * wsteph; + ref_line += ref_linesize; + main_line += main_linesize; + } + score[c] = m; + } + + return 0; +} + +static +int compute_block_weights8(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + const int sizeN = s->sizeN; + const double amin = s->amin; + const double sizeN2 = sizeN * sizeN; + const double wnum = sqrt(s->apic) * sizeN2; + ThreadData *td = arg; + const int blockw = (td->planewidth[0] - 2) / sizeN; + const int blockh = (td->planeheight[0] - 2) / sizeN; + const int slice_start = (blockh * jobnr) / nb_jobs; + const int slice_end = (blockh * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[0]; + const int main_linesize = td->main_linesize[0]; + const uint8_t *main_line = td->main_data[0] + main_linesize * slice_start * sizeN + 1 + main_linesize; + const uint8_t *ref_line = td->ref_data[0] + ref_linesize * slice_start * sizeN + 1 + ref_linesize; + double *weights = s->weights + slice_start * blockw; + + for (int i = slice_start; i < slice_end; i++) { + for (int j = 0; j < blockw; j++) { + const double ak = fmax(sizeN2 * amin, get_hx(ref_line + j * sizeN, ref_linesize, sizeN, sizeN)); + const double wk = wnum / ak; + + weights[j] = wk; + } + weights += blockw; + ref_line += ref_linesize * sizeN; + main_line += main_linesize * sizeN; + } + + return 0; +} + +static +int compute_sample_weights8(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + const int sizeM = s->sizeM; + const double apic = s->apic; + const double sizeM2 = sizeM * sizeM; + const double wnum = sqrt(apic) * sizeM2; + ThreadData *td = arg; + const uint64_t *sat = s->sat; + const int blockw = td->planewidth[0]; + const int blockh = td->planeheight[0]; + const int slice_start = (blockh * jobnr) / nb_jobs; + const int slice_end = (blockh * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[0]; + const int main_linesize = td->main_linesize[0]; + const uint8_t *main_line = td->main_data[0] + main_linesize * slice_start; + const uint8_t *ref_line = td->ref_data[0] + ref_linesize * slice_start; + double *weights = s->weights + slice_start * blockw; + + for (int i = slice_start; i < slice_end; i++) { + const int top = FFMAX(i - sizeM, 0); + const int bottom = FFMIN(i + sizeM + 1, blockh - i); + for (int j = 0; j < blockw; j++) { + const int left = FFMAX(j - sizeM, 0); + const int right = FFMIN(j + sizeM + 1, blockw - j); + const double ak = fmax(0.25, get_hxs(sat, j, i, + blockw + 1, + top, bottom, + left, right)); + weights[j] = wnum / ak; + } + weights += blockw; + ref_line += ref_linesize; + main_line += main_linesize; + } + + return 0; +} + +static +int compute_sample_weights16(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + const int sizeM = s->sizeM; + const double sizeM2 = sizeM * sizeM; + const double wnum = sqrt(s->apic) * sizeM2; + ThreadData *td = arg; + const uint64_t *sat = s->sat; + const int blockw = td->planewidth[0]; + const int blockh = td->planeheight[0]; + const int slice_start = (blockh * jobnr) / nb_jobs; + const int slice_end = (blockh * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[0]; + const int main_linesize = td->main_linesize[0]; + const uint8_t *main_line = td->main_data[0] + main_linesize * slice_start; + const uint8_t *ref_line = td->ref_data[0] + ref_linesize * slice_start; + double *weights = s->weights + slice_start * blockw; + + for (int i = slice_start; i < slice_end; i++) { + const int top = FFMAX(i - sizeM, 0); + const int bottom = FFMIN(i + sizeM + 1, blockh - i); + for (int j = 0; j < blockw; j++) { + const int left = FFMAX(j - sizeM, 0); + const int right = FFMIN(j + sizeM + 1, blockw - j); + const double ak = fmax(0.25, get_hxs(sat, j, i, + blockw + 1, + top, bottom, + left, right)); + weights[j] = wnum / ak; + } + weights += blockw; + ref_line += ref_linesize; + main_line += main_linesize; + } + + return 0; +} + +static +int compute_block_weights16(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + const int sizeN = s->sizeN; + const double amin = s->amin; + const double sizeN2 = sizeN * sizeN; + const double wnum = sqrt(s->apic) * sizeN2; + ThreadData *td = arg; + const int blockw = (td->planewidth[0] - 2) / sizeN; + const int blockh = (td->planeheight[0] - 2) / sizeN; + const int slice_start = (blockh * jobnr) / nb_jobs; + const int slice_end = (blockh * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[0]; + const int main_linesize = td->main_linesize[0]; + const uint8_t *main_line = td->main_data[0] + main_linesize * slice_start * sizeN + 1 + main_linesize; + const uint8_t *ref_line = td->ref_data[0] + ref_linesize * slice_start * sizeN + 1 + ref_linesize; + double *weights = s->weights + slice_start * blockw; + + for (int i = slice_start; i < slice_end; i++) { + for (int j = 0; j < blockw; j++) { + const double ak = fmax(sizeN2 * amin, get_hx16(ref_line + 2 * j * sizeN, ref_linesize, sizeN, sizeN)); + const double wk = wnum / ak; + + weights[j] = wk; + } + weights += blockw; + ref_line += ref_linesize * sizeN; + main_line += main_linesize * sizeN; + } + + return 0; +} + +static +int compute_block_wmse8(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + ThreadData *td = arg; + double *score = td->score[jobnr]; + const int sizeN = s->sizeN; + + for (int c = 0; c < s->nb_components; c++) { + const int planew = (td->planewidth[0] - 2) / sizeN; + const int blockw = (sizeN * td->planewidth[c]) / td->planewidth[0]; + const int blockh = (sizeN * td->planeheight[c]) / td->planeheight[0]; + const int slice_start = ((td->planeheight[c] / blockh) * jobnr) / nb_jobs; + const int slice_end = ((td->planeheight[c] / blockh) * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[c]; + const int main_linesize = td->main_linesize[c]; + const uint8_t *main_line = td->main_data[c] + main_linesize * slice_start * blockw; + const uint8_t *ref_line = td->ref_data[c] + ref_linesize * slice_start * blockw; + const double *weights = s->weights + slice_start * planew; + double m = 0; + + for (int i = slice_start; i < slice_end; i++) { + for (int j = 0; j < planew; j++) { + m += weights[j] * get_sd(ref_line + j * blockw, ref_linesize, + main_line + j * blockw, main_linesize, + blockw, blockh); + } + weights += planew; + ref_line += ref_linesize * blockw; + main_line += main_linesize * blockw; + } + score[c] = m; + } + + return 0; +} + +static +int compute_sample_wmse8(AVFilterContext *ctx, void *arg, + int jobnr, int nb_jobs) +{ + WPSNRContext *s = ctx->priv; + ThreadData *td = arg; + double *score = td->score[jobnr]; + + for (int c = 0; c < s->nb_components; c++) { + const int wstepw = td->planewidth[0] / td->planewidth[c]; + const int wsteph = td->planeheight[0] / td->planeheight[c]; + const int blockw = td->planewidth[c]; + const int blockh = td->planeheight[c]; + const int slice_start = (blockh * jobnr) / nb_jobs; + const int slice_end = (blockh * (jobnr+1)) / nb_jobs; + const int ref_linesize = td->ref_linesize[c]; + const int main_linesize = td->main_linesize[c]; + const uint8_t *main_line = td->main_data[c] + main_linesize * slice_start; + const uint8_t *ref_line = td->ref_data[c] + ref_linesize * slice_start; + const double *weights = s->weights + slice_start * blockw; + double m = 0.; + + for (int i = slice_start; i < slice_end; i++) { + for (int j = 0; j < blockw; j++) + m += weights[j * wstepw] * pow_2(ref_line[j] - main_line[j]); + weights += blockw * wsteph; + ref_line += ref_linesize; + main_line += main_linesize; + } + score[c] = m; + } + + return 0; +} + +static void set_meta(AVDictionary **metadata, const char *key, char comp, float d) +{ + char value[128]; + snprintf(value, sizeof(value), "%f", d); + if (comp) { + char key2[128]; + snprintf(key2, sizeof(key2), "%s%c", key, comp); + av_dict_set(metadata, key2, value, 0); + } else { + av_dict_set(metadata, key, value, 0); + } +} + +static const char *const wpsnr_keys[][2] = +{ + { + "lavfi.wpsnr.block.wmse.", + "lavfi.wpsnr.sample.wmse.", + }, + { + "lavfi.wpsnr.block.wpsnr.", + "lavfi.wpsnr.sample.wpsnr.", + }, + { + "lavfi.wpsnr.block.wmse_avg", + "lavfi.wpsnr.sample.wmse_avg", + }, + { + "lavfi.wpsnr.block.wpsnr_avg", + "lavfi.wpsnr.sample.wpsnr_avg", + }, +}; + +static int do_wpsnr(FFFrameSync *fs) +{ + AVFilterContext *ctx = fs->parent; + WPSNRContext *s = ctx->priv; + AVFrame *master, *ref; + double comp_wmse[4] = { 0 }, wmse = 0.; + uint64_t comp_sum[4] = { 0 }; + AVDictionary **metadata; + ThreadData td; + int ret; + + ret = ff_framesync_dualinput_get(fs, &master, &ref); + if (ret < 0) + return ret; + if (ctx->is_disabled || !ref) + return ff_filter_frame(ctx->outputs[0], master); + metadata = &master->metadata; + + td.nb_components = s->nb_components; + td.score = s->score; + for (int c = 0; c < s->nb_components; c++) { + td.main_data[c] = master->data[c]; + td.ref_data[c] = ref->data[c]; + td.main_linesize[c] = master->linesize[c]; + td.ref_linesize[c] = ref->linesize[c]; + td.planewidth[c] = s->planewidth[c]; + td.planeheight[c] = s->planeheight[c]; + } + + if (s->mode) { + s->compute_hx(ref->data[0], ref->linesize[0], + s->planewidth[0], s->planeheight[0], + s->tmp, s->planewidth[0]); + s->compute_sat(s->tmp, s->planewidth[0], s->planewidth[0], + s->planeheight[0], s->sat, s->planewidth[0]); + ff_filter_execute(ctx, s->compute_weights, &td, NULL, + FFMIN(s->planeheight[0], s->nb_threads)); + } else { + ff_filter_execute(ctx, s->compute_weights, &td, NULL, + FFMIN(s->planeheight[0] / s->sizeN, s->nb_threads)); + } + + ff_filter_execute(ctx, s->compute_wmse, &td, NULL, + FFMIN(s->planeheight[1], s->nb_threads)); + + for (int j = 0; j < s->nb_threads; j++) { + for (int c = 0; c < s->nb_components; c++) + comp_sum[c] += s->score[j][c]; + } + + for (int c = 0; c < s->nb_components; c++) + comp_wmse[c] = comp_sum[c] / ((double)(s->planewidth[c] - 2) * (s->planeheight[c] - 2)); + + for (int c = 0; c < s->nb_components; c++) + wmse += comp_wmse[c] * s->planeweight[c]; + + s->min_wmse = FFMIN(s->min_wmse, wmse); + s->max_wmse = FFMAX(s->max_wmse, wmse); + + s->wmse += wmse; + + for (int j = 0; j < s->nb_components; j++) + s->wmse_comp[j] += comp_wmse[j]; + s->nb_frames++; + + for (int j = 0; j < s->nb_components; j++) { + set_meta(metadata, wpsnr_keys[0][s->mode], s->comps[j], comp_wmse[j]); + set_meta(metadata, wpsnr_keys[1][s->mode], s->comps[j], get_wpsnr(comp_wmse[j], 1, s->max[j])); + } + set_meta(metadata, wpsnr_keys[2][s->mode], 0, wmse); + set_meta(metadata, wpsnr_keys[3][s->mode], 0, get_wpsnr(wmse, 1, s->average_max)); + + return ff_filter_frame(ctx->outputs[0], master); +} + +static av_cold int init(AVFilterContext *ctx) +{ + WPSNRContext *s = ctx->priv; + + s->min_wmse = +INFINITY; + s->max_wmse = -INFINITY; + + s->fs.on_event = do_wpsnr; + return 0; +} + +static const enum AVPixelFormat pix_fmts[] = { + AV_PIX_FMT_GRAY8, AV_PIX_FMT_GRAY9, AV_PIX_FMT_GRAY10, AV_PIX_FMT_GRAY12, AV_PIX_FMT_GRAY14, AV_PIX_FMT_GRAY16, +#define PF_NOALPHA(suf) AV_PIX_FMT_YUV420##suf, AV_PIX_FMT_YUV422##suf, AV_PIX_FMT_YUV444##suf +#define PF_ALPHA(suf) AV_PIX_FMT_YUVA420##suf, AV_PIX_FMT_YUVA422##suf, AV_PIX_FMT_YUVA444##suf +#define PF(suf) PF_NOALPHA(suf), PF_ALPHA(suf) + PF(P), PF(P9), PF(P10), PF_NOALPHA(P12), PF_NOALPHA(P14), PF(P16), + AV_PIX_FMT_YUV440P, AV_PIX_FMT_YUV411P, AV_PIX_FMT_YUV410P, + AV_PIX_FMT_YUVJ411P, AV_PIX_FMT_YUVJ420P, AV_PIX_FMT_YUVJ422P, + AV_PIX_FMT_YUVJ440P, AV_PIX_FMT_YUVJ444P, + AV_PIX_FMT_NONE +}; + +static int get_sizeM(int w, int h) +{ + return round(14 * sqrt(w * h / (WIDTH * HEIGHT))); +} + +static int get_sizeN(int w, int h) +{ + return ceil(128 * sqrt(w * h / (WIDTH * HEIGHT))); +} + +static double get_apic(int w, int h, int depth) +{ + return (1 << depth) * sqrt((WIDTH * HEIGHT) / (w * h)); +} + +static double get_amin(int depth) +{ + return 1 << (depth - 6); +} + +static int config_input_ref(AVFilterLink *inlink) +{ + const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(inlink->format); + AVFilterContext *ctx = inlink->dst; + WPSNRContext *s = ctx->priv; + double average_max; + unsigned sum; + + s->sizeN = get_sizeN(inlink->w, inlink->h); + s->sizeM = get_sizeM(inlink->w, inlink->h); + s->nb_threads = ff_filter_get_nb_threads(ctx); + s->nb_components = desc->nb_components; + if (ctx->inputs[0]->w != ctx->inputs[1]->w || + ctx->inputs[0]->h != ctx->inputs[1]->h) { + av_log(ctx, AV_LOG_ERROR, "Width and height of input videos must be same.\n"); + return AVERROR(EINVAL); + } + + s->max[0] = (1 << desc->comp[0].depth) - 1; + s->max[1] = (1 << desc->comp[1].depth) - 1; + s->max[2] = (1 << desc->comp[2].depth) - 1; + s->max[3] = (1 << desc->comp[3].depth) - 1; + + s->apic = get_apic(inlink->w, inlink->h, desc->comp[0].depth); + s->amin = get_amin(desc->comp[0].depth); + + s->compute_hx = desc->comp[0].depth <= 8 ? compute_hx8 : compute_hx16; + s->compute_sat = compute_sat16; + s->compute_weights = desc->comp[0].depth <= 8 ? compute_block_weights8 : compute_block_weights16; + s->compute_wmse = desc->comp[0].depth <= 8 ? compute_block_wmse8 : compute_block_wmse16; + if (s->mode) { + s->compute_weights = desc->comp[0].depth <= 8 ? compute_sample_weights8 : compute_sample_weights16; + s->compute_wmse = desc->comp[0].depth <= 8 ? compute_sample_wmse8 : compute_sample_wmse16; + } + + s->comps[0] = 'y' ; + s->comps[1] = 'u' ; + s->comps[2] = 'v' ; + s->comps[3] = 'a'; + + s->planeheight[1] = s->planeheight[2] = AV_CEIL_RSHIFT(inlink->h, desc->log2_chroma_h); + s->planeheight[0] = s->planeheight[3] = inlink->h; + s->planewidth[1] = s->planewidth[2] = AV_CEIL_RSHIFT(inlink->w, desc->log2_chroma_w); + s->planewidth[0] = s->planewidth[3] = inlink->w; + sum = 0; + for (int j = 0; j < s->nb_components; j++) + sum += s->planeheight[j] * s->planewidth[j]; + average_max = 0; + for (int j = 0; j < s->nb_components; j++) { + s->planeweight[j] = (double) s->planeheight[j] * s->planewidth[j] / sum; + average_max += s->max[j] * s->planeweight[j]; + } + s->average_max = lrint(average_max); + + s->score = av_calloc(s->nb_threads, sizeof(*s->score)); + if (!s->score) + return AVERROR(ENOMEM); + + for (int t = 0; t < s->nb_threads; t++) { + s->score[t] = av_calloc(s->nb_components, sizeof(*s->score[0])); + if (!s->score[t]) + return AVERROR(ENOMEM); + } + + s->tmp = av_calloc(inlink->w * inlink->h, sizeof(s->tmp)); + s->sat = av_calloc((inlink->w + 1) * (inlink->h + 1), sizeof(s->sat)); + s->weights = av_calloc(inlink->w * inlink->h, sizeof(s->weights)); + if (!s->sat || !s->tmp || !s->weights) + return AVERROR(ENOMEM); + + return 0; +} + +static int config_output(AVFilterLink *outlink) +{ + AVFilterContext *ctx = outlink->src; + WPSNRContext *s = ctx->priv; + AVFilterLink *mainlink = ctx->inputs[0]; + int ret; + + ret = ff_framesync_init_dualinput(&s->fs, ctx); + if (ret < 0) + return ret; + outlink->w = mainlink->w; + outlink->h = mainlink->h; + outlink->time_base = mainlink->time_base; + outlink->sample_aspect_ratio = mainlink->sample_aspect_ratio; + outlink->frame_rate = mainlink->frame_rate; + if ((ret = ff_framesync_configure(&s->fs)) < 0) + return ret; + + outlink->time_base = s->fs.time_base; + + if (av_cmp_q(mainlink->time_base, outlink->time_base) || + av_cmp_q(ctx->inputs[1]->time_base, outlink->time_base)) + av_log(ctx, AV_LOG_WARNING, "not matching timebases found between first input: %d/%d and second input %d/%d, results may be incorrect!\n", + mainlink->time_base.num, mainlink->time_base.den, + ctx->inputs[1]->time_base.num, ctx->inputs[1]->time_base.den); + + return 0; +} + +static int activate(AVFilterContext *ctx) +{ + WPSNRContext *s = ctx->priv; + return ff_framesync_activate(&s->fs); +} + +static av_cold void uninit(AVFilterContext *ctx) +{ + WPSNRContext *s = ctx->priv; + + if (s->nb_frames > 0) { + int j; + char buf[256]; + + buf[0] = 0; + for (j = 0; j < s->nb_components; j++) { + av_strlcatf(buf, sizeof(buf), " %c:%f", s->comps[j], + get_wpsnr(s->wmse_comp[j], s->nb_frames, s->max[j])); + } + av_log(ctx, AV_LOG_INFO, "WPSNR%c%s average:%f min:%f max:%f\n", + s->mode ? 's' : 'b', + buf, + get_wpsnr(s->wmse, s->nb_frames, s->average_max), + get_wpsnr(s->max_wmse, 1, s->average_max), + get_wpsnr(s->min_wmse, 1, s->average_max)); + } + + ff_framesync_uninit(&s->fs); + for (int t = 0; t < s->nb_threads && s->score; t++) + av_freep(&s->score[t]); + av_freep(&s->score); + + av_freep(&s->weights); + av_freep(&s->sat); + av_freep(&s->tmp); +} + +static const AVFilterPad wpsnr_inputs[] = { + { + .name = "main", + .type = AVMEDIA_TYPE_VIDEO, + },{ + .name = "reference", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_input_ref, + }, +}; + +static const AVFilterPad wpsnr_outputs[] = { + { + .name = "default", + .type = AVMEDIA_TYPE_VIDEO, + .config_props = config_output, + }, +}; + +const AVFilter ff_vf_wpsnr = { + .name = "wpsnr", + .description = NULL_IF_CONFIG_SMALL("Calculate the Perceptual Weighted PSNR between two video streams."), + .preinit = wpsnr_framesync_preinit, + .init = init, + .uninit = uninit, + .activate = activate, + .priv_size = sizeof(WPSNRContext), + .priv_class = &wpsnr_class, + FILTER_INPUTS(wpsnr_inputs), + FILTER_OUTPUTS(wpsnr_outputs), + FILTER_PIXFMTS_ARRAY(pix_fmts), + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_INTERNAL | AVFILTER_FLAG_SLICE_THREADS, +};