From patchwork Sun Jul 18 14:08:59 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paul B Mahol <onemda@gmail.com>
X-Patchwork-Id: 28967
Delivered-To: ffmpegpatchwork2@gmail.com
Received: by 2002:a5d:965a:0:0:0:0:0 with SMTP id d26csp3587645ios;
        Sun, 18 Jul 2021 07:14:40 -0700 (PDT)
X-Google-Smtp-Source: 
 ABdhPJwi/2Jw4J0SlfJfKdQwoydGWzJ8EBkCOcif5F7Fla9n64S9Ew7AaDzZtmJ/nXqJRUxf+Q2a
X-Received: by 2002:a05:6402:221c:: with SMTP id
 cq28mr22339741edb.115.1626617679814;
        Sun, 18 Jul 2021 07:14:39 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1626617679; cv=none;
        d=google.com; s=arc-20160816;
        b=ixppfgynVPm83Ok1UWSUQHHn5iYLT8K34DsvEYjeHG4X0J/elE6vtTsD95iTiRYbrc
         /+ioQVDVkPNnpFV8WDIqGATjDkJu8lINqtWmTi+cUVUaX90iqZiIWvSonSmzx8JRQWiN
         BcWCCxnl0T6GalK+lKeMGtxCX8ZZBgaRcyF7BHcG2a48rWYq6JAo0GVyZMLpiVSYAVdV
         50ThJYiQPCPn+eYHtD+sk/Uex/+X6dvl0mqf8nn75ad5ovX9ZD/NQB/H1btT3itC7Tt2
         cxK8zgV8wKi5vincEKcaiqL/fcahn4koJ+xQe6xGD/ounoBHVv1j13ODlUgXzxVclnKB
         sgtQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:content-transfer-encoding:mime-version:reply-to
         :list-subscribe:list-help:list-post:list-archive:list-unsubscribe
         :list-id:precedence:subject:message-id:date:to:from:dkim-signature
         :delivered-to;
        bh=dnXCuLdijtDAo4CPSOP+YJO6FVoEJX0nmmNNz3vfCO8=;
        b=Nm6eipLdTNOO1wi8fNEXJuU+i/CVe4XZsWNV3fPJGk1WXhzoHHbsVek3DFTjLtIIVo
         yHwUdf8kQGCS8RUgnTDYsHRi5MccjIgSf+pRxTFttENUZGjVvGOdqM2bMB6vzNOlkpEa
         c6mrVHzhzRxmQ77GyS/CX7Fwjd6CyuyCk+ljkJptcSnwB2zl82J4BDxQkod6GZh7Oi7c
         7e8rSCnRKj4snh971T9hqPmPdau2M9n3PVvoy1uZWZwN1DdXopk+bLOp+J0rLY8e/e24
         JjTSEhtCNTUflIvsrvbhtRwcjMm6+X1w0dYf7gkz7R7kWnhpE2zZPz8GcAN0yO2vLpyq
         eK3w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@gmail.com
 header.s=20161025 header.b=vXvC10Ou;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100])
        by mx.google.com with ESMTP id
 a17si17490033edv.499.2021.07.18.07.14.39;
        Sun, 18 Jul 2021 07:14:39 -0700 (PDT)
Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100;
Authentication-Results: mx.google.com;
       dkim=neutral (body hash did not verify) header.i=@gmail.com
 header.s=20161025 header.b=vXvC10Ou;
       spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org
 designates 79.124.17.100 as permitted sender)
 smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org;
       dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B3DA268AB27;
	Sun, 18 Jul 2021 17:14:35 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com
 [209.85.208.171])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 923C668A9D3
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Jul 2021 17:14:29 +0300 (EEST)
Received: by mail-lj1-f171.google.com with SMTP id e20so21718870ljn.8
 for <ffmpeg-devel@ffmpeg.org>; Sun, 18 Jul 2021 07:14:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:subject:date:message-id;
 bh=+N59WLb2QDikRysW60sPATudrRvaAxnVv1n7fL98XEc=;
 b=vXvC10OuBn8w/RVsI76kG/LKDQ7Z7n5rT4jR1pbDyhogvnrUcRIMBNFEW9I557pKIs
 xosLkxcopVhLsdhPoW3DQHCQNly8iNspjfOj9j+YVEa1PfuYJVZKZ3C6WI6VO+F1ePCu
 uHQDtP3pIPkN72jyQtx5myb9xOEUugt4RbY1m3TYiImekZqAsTTatORR9RjTyEVoT/gV
 NWzgjGWUMwTVIiyA6uHu2fEk4bDThf5uA5WmhAFUlxTjpdxevHd6geXZKPAcwRMsw/YU
 PBZjYqvDUBQkoDk5c0qBJhNU5aM05RAeyfIGb65Msvhtalv/ExS4EntPmw6IGIWvRTxw
 W4bA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:subject:date:message-id;
 bh=+N59WLb2QDikRysW60sPATudrRvaAxnVv1n7fL98XEc=;
 b=fV70r9lMfAhCxhdo5Nte1pGb7KVT8b293DZtSQAtSkPVPw5xsYcNDP3X345SOLWmma
 STI+Xc53yGVd1L5THRLE/ztBUYnmgYC6sSgL+jIqd3m7F6sV7gzz5SFD8bxbrnflmpDB
 zQIp5da52vFsh5X3AkGIj0rXVZom4C1ptDFm0mCKpEadDe+Erb/legtg6NDTILeWvoMg
 /6NTR3YSNNJQrtsLZVzmIxs/gfKelnRYT6O5XLr6H4FXyXyp/S9/JqWJ56LlDoHMbtwT
 d+089u5CU7BPiWuaIDwbkYbLr5SDFgyqil8eGa3rmG2dUxLmOSJP8iEzocNI6S73CdPo
 WFBQ==
X-Gm-Message-State: AOAM533xlVMRz4qh4hq3dXTnc4Su6wbkHxE4UGc4RUGeG8xAK6vpHTD6
 060DYEOPx/FD5/YlbUQbOrL4dlDeSj4=
X-Received: by 2002:a5d:4c50:: with SMTP id
 n16mr24820009wrt.249.1626617347736;
 Sun, 18 Jul 2021 07:09:07 -0700 (PDT)
Received: from localhost.localdomain ([95.168.118.78])
 by smtp.gmail.com with ESMTPSA id p11sm16473113wro.78.2021.07.18.07.09.06
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 18 Jul 2021 07:09:07 -0700 (PDT)
From: Paul B Mahol <onemda@gmail.com>
To: ffmpeg-devel@ffmpeg.org
Date: Sun, 18 Jul 2021 16:08:59 +0200
Message-Id: <20210718140859.6024-1-onemda@gmail.com>
X-Mailer: git-send-email 2.17.1
Subject: [FFmpeg-devel] [WIP] [RFC] [PATCH] avfilter: add audio
 psychoacoustic clipper
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
X-TUID: 1TvAA2xxgdVX

Signed-off-by: Paul B Mahol <onemda@gmail.com>
---
 doc/filters.texi          |  15 +
 libavfilter/Makefile      |   1 +
 libavfilter/af_apsyclip.c | 645 ++++++++++++++++++++++++++++++++++++++
 libavfilter/allfilters.c  |   1 +
 4 files changed, 662 insertions(+)
 create mode 100644 libavfilter/af_apsyclip.c

diff --git a/doc/filters.texi b/doc/filters.texi
index 232d81ae3e..ee0e841049 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -2281,6 +2281,21 @@ Default value is 1.0.
 
 This filter supports the all above options as @ref{commands}.
 
+@section apsyclip
+Apply Psychoacoustic clipper to audio samples.
+
+The filter accepts the following options:
+
+@table @option
+@item level_in
+@item level_out
+@item clip_level
+@item diff_only
+@item adaptive
+@item iterations
+@item auto_level
+@end table
+
 @section apulsator
 
 Audio pulsator is something between an autopanner and a tremolo.
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 62ee3d7b67..1c943cbbc4 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -72,6 +72,7 @@ OBJS-$(CONFIG_APAD_FILTER)                   += af_apad.o
 OBJS-$(CONFIG_APERMS_FILTER)                 += f_perms.o
 OBJS-$(CONFIG_APHASER_FILTER)                += af_aphaser.o generate_wave_table.o
 OBJS-$(CONFIG_APHASESHIFT_FILTER)            += af_afreqshift.o
+OBJS-$(CONFIG_APSYCLIP_FILTER)               += af_apsyclip.o
 OBJS-$(CONFIG_APULSATOR_FILTER)              += af_apulsator.o
 OBJS-$(CONFIG_AREALTIME_FILTER)              += f_realtime.o
 OBJS-$(CONFIG_ARESAMPLE_FILTER)              += af_aresample.o
diff --git a/libavfilter/af_apsyclip.c b/libavfilter/af_apsyclip.c
new file mode 100644
index 0000000000..29ea0f2b26
--- /dev/null
+++ b/libavfilter/af_apsyclip.c
@@ -0,0 +1,645 @@
+/*
+ * Copyright (c) 2014 - 2021 Jason Jang
+ * Copyright (c) 2021 Paul B Mahol
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/opt.h"
+#include "libavutil/tx.h"
+#include "audio.h"
+#include "avfilter.h"
+#include "filters.h"
+#include "internal.h"
+
+typedef struct AudioPsyClipContext {
+    const AVClass *class;
+
+    double level_in;
+    double level_out;
+    double clip_level;
+    double adaptive_distortion;
+    int auto_level;
+    int diff_only;
+    int iterations;
+    char *protections_str;
+    double *protections;
+
+    int num_psy_bins;
+    int fft_size;
+    int overlap;
+
+    int spread_table_rows;
+    int *spread_table_index;
+    int (*spread_table_range)[2];
+    float *window, *inv_window, *spread_table, *margin_curve;
+
+    AVFrame *in_buffer;
+    AVFrame *out_buffer;
+    AVFrame *in_frame;
+    AVFrame *out_dist_frame;
+    AVFrame *windowed_frame;
+    AVFrame *clipping_delta;
+    AVFrame *spectrum_buf;
+    AVFrame *mask_curve;
+
+    AVTXContext *tx_ctx;
+    av_tx_fn tx_fn;
+    AVTXContext *itx_ctx;
+    av_tx_fn itx_fn;
+} AudioPsyClipContext;
+
+#define OFFSET(x) offsetof(AudioPsyClipContext, x)
+#define A AV_OPT_FLAG_AUDIO_PARAM
+#define F AV_OPT_FLAG_FILTERING_PARAM
+
+static const AVOption apsyclip_options[] = {
+    { "level_in",  "set input level",  OFFSET(level_in),     AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,   64, A|F },
+    { "level_out", "set output level", OFFSET(level_out),    AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,   64, A|F },
+    { "clip_level", "set clip level", OFFSET(clip_level),    AV_OPT_TYPE_DOUBLE, {.dbl=1},.015625,    1, A|F },
+    { "diff_only", "set hear difference", OFFSET(diff_only), AV_OPT_TYPE_BOOL,   {.i64=0},      0,    1, A|F },
+    { "adaptive", "set adaptive distortion", OFFSET(adaptive_distortion), AV_OPT_TYPE_DOUBLE, {.dbl=0.5}, 0, 1, A|F },
+    { "iterations", "set iterations", OFFSET(iterations), AV_OPT_TYPE_INT, {.i64=10}, 0, 20, A|F },
+    { "auto_level", "set auto level", OFFSET(auto_level), AV_OPT_TYPE_BOOL,{.i64=1},  0,  1, A|F },
+    {NULL}
+};
+
+AVFILTER_DEFINE_CLASS(apsyclip);
+
+static int query_formats(AVFilterContext *ctx)
+{
+    AVFilterFormats *formats;
+    AVFilterChannelLayouts *layouts;
+    static const enum AVSampleFormat sample_fmts[] = {
+        AV_SAMPLE_FMT_FLTP,
+        AV_SAMPLE_FMT_NONE
+    };
+    int ret;
+
+    layouts = ff_all_channel_counts();
+    if (!layouts)
+        return AVERROR(ENOMEM);
+    ret = ff_set_common_channel_layouts(ctx, layouts);
+    if (ret < 0)
+        return ret;
+
+    formats = ff_make_format_list(sample_fmts);
+    if (!formats)
+        return AVERROR(ENOMEM);
+    ret = ff_set_common_formats(ctx, formats);
+    if (ret < 0)
+        return ret;
+
+    formats = ff_all_samplerates();
+    if (!formats)
+        return AVERROR(ENOMEM);
+    return ff_set_common_samplerates(ctx, formats);
+}
+
+static void generate_hann_window(float *window, float *inv_window, int size)
+{
+    for (int i = 0; i < size; i++) {
+        float value = 0.5f * (1.f - cosf(2.f * M_PI * i / size));
+
+        window[i] = value;
+        // 1/window to calculate unwindowed peak.
+        inv_window[i] = value > 0.01f ? 1.f / value : 0.f;
+    }
+}
+
+static void set_margin_curve(AudioPsyClipContext *s,
+                             const int (*points)[2], int num_points, int sample_rate)
+{
+    int j = 0;
+
+    s->margin_curve[0] = points[0][1];
+
+    for (int i = 0; i < num_points - 1; i++) {
+        while (j < s->fft_size / 2 + 1 && j * sample_rate / s->fft_size < points[i + 1][0]) {
+            // linearly interpolate between points
+            int binHz = j * sample_rate / s->fft_size;
+            s->margin_curve[j] = points[i][1] + (binHz - points[i][0]) * (points[i + 1][1] - points[i][1]) / (points[i + 1][0] - points[i][0]);
+            j++;
+        }
+    }
+    // handle bins after the last point
+    while (j < s->fft_size / 2 + 1) {
+        s->margin_curve[j] = points[num_points - 1][1];
+        j++;
+    }
+
+    // convert margin curve to linear amplitude scale
+    for (j = 0; j < s->fft_size / 2 + 1; j++)
+        s->margin_curve[j] = powf(10.f, s->margin_curve[j] / 20.f);
+}
+
+static void generate_spread_table(AudioPsyClipContext *s)
+{
+    // Calculate tent-shape function in log-log scale.
+
+    // As an optimization, only consider bins close to "bin"
+    // This reduces the number of multiplications needed in calculate_mask_curve
+    // The masking contribution at faraway bins is negligeable
+
+    // Another optimization to save memory and speed up the calculation of the
+    // spread table is to calculate and store only 2 spread functions per
+    // octave, and reuse the same spread function for multiple bins.
+    int table_index = 0;
+    int bin = 0;
+    int increment = 1;
+
+    while (bin < s->num_psy_bins) {
+        float sum = 0;
+        int base_idx = table_index * s->num_psy_bins;
+        int start_bin = bin * 3 / 4;
+        int end_bin = FFMIN(s->num_psy_bins, ((bin + 1) * 4 + 2) / 3);
+        int next_bin;
+
+        for (int j = start_bin; j < end_bin; j++) {
+            // add 0.5 so i=0 doesn't get log(0)
+            float rel_idx_log = FFABS(logf((j + 0.5f) / (bin + 0.5f)));
+            float value;
+            if (j >= bin) {
+                // mask up
+                value = expf(-rel_idx_log * 40.f);
+            } else {
+                // mask down
+                value = expf(-rel_idx_log * 80.f);
+            }
+            // the spreading function is centred in the row
+            sum += value;
+            s->spread_table[base_idx + s->num_psy_bins / 2 + j - bin] = value;
+        }
+        // now normalize it
+        for (int j = start_bin; j < end_bin; j++) {
+            s->spread_table[base_idx + s->num_psy_bins / 2 + j - bin] /= sum;
+        }
+
+        s->spread_table_range[table_index][0] = start_bin - bin;
+        s->spread_table_range[table_index][1] = end_bin - bin;
+
+        if (bin <= 1) {
+            next_bin = bin + 1;
+        } else {
+            if ((bin & (bin - 1)) == 0) {
+                // power of 2
+                increment = bin / 2;
+            }
+
+            next_bin = bin + increment;
+        }
+
+        // set bins between "bin" and "next_bin" to use this table_index
+        for (int i = bin; i < next_bin; i++)
+            s->spread_table_index[i] = table_index;
+
+        bin = next_bin;
+        table_index++;
+    }
+}
+
+static int config_input(AVFilterLink *inlink)
+{
+    AVFilterContext *ctx = inlink->dst;
+    AudioPsyClipContext *s = ctx->priv;
+    static const int points[][2] = { {0,14}, {125,14}, {250,16}, {500,18}, {1000,20}, {2000,20}, {4000,20}, {8000,15}, {16000,5}, {20000,-10} };
+    static const int num_points = 10;
+    float scale;
+    int ret;
+
+    s->fft_size = inlink->sample_rate > 100000 ? 1024 : inlink->sample_rate > 50000 ? 512 : 256;
+    s->overlap = s->fft_size / 4;
+
+    // The psy masking calculation is O(n^2),
+    // so skip it for frequencies not covered by base sampling rantes (i.e. 44k)
+    if (inlink->sample_rate <= 50000) {
+        s->num_psy_bins = s->fft_size / 2;
+    } else if (inlink->sample_rate <= 100000) {
+        s->num_psy_bins = s->fft_size / 4;
+    } else {
+        s->num_psy_bins = s->fft_size / 8;
+    }
+
+    s->window = av_calloc(s->fft_size, sizeof(*s->window));
+    s->inv_window = av_calloc(s->fft_size, sizeof(*s->inv_window));
+    if (!s->window || !s->inv_window)
+        return AVERROR(ENOMEM);
+
+    s->in_buffer      = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->out_buffer     = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->in_frame       = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->out_dist_frame = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->windowed_frame = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->clipping_delta = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->spectrum_buf   = ff_get_audio_buffer(inlink, s->fft_size * 2);
+    s->mask_curve     = ff_get_audio_buffer(inlink, s->fft_size / 2 + 1);
+    if (!s->in_buffer || !s->out_buffer || !s->in_frame ||
+        !s->out_dist_frame || !s->windowed_frame ||
+        !s->clipping_delta || !s->spectrum_buf || !s->mask_curve)
+        return AVERROR(ENOMEM);
+
+    generate_hann_window(s->window, s->inv_window, s->fft_size);
+
+    s->margin_curve = av_calloc(s->fft_size / 2 + 1, sizeof(*s->margin_curve));
+    if (!s->margin_curve)
+        return AVERROR(ENOMEM);
+
+    s->spread_table_rows = av_log2(s->num_psy_bins) * 2;
+    s->spread_table = av_calloc(s->spread_table_rows * s->num_psy_bins, sizeof(*s->spread_table));
+    if (!s->spread_table)
+        return AVERROR(ENOMEM);
+
+    s->spread_table_range = av_calloc(s->spread_table_rows * 2, sizeof(*s->spread_table_range));
+    if (!s->spread_table_range)
+        return AVERROR(ENOMEM);
+
+    s->spread_table_index = av_calloc(s->num_psy_bins, sizeof(*s->spread_table_index));
+    if (!s->spread_table_index)
+        return AVERROR(ENOMEM);
+
+    set_margin_curve(s, points, num_points, inlink->sample_rate);
+
+    generate_spread_table(s);
+
+    ret = av_tx_init(&s->tx_ctx, &s->tx_fn, AV_TX_FLOAT_FFT, 0, s->fft_size, &scale, 0);
+    if (ret < 0)
+        return ret;
+
+    ret = av_tx_init(&s->itx_ctx, &s->itx_fn, AV_TX_FLOAT_FFT, 1, s->fft_size, &scale, 0);
+    if (ret < 0)
+        return ret;
+
+    return 0;
+}
+
+static void apply_window(AudioPsyClipContext *s,
+                         const float *in_frame, float *out_frame, const int add_to_out_frame)
+{
+    const float *window = s->window;
+
+    for (int i = 0; i < s->fft_size; i++) {
+        if (add_to_out_frame) {
+            out_frame[i] += in_frame[i] * window[i];
+        } else {
+            out_frame[i] = in_frame[i] * window[i];
+        }
+    }
+}
+
+static void calculate_mask_curve(AudioPsyClipContext *s,
+                                 const float *spectrum, float *mask_curve)
+{
+    for (int i = 0; i < s->fft_size / 2 + 1; i++)
+        mask_curve[i] = 0;
+
+    for (int i = 0; i < s->num_psy_bins; i++) {
+        float magnitude;
+        int table_idx;
+        int range[2];
+
+        if (i == 0) {
+            magnitude = FFABS(spectrum[0]);
+        } else if (i == s->fft_size / 2) {
+            magnitude = FFABS(spectrum[1]);
+        } else {
+            // although the negative frequencies are omitted because they are redundant,
+            // the magnitude of the positive frequencies are not doubled.
+            // Multiply the magnitude by 2 to simulate adding up the + and - frequencies.
+            magnitude = hypotf(spectrum[2 * i], spectrum[2 * i + 1]) * 2;
+        }
+
+        table_idx = s->spread_table_index[i];
+        range[0] = s->spread_table_range[table_idx][0];
+        range[1] = s->spread_table_range[table_idx][1];
+        int base_idx = table_idx * s->num_psy_bins;
+        int start_bin = FFMAX(0, i + range[0]);
+        int end_bin = FFMIN(s->num_psy_bins, i + range[1]);
+
+        for (int j = start_bin; j < end_bin; j++)
+            mask_curve[j] += s->spread_table[base_idx + s->num_psy_bins / 2 + j - i] * magnitude;
+    }
+
+    // for ultrasonic frequencies, skip the O(n^2) spread calculation and just copy the magnitude
+    for (int i = s->num_psy_bins; i < s->fft_size / 2 + 1; i++) {
+        float magnitude;
+        if (i == s->fft_size / 2) {
+            magnitude = FFABS(spectrum[1]);
+        } else {
+            // although the negative frequencies are omitted because they are redundant,
+            // the magnitude of the positive frequencies are not doubled.
+            // Multiply the magnitude by 2 to simulate adding up the + and - frequencies.
+            magnitude = hypotf(spectrum[2 * i], spectrum[2 * i + 1]) * 2;
+        }
+
+        mask_curve[i] = magnitude;
+    }
+
+    for (int i = 0; i < s->fft_size / 2 + 1; i++)
+        mask_curve[i] = mask_curve[i] / s->margin_curve[i];
+}
+
+static void clip_to_window(AudioPsyClipContext *s,
+                           const float *windowed_frame, float *clipping_delta, float delta_boost)
+{
+    const float *window = s->window;
+
+    for (int i = 0; i < s->fft_size; i++) {
+        float limit = s->clip_level * window[i];
+        float effective_value = windowed_frame[i] + clipping_delta[i];
+
+        if (effective_value > limit) {
+            clipping_delta[i] += (limit - effective_value) * delta_boost;
+        } else if (effective_value < -limit) {
+            clipping_delta[i] += (-limit - effective_value) * delta_boost;
+        }
+    }
+}
+
+static void limit_clip_spectrum(AudioPsyClipContext *s,
+                                float *clip_spectrum, const float *mask_curve)
+{
+    // bin 0
+    float relative_distortion_level = FFABS(clip_spectrum[0]) / mask_curve[0];
+
+    if (relative_distortion_level > 1.f)
+        clip_spectrum[0] /= relative_distortion_level;
+
+    // bin 1..N/2-1
+    for (int i = 1; i < s->fft_size / 2; i++) {
+        float real = clip_spectrum[i * 2];
+        float imag = clip_spectrum[i * 2 + 1];
+        // although the negative frequencies are omitted because they are redundant,
+        // the magnitude of the positive frequencies are not doubled.
+        // Multiply the magnitude by 2 to simulate adding up the + and - frequencies.
+        relative_distortion_level = hypotf(real, imag) * 2 / mask_curve[i];
+        if (relative_distortion_level > 1.0) {
+            clip_spectrum[i * 2] /= relative_distortion_level;
+            clip_spectrum[i * 2 + 1] /= relative_distortion_level;
+        }
+    }
+    // bin N/2
+    relative_distortion_level = FFABS(clip_spectrum[1]) / mask_curve[s->fft_size / 2];
+    if (relative_distortion_level > 1.f)
+        clip_spectrum[1] /= relative_distortion_level;
+}
+
+static void r2c(float *buffer, int size)
+{
+    for (int i = size - 1; i >= 0; i--)
+        buffer[2 * i] = buffer[i];
+
+    for (int i = size - 1; i >= 0; i--)
+        buffer[2 * i + 1] = 0.f;
+}
+
+static void c2r(float *buffer, int size)
+{
+    for (int i = 0; i < size; i++)
+        buffer[i] = buffer[2 * i];
+
+    for (int i = 0; i < size; i++)
+        buffer[i + size] = 0.f;
+}
+
+static void feed(AudioPsyClipContext *s,
+                 const float *in_samples, float *out_samples, int diff_only,
+                 float *in_frame, float *out_dist_frame,
+                 float *windowed_frame, float *clipping_delta,
+                 float *spectrum_buf, float *mask_curve)
+{
+    float orig_peak = 0;
+    float peak;
+
+    // shift in/out buffers
+    for (int i = 0; i < s->fft_size - s->overlap; i++) {
+        in_frame[i] = in_frame[i + s->overlap];
+        out_dist_frame[i] = out_dist_frame[i + s->overlap];
+    }
+
+    for (int i = 0; i < s->overlap; i++) {
+        in_frame[i + s->fft_size - s->overlap] = in_samples[i];
+        out_dist_frame[i + s->fft_size - s->overlap] = 0.f;
+    }
+
+    apply_window(s, in_frame, windowed_frame, 0);
+    r2c(windowed_frame, s->fft_size);
+    s->tx_fn(s->tx_ctx, spectrum_buf, windowed_frame, sizeof(float));
+    c2r(windowed_frame, s->fft_size);
+    calculate_mask_curve(s, spectrum_buf, mask_curve);
+
+    // It would be easier to calculate the peak from the unwindowed input.
+    // This is just for consistency with the clipped peak calculateion
+    // because the inv_window zeros out samples on the edge of the window.
+    for (int i = 0; i < s->fft_size; i++)
+        orig_peak = FFMAX(orig_peak, FFABS(windowed_frame[i] * s->inv_window[i]));
+    orig_peak /= s->clip_level;
+    peak = orig_peak;
+
+    // clear clipping_delta
+    for (int i = 0; i < s->fft_size * 2; i++) {
+        clipping_delta[i] = 0.f;
+    }
+
+    // repeat clipping-filtering process a few times to control both the peaks and the spectrum
+    for (int i = 0; i < s->iterations; i++) {
+        float mask_curve_shift = 1.122f; // 1.122 is 1dB
+        // The last 1/3 of rounds have boosted delta to help reach the peak target faster
+        float delta_boost = 1.f;
+        if (i >= s->iterations - s->iterations / 3) {
+            // boosting the delta when largs peaks are still present is dangerous
+            if (peak < 2.f)
+                delta_boost = 2.f;
+        }
+
+        clip_to_window(s, windowed_frame, clipping_delta, delta_boost);
+
+        r2c(clipping_delta, s->fft_size);
+        s->tx_fn(s->tx_ctx, spectrum_buf, clipping_delta, sizeof(float));
+
+        limit_clip_spectrum(s, spectrum_buf, mask_curve);
+
+        s->itx_fn(s->itx_ctx, clipping_delta, spectrum_buf, sizeof(float));
+        c2r(clipping_delta, s->fft_size);
+
+        for (int i = 0; i < s->fft_size; i++)
+            clipping_delta[i] /= s->fft_size;
+
+        peak = 0;
+        for (int i = 0; i < s->fft_size; i++)
+            peak = FFMAX(peak, FFABS((windowed_frame[i] + clipping_delta[i]) * s->inv_window[i]));
+        peak /= s->clip_level;
+
+        // Automatically adjust mask_curve as necessary to reach peak target
+        if (orig_peak > 1.f && peak > 1.f) {
+            float diff_achieved = orig_peak - peak;
+            if (i + 1 < s->iterations - s->iterations / 3 && diff_achieved > 0) {
+                float diff_needed = orig_peak - 1.f;
+                float diff_ratio = diff_needed / diff_achieved;
+                // If a good amount of peak reduction was already achieved,
+                // don't shift the mask_curve by the full peak value
+                // On the other hand, if only a little peak reduction was achieved,
+                // don't shift the mask_curve by the enormous diff_ratio.
+                diff_ratio = FFMIN(diff_ratio, peak);
+                mask_curve_shift = FFMAX(mask_curve_shift, diff_ratio);
+            } else {
+                // If the peak got higher than the input or we are in the last 1/3 rounds,
+                // go back to the heavy-handed peak heuristic.
+                mask_curve_shift = FFMAX(mask_curve_shift, peak);
+            }
+        }
+
+        mask_curve_shift = 1.f + (mask_curve_shift - 1.f) * s->adaptive_distortion;
+
+        // Be less strict in the next iteration.
+        // This helps with peak control.
+        for (int i = 0; i < s->fft_size / 2 + 1; i++)
+            mask_curve[i] *= mask_curve_shift;
+    }
+
+    // do overlap & add
+    apply_window(s, clipping_delta, out_dist_frame, 1);
+
+    for (int i = 0; i < s->overlap; i++) {
+        out_samples[i] = out_dist_frame[i] / 1.5f;
+        // 4 times overlap with squared hanning window results in 1.5 time increase in amplitude
+        if (!diff_only)
+            out_samples[i] += in_frame[i];
+    }
+}
+
+static int filter_frame(AVFilterLink *inlink, AVFrame *in)
+{
+    AVFilterContext *ctx = inlink->dst;
+    AVFilterLink *outlink = ctx->outputs[0];
+    AudioPsyClipContext *s = ctx->priv;
+    AVFrame *out;
+    int ret;
+
+    out = ff_get_audio_buffer(outlink, s->overlap);
+    if (!out) {
+        ret = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    out->pts = in->pts;
+
+    for (int ch = 0; ch < inlink->channels; ch++) {
+        const float *src = (const float *)in->extended_data[ch];
+        float *in_buffer = (float *)s->in_buffer->extended_data[ch];
+        float *dst = (float *)out->extended_data[ch];
+
+        for (int n = 0; n < s->overlap; n++) {
+            in_buffer[n] = src[n] * s->level_in;
+        }
+
+        feed(s, in_buffer, dst, s->diff_only,
+             (float *)(s->in_frame->extended_data[ch]),
+             (float *)(s->out_dist_frame->extended_data[ch]),
+             (float *)(s->windowed_frame->extended_data[ch]),
+             (float *)(s->clipping_delta->extended_data[ch]),
+             (float *)(s->spectrum_buf->extended_data[ch]),
+             (float *)(s->mask_curve->extended_data[ch]));
+    }
+
+    ret = ff_filter_frame(outlink, out);
+    if (ret < 0)
+        goto fail;
+
+fail:
+    av_frame_free(&in);
+    return ret < 0 ? ret : 0;
+}
+
+static int activate(AVFilterContext *ctx)
+{
+    AVFilterLink *inlink = ctx->inputs[0];
+    AVFilterLink *outlink = ctx->outputs[0];
+    AudioPsyClipContext *s = ctx->priv;
+    AVFrame *in = NULL;
+    int ret = 0, status;
+    int64_t pts;
+
+    FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
+
+    ret = ff_inlink_consume_samples(inlink, s->overlap, s->overlap, &in);
+    if (ret < 0)
+        return ret;
+
+    if (ret > 0) {
+        return filter_frame(inlink, in);
+    } else if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
+        ff_outlink_set_status(outlink, status, pts);
+        return 0;
+    } else {
+        if (ff_outlink_frame_wanted(outlink))
+            ff_inlink_request_frame(inlink);
+        return 0;
+    }
+}
+
+static av_cold void uninit(AVFilterContext *ctx)
+{
+    AudioPsyClipContext *s = ctx->priv;
+
+    av_freep(&s->window);
+    av_freep(&s->inv_window);
+    av_freep(&s->spread_table);
+    av_freep(&s->spread_table_range);
+    av_freep(&s->spread_table_index);
+    av_freep(&s->margin_curve);
+
+    av_frame_free(&s->in_buffer);
+    av_frame_free(&s->out_buffer);
+    av_frame_free(&s->in_frame);
+    av_frame_free(&s->out_dist_frame);
+    av_frame_free(&s->windowed_frame);
+    av_frame_free(&s->clipping_delta);
+    av_frame_free(&s->spectrum_buf);
+    av_frame_free(&s->mask_curve);
+
+    av_tx_uninit(&s->tx_ctx);
+    av_tx_uninit(&s->itx_ctx);
+}
+
+static const AVFilterPad inputs[] = {
+    {
+        .name         = "default",
+        .type         = AVMEDIA_TYPE_AUDIO,
+        .config_props = config_input,
+    },
+    { NULL }
+};
+
+static const AVFilterPad outputs[] = {
+    {
+        .name = "default",
+        .type = AVMEDIA_TYPE_AUDIO,
+    },
+    { NULL }
+};
+
+const AVFilter ff_af_apsyclip = {
+    .name          = "apsyclip",
+    .description   = NULL_IF_CONFIG_SMALL("Audio Psychoacoustic Clipper."),
+    .query_formats = query_formats,
+    .priv_size     = sizeof(AudioPsyClipContext),
+    .priv_class    = &apsyclip_class,
+    .uninit        = uninit,
+    .inputs        = inputs,
+    .outputs       = outputs,
+    .activate      = activate,
+};
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index c6afef835f..316e2b1dda 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -65,6 +65,7 @@ extern const AVFilter ff_af_apad;
 extern const AVFilter ff_af_aperms;
 extern const AVFilter ff_af_aphaser;
 extern const AVFilter ff_af_aphaseshift;
+extern const AVFilter ff_af_apsyclip;
 extern const AVFilter ff_af_apulsator;
 extern const AVFilter ff_af_arealtime;
 extern const AVFilter ff_af_aresample;