From patchwork Sat May 14 09:44:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul B Mahol X-Patchwork-Id: 35765 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a885:b0:7f:4be2:bd17 with SMTP id ca5csp666033pzb; Sat, 14 May 2022 02:41:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxses/cc/vrscfS94nuFLCtvJfk5QmBzv1iJ1fQmaVSVF9O+UQozYmTqDPkdHweZ/XxvDu1 X-Received: by 2002:aa7:d9d9:0:b0:425:fcc7:d132 with SMTP id v25-20020aa7d9d9000000b00425fcc7d132mr2845299eds.89.1652521317286; Sat, 14 May 2022 02:41:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652521317; cv=none; d=google.com; s=arc-20160816; b=utwiQXP0Y+cCLhH9R07ZXjqvIlImM698FbV3hP4ZBOcxnOzR3PmBoPb5skag1+eKHK s2+fVrY5hkfRdiDZUnatiSMbWRLBNItlOjG5s1vlCpWnfK+cxI64sAa8nEFQg6bwlmNG Z0hVKiCokLI1U/dYXn5b5KWsNOO9jNIsPg5RxSNholOLUrLKRkaK41LNAx7NwaAC5h+u 9F8pCW4ZMy1zISQsEq32u6ZIqo+GP7dz8jkEBDcrJX29Cy2isfArLo7m6CjlkzVhP5dF tfEFixrlOLep6wJjU3ppvNM2R42EKiESb/V8PETavLVwIb1t6VvfECV9VYCksbm/3Gom S3eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=3BWcustv3p0//SW4TrxAv2ihT3dnVqbcum7PXBQR6ec=; b=r5s3L2ORfEhnvDtbdz7GeNLM+Oc5LLi26Q3e3g00B3f0qsjtqJTyNGOAEq+bI8xOv/ oSdA1mvcz4tHy2l3WXvvXvgWguk2MKZsmF5CT1Q5LG2HQOjRl5QOA8tBXcJgxBpBdcJI JCeO0PQgLtxMJSmuWxSS+w+m3qdvNUWlNJsj3/1EoQx4Zb30BQB/K72RDHdeFMJ0QX6z d9VpxkHoFGjagoDQOtrjGq8L1wUhfUgk+edqga9KqW6SKu64tDrDRVvGzQ/4ne0OVfyv 9LNfyy0TKIAGLuG6YqhbF4NYgDqLzAz0hkElYsfWZyqEuuQkalG23hserta+FaABtLfX WBVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b="SjtqxV/Y"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id l13-20020a170906794d00b006f39baaeacbsi4881516ejo.70.2022.05.14.02.41.56; Sat, 14 May 2022 02:41:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b="SjtqxV/Y"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0E90A68B3D5; Sat, 14 May 2022 12:41:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 42EB268B267 for ; Sat, 14 May 2022 12:41:46 +0300 (EEST) Received: by mail-wm1-f53.google.com with SMTP id i20-20020a05600c355400b0039456976dcaso6011558wmq.1 for ; Sat, 14 May 2022 02:41:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=O19yCYO3+umvQp+rHaG/Gu53Y0q9HXnclmFs/QGEHC4=; b=SjtqxV/YHQfrUBnkiJBhyZXTMcZ+iQug77thw+yZOOzgPS7g3B+qKbFrdecnj/5hu/ XPh/G4lixrSzqgv1srfssu24XmWIZud1qFmpQrUzVMKP/yq63ujb+tbXgQxUYuTq3mGW Rf1u6gCrlr8J8XHbUqKjbZJQP2W6lYH+qNGoOLJXrc4TeOuQVcq1t0WEtnVpC+vviUUW WvND6WqQ049L6QRWXIS+IkjsSpVevuX/mxZRE8BoTp93l95KrPqjeLk4QI2MK2ZYmSrw 2+TujuNr7/Gz/M+HDGkGCqPEfbbb94+2kxwR/iGibUiP2wClJLsAmcsJAKR+okxlpykT DLDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=O19yCYO3+umvQp+rHaG/Gu53Y0q9HXnclmFs/QGEHC4=; b=wE4f9zMTUK5hD3htKVuXPedPfqpdaUeI5M0oCDKy9UnH2iOdVTyOBJfSNuMPrP4aMS /QBth9UzmjaMDKS97jMlJyHQvHEbJuYBt/ch1a5/RLtvbyNHsg4VYpTqE4lHnyOo5Wsb bovWLQM1iisV7Pqrc1SQ+4oHCPLGvebv/pf3bTZg43G7N1IVWKBRYUr3tCji16YWMhjh Z9rJEGWRjpPt6orlvSPDOC2AwBtdw31ZMPuXNER29CiKgHlVcBj7ZsewLCIlyGO+B6DK FyOOc2+Xlg728u12Tu6uZyauIbI9nzkSSp0oOsz7CUkNrs+FFJmkprpQK7CdJE9TA2ZE C2Jg== X-Gm-Message-State: AOAM530wFjVJlnPCVFK1k2wQ+658FMzzCMw6LRk+2VmoBcT/OYU6E8qa fxsS5CBVxHvPvbvtlTXT2bcNmiku1Y4= X-Received: by 2002:a1c:4c06:0:b0:394:65c4:bd03 with SMTP id z6-20020a1c4c06000000b0039465c4bd03mr8217017wmf.8.1652521305299; Sat, 14 May 2022 02:41:45 -0700 (PDT) Received: from localhost.localdomain ([95.168.116.18]) by smtp.gmail.com with ESMTPSA id g18-20020a05600c141200b003942a244f42sm9890915wmi.27.2022.05.14.02.41.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 May 2022 02:41:44 -0700 (PDT) From: Paul B Mahol To: ffmpeg-devel@ffmpeg.org Date: Sat, 14 May 2022 11:44:24 +0200 Message-Id: <20220514094424.203626-1-onemda@gmail.com> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avfilter/af_afir: add support for double sample format X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ey9aFfyN+j/E Signed-off-by: Paul B Mahol --- doc/filters.texi | 16 ++ libavfilter/af_afir.c | 511 +++++++----------------------------- libavfilter/af_afir.h | 99 +++++++ libavfilter/af_afirdsp.h | 20 ++ libavfilter/afir_template.c | 392 +++++++++++++++++++++++++++ 5 files changed, 623 insertions(+), 415 deletions(-) create mode 100644 libavfilter/af_afir.h create mode 100644 libavfilter/afir_template.c diff --git a/doc/filters.texi b/doc/filters.texi index 45ebcccf1c..da63403848 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -1639,6 +1639,22 @@ Allowed range is from @var{1} to @var{32}. Default is @var{1}. Set IR stream which will be used for convolution, starting from @var{0}, should always be lower than supplied value by @code{nbirs} option. Default is @var{0}. This option can be changed at runtime via @ref{commands}. + +@item precision +Set which precision to use when processing samples. + +@table @option +@item auto +Auto pick internal sample format depending on other filters. + +@item float +Always use single-floating point precision sample format. + +@item double +Always use double-floating point precision sample format. +@end table + +Default value is auto. @end table @subsection Examples diff --git a/libavfilter/af_afir.c b/libavfilter/af_afir.c index 301553575f..e1fe7d6a64 100644 --- a/libavfilter/af_afir.c +++ b/libavfilter/af_afir.c @@ -42,208 +42,78 @@ #include "filters.h" #include "formats.h" #include "internal.h" +#include "af_afir.h" #include "af_afirdsp.h" -typedef struct AudioFIRSegment { - int nb_partitions; - int part_size; - int block_size; - int fft_length; - int coeff_size; - int input_size; - int input_offset; - - int *output_offset; - int *part_index; - - AVFrame *sumin; - AVFrame *sumout; - AVFrame *blockin; - AVFrame *blockout; - AVFrame *buffer; - AVFrame *coeff; - AVFrame *input; - AVFrame *output; - - AVTXContext **tx, **itx; - av_tx_fn tx_fn, itx_fn; -} AudioFIRSegment; - -typedef struct AudioFIRContext { - const AVClass *class; - - float wet_gain; - float dry_gain; - float length; - int gtype; - float ir_gain; - int ir_format; - float max_ir_len; - int response; - int w, h; - AVRational frame_rate; - int ir_channel; - int minp; - int maxp; - int nb_irs; - int selir; - - float gain; - - int eof_coeffs[32]; - int have_coeffs; - int nb_taps; - int nb_channels; - int nb_coef_channels; - int one2many; - - AudioFIRSegment seg[1024]; - int nb_segments; - - AVFrame *in; - AVFrame *ir[32]; - AVFrame *video; - int min_part_size; - int64_t pts; +static void drawtext(AVFrame *pic, int x, int y, const char *txt, uint32_t color) +{ + const uint8_t *font; + int font_height; + int i; - AudioFIRDSPContext afirdsp; - AVFloatDSPContext *fdsp; -} AudioFIRContext; + font = avpriv_cga_font, font_height = 8; -static void direct(const float *in, const AVComplexFloat *ir, int len, float *out) -{ - for (int n = 0; n < len; n++) - for (int m = 0; m <= n; m++) - out[n] += ir[m].re * in[n - m]; -} + for (i = 0; txt[i]; i++) { + int char_y, mask; -static void fir_fadd(AudioFIRContext *s, float *dst, const float *src, int nb_samples) -{ - if ((nb_samples & 15) == 0 && nb_samples >= 16) { - s->fdsp->vector_fmac_scalar(dst, src, 1.f, nb_samples); - } else { - for (int n = 0; n < nb_samples; n++) - dst[n] += src[n]; + uint8_t *p = pic->data[0] + y * pic->linesize[0] + (x + i * 8) * 4; + for (char_y = 0; char_y < font_height; char_y++) { + for (mask = 0x80; mask; mask >>= 1) { + if (font[txt[i] * font_height + char_y] & mask) + AV_WL32(p, color); + p += 4; + } + p += pic->linesize[0] - 8 * 4; + } } } -static int fir_quantum(AVFilterContext *ctx, AVFrame *out, int ch, int offset) +static void draw_line(AVFrame *out, int x0, int y0, int x1, int y1, uint32_t color) { - AudioFIRContext *s = ctx->priv; - const float *in = (const float *)s->in->extended_data[ch] + offset; - float *blockin, *blockout, *buf, *ptr = (float *)out->extended_data[ch] + offset; - const int nb_samples = FFMIN(s->min_part_size, out->nb_samples - offset); - int n, i, j; - - for (int segment = 0; segment < s->nb_segments; segment++) { - AudioFIRSegment *seg = &s->seg[segment]; - float *src = (float *)seg->input->extended_data[ch]; - float *dst = (float *)seg->output->extended_data[ch]; - float *sumin = (float *)seg->sumin->extended_data[ch]; - float *sumout = (float *)seg->sumout->extended_data[ch]; - - if (s->min_part_size >= 8) { - s->fdsp->vector_fmul_scalar(src + seg->input_offset, in, s->dry_gain, FFALIGN(nb_samples, 4)); - emms_c(); - } else { - for (n = 0; n < nb_samples; n++) - src[seg->input_offset + n] = in[n] * s->dry_gain; - } - - seg->output_offset[ch] += s->min_part_size; - if (seg->output_offset[ch] == seg->part_size) { - seg->output_offset[ch] = 0; - } else { - memmove(src, src + s->min_part_size, (seg->input_size - s->min_part_size) * sizeof(*src)); - - dst += seg->output_offset[ch]; - fir_fadd(s, ptr, dst, nb_samples); - continue; - } - - if (seg->part_size < 8) { - memset(dst, 0, sizeof(*dst) * seg->part_size * seg->nb_partitions); - - j = seg->part_index[ch]; - - for (i = 0; i < seg->nb_partitions; i++) { - const int coffset = j * seg->coeff_size; - const AVComplexFloat *coeff = (const AVComplexFloat *)seg->coeff->extended_data[ch * !s->one2many] + coffset; - - direct(src, coeff, nb_samples, dst); + int dx = FFABS(x1-x0); + int dy = FFABS(y1-y0), sy = y0 < y1 ? 1 : -1; + int err = (dx>dy ? dx : -dy) / 2, e2; - if (j == 0) - j = seg->nb_partitions; - j--; - } + for (;;) { + AV_WL32(out->data[0] + y0 * out->linesize[0] + x0 * 4, color); - seg->part_index[ch] = (seg->part_index[ch] + 1) % seg->nb_partitions; + if (x0 == x1 && y0 == y1) + break; - memmove(src, src + s->min_part_size, (seg->input_size - s->min_part_size) * sizeof(*src)); + e2 = err; - for (n = 0; n < nb_samples; n++) { - ptr[n] += dst[n]; - } - continue; + if (e2 >-dx) { + err -= dy; + x0--; } - memset(sumin, 0, sizeof(*sumin) * seg->fft_length); - blockin = (float *)seg->blockin->extended_data[ch] + seg->part_index[ch] * seg->block_size; - blockout = (float *)seg->blockout->extended_data[ch] + seg->part_index[ch] * seg->block_size; - memset(blockin + seg->part_size, 0, sizeof(*blockin) * (seg->fft_length - seg->part_size)); - - memcpy(blockin, src, sizeof(*src) * seg->part_size); - - seg->tx_fn(seg->tx[ch], blockout, blockin, sizeof(float)); - - j = seg->part_index[ch]; - - for (i = 0; i < seg->nb_partitions; i++) { - const int coffset = j * seg->coeff_size; - const float *blockout = (const float *)seg->blockout->extended_data[ch] + i * seg->block_size; - const AVComplexFloat *coeff = (const AVComplexFloat *)seg->coeff->extended_data[ch * !s->one2many] + coffset; - - s->afirdsp.fcmul_add(sumin, blockout, (const float *)coeff, seg->part_size); - - if (j == 0) - j = seg->nb_partitions; - j--; + if (e2 < dy) { + err += dx; + y0 += sy; } - - seg->itx_fn(seg->itx[ch], sumout, sumin, sizeof(float)); - - buf = (float *)seg->buffer->extended_data[ch]; - fir_fadd(s, buf, sumout, seg->part_size); - - memcpy(dst, buf, seg->part_size * sizeof(*dst)); - - buf = (float *)seg->buffer->extended_data[ch]; - memcpy(buf, sumout + seg->part_size, seg->part_size * sizeof(*buf)); - - seg->part_index[ch] = (seg->part_index[ch] + 1) % seg->nb_partitions; - - memmove(src, src + s->min_part_size, (seg->input_size - s->min_part_size) * sizeof(*src)); - - fir_fadd(s, ptr, dst, nb_samples); } +} - if (s->min_part_size >= 8) { - s->fdsp->vector_fmul_scalar(ptr, ptr, s->wet_gain, FFALIGN(nb_samples, 4)); - emms_c(); - } else { - for (n = 0; n < nb_samples; n++) - ptr[n] *= s->wet_gain; - } +#define DEPTH 32 +#include "afir_template.c" - return 0; -} +#undef DEPTH +#define DEPTH 64 +#include "afir_template.c" static int fir_channel(AVFilterContext *ctx, AVFrame *out, int ch) { AudioFIRContext *s = ctx->priv; for (int offset = 0; offset < out->nb_samples; offset += s->min_part_size) { - fir_quantum(ctx, out, ch, offset); + switch (s->format) { + case AV_SAMPLE_FMT_FLTP: + fir_quantum_float(ctx, out, ch, offset); + break; + case AV_SAMPLE_FMT_DBLP: + fir_quantum_double(ctx, out, ch, offset); + break; + } } return 0; @@ -284,144 +154,6 @@ static int fir_frame(AudioFIRContext *s, AVFrame *in, AVFilterLink *outlink) return ff_filter_frame(outlink, out); } -static void drawtext(AVFrame *pic, int x, int y, const char *txt, uint32_t color) -{ - const uint8_t *font; - int font_height; - int i; - - font = avpriv_cga_font, font_height = 8; - - for (i = 0; txt[i]; i++) { - int char_y, mask; - - uint8_t *p = pic->data[0] + y * pic->linesize[0] + (x + i * 8) * 4; - for (char_y = 0; char_y < font_height; char_y++) { - for (mask = 0x80; mask; mask >>= 1) { - if (font[txt[i] * font_height + char_y] & mask) - AV_WL32(p, color); - p += 4; - } - p += pic->linesize[0] - 8 * 4; - } - } -} - -static void draw_line(AVFrame *out, int x0, int y0, int x1, int y1, uint32_t color) -{ - int dx = FFABS(x1-x0); - int dy = FFABS(y1-y0), sy = y0 < y1 ? 1 : -1; - int err = (dx>dy ? dx : -dy) / 2, e2; - - for (;;) { - AV_WL32(out->data[0] + y0 * out->linesize[0] + x0 * 4, color); - - if (x0 == x1 && y0 == y1) - break; - - e2 = err; - - if (e2 >-dx) { - err -= dy; - x0--; - } - - if (e2 < dy) { - err += dx; - y0 += sy; - } - } -} - -static void draw_response(AVFilterContext *ctx, AVFrame *out) -{ - AudioFIRContext *s = ctx->priv; - float *mag, *phase, *delay, min = FLT_MAX, max = FLT_MIN; - float min_delay = FLT_MAX, max_delay = FLT_MIN; - int prev_ymag = -1, prev_yphase = -1, prev_ydelay = -1; - char text[32]; - int channel, i, x; - - memset(out->data[0], 0, s->h * out->linesize[0]); - - phase = av_malloc_array(s->w, sizeof(*phase)); - mag = av_malloc_array(s->w, sizeof(*mag)); - delay = av_malloc_array(s->w, sizeof(*delay)); - if (!mag || !phase || !delay) - goto end; - - channel = av_clip(s->ir_channel, 0, s->ir[s->selir]->ch_layout.nb_channels - 1); - for (i = 0; i < s->w; i++) { - const float *src = (const float *)s->ir[s->selir]->extended_data[channel]; - double w = i * M_PI / (s->w - 1); - double div, real_num = 0., imag_num = 0., real = 0., imag = 0.; - - for (x = 0; x < s->nb_taps; x++) { - real += cos(-x * w) * src[x]; - imag += sin(-x * w) * src[x]; - real_num += cos(-x * w) * src[x] * x; - imag_num += sin(-x * w) * src[x] * x; - } - - mag[i] = hypot(real, imag); - phase[i] = atan2(imag, real); - div = real * real + imag * imag; - delay[i] = (real_num * real + imag_num * imag) / div; - min = fminf(min, mag[i]); - max = fmaxf(max, mag[i]); - min_delay = fminf(min_delay, delay[i]); - max_delay = fmaxf(max_delay, delay[i]); - } - - for (i = 0; i < s->w; i++) { - int ymag = mag[i] / max * (s->h - 1); - int ydelay = (delay[i] - min_delay) / (max_delay - min_delay) * (s->h - 1); - int yphase = (0.5 * (1. + phase[i] / M_PI)) * (s->h - 1); - - ymag = s->h - 1 - av_clip(ymag, 0, s->h - 1); - yphase = s->h - 1 - av_clip(yphase, 0, s->h - 1); - ydelay = s->h - 1 - av_clip(ydelay, 0, s->h - 1); - - if (prev_ymag < 0) - prev_ymag = ymag; - if (prev_yphase < 0) - prev_yphase = yphase; - if (prev_ydelay < 0) - prev_ydelay = ydelay; - - draw_line(out, i, ymag, FFMAX(i - 1, 0), prev_ymag, 0xFFFF00FF); - draw_line(out, i, yphase, FFMAX(i - 1, 0), prev_yphase, 0xFF00FF00); - draw_line(out, i, ydelay, FFMAX(i - 1, 0), prev_ydelay, 0xFF00FFFF); - - prev_ymag = ymag; - prev_yphase = yphase; - prev_ydelay = ydelay; - } - - if (s->w > 400 && s->h > 100) { - drawtext(out, 2, 2, "Max Magnitude:", 0xDDDDDDDD); - snprintf(text, sizeof(text), "%.2f", max); - drawtext(out, 15 * 8 + 2, 2, text, 0xDDDDDDDD); - - drawtext(out, 2, 12, "Min Magnitude:", 0xDDDDDDDD); - snprintf(text, sizeof(text), "%.2f", min); - drawtext(out, 15 * 8 + 2, 12, text, 0xDDDDDDDD); - - drawtext(out, 2, 22, "Max Delay:", 0xDDDDDDDD); - snprintf(text, sizeof(text), "%.2f", max_delay); - drawtext(out, 11 * 8 + 2, 22, text, 0xDDDDDDDD); - - drawtext(out, 2, 32, "Min Delay:", 0xDDDDDDDD); - snprintf(text, sizeof(text), "%.2f", min_delay); - drawtext(out, 11 * 8 + 2, 32, text, 0xDDDDDDDD); - } - -end: - av_free(delay); - av_free(phase); - av_free(mag); -} - static int init_segment(AVFilterContext *ctx, AudioFIRSegment *seg, int offset, int nb_partitions, int part_size) { @@ -446,9 +178,20 @@ static int init_segment(AVFilterContext *ctx, AudioFIRSegment *seg, return AVERROR(ENOMEM); for (int ch = 0; ch < ctx->inputs[0]->ch_layout.nb_channels && part_size >= 8; ch++) { - float scale = 1.f, iscale = 1.f / part_size; - av_tx_init(&seg->tx[ch], &seg->tx_fn, AV_TX_FLOAT_RDFT, 0, 2 * part_size, &scale, 0); - av_tx_init(&seg->itx[ch], &seg->itx_fn, AV_TX_FLOAT_RDFT, 1, 2 * part_size, &iscale, 0); + double dscale = 1.0, idscale = 1.0 / part_size; + float fscale = 1.f, ifscale = 1.f / part_size; + + switch (s->format) { + case AV_SAMPLE_FMT_FLTP: + av_tx_init(&seg->tx[ch], &seg->tx_fn, AV_TX_FLOAT_RDFT, 0, 2 * part_size, &fscale, 0); + av_tx_init(&seg->itx[ch], &seg->itx_fn, AV_TX_FLOAT_RDFT, 1, 2 * part_size, &ifscale, 0); + break; + case AV_SAMPLE_FMT_DBLP: + av_tx_init(&seg->tx[ch], &seg->tx_fn, AV_TX_DOUBLE_RDFT, 0, 2 * part_size, &dscale, 0); + av_tx_init(&seg->itx[ch], &seg->itx_fn, AV_TX_DOUBLE_RDFT, 1, 2 * part_size, &idscale, 0); + break; + } + if (!seg->tx[ch] || !seg->itx[ch]) return AVERROR(ENOMEM); } @@ -502,8 +245,7 @@ static void uninit_segment(AVFilterContext *ctx, AudioFIRSegment *seg) static int convert_coeffs(AVFilterContext *ctx) { AudioFIRContext *s = ctx->priv; - int ret, i, ch, n, cur_nb_taps; - float power = 0; + int ret, i, cur_nb_taps; if (!s->nb_taps) { int part_size, max_part_size; @@ -546,109 +288,42 @@ static int convert_coeffs(AVFilterContext *ctx) return AVERROR_BUG; } - if (s->response) - draw_response(ctx, s->video); + if (s->response) { + switch (s->format) { + case AV_SAMPLE_FMT_FLTP: + draw_response_float(ctx, s->video); + break; + case AV_SAMPLE_FMT_DBLP: + draw_response_double(ctx, s->video); + break; + } + } s->gain = 1; cur_nb_taps = s->ir[s->selir]->nb_samples; - switch (s->gtype) { - case -1: - /* nothing to do */ + switch (s->format) { + case AV_SAMPLE_FMT_FLTP: + ret = get_power_float(ctx, s, cur_nb_taps); break; - case 0: - for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { - float *time = (float *)s->ir[s->selir]->extended_data[!s->one2many * ch]; - - for (i = 0; i < cur_nb_taps; i++) - power += FFABS(time[i]); - } - s->gain = ctx->inputs[1 + s->selir]->ch_layout.nb_channels / power; + case AV_SAMPLE_FMT_DBLP: + ret = get_power_double(ctx, s, cur_nb_taps); break; - case 1: - for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { - float *time = (float *)s->ir[s->selir]->extended_data[!s->one2many * ch]; - - for (i = 0; i < cur_nb_taps; i++) - power += time[i]; - } - s->gain = ctx->inputs[1 + s->selir]->ch_layout.nb_channels / power; - break; - case 2: - for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { - float *time = (float *)s->ir[s->selir]->extended_data[!s->one2many * ch]; - - for (i = 0; i < cur_nb_taps; i++) - power += time[i] * time[i]; - } - s->gain = sqrtf(ch / power); - break; - default: - return AVERROR_BUG; } - s->gain = FFMIN(s->gain * s->ir_gain, 1.f); - av_log(ctx, AV_LOG_DEBUG, "power %f, gain %f\n", power, s->gain); - for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { - float *time = (float *)s->ir[s->selir]->extended_data[!s->one2many * ch]; - - s->fdsp->vector_fmul_scalar(time, time, s->gain, FFALIGN(cur_nb_taps, 4)); - } + if (ret < 0) + return ret; av_log(ctx, AV_LOG_DEBUG, "nb_taps: %d\n", cur_nb_taps); av_log(ctx, AV_LOG_DEBUG, "nb_segments: %d\n", s->nb_segments); - for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { - float *time = (float *)s->ir[s->selir]->extended_data[!s->one2many * ch]; - int toffset = 0; - - for (i = FFMAX(1, s->length * s->nb_taps); i < s->nb_taps; i++) - time[i] = 0; - - av_log(ctx, AV_LOG_DEBUG, "channel: %d\n", ch); - - for (int segment = 0; segment < s->nb_segments; segment++) { - AudioFIRSegment *seg = &s->seg[segment]; - float *blockin = (float *)seg->blockin->extended_data[ch]; - float *blockout = (float *)seg->blockout->extended_data[ch]; - AVComplexFloat *coeff = (AVComplexFloat *)seg->coeff->extended_data[ch]; - - av_log(ctx, AV_LOG_DEBUG, "segment: %d\n", segment); - - for (i = 0; i < seg->nb_partitions; i++) { - const int coffset = i * seg->coeff_size; - const int remaining = s->nb_taps - toffset; - const int size = remaining >= seg->part_size ? seg->part_size : remaining; - - if (size < 8) { - for (n = 0; n < size; n++) - coeff[coffset + n].re = time[toffset + n]; - - toffset += size; - continue; - } - - memset(blockin, 0, sizeof(*blockin) * seg->fft_length); - memcpy(blockin, time + toffset, size * sizeof(*blockin)); - - seg->tx_fn(seg->tx[0], blockout, blockin, sizeof(float)); - - for (n = 0; n < seg->part_size + 1; n++) { - coeff[coffset + n].re = blockout[2 * n]; - coeff[coffset + n].im = blockout[2 * n + 1]; - } - - toffset += size; - } - - av_log(ctx, AV_LOG_DEBUG, "nb_partitions: %d\n", seg->nb_partitions); - av_log(ctx, AV_LOG_DEBUG, "partition size: %d\n", seg->part_size); - av_log(ctx, AV_LOG_DEBUG, "block size: %d\n", seg->block_size); - av_log(ctx, AV_LOG_DEBUG, "fft_length: %d\n", seg->fft_length); - av_log(ctx, AV_LOG_DEBUG, "coeff_size: %d\n", seg->coeff_size); - av_log(ctx, AV_LOG_DEBUG, "input_size: %d\n", seg->input_size); - av_log(ctx, AV_LOG_DEBUG, "input_offset: %d\n", seg->input_offset); - } + switch (s->format) { + case AV_SAMPLE_FMT_FLTP: + convert_channels_float(ctx, s); + break; + case AV_SAMPLE_FMT_DBLP: + convert_channels_double(ctx, s); + break; } s->have_coeffs = 1; @@ -762,9 +437,10 @@ static int activate(AVFilterContext *ctx) static int query_formats(AVFilterContext *ctx) { AudioFIRContext *s = ctx->priv; - static const enum AVSampleFormat sample_fmts[] = { - AV_SAMPLE_FMT_FLTP, - AV_SAMPLE_FMT_NONE + static const enum AVSampleFormat sample_fmts[3][3] = { + { AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_DBLP, AV_SAMPLE_FMT_NONE }, + { AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_NONE }, + { AV_SAMPLE_FMT_DBLP, AV_SAMPLE_FMT_NONE }, }; static const enum AVPixelFormat pix_fmts[] = { AV_PIX_FMT_RGB0, @@ -801,7 +477,7 @@ static int query_formats(AVFilterContext *ctx) } } - if ((ret = ff_set_common_formats_from_list(ctx, sample_fmts)) < 0) + if ((ret = ff_set_common_formats_from_list(ctx, sample_fmts[s->precision])) < 0) return ret; return ff_set_common_all_samplerates(ctx); @@ -827,6 +503,7 @@ FF_ENABLE_DEPRECATION_WARNINGS s->nb_channels = outlink->ch_layout.nb_channels; s->nb_coef_channels = ctx->inputs[1 + s->selir]->ch_layout.nb_channels; + s->format = outlink->format; return 0; } @@ -977,6 +654,10 @@ static const AVOption afir_options[] = { { "maxp", "set max partition size", OFFSET(maxp), AV_OPT_TYPE_INT, {.i64=8192}, 8, 32768, AF }, { "nbirs", "set number of input IRs",OFFSET(nb_irs),AV_OPT_TYPE_INT, {.i64=1}, 1, 32, AF }, { "ir", "select IR", OFFSET(selir), AV_OPT_TYPE_INT, {.i64=0}, 0, 31, AFR }, + { "precision", "set processing precision", OFFSET(precision), AV_OPT_TYPE_INT, {.i64=0}, 0, 2, AF, "precision" }, + { "auto", "set auto processing precision", 0, AV_OPT_TYPE_CONST, {.i64=0}, 0, 0, AF, "precision" }, + { "float", "set single-floating point processing precision", 0, AV_OPT_TYPE_CONST, {.i64=1}, 0, 0, AF, "precision" }, + { "double","set double-floating point processing precision", 0, AV_OPT_TYPE_CONST, {.i64=2}, 0, 0, AF, "precision" }, { NULL } }; diff --git a/libavfilter/af_afir.h b/libavfilter/af_afir.h new file mode 100644 index 0000000000..cf59baf55e --- /dev/null +++ b/libavfilter/af_afir.h @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2017 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVFILTER_AF_AFIR_H +#define AVFILTER_AF_AFIR_H + +#include "libavutil/float_dsp.h" +#include "libavutil/frame.h" +#include "libavutil/rational.h" +#include "libavutil/tx.h" +#include "avfilter.h" +#include "af_afirdsp.h" + +typedef struct AudioFIRSegment { + int nb_partitions; + int part_size; + int block_size; + int fft_length; + int coeff_size; + int input_size; + int input_offset; + + int *output_offset; + int *part_index; + + AVFrame *sumin; + AVFrame *sumout; + AVFrame *blockin; + AVFrame *blockout; + AVFrame *buffer; + AVFrame *coeff; + AVFrame *input; + AVFrame *output; + + AVTXContext **tx, **itx; + av_tx_fn tx_fn, itx_fn; +} AudioFIRSegment; + +typedef struct AudioFIRContext { + const AVClass *class; + + float wet_gain; + float dry_gain; + float length; + int gtype; + float ir_gain; + int ir_format; + float max_ir_len; + int response; + int w, h; + AVRational frame_rate; + int ir_channel; + int minp; + int maxp; + int nb_irs; + int selir; + int precision; + int format; + + double gain; + + int eof_coeffs[32]; + int have_coeffs; + int nb_taps; + int nb_channels; + int nb_coef_channels; + int one2many; + + AudioFIRSegment seg[1024]; + int nb_segments; + + AVFrame *in; + AVFrame *ir[32]; + AVFrame *video; + int min_part_size; + int64_t pts; + + AudioFIRDSPContext afirdsp; + AVFloatDSPContext *fdsp; +} AudioFIRContext; + +#endif /* AVFILTER_AF_AFIR_H */ diff --git a/libavfilter/af_afirdsp.h b/libavfilter/af_afirdsp.h index 05182bebb4..bf7d1d6f0f 100644 --- a/libavfilter/af_afirdsp.h +++ b/libavfilter/af_afirdsp.h @@ -29,6 +29,8 @@ typedef struct AudioFIRDSPContext { void (*fcmul_add)(float *sum, const float *t, const float *c, ptrdiff_t len); + void (*dcmul_add)(double *sum, const double *t, const double *c, + ptrdiff_t len); } AudioFIRDSPContext; void ff_afir_init_x86(AudioFIRDSPContext *s); @@ -50,9 +52,27 @@ static void fcmul_add_c(float *sum, const float *t, const float *c, ptrdiff_t le sum[2 * n] += t[2 * n] * c[2 * n]; } +static void dcmul_add_c(double *sum, const double *t, const double *c, ptrdiff_t len) +{ + int n; + + for (n = 0; n < len; n++) { + const double cre = c[2 * n ]; + const double cim = c[2 * n + 1]; + const double tre = t[2 * n ]; + const double tim = t[2 * n + 1]; + + sum[2 * n ] += tre * cre - tim * cim; + sum[2 * n + 1] += tre * cim + tim * cre; + } + + sum[2 * n] += t[2 * n] * c[2 * n]; +} + static av_unused void ff_afir_init(AudioFIRDSPContext *dsp) { dsp->fcmul_add = fcmul_add_c; + dsp->dcmul_add = dcmul_add_c; if (ARCH_X86) ff_afir_init_x86(dsp); diff --git a/libavfilter/afir_template.c b/libavfilter/afir_template.c new file mode 100644 index 0000000000..6cb3eb2203 --- /dev/null +++ b/libavfilter/afir_template.c @@ -0,0 +1,392 @@ +/* + * Copyright (c) 2017 Paul B Mahol + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "avfilter.h" +#include "formats.h" +#include "internal.h" +#include "audio.h" + +#undef ctype +#undef ftype +#undef SQRT +#undef SAMPLE_FORMAT +#if DEPTH == 32 +#define SAMPLE_FORMAT float +#define SQRT sqrtf +#define ctype AVComplexFloat +#define ftype float +#else +#define SAMPLE_FORMAT double +#define SQRT sqrt +#define ctype AVComplexDouble +#define ftype double +#endif + +#define fn3(a,b) a##_##b +#define fn2(a,b) fn3(a,b) +#define fn(a) fn2(a, SAMPLE_FORMAT) + +static void fn(draw_response)(AVFilterContext *ctx, AVFrame *out) +{ + AudioFIRContext *s = ctx->priv; + ftype *mag, *phase, *delay, min = FLT_MAX, max = FLT_MIN; + ftype min_delay = FLT_MAX, max_delay = FLT_MIN; + int prev_ymag = -1, prev_yphase = -1, prev_ydelay = -1; + char text[32]; + int channel, i, x; + + memset(out->data[0], 0, s->h * out->linesize[0]); + + phase = av_malloc_array(s->w, sizeof(*phase)); + mag = av_malloc_array(s->w, sizeof(*mag)); + delay = av_malloc_array(s->w, sizeof(*delay)); + if (!mag || !phase || !delay) + goto end; + + channel = av_clip(s->ir_channel, 0, s->ir[s->selir]->ch_layout.nb_channels - 1); + for (i = 0; i < s->w; i++) { + const ftype *src = (const ftype *)s->ir[s->selir]->extended_data[channel]; + double w = i * M_PI / (s->w - 1); + double div, real_num = 0., imag_num = 0., real = 0., imag = 0.; + + for (x = 0; x < s->nb_taps; x++) { + real += cos(-x * w) * src[x]; + imag += sin(-x * w) * src[x]; + real_num += cos(-x * w) * src[x] * x; + imag_num += sin(-x * w) * src[x] * x; + } + + mag[i] = hypot(real, imag); + phase[i] = atan2(imag, real); + div = real * real + imag * imag; + delay[i] = (real_num * real + imag_num * imag) / div; + min = fminf(min, mag[i]); + max = fmaxf(max, mag[i]); + min_delay = fminf(min_delay, delay[i]); + max_delay = fmaxf(max_delay, delay[i]); + } + + for (i = 0; i < s->w; i++) { + int ymag = mag[i] / max * (s->h - 1); + int ydelay = (delay[i] - min_delay) / (max_delay - min_delay) * (s->h - 1); + int yphase = (0.5 * (1. + phase[i] / M_PI)) * (s->h - 1); + + ymag = s->h - 1 - av_clip(ymag, 0, s->h - 1); + yphase = s->h - 1 - av_clip(yphase, 0, s->h - 1); + ydelay = s->h - 1 - av_clip(ydelay, 0, s->h - 1); + + if (prev_ymag < 0) + prev_ymag = ymag; + if (prev_yphase < 0) + prev_yphase = yphase; + if (prev_ydelay < 0) + prev_ydelay = ydelay; + + draw_line(out, i, ymag, FFMAX(i - 1, 0), prev_ymag, 0xFFFF00FF); + draw_line(out, i, yphase, FFMAX(i - 1, 0), prev_yphase, 0xFF00FF00); + draw_line(out, i, ydelay, FFMAX(i - 1, 0), prev_ydelay, 0xFF00FFFF); + + prev_ymag = ymag; + prev_yphase = yphase; + prev_ydelay = ydelay; + } + + if (s->w > 400 && s->h > 100) { + drawtext(out, 2, 2, "Max Magnitude:", 0xDDDDDDDD); + snprintf(text, sizeof(text), "%.2f", max); + drawtext(out, 15 * 8 + 2, 2, text, 0xDDDDDDDD); + + drawtext(out, 2, 12, "Min Magnitude:", 0xDDDDDDDD); + snprintf(text, sizeof(text), "%.2f", min); + drawtext(out, 15 * 8 + 2, 12, text, 0xDDDDDDDD); + + drawtext(out, 2, 22, "Max Delay:", 0xDDDDDDDD); + snprintf(text, sizeof(text), "%.2f", max_delay); + drawtext(out, 11 * 8 + 2, 22, text, 0xDDDDDDDD); + + drawtext(out, 2, 32, "Min Delay:", 0xDDDDDDDD); + snprintf(text, sizeof(text), "%.2f", min_delay); + drawtext(out, 11 * 8 + 2, 32, text, 0xDDDDDDDD); + } + +end: + av_free(delay); + av_free(phase); + av_free(mag); +} + +static void fn(convert_channels)(AVFilterContext *ctx, AudioFIRContext *s) +{ + for (int ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { + ftype *time = (ftype *)s->ir[s->selir]->extended_data[!s->one2many * ch]; + int toffset = 0; + + for (int i = FFMAX(1, s->length * s->nb_taps); i < s->nb_taps; i++) + time[i] = 0; + + av_log(ctx, AV_LOG_DEBUG, "channel: %d\n", ch); + + for (int segment = 0; segment < s->nb_segments; segment++) { + AudioFIRSegment *seg = &s->seg[segment]; + ftype *blockin = (ftype *)seg->blockin->extended_data[ch]; + ftype *blockout = (ftype *)seg->blockout->extended_data[ch]; + ctype *coeff = (ctype *)seg->coeff->extended_data[ch]; + + av_log(ctx, AV_LOG_DEBUG, "segment: %d\n", segment); + + for (int i = 0; i < seg->nb_partitions; i++) { + const int coffset = i * seg->coeff_size; + const int remaining = s->nb_taps - toffset; + const int size = remaining >= seg->part_size ? seg->part_size : remaining; + + if (size < 8) { + for (int n = 0; n < size; n++) + coeff[coffset + n].re = time[toffset + n]; + + toffset += size; + continue; + } + + memset(blockin, 0, sizeof(*blockin) * seg->fft_length); + memcpy(blockin, time + toffset, size * sizeof(*blockin)); + + seg->tx_fn(seg->tx[0], blockout, blockin, sizeof(ftype)); + + for (int n = 0; n < seg->part_size + 1; n++) { + coeff[coffset + n].re = blockout[2 * n]; + coeff[coffset + n].im = blockout[2 * n + 1]; + } + + toffset += size; + } + + av_log(ctx, AV_LOG_DEBUG, "nb_partitions: %d\n", seg->nb_partitions); + av_log(ctx, AV_LOG_DEBUG, "partition size: %d\n", seg->part_size); + av_log(ctx, AV_LOG_DEBUG, "block size: %d\n", seg->block_size); + av_log(ctx, AV_LOG_DEBUG, "fft_length: %d\n", seg->fft_length); + av_log(ctx, AV_LOG_DEBUG, "coeff_size: %d\n", seg->coeff_size); + av_log(ctx, AV_LOG_DEBUG, "input_size: %d\n", seg->input_size); + av_log(ctx, AV_LOG_DEBUG, "input_offset: %d\n", seg->input_offset); + } + } +} + +static int fn(get_power)(AVFilterContext *ctx, AudioFIRContext *s, int cur_nb_taps) +{ + ftype power = 0; + int ch; + + switch (s->gtype) { + case -1: + /* nothing to do */ + break; + case 0: + for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { + ftype *time = (ftype *)s->ir[s->selir]->extended_data[!s->one2many * ch]; + + for (int i = 0; i < cur_nb_taps; i++) + power += FFABS(time[i]); + } + s->gain = ctx->inputs[1 + s->selir]->ch_layout.nb_channels / power; + break; + case 1: + for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { + ftype *time = (ftype *)s->ir[s->selir]->extended_data[!s->one2many * ch]; + + for (int i = 0; i < cur_nb_taps; i++) + power += time[i]; + } + s->gain = ctx->inputs[1 + s->selir]->ch_layout.nb_channels / power; + break; + case 2: + for (ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { + ftype *time = (ftype *)s->ir[s->selir]->extended_data[!s->one2many * ch]; + + for (int i = 0; i < cur_nb_taps; i++) + power += time[i] * time[i]; + } + s->gain = SQRT(ch / power); + break; + default: + return AVERROR_BUG; + } + + s->gain = FFMIN(s->gain * s->ir_gain, 1.); + + av_log(ctx, AV_LOG_DEBUG, "power %f, gain %f\n", power, s->gain); + + for (int ch = 0; ch < ctx->inputs[1 + s->selir]->ch_layout.nb_channels; ch++) { + ftype *time = (ftype *)s->ir[s->selir]->extended_data[!s->one2many * ch]; + +#if DEPTH == 32 + s->fdsp->vector_fmul_scalar(time, time, s->gain, FFALIGN(cur_nb_taps, 4)); +#else + s->fdsp->vector_dmul_scalar(time, time, s->gain, FFALIGN(cur_nb_taps, 8)); +#endif + } + + return 0; +} + +static void fn(direct)(const ftype *in, const ctype *ir, int len, ftype *out) +{ + for (int n = 0; n < len; n++) + for (int m = 0; m <= n; m++) + out[n] += ir[m].re * in[n - m]; +} + +static void fn(fir_fadd)(AudioFIRContext *s, ftype *dst, const ftype *src, int nb_samples) +{ + if ((nb_samples & 15) == 0 && nb_samples >= 16) { +#if DEPTH == 32 + s->fdsp->vector_fmac_scalar(dst, src, 1.f, nb_samples); +#else + s->fdsp->vector_dmac_scalar(dst, src, 1.0, nb_samples); +#endif + } else { + for (int n = 0; n < nb_samples; n++) + dst[n] += src[n]; + } +} + +static int fn(fir_quantum)(AVFilterContext *ctx, AVFrame *out, int ch, int offset) +{ + AudioFIRContext *s = ctx->priv; + const ftype *in = (const ftype *)s->in->extended_data[ch] + offset; + ftype *blockin, *blockout, *buf, *ptr = (ftype *)out->extended_data[ch] + offset; + const int nb_samples = FFMIN(s->min_part_size, out->nb_samples - offset); + int n, i, j; + + for (int segment = 0; segment < s->nb_segments; segment++) { + AudioFIRSegment *seg = &s->seg[segment]; + ftype *src = (ftype *)seg->input->extended_data[ch]; + ftype *dst = (ftype *)seg->output->extended_data[ch]; + ftype *sumin = (ftype *)seg->sumin->extended_data[ch]; + ftype *sumout = (ftype *)seg->sumout->extended_data[ch]; + + if (s->min_part_size >= 8) { +#if DEPTH == 32 + s->fdsp->vector_fmul_scalar(src + seg->input_offset, in, s->dry_gain, FFALIGN(nb_samples, 4)); +#else + s->fdsp->vector_dmul_scalar(src + seg->input_offset, in, s->dry_gain, FFALIGN(nb_samples, 8)); +#endif + emms_c(); + } else { + for (n = 0; n < nb_samples; n++) + src[seg->input_offset + n] = in[n] * s->dry_gain; + } + + seg->output_offset[ch] += s->min_part_size; + if (seg->output_offset[ch] == seg->part_size) { + seg->output_offset[ch] = 0; + } else { + memmove(src, src + s->min_part_size, (seg->input_size - s->min_part_size) * sizeof(*src)); + + dst += seg->output_offset[ch]; + fn(fir_fadd)(s, ptr, dst, nb_samples); + continue; + } + + if (seg->part_size < 8) { + memset(dst, 0, sizeof(*dst) * seg->part_size * seg->nb_partitions); + + j = seg->part_index[ch]; + + for (i = 0; i < seg->nb_partitions; i++) { + const int coffset = j * seg->coeff_size; + const ctype *coeff = (const ctype *)seg->coeff->extended_data[ch * !s->one2many] + coffset; + + fn(direct)(src, coeff, nb_samples, dst); + + if (j == 0) + j = seg->nb_partitions; + j--; + } + + seg->part_index[ch] = (seg->part_index[ch] + 1) % seg->nb_partitions; + + memmove(src, src + s->min_part_size, (seg->input_size - s->min_part_size) * sizeof(*src)); + + for (n = 0; n < nb_samples; n++) { + ptr[n] += dst[n]; + } + continue; + } + + memset(sumin, 0, sizeof(*sumin) * seg->fft_length); + blockin = (ftype *)seg->blockin->extended_data[ch] + seg->part_index[ch] * seg->block_size; + blockout = (ftype *)seg->blockout->extended_data[ch] + seg->part_index[ch] * seg->block_size; + memset(blockin + seg->part_size, 0, sizeof(*blockin) * (seg->fft_length - seg->part_size)); + + memcpy(blockin, src, sizeof(*src) * seg->part_size); + + seg->tx_fn(seg->tx[ch], blockout, blockin, sizeof(ftype)); + + j = seg->part_index[ch]; + + for (i = 0; i < seg->nb_partitions; i++) { + const int coffset = j * seg->coeff_size; + const ftype *blockout = (const ftype *)seg->blockout->extended_data[ch] + i * seg->block_size; + const ctype *coeff = (const ctype *)seg->coeff->extended_data[ch * !s->one2many] + coffset; + +#if DEPTH == 32 + s->afirdsp.fcmul_add(sumin, blockout, (const ftype *)coeff, seg->part_size); +#else + s->afirdsp.dcmul_add(sumin, blockout, (const ftype *)coeff, seg->part_size); +#endif + + if (j == 0) + j = seg->nb_partitions; + j--; + } + + seg->itx_fn(seg->itx[ch], sumout, sumin, sizeof(ftype)); + + buf = (ftype *)seg->buffer->extended_data[ch]; + fn(fir_fadd)(s, buf, sumout, seg->part_size); + + memcpy(dst, buf, seg->part_size * sizeof(*dst)); + + buf = (ftype *)seg->buffer->extended_data[ch]; + memcpy(buf, sumout + seg->part_size, seg->part_size * sizeof(*buf)); + + seg->part_index[ch] = (seg->part_index[ch] + 1) % seg->nb_partitions; + + memmove(src, src + s->min_part_size, (seg->input_size - s->min_part_size) * sizeof(*src)); + + fn(fir_fadd)(s, ptr, dst, nb_samples); + } + + if (s->min_part_size >= 8) { +#if DEPTH == 32 + s->fdsp->vector_fmul_scalar(ptr, ptr, s->wet_gain, FFALIGN(nb_samples, 4)); +#else + s->fdsp->vector_dmul_scalar(ptr, ptr, s->wet_gain, FFALIGN(nb_samples, 8)); +#endif + emms_c(); + } else { + for (n = 0; n < nb_samples; n++) + ptr[n] *= s->wet_gain; + } + + return 0; +} + +