From patchwork Fri Jan 21 08:34:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 33742 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp282111iog; Fri, 21 Jan 2022 00:34:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJxHj7yQnNAbTvAFkuI5QgD4//92Om7NzQr09Hf1fUt93PJK7k4w+DBdQJJ+HtohtI5dq0Rc X-Received: by 2002:aa7:cada:: with SMTP id l26mr3414307edt.376.1642754073910; Fri, 21 Jan 2022 00:34:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642754073; cv=none; d=google.com; s=arc-20160816; b=i4Ua4rp3YwoZLjhtALX9OGgKkYfQnviInw0UhWs4sq798spqaTh1mmJCV1O3ijOws2 PurP/Eg/wtY19aRl+tt8kZE8QcZ/zOxDJnqj+YOMZ6KdixYhrQim1UqQnFP6p7Qg7zVm jlnX6cKAutmkQkTHUWcCbd1fHegCWdpKcy97lvqAeAJbcllQgMhGVn5kkwq/dlPFsyV6 xT7v6TBMcaXhhaaD7cc9i+yoUAERNKduiesxCUQaPNOG96syBqxIfr/3qev2AHn6Sijq /eXr/aRRk770vM17mfDPwdmzTAhm+78U3Ei+rvAsWj2iJPhfspANkvK7CPVx560OOCwI co6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:references:in-reply-to:message-id:to:from:date :dkim-signature:delivered-to; bh=0HeU+ufgG7fDhJPXSQWZB5iL7tHiwGQ2wbGilo1hjdQ=; b=e5DIgakqYU8GhfIdr72keQZuYWHsUCYQkdZJwh81f+DJ9hJkxZ9qyLVaZNLGJvoNZG uvuHhaRImHWTHDVd4a3kuPjwVVrOBTiGZI7h7MWptZgtEkDwGYF1rSkUXMzJZSPRhO5t Z0Th7q35H+GRCOMy6jxaytUDigIFK2YCN6xB87F7GAOOE+8QxButNzVK/kVR9Oh5XKxZ yLSEmBL5T8QPM3Uw/NApnhXQ6Jn68nIjDLsCU/dkWj2porVfNky4VdOImb+xZyOIH3Ak lVDlxYdqo0rfTrx5tzm1uH/Pj9Cmzt4eYs9ZtDze38i8Co6yf4RpfgqOa7q93ZNjRg1I FpTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=Oi1dlRLA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t22si4219935ejf.657.2022.01.21.00.34.33; Fri, 21 Jan 2022 00:34:33 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=Oi1dlRLA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DF25C68B27E; Fri, 21 Jan 2022 10:34:30 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7407668B1A8 for ; Fri, 21 Jan 2022 10:34:24 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id 22CD01060308 for ; Fri, 21 Jan 2022 08:34:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1642754064; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:References:Sender; bh=+h8FPUT/TRoevK2Wd5nrLc3IM08ndNIWZ5xF0InxFWw=; b=Oi1dlRLAUx3CEinKNkoN/sB0jb63szA13z2BG8uU9BrAw1y6BZkBY+UydV4dw93W rGlHfZkmoKnlzj/EMOlYuh+lYAgIhn0BvRrKErSbbS5MZu5xohJ10PCC/40/AxVFjpG Z4jsCjiviB99YtWAELS4OzdkolMoa98Jgj2hBOLWlDbBu17AqtCrYZWO4yL6hAdsDNl xyX5iGfdx4sDesq4BJGZuNxzOCnwWa/iotkdhVqfCMdn7y2gDBa4twG4WZ7ex5WYk4w jBC4UxBDounsTQ00Yy7qEesrDvBqtJkmtq3dbeCsM+UAqdiNruF/dzpKpaHkezrXp+S nFIJ8MB1oQ== Date: Fri, 21 Jan 2022 09:34:24 +0100 (CET) From: Lynne To: FFmpeg development discussions and patches Message-ID: In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavu/tx: add an RDFT implementation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: +PurocHKoiTq RDFTs are full of conventions that vary between implementations. What I've gone for here is what's most common between both fftw, avcodec's rdft and what we use, the equivalent of which is DFT_R2C for forward and IDFT_C2R for inverse. The other 2 conventions (IDFT_R2C and DFT_C2R) were not used at all in our code, and their names are also not appropriate. Like fftw, the input is written over with junk data. Like avcodec's rft, the output of a C2R transform is non-normalized (half the amplitude of the input). Patch attached. Subject: [PATCH 2/2] lavu/tx: add an RDFT implementation RDFTs are full of conventions that vary between implementations. What I've gone for here is what's most common between both fftw, avcodec's rdft and what we use, the equivalent of which is DFT_R2C for forward and IDFT_C2R for inverse. The other 2 conventions (IDFT_R2C and DFT_C2R) were not used at all in our code, and their names are also not appropriate. Like fftw, the input is written over with junk data. Like avcodec's rft, the output of a C2R transform is non-normalized (half the amplitude of the input). --- libavutil/tx.c | 5 +- libavutil/tx.h | 7 +++ libavutil/tx_template.c | 112 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 123 insertions(+), 1 deletion(-) diff --git a/libavutil/tx.c b/libavutil/tx.c index 28fe6c55b9..98d92cc854 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -522,10 +522,13 @@ static void print_tx_structure(AVTXContext *s, int depth) cd->type == TX_TYPE_ANY ? "all" : cd->type == AV_TX_FLOAT_FFT ? "fft_float" : cd->type == AV_TX_FLOAT_MDCT ? "mdct_float" : + cd->type == AV_TX_FLOAT_RDFT ? "rdft_float" : cd->type == AV_TX_DOUBLE_FFT ? "fft_double" : cd->type == AV_TX_DOUBLE_MDCT ? "mdct_double" : + cd->type == AV_TX_DOUBLE_RDFT ? "rdft_double" : cd->type == AV_TX_INT32_FFT ? "fft_int32" : - cd->type == AV_TX_INT32_MDCT ? "mdct_int32" : "unknown", + cd->type == AV_TX_INT32_MDCT ? "mdct_int32" : + cd->type == AV_TX_INT32_RDFT ? "rdft_double" : "unknown", s->len, cd->function); for (int i = 0; i < s->nb_sub; i++) diff --git a/libavutil/tx.h b/libavutil/tx.h index 4bc1478644..e33501d666 100644 --- a/libavutil/tx.h +++ b/libavutil/tx.h @@ -83,6 +83,13 @@ enum AVTXType { */ AV_TX_INT32_MDCT = 5, + /** + * Standard real to complex/complex to real DFT. + */ + AV_TX_FLOAT_RDFT = 6, + AV_TX_DOUBLE_RDFT = 7, + AV_TX_INT32_RDFT = 8, + /* Not part of the API, do not use */ AV_TX_NB, }; diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index bfd27799be..a021d3ec6a 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -1301,6 +1301,116 @@ DECL_COMP_MDCT(7) DECL_COMP_MDCT(9) DECL_COMP_MDCT(15) +static av_cold int TX_NAME(ff_tx_rdft_init)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) +{ + int ret; + double f; + FFTSample *tab; + + if ((ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, NULL, len >> 1, inv, scale))) + return ret; + + if (!(s->exp = av_mallocz(((len >> 2) + 1)*sizeof(*s->exp)))) + return AVERROR(ENOMEM); + + tab = (FFTSample *)s->exp; + + f = 2*M_PI/len; + + for (int i = 0; i < len >> 2; i++) + *tab++ = RESCALE(cos(i*f)); + for (int i = len >> 2; i >= 0; i--) + *tab++ = RESCALE(cos(i*f)); + + return 0; +} + +#define DECL_RDFT(name, s0, s1, inv) \ +static void TX_NAME(ff_tx_rdft_ ##name)(AVTXContext *s, void *_dst, \ + void *_src, ptrdiff_t stride) \ +{ \ + const int len2 = s->len >> 1; \ + const int len4 = s->len >> 2; \ + const FFTSample *tcos = (void *)s->exp; \ + const FFTSample *tsin = tcos + len4; \ + FFTComplex *data = inv ? _src : _dst; \ + FFTComplex t[3]; \ + \ + if (!inv) \ + s->fn[0](&s->sub[0], data, _src, sizeof(FFTComplex)); \ + else \ + data[0].im = data[len2].re; \ + \ + /* The DC value's both components are real, but we need to change them \ + * into complex values */ \ + t[0].re = data[0].re; \ + data[0].re = t[0].re + data[0].im; \ + data[0].im = t[0].re - data[0].im; \ + \ + for (int i = 1; i < len4; i++) { \ + /* Separate even and odd FFTs */ \ + t[0].re = RESCALE(0.5 )*(data[i].re + data[len2 - i].re); \ + t[0].im = RESCALE( - 0.5)*(data[i].im - data[len2 - i].im); \ + t[1].re = RESCALE( (0.5 - inv))*(data[i].im + data[len2 - i].im); \ + t[1].im = RESCALE(-(0.5 - inv))*(data[i].re - data[len2 - i].re); \ + \ + /* Apply twiddle factors to the odd FFT and add to the even FFT */ \ + t[2].re = t[1].re*tcos[i] s0 t[1].im*tsin[i]; \ + t[2].im = t[1].im*tcos[i] s1 t[1].re*tsin[i]; \ + \ + data[ i].re = t[0].re + t[2].re; \ + data[ i].im = t[2].im - t[0].im; \ + data[len2 - i].re = t[0].re - t[2].re; \ + data[len2 - i].im = t[2].im + t[0].im; \ + } \ + \ + data[len4].im = -data[len4].im; \ + \ + if (inv) { \ + data[0].re *= RESCALE(0.5); \ + data[0].im *= RESCALE(0.5); \ + s->fn[0](&s->sub[0], _dst, data, sizeof(FFTComplex)); \ + } else { \ + /* Move [0].im to the last position, as convention requires */ \ + data[len2].re = data[0].im; \ + data[ 0].im = 0; \ + } \ +} + +DECL_RDFT(fwd, +, -, 0) +DECL_RDFT(inv, -, +, 1) + +const FFTXCodelet TX_NAME(ff_tx_rdft_fwd_def) = { + .name = TX_NAME_STR("rdft_fwd"), + .function = TX_NAME(ff_tx_rdft_fwd), + .type = TX_TYPE(RDFT), + .flags = AV_TX_UNALIGNED | FF_TX_OUT_OF_PLACE | FF_TX_FORWARD_ONLY, + .factors = { 2, TX_FACTOR_ANY }, + .min_len = 2, + .max_len = TX_LEN_UNLIMITED, + .init = TX_NAME(ff_tx_rdft_init), + .cpu_flags = FF_TX_CPU_FLAGS_ALL, + .prio = FF_TX_PRIO_BASE, +}; + +const FFTXCodelet TX_NAME(ff_tx_rdft_inv_def) = { + .name = TX_NAME_STR("rdft_inv"), + .function = TX_NAME(ff_tx_rdft_inv), + .type = TX_TYPE(RDFT), + .flags = AV_TX_UNALIGNED | FF_TX_OUT_OF_PLACE | FF_TX_INVERSE_ONLY, + .factors = { 2, TX_FACTOR_ANY }, + .min_len = 2, + .max_len = TX_LEN_UNLIMITED, + .init = TX_NAME(ff_tx_rdft_init), + .cpu_flags = FF_TX_CPU_FLAGS_ALL, + .prio = FF_TX_PRIO_BASE, +}; + const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { /* Split-Radix codelets */ &TX_NAME(ff_tx_fft2_ns_def), @@ -1345,6 +1455,8 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { &TX_NAME(ff_tx_mdct_naive_fwd_def), &TX_NAME(ff_tx_mdct_naive_inv_def), &TX_NAME(ff_tx_mdct_inv_full_def), + &TX_NAME(ff_tx_rdft_fwd_def), + &TX_NAME(ff_tx_rdft_inv_def), NULL, };