From patchwork Thu Aug 3 16:26:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 43111 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:b914:b0:130:ccc6:6c4b with SMTP id fe20csp46422pzb; Thu, 3 Aug 2023 09:26:33 -0700 (PDT) X-Google-Smtp-Source: APBJJlGxgjkuPRH+/J7ufROFVSyR5yM3v1hL9Y8ORs8VsRiB1J1VRm7I2cniiFTv9LSgwmdJIvpx X-Received: by 2002:a17:906:10c:b0:98e:2097:f23e with SMTP id 12-20020a170906010c00b0098e2097f23emr7203009eje.77.1691079992769; Thu, 03 Aug 2023 09:26:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691079992; cv=none; d=google.com; s=arc-20160816; b=fWM1sMSJK1G0DC8OM1orhLJ7y4rJcqG0tSG2zv/NFOhLwVawoiSgbO1+XY1xuVeKuT RdwxIK6dbBrPaLET0vVTvuS9ONjcsyZUaY58sFqAC3lj7RydZfJnECqNcc5SLcSSZI5e v97jDX44ZZKfS6Qw3Q1xd2LrXKE4viHLL0Dk4u9kIiWjPFrhuwI5XWPYzT/Re6gYFX7q I/ra9sHWX90FUj5we3AEs+obacOUDo49LkD41piA805Z+J5ualjGquptIcjdETt11wE2 gXje3VGnRU86yAqi3+y/k5QMtRMjeP/zyPb04a2PGiAZJxbtxzdr6hZQdsgSSn0E1wXp nldA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:message-id:to:from:date:dkim-signature:delivered-to; bh=g8zlO72QmGJ9nWXO58FXqrLYjDkI+ILjrEUsCodNjH4=; fh=NIkoTDpgqwsBDY3Ej4pG8uwJcqBS6rd7Vc7bU5x+v0U=; b=zJp9kc19D50T7kZz9nVEODA9YlZP5EJAOBs+se6Om8bsrKOOLehUaH8LskqDkxFwuk nvRJfIReL5WVWsroBPTqr3HSGWdBMxXnjdQsxhQLhqVj3W7T1/izOe1Asq/K6aQHOJsI Fa3/Nf22ubzu93iyx1B96/6HYHQSD2mWtBvxRnBhvtSFfwpaplGdAbGIJU3iUS2cFRBa YQOyd38gdMT3LfbILCmPO83JdbZyRcCnXOPYuD78zPr+U56PCZNQ74VCZSXlE5x9PlQR THbHkI4Yn3tBfOD/GrFEz/IHMl5ZVY7zrMMqYsME1sbySzICB0U7J9HUbZiYoXgktOuB ubuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=URX018KF; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a10-20020a17090682ca00b00991cb7517bdsi38601ejy.948.2023.08.03.09.26.28; Thu, 03 Aug 2023 09:26:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=URX018KF; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0F52368C6D6; Thu, 3 Aug 2023 19:26:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9FAE068C2FB for ; Thu, 3 Aug 2023 19:26:17 +0300 (EEST) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 64A171060162 for ; Thu, 3 Aug 2023 16:26:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1691079976; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=yNJkhvgrS4zN2ApWOiYolC2XK7WKj/jShsaCARB63K4=; b=URX018KF09/P/tKNa7pm1Wrmzq2qqH67EVz2Jf8YJcE7k68sAFr08DXTSy1DmH8r jwT/zpB7vcRFkEKa6vwlNMTdQcPim/DDYGQW5eOBqcuKkqq+6hTSTvwEuNd6kFVhGkR UQ48WD4lT4FwgZGO+F+izy78Xvyer+OnfAPYg+8I42ZNJOwPgzBChE8hNIr49OKFu/a 5CEQRgZX4lU6p4cgeb8LtxYSfU7uJoqYZOCxoQ+bzyGGXU+bsp9pAQNFIMjdTXZTVxv jHBG1+xeVfpjtw8gqfI9tdU9ISJu29eO5vefqNariNU9YA2pIJq3YKYNYnTw77xMs+X BNG5Evl32g== Date: Thu, 3 Aug 2023 18:26:16 +0200 (CEST) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavu/tx: add real to real and real to imaginary RDFT transforms X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: BW+q1s1Nxkcv These are in-place transforms, required for DCT-I and DST-I. Templated as the mod2 variant requires minor modifications, and is required specifically for DCT-I/DST-I. Quite optimized, as there's no need for any additional buffer storage. From 2ea5e2541c2551bf1b56e967d35946289a85aa49 Mon Sep 17 00:00:00 2001 From: Lynne Date: Thu, 3 Aug 2023 18:21:23 +0200 Subject: [PATCH 1/2] lavu/tx: add real to real and real to imaginary RDFT transforms These are in-place transforms, required for DCT-I and DST-I. Templated as the mod2 variant requires minor modifications, and is required specifically for DCT-I/DST-I. --- doc/APIchanges | 3 + libavutil/tx.c | 18 ++++- libavutil/tx.h | 10 +++ libavutil/tx_template.c | 175 +++++++++++++++++++++++++++++++--------- libavutil/version.h | 2 +- 5 files changed, 167 insertions(+), 41 deletions(-) diff --git a/doc/APIchanges b/doc/APIchanges index 5afe8bcb75..edd178be4f 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -2,6 +2,9 @@ The last version increases of all libraries were on 2023-02-09 API changes, most recent first: +2023-07-xx - xxxxxxxxxx - lavu 58.15.100 - tx.h + Add AV_TX_REAL_TO_REAL and AV_TX_REAL_TO_IMAGINARY + 2023-07-xx - xxxxxxxxxx - lavc 60 - avcodec.h Deprecate AV_CODEC_FLAG_DROPCHANGED without replacement. diff --git a/libavutil/tx.c b/libavutil/tx.c index e25abf998f..e9826e6107 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -437,7 +437,9 @@ int ff_tx_decompose_length(int dst[TX_MAX_DECOMPOSITIONS], enum AVTXType type, /* Check direction for non-orthogonal codelets */ if (((cd->flags & FF_TX_FORWARD_ONLY) && inv) || - ((cd->flags & (FF_TX_INVERSE_ONLY | AV_TX_FULL_IMDCT)) && !inv)) + ((cd->flags & (FF_TX_INVERSE_ONLY | AV_TX_FULL_IMDCT)) && !inv) || + ((cd->flags & (FF_TX_FORWARD_ONLY | AV_TX_REAL_TO_REAL)) && inv) || + ((cd->flags & (FF_TX_FORWARD_ONLY | AV_TX_REAL_TO_IMAGINARY)) && inv)) continue; /* Check if the CPU supports the required ISA */ @@ -560,6 +562,10 @@ static void print_flags(AVBPrint *bp, uint64_t f) av_bprintf(bp, "%spreshuf", prev > 1 ? sep : ""); if ((f & AV_TX_FULL_IMDCT) && ++prev) av_bprintf(bp, "%simdct_full", prev > 1 ? sep : ""); + if ((f & AV_TX_REAL_TO_REAL) && ++prev) + av_bprintf(bp, "%sreal_to_real", prev > 1 ? sep : ""); + if ((f & AV_TX_REAL_TO_IMAGINARY) && ++prev) + av_bprintf(bp, "%sreal_to_imaginary", prev > 1 ? sep : ""); if ((f & FF_TX_ASM_CALL) && ++prev) av_bprintf(bp, "%sasm_call", prev > 1 ? sep : ""); av_bprintf(bp, "]"); @@ -717,7 +723,11 @@ av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, uint64_t req_flags = flags; /* Flags the codelet may require to be present */ - uint64_t inv_req_mask = AV_TX_FULL_IMDCT | FF_TX_PRESHUFFLE | FF_TX_ASM_CALL; + uint64_t inv_req_mask = AV_TX_FULL_IMDCT | + AV_TX_REAL_TO_REAL | + AV_TX_REAL_TO_IMAGINARY | + FF_TX_PRESHUFFLE | + FF_TX_ASM_CALL; /* Unaligned codelets are compatible with the aligned flag */ if (req_flags & FF_TX_ALIGNED) @@ -742,7 +752,9 @@ av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, /* Check direction for non-orthogonal codelets */ if (((cd->flags & FF_TX_FORWARD_ONLY) && inv) || - ((cd->flags & (FF_TX_INVERSE_ONLY | AV_TX_FULL_IMDCT)) && !inv)) + ((cd->flags & (FF_TX_INVERSE_ONLY | AV_TX_FULL_IMDCT)) && !inv) || + ((cd->flags & (FF_TX_FORWARD_ONLY | AV_TX_REAL_TO_REAL)) && inv) || + ((cd->flags & (FF_TX_FORWARD_ONLY | AV_TX_REAL_TO_IMAGINARY)) && inv)) continue; /* Check if the requested flags match from both sides */ diff --git a/libavutil/tx.h b/libavutil/tx.h index 064edbc097..d178e8ee9d 100644 --- a/libavutil/tx.h +++ b/libavutil/tx.h @@ -149,6 +149,16 @@ enum AVTXFlags { * Ignored for all transforms but inverse MDCTs. */ AV_TX_FULL_IMDCT = 1ULL << 2, + + /** + * Perform a real to half-complex RDFT. + * Only the real, or imaginary coefficients will + * be output, depending on the flag used. Only available for forward RDFTs. + * Output array must have enough space to hold N complex values + * (regular size for a real to complex transform). + */ + AV_TX_REAL_TO_REAL = 1ULL << 3, + AV_TX_REAL_TO_IMAGINARY = 1ULL << 4, }; /** diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index c4ec9502e0..50c65d00b5 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -1613,14 +1613,17 @@ static av_cold int TX_NAME(ff_tx_rdft_init)(AVTXContext *s, int ret; double f, m; TXSample *tab; + int len4 = FFALIGN(len, 4) / 4; s->scale_d = *((SCALE_TYPE *)scale); s->scale_f = s->scale_d; + flags &= ~(AV_TX_REAL_TO_REAL | AV_TX_REAL_TO_IMAGINARY); + if ((ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, NULL, len >> 1, inv, scale))) return ret; - if (!(s->exp = av_mallocz((8 + (len >> 2) - 1)*sizeof(*s->exp)))) + if (!(s->exp = av_mallocz((8 + 2*len4)*sizeof(*s->exp)))) return AVERROR(ENOMEM); tab = (TXSample *)s->exp; @@ -1639,17 +1642,20 @@ static av_cold int TX_NAME(ff_tx_rdft_init)(AVTXContext *s, *tab++ = RESCALE( (0.5 - inv) * m); *tab++ = RESCALE(-(0.5 - inv) * m); - for (int i = 0; i < len >> 2; i++) + for (int i = 0; i < len4; i++) *tab++ = RESCALE(cos(i*f)); - for (int i = len >> 2; i >= 0; i--) - *tab++ = RESCALE(cos(i*f) * (inv ? +1.0 : -1.0)); + + tab = ((TXSample *)s->exp) + len4 + 8; + + for (int i = 0; i < len4; i++) + *tab++ = RESCALE(cos(((float)len/4.0 - (float)i + 0)*f) * (inv ? +1.0 : -1.0)); return 0; } -#define DECL_RDFT(name, inv) \ -static void TX_NAME(ff_tx_rdft_ ##name)(AVTXContext *s, void *_dst, \ - void *_src, ptrdiff_t stride) \ +#define DECL_RDFT(n, inv) \ +static void TX_NAME(ff_tx_rdft_ ##n)(AVTXContext *s, void *_dst, \ + void *_src, ptrdiff_t stride) \ { \ const int len2 = s->len >> 1; \ const int len4 = s->len >> 2; \ @@ -1698,40 +1704,131 @@ static void TX_NAME(ff_tx_rdft_ ##name)(AVTXContext *s, void *_dst, \ data[len2].re = data[0].im; \ data[ 0].im = data[len2].im = 0; \ } \ -} +} \ + \ +static const FFTXCodelet TX_NAME(ff_tx_rdft_ ##n## _def) = { \ + .name = TX_NAME_STR("rdft_" #n), \ + .function = TX_NAME(ff_tx_rdft_ ##n), \ + .type = TX_TYPE(RDFT), \ + .flags = AV_TX_UNALIGNED | AV_TX_INPLACE | FF_TX_OUT_OF_PLACE | \ + inv ? FF_TX_INVERSE_ONLY : FF_TX_FORWARD_ONLY, \ + .factors = { 4, TX_FACTOR_ANY }, \ + .nb_factors = 2, \ + .min_len = 4, \ + .max_len = TX_LEN_UNLIMITED, \ + .init = TX_NAME(ff_tx_rdft_init), \ + .cpu_flags = FF_TX_CPU_FLAGS_ALL, \ + .prio = FF_TX_PRIO_BASE, \ +}; -DECL_RDFT(r2c, 0) -DECL_RDFT(c2r, 1) +DECL_RDFT(r2c, 0) +DECL_RDFT(c2r, 1) -static const FFTXCodelet TX_NAME(ff_tx_rdft_r2c_def) = { - .name = TX_NAME_STR("rdft_r2c"), - .function = TX_NAME(ff_tx_rdft_r2c), - .type = TX_TYPE(RDFT), - .flags = AV_TX_UNALIGNED | AV_TX_INPLACE | - FF_TX_OUT_OF_PLACE | FF_TX_FORWARD_ONLY, - .factors = { 2, TX_FACTOR_ANY }, - .nb_factors = 2, - .min_len = 2, - .max_len = TX_LEN_UNLIMITED, - .init = TX_NAME(ff_tx_rdft_init), - .cpu_flags = FF_TX_CPU_FLAGS_ALL, - .prio = FF_TX_PRIO_BASE, +#define DECL_RDFT_HALF(n, mode, mod2) \ +static void TX_NAME(ff_tx_rdft_ ##n)(AVTXContext *s, void *_dst, \ + void *_src, ptrdiff_t stride) \ +{ \ + const int len = s->len; \ + const int len2 = len >> 1; \ + const int len4 = len >> 2; \ + const int aligned_len4 = FFALIGN(len, 4)/4; \ + const TXSample *fact = (void *)s->exp; \ + const TXSample *tcos = fact + 8; \ + const TXSample *tsin = tcos + aligned_len4; \ + TXComplex *data = _dst; \ + TXSample *out = _dst; /* Half-complex is forward-only */ \ + TXSample tmp_dc; \ + av_unused TXSample tmp_mid; \ + TXSample tmp[4]; \ + TXComplex sf, sl; \ + \ + s->fn[0](&s->sub[0], _dst, _src, sizeof(TXComplex)); \ + \ + tmp_dc = data[0].re; \ + data[ 0].re = tmp_dc + data[0].im; \ + tmp_dc = tmp_dc - data[0].im; \ + \ + data[ 0].re = MULT(fact[0], data[ 0].re); \ + tmp_dc = MULT(fact[1], tmp_dc); \ + data[len4].re = MULT(fact[2], data[len4].re); \ + \ + if (!mod2) { \ + data[len4].im = MULT(fact[3], data[len4].im); \ + } else { \ + sf = data[len4]; \ + sl = data[len4 + 1]; \ + if (mode == AV_TX_REAL_TO_REAL) \ + tmp[0] = MULT(fact[4], (sf.re + sl.re)); \ + else \ + tmp[0] = MULT(fact[5], (sf.im - sl.im)); \ + tmp[1] = MULT(fact[6], (sf.im + sl.im)); \ + tmp[2] = MULT(fact[7], (sf.re - sl.re)); \ + \ + if (mode == AV_TX_REAL_TO_REAL) { \ + tmp[3] = tmp[1]*tcos[len4] - tmp[2]*tsin[len4]; \ + tmp_mid = (tmp[0] - tmp[3]); \ + } else { \ + tmp[3] = tmp[1]*tsin[len4] + tmp[2]*tcos[len4]; \ + tmp_mid = (tmp[0] + tmp[3]); \ + } \ + } \ + \ + /* NOTE: unrolling this breaks non-mod8 lengths */ \ + for (int i = 1; i <= len4; i++) { \ + TXSample tmp[4]; \ + TXComplex sf = data[i]; \ + TXComplex sl = data[len2 - i]; \ + \ + if (mode == AV_TX_REAL_TO_REAL) \ + tmp[0] = MULT(fact[4], (sf.re + sl.re)); \ + else \ + tmp[0] = MULT(fact[5], (sf.im - sl.im)); \ + \ + tmp[1] = MULT(fact[6], (sf.im + sl.im)); \ + tmp[2] = MULT(fact[7], (sf.re - sl.re)); \ + \ + if (mode == AV_TX_REAL_TO_REAL) { \ + tmp[3] = tmp[1]*tcos[i] - tmp[2]*tsin[i]; \ + out[i] = (tmp[0] + tmp[3]); \ + out[len - i] = (tmp[0] - tmp[3]); \ + } else { \ + tmp[3] = tmp[1]*tsin[i] + tmp[2]*tcos[i]; \ + out[i - 1] = (tmp[3] - tmp[0]); \ + out[len - i - 1] = (tmp[0] + tmp[3]); \ + } \ + } \ + \ + for (int i = 1; i < (len4 + (mode == AV_TX_REAL_TO_IMAGINARY)); i++) \ + out[len2 - i] = out[len - i]; \ + \ + if (mode == AV_TX_REAL_TO_REAL) { \ + out[len2] = tmp_dc; \ + if (mod2) \ + out[len4 + 1] = tmp_mid; \ + } else if (mod2) { \ + out[len4] = tmp_mid; \ + } \ +} \ + \ +static const FFTXCodelet TX_NAME(ff_tx_rdft_ ##n## _def) = { \ + .name = TX_NAME_STR("rdft_" #n), \ + .function = TX_NAME(ff_tx_rdft_ ##n), \ + .type = TX_TYPE(RDFT), \ + .flags = AV_TX_UNALIGNED | AV_TX_INPLACE | mode | \ + FF_TX_OUT_OF_PLACE | FF_TX_FORWARD_ONLY, \ + .factors = { 2 + 2*(!mod2), TX_FACTOR_ANY }, \ + .nb_factors = 2, \ + .min_len = 2 + 2*(!mod2), \ + .max_len = TX_LEN_UNLIMITED, \ + .init = TX_NAME(ff_tx_rdft_init), \ + .cpu_flags = FF_TX_CPU_FLAGS_ALL, \ + .prio = FF_TX_PRIO_BASE, \ }; -static const FFTXCodelet TX_NAME(ff_tx_rdft_c2r_def) = { - .name = TX_NAME_STR("rdft_c2r"), - .function = TX_NAME(ff_tx_rdft_c2r), - .type = TX_TYPE(RDFT), - .flags = AV_TX_UNALIGNED | AV_TX_INPLACE | - FF_TX_OUT_OF_PLACE | FF_TX_INVERSE_ONLY, - .factors = { 2, TX_FACTOR_ANY }, - .nb_factors = 2, - .min_len = 2, - .max_len = TX_LEN_UNLIMITED, - .init = TX_NAME(ff_tx_rdft_init), - .cpu_flags = FF_TX_CPU_FLAGS_ALL, - .prio = FF_TX_PRIO_BASE, -}; +DECL_RDFT_HALF(r2r, AV_TX_REAL_TO_REAL, 0) +DECL_RDFT_HALF(r2r_mod2, AV_TX_REAL_TO_REAL, 1) +DECL_RDFT_HALF(r2i, AV_TX_REAL_TO_IMAGINARY, 0) +DECL_RDFT_HALF(r2i_mod2, AV_TX_REAL_TO_IMAGINARY, 1) static av_cold int TX_NAME(ff_tx_dct_init)(AVTXContext *s, const FFTXCodelet *cd, @@ -1997,6 +2094,10 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { &TX_NAME(ff_tx_mdct_naive_inv_def), &TX_NAME(ff_tx_mdct_inv_full_def), &TX_NAME(ff_tx_rdft_r2c_def), + &TX_NAME(ff_tx_rdft_r2r_def), + &TX_NAME(ff_tx_rdft_r2r_mod2_def), + &TX_NAME(ff_tx_rdft_r2i_def), + &TX_NAME(ff_tx_rdft_r2i_mod2_def), &TX_NAME(ff_tx_rdft_c2r_def), &TX_NAME(ff_tx_dctII_def), &TX_NAME(ff_tx_dctIII_def), diff --git a/libavutil/version.h b/libavutil/version.h index 24af520e08..9e798b0e3f 100644 --- a/libavutil/version.h +++ b/libavutil/version.h @@ -79,7 +79,7 @@ */ #define LIBAVUTIL_VERSION_MAJOR 58 -#define LIBAVUTIL_VERSION_MINOR 14 +#define LIBAVUTIL_VERSION_MINOR 15 #define LIBAVUTIL_VERSION_MICRO 100 #define LIBAVUTIL_VERSION_INT AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \ -- 2.40.1 From patchwork Thu Aug 3 16:31:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 43112 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:b914:b0:130:ccc6:6c4b with SMTP id fe20csp49525pzb; Thu, 3 Aug 2023 09:31:35 -0700 (PDT) X-Google-Smtp-Source: APBJJlEMsdym7XFKfEsr88Va7Q4dNP3ouxmS02cHyKytRrJNBHekyaB9ztxrBpHrOJf4x7YHAH46 X-Received: by 2002:a17:906:7696:b0:988:b61e:4219 with SMTP id o22-20020a170906769600b00988b61e4219mr12020084ejm.29.1691080294740; Thu, 03 Aug 2023 09:31:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691080294; cv=none; d=google.com; s=arc-20160816; b=l7zZt8Gx8yWDAKcD/7ZlGx0w4epkg9Kde037RUvKdpwPW1zmtFw4Me3Q+r26kULMzD /QdxoEC409aHlMHIEdoKlY/BMamfgZ94TjTbuARstgbWtD3oFeArk2vC72d3L3uI8qAS ELe/FoeMksmwwWd4c1pZuIWqDMihvi3nZFtIFibRBIScMbSjDCMNA4+pXMioQqYJWen2 3oQG5jTwi40mjKoRwQ9x153xEq/76TpfSj4Ad20atTGcZxlE0r+PT2l6A/kqQs5rfxoG Ky6YoFC2GL0C7LmO7Ukva2JSk5YqvdCMll5smoJYAmlCkIcAsygruNPppCKz+DOoJP39 ntfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:references:in-reply-to:message-id:to:from:date :dkim-signature:delivered-to; bh=IITely3cClWw+gG1AX9a4IZOirqy/vIK56KdteCSchw=; fh=awftJPkKIDcYG+GZSKK7EGrS2s4rqD3QuLTgwd5DXmc=; b=sJyRRV9XSPTlWF0CfBcIjFdEjmUsEJgx6KVOrS7EOuZzVCeC68vrkjrsr2JCz7TJ+D MtQsVYJfW7oUMFWpmUWXvaDX/Y9L0LPGR2Zxr2LHv73QQF+aREMr9ouBzIiUldzPZxIa gunlRKFnqF4KJYc3+5s19OEpHCDY06BCytLIzWkeISrWZ2sD7brvnLnUScnL3F1sDhvz QwIW5BtYI2qZYmC72P1ahVSmJNkhvDDjR+jZ12SPTtwDxfhSpp49pYF/slYD2rNx0riL i1HRkBMsQqtKk6rkx4LoIpVSmF+IMlD4KHKbj6EbPaP8DI0HdQHXzwbNgtU0YYTfTUW1 tmqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=25TiQz3U; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b15-20020a1709063f8f00b00991ece4c95esi46734ejj.487.2023.08.03.09.31.34; Thu, 03 Aug 2023 09:31:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b=25TiQz3U; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 75EF868C6CF; Thu, 3 Aug 2023 19:31:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3C5B568ADCE for ; Thu, 3 Aug 2023 19:31:24 +0300 (EEST) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 51A3810602D9 for ; Thu, 3 Aug 2023 16:31:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1691080283; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:References:Sender; bh=JksDmHyx2i1j8+1UYTEECRCHh5NQN9bfbFECkfewZMg=; b=25TiQz3UNp3qPziOjrsj8nrLvpCqeMSNtv7ns7HDzmNFtAKIPpLbRoZngRPQxzyY pQiYp8opaCZ8f4ILcJc8WjmhnA0OeoJgyRp7vKQIDrTMEpHzpdYqv4Wftf9ytHdcZHn izmXqQkVuC6/4H0iSMJQVCb2YDRGxxNXW7+94bh4HfwtITypNxR9mBjXwwblk3MdcDU X+nUeW2evQ8G/n+uW70U9LX3QgfoeaGVkG/1Y3DhxR4olXbIh4SQJWygMz/82JOz1gb iQDBxVTIfYa+m8h8saTlDeK3TlSRQmNTLFmoVHr8NL8lxi0xQQAvCESkgw7+1ocGgte BXhunggX3g== Date: Thu, 3 Aug 2023 18:31:23 +0200 (CEST) From: Lynne To: FFmpeg development discussions and patches Message-ID: In-Reply-To: References: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavu/tx: add DCT-I and DST-I transforms X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: TKgELo24wNQq These are true, actual DCT-I and DST-I transforms, unlike the libavcodec versions, which are plainly not. Error tests via https://github.com/cyanreg/lavu_fft_test RMS error on a 2048-sample DCT-I: RMSE   av_tx = 0.000000 (4096 matches, first mismatch at -1) RMSE  fftw3f = 0.000000 (4096 matches, first mismatch at -1) RMSE   avfft = 0.011440 (0 matches, first mismatch at 0) RMS error on a 2048-sample DST-I: RMSE   av_tx = 0.000000 (4096 matches, first mismatch at -1) RMSE  fftw3f = 0.000000 (4096 matches, first mismatch at -1) RMSE   avfft = 0.015316 (0 matches, first mismatch at 0) From 0bbe264a0c597a5a871ffc2bfea06e717bc9e0a1 Mon Sep 17 00:00:00 2001 From: Lynne Date: Thu, 3 Aug 2023 18:23:02 +0200 Subject: [PATCH 2/2] lavu/tx: add DCT-I and DST-I transforms These are true, actual DCT-I and DST-I transforms, unlike the libavcodec versions, which are plainly not. --- libavutil/tx.h | 24 ++++++++++ libavutil/tx_template.c | 103 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 127 insertions(+) diff --git a/libavutil/tx.h b/libavutil/tx.h index d178e8ee9d..4696988cae 100644 --- a/libavutil/tx.h +++ b/libavutil/tx.h @@ -105,6 +105,30 @@ enum AVTXType { AV_TX_DOUBLE_DCT = 10, AV_TX_INT32_DCT = 11, + /** + * Discrete Cosine Transform I + * + * The forward transform is a DCT-I. + * The inverse transform is a DCT-I multiplied by 2/(N + 1). + * + * The input array is always overwritten. + */ + AV_TX_FLOAT_DCT_I = 12, + AV_TX_DOUBLE_DCT_I = 13, + AV_TX_INT32_DCT_I = 14, + + /** + * Discrete Sine Transform I + * + * The forward transform is a DST-I. + * The inverse transform is a DST-I multiplied by 2/(N + 1). + * + * The input array is always overwritten. + */ + AV_TX_FLOAT_DST_I = 15, + AV_TX_DOUBLE_DST_I = 16, + AV_TX_INT32_DST_I = 17, + /* Not part of the API, do not use */ AV_TX_NB, }; diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 50c65d00b5..9bdac1e57d 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -2004,6 +2004,107 @@ static const FFTXCodelet TX_NAME(ff_tx_dctIII_def) = { .prio = FF_TX_PRIO_BASE, }; +static av_cold int TX_NAME(ff_tx_dcstI_init)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) +{ + int ret; + SCALE_TYPE rsc = *((SCALE_TYPE *)scale); + + if (0 && inv) { + len *= 2; + s->len *= 2; + rsc *= 0.5; + } + + /* We want a half-complex RDFT */ + flags |= cd->type == TX_TYPE(DCT_I) ? AV_TX_REAL_TO_REAL : + AV_TX_REAL_TO_IMAGINARY; + + if ((ret = ff_tx_init_subtx(s, TX_TYPE(RDFT), flags, NULL, + (len - 1 + 2*(cd->type == TX_TYPE(DST_I)))*2, + 0, &rsc))) + return ret; + + s->tmp = av_mallocz((len + 1)*2*sizeof(TXSample)); + if (!s->tmp) + return AVERROR(ENOMEM); + + return 0; +} + +static void TX_NAME(ff_tx_dctI)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) +{ + TXSample *dst = _dst; + TXSample *src = _src; + const int len = s->len - 1; + TXSample *tmp = (TXSample *)s->tmp; + + stride /= sizeof(TXSample); + + for (int i = 0; i < len; i++) + tmp[i] = tmp[2*len - i] = src[i * stride]; + + tmp[len] = src[len * stride]; /* Middle */ + + s->fn[0](&s->sub[0], dst, tmp, sizeof(TXSample)); +} + +static void TX_NAME(ff_tx_dstI)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) +{ + TXSample *dst = _dst; + TXSample *src = _src; + const int len = s->len + 1; + TXSample *tmp = (void *)s->tmp; + + stride /= sizeof(TXSample); + + tmp[0] = 0; + + for (int i = 1; i < len; i++) { + TXSample a = src[(i - 1) * stride]; + tmp[i] = -a; + tmp[2*len - i] = a; + } + + tmp[len] = 0; /* i == n, Nyquist */ + + s->fn[0](&s->sub[0], dst, tmp, sizeof(float)); +} + +static const FFTXCodelet TX_NAME(ff_tx_dctI_def) = { + .name = TX_NAME_STR("dctI"), + .function = TX_NAME(ff_tx_dctI), + .type = TX_TYPE(DCT_I), + .flags = AV_TX_UNALIGNED | AV_TX_INPLACE | FF_TX_OUT_OF_PLACE, + .factors = { 2, TX_FACTOR_ANY }, + .nb_factors = 2, + .min_len = 2, + .max_len = TX_LEN_UNLIMITED, + .init = TX_NAME(ff_tx_dcstI_init), + .cpu_flags = FF_TX_CPU_FLAGS_ALL, + .prio = FF_TX_PRIO_BASE, +}; + +static const FFTXCodelet TX_NAME(ff_tx_dstI_def) = { + .name = TX_NAME_STR("dstI"), + .function = TX_NAME(ff_tx_dstI), + .type = TX_TYPE(DST_I), + .flags = AV_TX_UNALIGNED | AV_TX_INPLACE | FF_TX_OUT_OF_PLACE, + .factors = { 2, TX_FACTOR_ANY }, + .nb_factors = 2, + .min_len = 2, + .max_len = TX_LEN_UNLIMITED, + .init = TX_NAME(ff_tx_dcstI_init), + .cpu_flags = FF_TX_CPU_FLAGS_ALL, + .prio = FF_TX_PRIO_BASE, +}; + int TX_TAB(ff_tx_mdct_gen_exp)(AVTXContext *s, int *pre_tab) { int off = 0; @@ -2101,6 +2202,8 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { &TX_NAME(ff_tx_rdft_c2r_def), &TX_NAME(ff_tx_dctII_def), &TX_NAME(ff_tx_dctIII_def), + &TX_NAME(ff_tx_dctI_def), + &TX_NAME(ff_tx_dstI_def), NULL, }; -- 2.40.1