From patchwork Fri Sep 23 23:14:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lynne X-Patchwork-Id: 38200 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp790483pzh; Fri, 23 Sep 2022 16:15:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7yMf+ZsJxiq6wJo3tZS+FOMw4tUWaEE0QJHkX5ipQS1fqV7UdY2XEvuxxJ7BEmVEjfK3Lp X-Received: by 2002:a05:6402:27d0:b0:451:b381:e0a1 with SMTP id c16-20020a05640227d000b00451b381e0a1mr10609592ede.4.1663974935514; Fri, 23 Sep 2022 16:15:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663974935; cv=none; d=google.com; s=arc-20160816; b=fas2nAOlafctGtdQRH6Ym4gaPzQPBonzBSqOX8oPc5L/lP3f5uVvhq1sGAmcrwauLJ XjHBkbsQ8Yqu/PAj+89tE6K++vLBulNf6lAy/14hytXAP+fbiv8fDkSYsbt4eLSi5pc3 JKUxTe7hI20iDbodDaBxlbfBu/J/MS5IqCi7bUJ9iroc2kt3uJCqogIq1EEGjJSq3tU7 rtJF6WexpFfpCBOpgRbNHD3alxZa8RvgGq27givfrAf/pTGkLXVKQmSiWlWalJY1tv8+ LoxGjD+jceyNgE7OfWIalPEkqGLFFY+Grp+6XxPtC007FOZK0msBMeAaTXX5MbgcJADE hesA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject :mime-version:message-id:to:from:date:dkim-signature:delivered-to; bh=4ccC3YdrtKKT19tYh7Dm4paHinXa/unckGxEPzmBVUI=; b=Liz0/+qCJWN6pZPcs+JByxJDhBW/tPlW7FdhINFhkVa1kah9ZTwMEfUN/BsaKPyhIU ffD/dzdrb2KvCf0SU2+q7Nj//OvvUHfsf2F1JWTPBJbkbkUy4UyPb8qLD5l1ZxwTpF7w JMlZa2+VfNiysGOfFEHhW07UO2x1G7gNPfcWgN5Enaj1MDktQypnakkjLUMRdvaZp0E7 5/IQLKyfC9sAlizgVvb7y9reapCWYyXsbZmpW1UgM0QTAM0jt8s5oQJNF22MJDSHVDaA E23mlM95usN8FDeDnCiUf8JQwgEbfH/xIY/sKDUJVlozVzWJbS1iNEZQw1cwVJ2pzxJ2 DI6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b="H/h4tiM9"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id fs36-20020a170907602400b007305f9cf344si6689828ejc.853.2022.09.23.16.15.12; Fri, 23 Sep 2022 16:15:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@lynne.ee header.s=s1 header.b="H/h4tiM9"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=lynne.ee Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 66B0D68BB5D; Sat, 24 Sep 2022 02:15:08 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D8AB568BADF for ; Sat, 24 Sep 2022 02:15:01 +0300 (EEST) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id 1E7CA10600E8 for ; Fri, 23 Sep 2022 23:15:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1663974899; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:Sender; bh=G/05UJ2RmzGGSTcsX1gOCVGtwiU1w8KuzdLEMGxbWJA=; b=H/h4tiM9q69km0vZpIv97kd5p3P7PPArPddKJBVA487V/+nAlU7eQbyYwq8+X9lM 1GJEgj4Vtq4UdFkFdBROpESdUdvOHqhUxyQxRumKMqKrAJRstMMtNUvF50qWUWApG7y /MHupkHY+K93fnyJs0BAlrvOUvNiZ0cpbL22T1U4aie7MYi0GJuZhMAkXUhd4Xo1o6j nPSFHlv/vgC/11362wB9uPd3Po/Sn/vruoNmYSNJLtD/nwzAD+LUBfQpI5/6FoCE71M TSmSY5caaRUx4Pr3EwfdS7k122F4yHoCYg03gyeG9qjBcnN1HwF6bmwEXOur18xa204 //BEIUZbyQ== Date: Sat, 24 Sep 2022 01:14:59 +0200 (CEST) From: Lynne To: Ffmpeg Devel Message-ID: MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8xnvyokVCKPf This commit changes both the encoder and decoder to use the new lavu/tx code, which has faster C transforms and more assembly optimizations. Patch attached. From d4fdda5b57ab1e0f08eb3d78dac6b003060dfd41 Mon Sep 17 00:00:00 2001 From: Lynne Date: Sat, 24 Sep 2022 00:46:44 +0200 Subject: [PATCH 1/6] opus: convert encoder and decoder to lavu/tx This commit changes both the encoder and decoder to use the new lavu/tx code, which has faster C transforms and more assembly optimizations. --- libavcodec/opus_celt.c | 20 ++++++++++++-------- libavcodec/opus_celt.h | 5 +++-- libavcodec/opusenc.c | 15 +++++++++------ libavcodec/opusenc_psy.c | 13 ++++++++----- libavcodec/opusenc_psy.h | 4 +++- 5 files changed, 35 insertions(+), 22 deletions(-) diff --git a/libavcodec/opus_celt.c b/libavcodec/opus_celt.c index 9dbeff1927..f1fb88a56d 100644 --- a/libavcodec/opus_celt.c +++ b/libavcodec/opus_celt.c @@ -323,7 +323,8 @@ int ff_celt_decode_frame(CeltFrame *f, OpusRangeCoder *rc, { int i, j, downmix = 0; int consumed; // bits of entropy consumed thus far for this frame - MDCT15Context *imdct; + AVTXContext *imdct; + av_tx_fn imdct_fn; if (channels != 1 && channels != 2) { av_log(f->avctx, AV_LOG_ERROR, "Invalid number of coded channels: %d\n", @@ -385,7 +386,8 @@ int ff_celt_decode_frame(CeltFrame *f, OpusRangeCoder *rc, f->blocks = f->transient ? 1 << f->size : 1; f->blocksize = frame_size / f->blocks; - imdct = f->imdct[f->transient ? 0 : f->size]; + imdct = f->tx[f->transient ? 0 : f->size]; + imdct_fn = f->tx_fn[f->transient ? 0 : f->size]; if (channels == 1) { for (i = 0; i < CELT_MAX_BANDS; i++) @@ -440,8 +442,8 @@ int ff_celt_decode_frame(CeltFrame *f, OpusRangeCoder *rc, for (j = 0; j < f->blocks; j++) { float *dst = block->buf + 1024 + j * f->blocksize; - imdct->imdct_half(imdct, dst + CELT_OVERLAP / 2, f->block[i].coeffs + j, - f->blocks); + imdct_fn(imdct, dst + CELT_OVERLAP / 2, f->block[i].coeffs + j, + sizeof(float)*f->blocks); f->dsp->vector_fmul_window(dst, dst, dst + CELT_OVERLAP / 2, ff_celt_window, CELT_OVERLAP / 2); } @@ -526,8 +528,8 @@ void ff_celt_free(CeltFrame **f) if (!frm) return; - for (i = 0; i < FF_ARRAY_ELEMS(frm->imdct); i++) - ff_mdct15_uninit(&frm->imdct[i]); + for (i = 0; i < FF_ARRAY_ELEMS(frm->tx); i++) + av_tx_uninit(&frm->tx[i]); ff_celt_pvq_uninit(&frm->pvq); @@ -555,9 +557,11 @@ int ff_celt_init(AVCodecContext *avctx, CeltFrame **f, int output_channels, frm->output_channels = output_channels; frm->apply_phase_inv = apply_phase_inv; - for (i = 0; i < FF_ARRAY_ELEMS(frm->imdct); i++) - if ((ret = ff_mdct15_init(&frm->imdct[i], 1, i + 3, -1.0f/32768)) < 0) + for (i = 0; i < FF_ARRAY_ELEMS(frm->tx); i++) { + const float scale = -1.0f/32768; + if ((ret = av_tx_init(&frm->tx[i], &frm->tx_fn[i], AV_TX_FLOAT_MDCT, 1, 15 << (i + 3), &scale, 0)) < 0) goto fail; + } if ((ret = ff_celt_pvq_init(&frm->pvq, 0)) < 0) goto fail; diff --git a/libavcodec/opus_celt.h b/libavcodec/opus_celt.h index 661ca251de..291a544298 100644 --- a/libavcodec/opus_celt.h +++ b/libavcodec/opus_celt.h @@ -30,10 +30,10 @@ #include "opus_pvq.h" #include "opusdsp.h" -#include "mdct15.h" #include "libavutil/float_dsp.h" #include "libavutil/libm.h" #include "libavutil/mem_internal.h" +#include "libavutil/tx.h" #define CELT_VECTORS 11 #define CELT_ALLOC_STEPS 6 @@ -93,7 +93,8 @@ typedef struct CeltBlock { struct CeltFrame { // constant values that do not change during context lifetime AVCodecContext *avctx; - MDCT15Context *imdct[4]; + AVTXContext *tx[4]; + av_tx_fn tx_fn[4]; AVFloatDSPContext *dsp; CeltBlock block[2]; CeltPVQ *pvq; diff --git a/libavcodec/opusenc.c b/libavcodec/opusenc.c index a7a9d3a5f5..8cdd27d930 100644 --- a/libavcodec/opusenc.c +++ b/libavcodec/opusenc.c @@ -40,7 +40,8 @@ typedef struct OpusEncContext { AVCodecContext *avctx; AudioFrameQueue afq; AVFloatDSPContext *dsp; - MDCT15Context *mdct[CELT_BLOCK_NB]; + AVTXContext *tx[CELT_BLOCK_NB]; + av_tx_fn tx_fn[CELT_BLOCK_NB]; CeltPVQ *pvq; struct FFBufQueue bufqueue; @@ -204,7 +205,7 @@ static void celt_frame_mdct(OpusEncContext *s, CeltFrame *f) s->dsp->vector_fmul_reverse(&win[CELT_OVERLAP], src2, ff_celt_window - 8, 128); src1 = src2; - s->mdct[0]->mdct(s->mdct[0], b->coeffs + t, win, f->blocks); + s->tx_fn[0](s->tx[0], b->coeffs + t, win, sizeof(float)*f->blocks); } } } else { @@ -226,7 +227,7 @@ static void celt_frame_mdct(OpusEncContext *s, CeltFrame *f) ff_celt_window - 8, 128); memcpy(win + lap_dst + blk_len, temp, CELT_OVERLAP*sizeof(float)); - s->mdct[f->size]->mdct(s->mdct[f->size], b->coeffs, win, 1); + s->tx_fn[f->size](s->tx[f->size], b->coeffs, win, sizeof(float)); } } @@ -612,7 +613,7 @@ static av_cold int opus_encode_end(AVCodecContext *avctx) OpusEncContext *s = avctx->priv_data; for (int i = 0; i < CELT_BLOCK_NB; i++) - ff_mdct15_uninit(&s->mdct[i]); + av_tx_uninit(&s->tx[i]); ff_celt_pvq_uninit(&s->pvq); av_freep(&s->dsp); @@ -668,9 +669,11 @@ static av_cold int opus_encode_init(AVCodecContext *avctx) return AVERROR(ENOMEM); /* I have no idea why a base scaling factor of 68 works, could be the twiddles */ - for (int i = 0; i < CELT_BLOCK_NB; i++) - if ((ret = ff_mdct15_init(&s->mdct[i], 0, i + 3, 68 << (CELT_BLOCK_NB - 1 - i)))) + for (int i = 0; i < CELT_BLOCK_NB; i++) { + const float scale = 68 << (CELT_BLOCK_NB - 1 - i); + if ((ret = av_tx_init(&s->tx[i], &s->tx_fn[i], AV_TX_FLOAT_MDCT, 0, 15 << (i + 3), &scale, 0))) return AVERROR(ENOMEM); + } /* Zero out previous energy (matters for inter first frame) */ for (int ch = 0; ch < s->channels; ch++) diff --git a/libavcodec/opusenc_psy.c b/libavcodec/opusenc_psy.c index 1c8f69269c..3bff57d347 100644 --- a/libavcodec/opusenc_psy.c +++ b/libavcodec/opusenc_psy.c @@ -22,7 +22,6 @@ #include "opusenc_psy.h" #include "opus_pvq.h" #include "opustab.h" -#include "mdct15.h" #include "libavutil/qsort.h" static float pvq_band_cost(CeltPVQ *pvq, CeltFrame *f, OpusRangeCoder *rc, int band, @@ -99,7 +98,8 @@ static void step_collect_psy_metrics(OpusPsyContext *s, int index) s->dsp->vector_fmul(s->scratch, s->scratch, s->window[s->bsize_analysis], (OPUS_BLOCK_SIZE(s->bsize_analysis) << 1)); - s->mdct[s->bsize_analysis]->mdct(s->mdct[s->bsize_analysis], st->coeffs[ch], s->scratch, 1); + s->mdct_fn[s->bsize_analysis](s->mdct[s->bsize_analysis], st->coeffs[ch], + s->scratch, sizeof(float)); for (i = 0; i < CELT_MAX_BANDS; i++) st->bands[ch][i] = &st->coeffs[ch][ff_celt_freq_bands[i] << s->bsize_analysis]; @@ -558,13 +558,16 @@ av_cold int ff_opus_psy_init(OpusPsyContext *s, AVCodecContext *avctx, for (i = 0; i < CELT_BLOCK_NB; i++) { float tmp; const int len = OPUS_BLOCK_SIZE(i); + const float scale = 68 << (CELT_BLOCK_NB - 1 - i); s->window[i] = av_malloc(2*len*sizeof(float)); if (!s->window[i]) { ret = AVERROR(ENOMEM); goto fail; } generate_window_func(s->window[i], 2*len, WFUNC_SINE, &tmp); - if ((ret = ff_mdct15_init(&s->mdct[i], 0, i + 3, 68 << (CELT_BLOCK_NB - 1 - i)))) + ret = av_tx_init(&s->mdct[i], &s->mdct_fn[i], AV_TX_FLOAT_MDCT, + 0, 15 << (i + 3), &scale, 0); + if (ret < 0) goto fail; } @@ -575,7 +578,7 @@ fail: av_freep(&s->dsp); for (i = 0; i < CELT_BLOCK_NB; i++) { - ff_mdct15_uninit(&s->mdct[i]); + av_tx_uninit(&s->mdct[i]); av_freep(&s->window[i]); } @@ -598,7 +601,7 @@ av_cold int ff_opus_psy_end(OpusPsyContext *s) av_freep(&s->dsp); for (i = 0; i < CELT_BLOCK_NB; i++) { - ff_mdct15_uninit(&s->mdct[i]); + av_tx_uninit(&s->mdct[i]); av_freep(&s->window[i]); } diff --git a/libavcodec/opusenc_psy.h b/libavcodec/opusenc_psy.h index d4fb096a3d..0a7cdb6f2c 100644 --- a/libavcodec/opusenc_psy.h +++ b/libavcodec/opusenc_psy.h @@ -22,6 +22,7 @@ #ifndef AVCODEC_OPUSENC_PSY_H #define AVCODEC_OPUSENC_PSY_H +#include "libavutil/tx.h" #include "libavutil/mem_internal.h" #include "opusenc.h" @@ -70,7 +71,8 @@ typedef struct OpusPsyContext { int max_steps; float *window[CELT_BLOCK_NB]; - MDCT15Context *mdct[CELT_BLOCK_NB]; + AVTXContext *mdct[CELT_BLOCK_NB]; + av_tx_fn mdct_fn[CELT_BLOCK_NB]; int bsize_analysis; DECLARE_ALIGNED(32, float, scratch)[2048]; -- 2.37.2.609.g9ff673ca1a