From patchwork Wed Feb 1 03:13:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rostislav Pehlivanov X-Patchwork-Id: 2388 Delivered-To: ffmpegpatchwork@gmail.com Received: by 10.103.89.21 with SMTP id n21csp2245704vsb; Tue, 31 Jan 2017 19:13:34 -0800 (PST) X-Received: by 10.28.67.69 with SMTP id q66mr719069wma.129.1485918814577; Tue, 31 Jan 2017 19:13:34 -0800 (PST) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 32si23157932wru.4.2017.01.31.19.13.33; Tue, 31 Jan 2017 19:13:34 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 07C16689895; Wed, 1 Feb 2017 05:13:28 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D9F4A68974C for ; Wed, 1 Feb 2017 05:13:20 +0200 (EET) Received: by mail-wm0-f66.google.com with SMTP id c85so2630288wmi.1 for ; Tue, 31 Jan 2017 19:13:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=qXU4wpw9buuDyCP+M49/18l3BXZWpX99zXt6u9OojzE=; b=OAaBOpfPuz3Bj8qn84YBzagG/8eiN0Xsqy8Y79WOT690lS1eG2SqveDtbtaDKn51XP BmUcCK7Z28hw8qQhnMqM2QxXJA5F4vb8P+JLuQ+1lUT4giqWRpL6Gj6CftimCmEpBM2n YsXvH7MOfyXJI9EY5hRVNHd4B0m8nKZjMmPjfoUO+hLIHTyRgLbCfn1vX7xTaneaflnB MO+ifC/YFQdIfdj+yn9na5tFcGcPjRp7x/oXlbUVnb1ww6Soh09Re6VzmwKwAhWCFkko LgWZkyNYHQ3dHnmpYAiZoX+6a/WtKBo38C+jJTgYk6kAa2OFLjBrPaDtpCmqj8IIOeeW xojg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=qXU4wpw9buuDyCP+M49/18l3BXZWpX99zXt6u9OojzE=; b=IO8lvdYsvGX+NXXJP+lFMT7PUcRjwWoDuEEYBY7kQwubDujnrJM0F/JdW1du9vyZah yRN2XmhzTvT5w2joxM4dvaD1JaHu+1JM2PiCsjSB4r8AzuWWU4duvPJzInWngESyuO87 Ss2jzFgU8UoIFpF6DqQBW86jDvBBVIqReGi6wxU6WkhofOtWkfjNrsFksYMb4UZQYY9Z zq/bA2S+u3nUyDLGa2OXIizmnB2bGLdRBr5cDuZT7U7s7qot6jCfnuRc0sbtL6v05FJt 7DhVBFJXXkuCue33rWKfUorW4lDJdJKwg6owbK0DQuPCAkcMOc0mr3WdDvWJudWYhUZE 0bHw== X-Gm-Message-State: AIkVDXLUe7t8JMAB9xBPB+qRQkxvdMkxeRLLtU+1WK1+IijtXKvTSqHpjKC34nRL568WbA== X-Received: by 10.28.129.147 with SMTP id c141mr22888479wmd.12.1485918802366; Tue, 31 Jan 2017 19:13:22 -0800 (PST) Received: from moonbase.lan (host86-136-238-164.range86-136.btcentralplus.com. [86.136.238.164]) by smtp.gmail.com with ESMTPSA id b8sm31441545wrb.17.2017.01.31.19.13.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jan 2017 19:13:21 -0800 (PST) From: Rostislav Pehlivanov To: ffmpeg-devel@ffmpeg.org Date: Wed, 1 Feb 2017 03:13:09 +0000 Message-Id: <20170201031309.99552-7-atomnuker@gmail.com> X-Mailer: git-send-email 2.11.0.483.g087da7b7c In-Reply-To: <20170201031309.99552-1-atomnuker@gmail.com> References: <20170201031309.99552-1-atomnuker@gmail.com> Subject: [FFmpeg-devel] [PATCH 6/6] opus: [RFC] add a native Opus encoder X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Rostislav Pehlivanov MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" This marks the first time anyone has written an Opus encoder without using any libopus code. The aim of the encoder is to prove how far the format can go by writing the craziest encoder for it. The encoder's pretty basic, it supports nearly all the features that CELT has (except the pitch prefiltering) however there is no psychoacoustic or rate control system to enable or disable features. So only simple CBR is supported. The unique side of the encoder is that its able to use a lookahead of any value which is a multiple of 2.5 milliseconds, up to a maximum of 320ms. Unfortunately its up to the psychoacoustic syatem to use it for lookahead but there's no such written yet, though I'll start writing one once I get this damn patch message written. Even though its a pretty basic encoder its already outperforming any other native encoder FFmpeg has by a huge amount. The encoder still has a few minor bugs, like desyncs at ultra low bitrates (below 9kbps with 20ms frames). The transient-only part of the MDCT can be performed in place so I invite those willing to use their brains for a little bit to figure out how to modify mdct15.c to use the stride for the forward transform. The PVQ search is very slow and I HATE IT, if anyone wants to help me write the craziest pvq search algorithm take a look at the comment above celt_pvq_search() in opus_pvq.c Signed-off-by: Rostislav Pehlivanov --- libavcodec/Makefile | 1 + libavcodec/allcodecs.c | 2 +- libavcodec/opus_celt.h | 16 + libavcodec/opus_pvq.c | 459 +++++++++++++++++++- libavcodec/opus_pvq.h | 6 + libavcodec/opusenc.c | 1132 ++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 1605 insertions(+), 11 deletions(-) create mode 100644 libavcodec/opusenc.c diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 8080e9127d..b8c05773d5 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -436,6 +436,7 @@ OBJS-$(CONFIG_NUV_DECODER) += nuv.o rtjpeg.o OBJS-$(CONFIG_ON2AVC_DECODER) += on2avc.o on2avcdata.o OBJS-$(CONFIG_OPUS_DECODER) += opusdec.o opus.o opus_celt.o opus_rc.o \ opus_pvq.o opus_silk.o opustab.o vorbis_data.o +OBJS-$(CONFIG_OPUS_ENCODER) += opusenc.o opustab.o opus_pvq.o OBJS-$(CONFIG_PAF_AUDIO_DECODER) += pafaudio.o OBJS-$(CONFIG_PAF_VIDEO_DECODER) += pafvideo.o OBJS-$(CONFIG_PAM_DECODER) += pnmdec.o pnm.o diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c index f92b2b7496..bea32f8072 100644 --- a/libavcodec/allcodecs.c +++ b/libavcodec/allcodecs.c @@ -446,7 +446,7 @@ void avcodec_register_all(void) REGISTER_DECODER(MPC8, mpc8); REGISTER_ENCDEC (NELLYMOSER, nellymoser); REGISTER_DECODER(ON2AVC, on2avc); - REGISTER_DECODER(OPUS, opus); + REGISTER_ENCDEC (OPUS, opus); REGISTER_DECODER(PAF_AUDIO, paf_audio); REGISTER_DECODER(QCELP, qcelp); REGISTER_DECODER(QDM2, qdm2); diff --git a/libavcodec/opus_celt.h b/libavcodec/opus_celt.h index 1378d85df3..a54cab68c6 100644 --- a/libavcodec/opus_celt.h +++ b/libavcodec/opus_celt.h @@ -61,14 +61,23 @@ enum CeltBlockSize { typedef struct CeltBlock { float energy[CELT_MAX_BANDS]; + float lin_energy[CELT_MAX_BANDS]; + float error_energy[CELT_MAX_BANDS]; float prev_energy[2][CELT_MAX_BANDS]; uint8_t collapse_masks[CELT_MAX_BANDS]; + int band_bins[CELT_MAX_BANDS]; /* MDCT bins per band */ + float *band_coeffs[CELT_MAX_BANDS]; + /* buffer for mdct output + postfilter */ DECLARE_ALIGNED(32, float, buf)[2048]; DECLARE_ALIGNED(32, float, coeffs)[CELT_MAX_FRAME_SIZE]; + /* Used by the encoder */ + DECLARE_ALIGNED(32, float, overlap)[120]; + DECLARE_ALIGNED(32, float, samples)[CELT_MAX_FRAME_SIZE]; + /* postfilter parameters */ int pf_period_new; float pf_gains_new[3]; @@ -94,6 +103,12 @@ typedef struct CeltFrame { int end_band; int coded_bands; int transient; + int intra; + int pfilter; + int skip_band_floor; + int tf_select; + int alloc_trim; + int alloc_boost[CELT_MAX_BANDS]; int blocks; /* number of iMDCT blocks in the frame, depends on transient */ int blocksize; /* size of each block */ int silence; /* Frame is filled with silence */ @@ -109,6 +124,7 @@ typedef struct CeltFrame { int framebits; int remaining; int remaining2; + int caps [CELT_MAX_BANDS]; int fine_bits [CELT_MAX_BANDS]; int fine_priority[CELT_MAX_BANDS]; int pulses [CELT_MAX_BANDS]; diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c index a39330b7e4..adf6f44c8d 100644 --- a/libavcodec/opus_pvq.c +++ b/libavcodec/opus_pvq.c @@ -1,7 +1,7 @@ /* * Copyright (c) 2012 Andrew D'Addesio * Copyright (c) 2013-2014 Mozilla Corporation - * Copyright (c) 2016 Rostislav Pehlivanov + * Copyright (c) 2017 Rostislav Pehlivanov * * This file is part of FFmpeg. * @@ -78,8 +78,8 @@ static inline void celt_normalize_residual(const int * av_restrict iy, float * a X[i] = g * iy[i]; } -static void celt_exp_rotation1(float *X, uint32_t len, uint32_t stride, - float c, float s) +static void celt_exp_rotation_impl(float *X, uint32_t len, uint32_t stride, + float c, float s) { float *Xptr; int i; @@ -105,7 +105,7 @@ static void celt_exp_rotation1(float *X, uint32_t len, uint32_t stride, static inline void celt_exp_rotation(float *X, uint32_t len, uint32_t stride, uint32_t K, - enum CeltSpread spread) + enum CeltSpread spread, const int encode) { uint32_t stride2 = 0; float c, s; @@ -118,8 +118,8 @@ static inline void celt_exp_rotation(float *X, uint32_t len, gain = (float)len / (len + (20 - 5*spread) * K); theta = M_PI * gain * gain / 4; - c = cos(theta); - s = sin(theta); + c = cosf(theta); + s = sinf(theta); if (len >= stride << 3) { stride2 = 1; @@ -133,9 +133,15 @@ static inline void celt_exp_rotation(float *X, uint32_t len, extract_collapse_mask().*/ len /= stride; for (i = 0; i < stride; i++) { - if (stride2) - celt_exp_rotation1(X + i * len, len, stride2, s, c); - celt_exp_rotation1(X + i * len, len, 1, c, s); + if (encode) { + celt_exp_rotation_impl(X + i * len, len, 1, c, -s); + if (stride2) + celt_exp_rotation_impl(X + i * len, len, stride2, s, -c); + } else { + if (stride2) + celt_exp_rotation_impl(X + i * len, len, stride2, s, c); + celt_exp_rotation_impl(X + i * len, len, 1, c, s); + } } } @@ -270,6 +276,18 @@ static inline int celt_compute_qn(int N, int b, int offset, int pulse_cap, return qn; } +/* Convert the quantized vector to an index */ +static inline uint32_t celt_icwrsi(uint32_t N, const int *y) +{ + int i, idx = 0, sum = 0; + for (i = N - 1; i >= 0; i--) { + const uint32_t i_s = CELT_PVQ_U(N - i, sum + FFABS(y[i]) + 1); + idx += CELT_PVQ_U(N - i, sum) + (y[i] < 0)*i_s; + sum += FFABS(y[i]); + } + return idx; +} + // this code was adapted from libopus static inline uint64_t celt_cwrsi(uint32_t N, uint32_t K, uint32_t i, int *y) { @@ -356,12 +374,85 @@ static inline uint64_t celt_cwrsi(uint32_t N, uint32_t K, uint32_t i, int *y) return norm; } +static inline void celt_encode_pulses(OpusRangeCoder *rc, int *y, uint32_t N, uint32_t K) +{ + ff_opus_rc_enc_uint(rc, celt_icwrsi(N, y), CELT_PVQ_V(N, K)); +} + static inline float celt_decode_pulses(OpusRangeCoder *rc, int *y, uint32_t N, uint32_t K) { const uint32_t idx = ff_opus_rc_dec_uint(rc, CELT_PVQ_V(N, K)); return celt_cwrsi(N, K, idx, y); } +/* + * Placehoder, quite a lot slower than libopus's code, and its also similar, + * though we always work in the signed domain and always do a pre-projection + * (fancy saying for scale and round to the nearest smallest integer) and we always + * calculate the full distortion. I hate it. + * + * The problem this function solves is very simple: you have a K integer amount of + * "pulses", and you have to give them to the y[] array such that when you + * normalize y[] to the L2 norm (just do y_norm += y[i]*y[i], then divide + * every y[i]/sqrtf(y_norm)) the difference between X[] (which is already normalized) + * is the smallest. Summing up y[i] should return K. + * + * Note that this code still returns identical results to libopus's search. Usually + * any increase in distortion, even the slightest, will result in worse quality. + */ +static void celt_pvq_search(float *X, int *y, int K, int N) +{ + int i, j, k; + float res = K, x_sum = 0.0f, y_norm = 0.0f; + + for (i = 0; i < N; i++) + x_sum += FFABS(X[i]); + + res /= x_sum; + + for (i = 0; i < N; i++) { + y[i] = floor(res*X[i]); + K -= FFABS(y[i]); + y_norm += y[i]*y[i]; + } + + if (K) { + const int phase = FFSIGN(K); + for (j = 0; j < FFABS(K); j++) { + int min_idx = 0; + float min_dist = FLT_MAX; + for (i = 0; i < N; i++) { + float new_norm, dist = 0.0f, o_yn = y_norm - y[i]*y[i]; + y[i] += FFSIGN(X[i])*phase; + new_norm = 1.0f/(sqrtf(o_yn + y[i]*y[i]) + FLT_MIN); + for (k = 0; k < N; k++) { + const float deq_v = y[k]*new_norm; + dist += (deq_v - X[k])*(deq_v - X[k]); + } + y[i] -= FFSIGN(X[i])*phase; + if (dist < min_dist) { + min_idx = i; + min_dist = dist; + } + } + y_norm -= y[min_idx]*y[min_idx]; + y[min_idx] += FFSIGN(X[min_idx])*phase; + y_norm += y[min_idx]*y[min_idx]; + } + } +} + +static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_t K, + enum CeltSpread spread, uint32_t blocks, float gain) +{ + int y[176]; + + celt_exp_rotation(X, N, blocks, K, spread, 1); + celt_pvq_search(X, y, K, N); + celt_encode_pulses(rc, y, N, K); + return celt_extract_collapse_mask(y, N, blocks); +} + /** Decode pulse vector and combine the result with the pitch vector to produce the final normalised signal in the current band. */ static uint32_t celt_alg_unquant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_t K, @@ -371,7 +462,7 @@ static uint32_t celt_alg_unquant(OpusRangeCoder *rc, float *X, uint32_t N, uint3 gain /= sqrtf(celt_decode_pulses(rc, y, N, K)); celt_normalize_residual(y, X, N, gain); - celt_exp_rotation(X, N, blocks, K, spread); + celt_exp_rotation(X, N, blocks, K, spread, 0); return celt_extract_collapse_mask(y, N, blocks); } @@ -725,5 +816,353 @@ uint32_t ff_celt_decode_band(CeltFrame *f, OpusRangeCoder *rc, const int band, } cm = av_mod_uintp2(cm, blocks); } + + return cm; +} + +/* This has to be, AND MUST BE done by the psychoacoustic system, this has a very + * big impact on the entire quantization and especially huge on transients */ +static int celt_calc_theta(const float *X, const float *Y, int coupling, int N) +{ + int j; + float e[2] = { 0.0f, 0.0f }; + for (j = 0; j < N; j++) { + if (coupling) { /* Coupling case */ + e[0] += (X[j] + Y[j])*(X[j] + Y[j]); + e[1] += (X[j] - Y[j])*(X[j] - Y[j]); + } else { + e[0] += X[j]*X[j]; + e[1] += Y[j]*Y[j]; + } + } + return lrintf(32768.0f*atan2f(sqrtf(e[1]), sqrtf(e[0]))/M_PI); +} + +static void celt_stereo_is_decouple(float *X, float *Y, float e_l, float e_r, int N) +{ + int i; + const float energy_n = 1.0f/(sqrtf(e_l*e_l + e_r*e_r) + FLT_MIN); + e_l *= energy_n; + e_r *= energy_n; + for (i = 0; i < N; i++) + X[i] = e_l*X[i] + e_r*Y[i]; +} + +static void celt_stereo_ms_decouple(float *X, float *Y, int N) +{ + int i; + const float decouple_norm = 1.0f/sqrtf(2.0f); + for (i = 0; i < N; i++) { + const float Xret = X[i]; + X[i] = (X[i] + Y[i])*decouple_norm; + Y[i] = (Y[i] - Xret)*decouple_norm; + } +} + +uint32_t ff_celt_encode_band(CeltFrame *f, OpusRangeCoder *rc, const int band, + float *X, float *Y, int N, int b, uint32_t blocks, + float *lowband, int duration, float *lowband_out, int level, + float gain, float *lowband_scratch, int fill) +{ + const uint8_t *cache; + int dualstereo, split; + int imid = 0, iside = 0; + //uint32_t N0 = N; + int N_B; + //int N_B0; + int B0 = blocks; + int time_divide = 0; + int recombine = 0; + int inv = 0; + float mid = 0, side = 0; + int longblocks = (B0 == 1); + uint32_t cm = 0; + + //N_B0 = N_B = N / blocks; + split = dualstereo = (Y != NULL); + + if (N == 1) { + /* special case for one sample - the decoder's output will be +- 1.0f!!! */ + int i; + float *x = X; + for (i = 0; i <= dualstereo; i++) { + if (f->remaining2 >= 1<<3) { + ff_opus_rc_put_raw(rc, x[0] < 0, 1); + f->remaining2 -= 1 << 3; + b -= 1 << 3; + } + x = Y; + } + if (lowband_out) + lowband_out[0] = X[0]; + return 1; + } + + if (!dualstereo && level == 0) { + int tf_change = f->tf_change[band]; + int k; + if (tf_change > 0) + recombine = tf_change; + /* Band recombining to increase frequency resolution */ + + if (lowband && + (recombine || ((N_B & 1) == 0 && tf_change < 0) || B0 > 1)) { + int j; + for (j = 0; j < N; j++) + lowband_scratch[j] = lowband[j]; + lowband = lowband_scratch; + } + + for (k = 0; k < recombine; k++) { + celt_haar1(X, N >> k, 1 << k); + fill = ff_celt_bit_interleave[fill & 0xF] | ff_celt_bit_interleave[fill >> 4] << 2; + } + blocks >>= recombine; + N_B <<= recombine; + + /* Increasing the time resolution */ + while ((N_B & 1) == 0 && tf_change < 0) { + celt_haar1(X, N_B, blocks); + fill |= fill << blocks; + blocks <<= 1; + N_B >>= 1; + time_divide++; + tf_change++; + } + B0 = blocks; + //N_B0 = N_B; + + /* Reorganize the samples in time order instead of frequency order */ + if (B0 > 1) + celt_deinterleave_hadamard(f->scratch, X, N_B >> recombine, + B0 << recombine, longblocks); + } + + /* If we need 1.5 more bit than we can produce, split the band in two. */ + cache = ff_celt_cache_bits + + ff_celt_cache_index[(duration + 1) * CELT_MAX_BANDS + band]; + if (!dualstereo && duration >= 0 && b > cache[cache[0]] + 12 && N > 2) { + N >>= 1; + Y = X + N; + split = 1; + duration -= 1; + if (blocks == 1) + fill = (fill & 1) | (fill << 1); + blocks = (blocks + 1) >> 1; + } + + if (split) { + int qn; + int itheta = celt_calc_theta(X, Y, dualstereo, N); + int mbits, sbits, delta; + int qalloc; + int pulse_cap; + int offset; + int orig_fill; + int tell; + + /* Decide on the resolution to give to the split parameter theta */ + pulse_cap = ff_celt_log_freq_range[band] + duration * 8; + offset = (pulse_cap >> 1) - (dualstereo && N == 2 ? CELT_QTHETA_OFFSET_TWOPHASE : + CELT_QTHETA_OFFSET); + qn = (dualstereo && band >= f->intensity_stereo) ? 1 : + celt_compute_qn(N, b, offset, pulse_cap, dualstereo); + tell = opus_rc_tell_frac(rc); + + if (qn != 1) { + + itheta = (itheta*qn + 8192) >> 14; + + /* Entropy coding of the angle. We use a uniform pdf for the + * time split, a step for stereo, and a triangular one for the rest. */ + if (dualstereo && N > 2) + ff_opus_rc_enc_uint_step(rc, itheta, qn / 2); + else if (dualstereo || B0 > 1) + ff_opus_rc_enc_uint(rc, itheta, qn + 1); + else + ff_opus_rc_enc_uint_tri(rc, itheta, qn); + itheta = itheta * 16384 / qn; + + if (dualstereo) { + if (itheta == 0) + celt_stereo_is_decouple(X, Y, f->block[0].lin_energy[band], f->block[1].lin_energy[band], N); + else + celt_stereo_ms_decouple(X, Y, N); + } + } else if (dualstereo) { + inv = itheta > 8192; + if (inv) + { + int j; + for (j=0;jblock[0].lin_energy[band], f->block[1].lin_energy[band], N); + + if (b > 2 << 3 && f->remaining2 > 2 << 3) { + ff_opus_rc_enc_log(rc, inv, 2); + } else { + inv = 0; + } + + itheta = 0; + } + qalloc = opus_rc_tell_frac(rc) - tell; + b -= qalloc; + + orig_fill = fill; + if (itheta == 0) { + imid = 32767; + iside = 0; + fill = av_mod_uintp2(fill, blocks); + delta = -16384; + } else if (itheta == 16384) { + imid = 0; + iside = 32767; + fill &= ((1 << blocks) - 1) << blocks; + delta = 16384; + } else { + imid = celt_cos(itheta); + iside = celt_cos(16384-itheta); + /* This is the mid vs side allocation that minimizes squared error + in that band. */ + delta = ROUND_MUL16((N - 1) << 7, celt_log2tan(iside, imid)); + } + + mid = imid / 32768.0f; + side = iside / 32768.0f; + + /* This is a special case for N=2 that only works for stereo and takes + advantage of the fact that mid and side are orthogonal to encode + the side with just one bit. */ + if (N == 2 && dualstereo) { + int c; + int sign = 0; + float tmp; + float *x2, *y2; + mbits = b; + /* Only need one bit for the side */ + sbits = (itheta != 0 && itheta != 16384) ? 1 << 3 : 0; + mbits -= sbits; + c = (itheta > 8192); + f->remaining2 -= qalloc+sbits; + + x2 = c ? Y : X; + y2 = c ? X : Y; + if (sbits) { + sign = x2[0]*y2[1] - x2[1]*y2[0] < 0; + ff_opus_rc_put_raw(rc, sign, 1); + } + sign = 1 - 2 * sign; + /* We use orig_fill here because we want to fold the side, but if + itheta==16384, we'll have cleared the low bits of fill. */ + cm = ff_celt_encode_band(f, rc, band, x2, NULL, N, mbits, blocks, + lowband, duration, lowband_out, level, gain, + lowband_scratch, orig_fill); + /* We don't split N=2 bands, so cm is either 1 or 0 (for a fold-collapse), + and there's no need to worry about mixing with the other channel. */ + y2[0] = -sign * x2[1]; + y2[1] = sign * x2[0]; + X[0] *= mid; + X[1] *= mid; + Y[0] *= side; + Y[1] *= side; + tmp = X[0]; + X[0] = tmp - Y[0]; + Y[0] = tmp + Y[0]; + tmp = X[1]; + X[1] = tmp - Y[1]; + Y[1] = tmp + Y[1]; + } else { + /* "Normal" split code */ + float *next_lowband2 = NULL; + float *next_lowband_out1 = NULL; + int next_level = 0; + int rebalance; + + /* Give more bits to low-energy MDCTs than they would + * otherwise deserve */ + if (B0 > 1 && !dualstereo && (itheta & 0x3fff)) { + if (itheta > 8192) + /* Rough approximation for pre-echo masking */ + delta -= delta >> (4 - duration); + else + /* Corresponds to a forward-masking slope of + * 1.5 dB per 10 ms */ + delta = FFMIN(0, delta + (N << 3 >> (5 - duration))); + } + mbits = av_clip((b - delta) / 2, 0, b); + sbits = b - mbits; + f->remaining2 -= qalloc; + + if (lowband && !dualstereo) + next_lowband2 = lowband + N; /* >32-bit split case */ + + /* Only stereo needs to pass on lowband_out. + * Otherwise, it's handled at the end */ + if (dualstereo) + next_lowband_out1 = lowband_out; + else + next_level = level + 1; + + rebalance = f->remaining2; + if (mbits >= sbits) { + /* In stereo mode, we do not apply a scaling to the mid + * because we need the normalized mid for folding later */ + cm = ff_celt_encode_band(f, rc, band, X, NULL, N, mbits, blocks, + lowband, duration, next_lowband_out1, + next_level, dualstereo ? 1.0f : (gain * mid), + lowband_scratch, fill); + + rebalance = mbits - (rebalance - f->remaining2); + if (rebalance > 3 << 3 && itheta != 0) + sbits += rebalance - (3 << 3); + + /* For a stereo split, the high bits of fill are always zero, + * so no folding will be done to the side. */ + cm |= ff_celt_encode_band(f, rc, band, Y, NULL, N, sbits, blocks, + next_lowband2, duration, NULL, + next_level, gain * side, NULL, + fill >> blocks) << ((B0 >> 1) & (dualstereo - 1)); + } else { + /* For a stereo split, the high bits of fill are always zero, + * so no folding will be done to the side. */ + cm = ff_celt_encode_band(f, rc, band, Y, NULL, N, sbits, blocks, + next_lowband2, duration, NULL, + next_level, gain * side, NULL, + fill >> blocks) << ((B0 >> 1) & (dualstereo - 1)); + + rebalance = sbits - (rebalance - f->remaining2); + if (rebalance > 3 << 3 && itheta != 16384) + mbits += rebalance - (3 << 3); + + /* In stereo mode, we do not apply a scaling to the mid because + * we need the normalized mid for folding later */ + cm |= ff_celt_encode_band(f, rc, band, X, NULL, N, mbits, blocks, + lowband, duration, next_lowband_out1, + next_level, dualstereo ? 1.0f : (gain * mid), + lowband_scratch, fill); + } + } + } else { + /* This is the basic no-split case */ + uint32_t q = celt_bits2pulses(cache, b); + uint32_t curr_bits = celt_pulses2bits(cache, q); + f->remaining2 -= curr_bits; + + /* Ensures we can never bust the budget */ + while (f->remaining2 < 0 && q > 0) { + f->remaining2 += curr_bits; + curr_bits = celt_pulses2bits(cache, --q); + f->remaining2 -= curr_bits; + } + + if (q != 0) { + /* Finally do the actual quantization */ + cm = celt_alg_quant(rc, X, N, (q < 8) ? q : (8 + (q & 7)) << ((q >> 3) - 1), + f->spread, blocks, gain); + } + } + return cm; } diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h index 0354a3c960..d414b47a42 100644 --- a/libavcodec/opus_pvq.h +++ b/libavcodec/opus_pvq.h @@ -32,4 +32,10 @@ uint32_t ff_celt_decode_band(CeltFrame *f, OpusRangeCoder *rc, const int band, float *lowband, int duration, float *lowband_out, int level, float gain, float *lowband_scratch, int fill); +/* Encodes a band using PVQ */ +uint32_t ff_celt_encode_band(CeltFrame *f, OpusRangeCoder *rc, const int band, + float *X, float *Y, int N, int b, uint32_t blocks, + float *lowband, int duration, float *lowband_out, int level, + float gain, float *lowband_scratch, int fill); + #endif /* AVCODEC_OPUS_PVQ_H */ diff --git a/libavcodec/opusenc.c b/libavcodec/opusenc.c new file mode 100644 index 0000000000..da05c70927 --- /dev/null +++ b/libavcodec/opusenc.c @@ -0,0 +1,1132 @@ +/* + * Opus encoder + * Copyright (c) 2017 Rostislav Pehlivanov + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "opus_celt.h" +#include "opus_pvq.h" +#include "opustab.h" + +#include "libavutil/float_dsp.h" +#include "libavutil/opt.h" +#include "internal.h" +#include "bytestream.h" +#include "audio_frame_queue.h" + +/* Determines the maximum delay the psychoacoustic system will use for lookahead */ +#define FF_BUFQUEUE_SIZE 145 +#include "libavfilter/bufferqueue.h" + +#define OPUS_MAX_LOOKAHEAD ((FF_BUFQUEUE_SIZE - 1)*2.5f) + +#define OPUS_MAX_CHANNELS 2 + +/* Just a practical limit, actual limit is 120ms of total audio length */ +#define OPUS_MAX_FRAMES 6 + +#define OPUS_BLOCK_SIZE(x) (2 * 15 * (1 << (x + 2))) + +#define OPUS_SAMPLES_TO_BLOCK_SIZE(x) (ff_log2(x / (2 * 15)) - 2) + +typedef struct OpusEncOptions { + float max_delay_ms; +} OpusEncOptions; + +typedef struct OpusEncContext { + AVClass *av_class; + OpusEncOptions options; + AVCodecContext *avctx; + AVFrame *empty; + AudioFrameQueue afq; + AVFloatDSPContext *dsp; + MDCT15Context *mdct[CELT_BLOCK_NB]; + struct FFBufQueue bufqueue; + + enum OpusMode mode; + enum OpusBandwidth bandwidth; + int pkt_framesize; + int pkt_frames; + int insane_padding; + + int channels; + + CeltFrame frame[OPUS_MAX_FRAMES]; + OpusRangeCoder rc[OPUS_MAX_FRAMES]; + + /* Actual energy the decoder will have */ + float last_quantized_energy[OPUS_MAX_CHANNELS][CELT_MAX_BANDS]; + + DECLARE_ALIGNED(32, float, scratch)[2048]; +} OpusEncContext; + +static void opus_write_extradata(AVCodecContext *avctx) +{ + uint8_t *bs = avctx->extradata; + + bytestream_put_buffer(&bs, "OpusHead", 8); + bytestream_put_byte (&bs, 0x1); + bytestream_put_byte (&bs, avctx->channels); + bytestream_put_le16 (&bs, avctx->initial_padding); + bytestream_put_le32 (&bs, avctx->sample_rate); + bytestream_put_le16 (&bs, 0x0); + bytestream_put_byte (&bs, 0x0); /* Default layout */ +} + +static int opus_gen_toc(OpusEncContext *s, uint8_t *toc, int *size, int *fsize_needed) +{ + int i, tmp = 0x0, extended_toc = 0; + static const int toc_cfg[][OPUS_MODE_NB][OPUS_BANDWITH_NB] = { + /* Silk Hybrid Celt Layer */ + /* NB MB WB SWB FB NB MB WB SWB FB NB MB WB SWB FB Bandwidth */ + { { 0, 0, 0, 0, 0 }, { 0, 0, 0, 0, 0 }, { 17, 0, 21, 25, 29 } }, /* 2.5 ms */ + { { 0, 0, 0, 0, 0 }, { 0, 0, 0, 0, 0 }, { 18, 0, 22, 26, 30 } }, /* 5 ms */ + { { 1, 5, 9, 0, 0 }, { 0, 0, 0, 13, 15 }, { 19, 0, 23, 27, 31 } }, /* 10 ms */ + { { 2, 6, 10, 0, 0 }, { 0, 0, 0, 14, 16 }, { 20, 0, 24, 28, 32 } }, /* 20 ms */ + { { 3, 7, 11, 0, 0 }, { 0, 0, 0, 0, 0 }, { 0, 0, 0, 0, 0 } }, /* 40 ms */ + { { 4, 8, 12, 0, 0 }, { 0, 0, 0, 0, 0 }, { 0, 0, 0, 0, 0 } }, /* 60 ms */ + }; + int cfg = toc_cfg[s->pkt_framesize][s->mode][s->bandwidth]; + *fsize_needed = 0; + if (!cfg) + return 1; + if (s->pkt_frames == 2) { /* 2 packets */ + if (s->frame[0].framebits == s->frame[1].framebits) { /* same size */ + tmp = 0x1; + } else { /* different size */ + tmp = 0x2; + *fsize_needed = 1; /* put frame sizes in the packet */ + } + } else if (s->pkt_frames > 2) { + tmp = 0x3; + extended_toc = 1; + } + tmp |= (s->channels > 1) << 2; /* Stereo or mono */ + tmp |= (cfg - 1) << 3; /* codec configuration */ + *toc++ = tmp; + if (extended_toc) { + for (i = 0; i < (s->pkt_frames - 1); i++) + *fsize_needed |= (s->frame[i].framebits != s->frame[i + 1].framebits); + tmp = (*fsize_needed) << 7; /* vbr flag */ + tmp |= s->pkt_frames; /* frame number - can be 0 as well */ + *toc++ = tmp; + } + *size = 1 + extended_toc; + return 0; +} + +static void celt_frame_setup_input(OpusEncContext *s, CeltFrame *f) +{ + int sf, ch; + AVFrame *cur = NULL; + const int subframesize = s->avctx->frame_size; + int subframes = OPUS_BLOCK_SIZE(s->pkt_framesize) / subframesize; + + cur = ff_bufqueue_get(&s->bufqueue); + + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *b = &f->block[ch]; + const void *input = cur->extended_data[ch]; + size_t bps = av_get_bytes_per_sample(cur->format); + memcpy(b->overlap, input, bps*cur->nb_samples); + } + + av_frame_free(&cur); + + for (sf = 0; sf < subframes; sf++) { + if (sf != (subframes - 1)) + cur = ff_bufqueue_get(&s->bufqueue); + else + cur = ff_bufqueue_peek(&s->bufqueue, 0); + + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *b = &f->block[ch]; + const void *input = cur->extended_data[ch]; + const size_t bps = av_get_bytes_per_sample(cur->format); + const size_t left = (subframesize - cur->nb_samples)*bps; + const size_t len = FFMIN(subframesize, cur->nb_samples)*bps; + memcpy(&b->samples[sf*subframesize], input, len); + memset(&b->samples[cur->nb_samples], 0, left); + } + + /* Last frame isn't popped off and freed yet - we need it for overlap */ + if (sf != (subframes - 1)) + av_frame_free(&cur); + } +} + +/* Apply the pre emphasis filter */ +static void celt_apply_preemph_filter(OpusEncContext *s, CeltFrame *f) +{ + int i, sf, ch; + const int subframesize = s->avctx->frame_size; + const int subframes = OPUS_BLOCK_SIZE(s->pkt_framesize) / subframesize; + + /* Filter overlap */ + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *b = &f->block[ch]; + float m = b->emph_coeff; + for (i = 0; i < CELT_OVERLAP; i++) { + float sample = b->overlap[i]; + b->overlap[i] = sample - m; + m = sample * CELT_EMPH_COEFF; + } + b->emph_coeff = m; + } + + /* Filter the samples but do not update the last subframe's coeff - overlap ^^^ */ + for (sf = 0; sf < subframes; sf++) { + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *b = &f->block[ch]; + float m = b->emph_coeff; + for (i = 0; i < subframesize; i++) { + float sample = b->samples[sf*subframesize + i]; + b->samples[sf*subframesize + i] = sample - m; + m = sample * CELT_EMPH_COEFF; + } + if (sf != (subframes - 1)) + b->emph_coeff = m; + } + } +} + +/* Create the window and mdct it to get the coefficient */ +static void celt_frame_mdct(OpusEncContext *s, CeltFrame *f) +{ + float *win = s->scratch; + int i, t, ch, t_transforms = OPUS_BLOCK_SIZE(f->size)/CELT_OVERLAP; + const int blk_len = OPUS_BLOCK_SIZE(f->size), wlen = OPUS_BLOCK_SIZE(f->size + 1); + int rwin = blk_len - CELT_OVERLAP, lap_dst = (wlen - blk_len - CELT_OVERLAP) >> 1; + + /* I think I can use s->dsp->vector_fmul_window for transients at least */ + if (f->transient) { + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *b = &f->block[ch]; + float *src1 = b->overlap; + for (t = 0; t < t_transforms; t++) { + float tmp_c[120]; + float *src2 = &b->samples[120*t]; + for (i = 0; i < CELT_OVERLAP; i++) { + win[ i] = src1[i]*ff_celt_window[i]; + win[CELT_OVERLAP + i] = src2[i]*ff_celt_window[CELT_OVERLAP - i - 1]; + } + src1 = src2; + s->mdct[0]->mdct(s->mdct[0], tmp_c, win, t_transforms); + /* This can and should be done in place */ + for (i = 0; i < CELT_OVERLAP; i++) + b->coeffs[i*t_transforms + t] = tmp_c[i]; + } + } + } else { + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *b = &f->block[ch]; + + memset(win, 0, wlen*sizeof(float)); + + memcpy(&win[lap_dst + CELT_OVERLAP], b->samples, rwin*sizeof(float)); + + /* You know what's aligned? NOT. FUCKING. THIS. */ + //s->dsp->vector_fmul(&dst[lap_dst], b->overlap, ff_celt_window, CELT_OVERLAP); + //s->dsp->vector_fmul_reverse(&dst[lap_dst + blk_len - CELT_OVERLAP], b->samples, ff_celt_window, CELT_OVERLAP); + + for (i = 0; i < CELT_OVERLAP; i++) { + win[lap_dst + i] = b->overlap[i] *ff_celt_window[i]; + win[lap_dst + blk_len + i] = b->samples[rwin + i]*ff_celt_window[CELT_OVERLAP - i - 1]; + } + + s->mdct[f->size]->mdct(s->mdct[f->size], b->coeffs, win, 1); + } + } +} + +/* Fills the bands and normalizes them */ +static int celt_frame_map_norm_bands(OpusEncContext *s, CeltFrame *f) +{ + int i, j, ch, noise = 0; + + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *block = &f->block[ch]; + float *start = block->coeffs; + for (i = 0; i < CELT_MAX_BANDS; i++) { + float ener = 0.0f; + + /* Calculate band bins */ + block->band_bins[i] = ff_celt_freq_range[i] << f->size; + block->band_coeffs[i] = start; + start += block->band_bins[i]; + + /* Normalize band energy */ + for (j = 0; j < block->band_bins[i]; j++) + ener += block->band_coeffs[i][j]*block->band_coeffs[i][j]; + + block->lin_energy[i] = sqrtf(ener) + FLT_MIN; + ener = 1.0f/block->lin_energy[i]; + + for (j = 0; j < block->band_bins[i]; j++) + block->band_coeffs[i][j] *= ener; + + block->energy[i] = log2f(block->lin_energy[i]) - ff_celt_mean_energy[i]; + + /* CELT_ENERGY_SILENCE is what the decoder uses and its not -infinity */ + block->energy[i] = FFMAX(block->energy[i], CELT_ENERGY_SILENCE); + noise |= block->energy[i] > CELT_ENERGY_SILENCE; + } + } + return !noise; +} + +static void celt_enc_tf(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + int i, tf_select = 0, diff = 0, tf_changed = 0, tf_select_needed; + int bits = f->transient ? 2 : 4; + + tf_select_needed = ((f->size && (opus_rc_tell(rc) + bits + 1) <= f->framebits)); + + for (i = f->start_band; i < f->end_band; i++) { + if ((opus_rc_tell(rc) + bits + tf_select_needed) <= f->framebits) { + const int tbit = (diff ^ 1) == f->tf_change[i]; + ff_opus_rc_enc_log(rc, tbit, bits); + diff ^= tbit; + tf_changed |= diff; + } + bits = f->transient ? 4 : 5; + } + + if (tf_select_needed && ff_celt_tf_select[f->size][f->transient][0][tf_changed] != + ff_celt_tf_select[f->size][f->transient][1][tf_changed]) { + ff_opus_rc_enc_log(rc, f->tf_select, 1); + tf_select = f->tf_select; + } + + for (i = f->start_band; i < f->end_band; i++) + f->tf_change[i] = ff_celt_tf_select[f->size][f->transient][tf_select][f->tf_change[i]]; +} + +static void celt_bitalloc(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + int i, j, low, high, total, done, bandbits, remaining, tbits_8ths; + int skip_startband = f->start_band; + int skip_bit = 0; + int intensitystereo_bit = 0; + int dualstereo_bit = 0; + int dynalloc = 6; + int extrabits = 0; + + int *cap = f->caps; + int boost[CELT_MAX_BANDS]; + int trim_offset[CELT_MAX_BANDS]; + int threshold[CELT_MAX_BANDS]; + int bits1[CELT_MAX_BANDS]; + int bits2[CELT_MAX_BANDS]; + + /* Tell the spread to the decoder */ + if (opus_rc_tell(rc) + 4 <= f->framebits) + ff_opus_rc_enc_cdf(rc, f->spread, ff_celt_model_spread); + + /* Generate static allocation caps */ + for (i = 0; i < CELT_MAX_BANDS; i++) { + cap[i] = (ff_celt_static_caps[f->size][f->channels - 1][i] + 64) + * ff_celt_freq_range[i] << (f->channels - 1) << f->size >> 2; + } + + /* Band boosts */ + tbits_8ths = f->framebits << 3; + for (i = f->start_band; i < f->end_band; i++) { + int quanta, b_dynalloc, boost_amount = f->alloc_boost[i]; + + boost[i] = 0; + + quanta = ff_celt_freq_range[i] << (f->channels - 1) << f->size; + quanta = FFMIN(quanta << 3, FFMAX(6 << 3, quanta)); + b_dynalloc = dynalloc; + + while (opus_rc_tell_frac(rc) + (b_dynalloc << 3) < tbits_8ths && boost[i] < cap[i]) { + int is_boost = boost_amount--; + + ff_opus_rc_enc_log(rc, is_boost, b_dynalloc); + if (!is_boost) + break; + + boost[i] += quanta; + tbits_8ths -= quanta; + + b_dynalloc = 1; + } + + if (boost[i]) + dynalloc = FFMAX(2, dynalloc - 1); + } + + /* Put allocation trim */ + if (opus_rc_tell_frac(rc) + (6 << 3) <= tbits_8ths) + ff_opus_rc_enc_cdf(rc, f->alloc_trim, ff_celt_model_alloc_trim); + + /* Anti-collapse bit reservation */ + tbits_8ths = (f->framebits << 3) - opus_rc_tell_frac(rc) - 1; + f->anticollapse_needed = 0; + if (f->transient && f->size >= 2 && tbits_8ths >= ((f->size + 2) << 3)) + f->anticollapse_needed = 1 << 3; + tbits_8ths -= f->anticollapse_needed; + + /* Band skip bit reservation */ + if (tbits_8ths >= 1 << 3) + skip_bit = 1 << 3; + tbits_8ths -= skip_bit; + + /* Intensity/dual stereo bit reservation */ + if (f->channels == 2) { + intensitystereo_bit = ff_celt_log2_frac[f->end_band - f->start_band]; + if (intensitystereo_bit <= tbits_8ths) { + tbits_8ths -= intensitystereo_bit; + if (tbits_8ths >= 1 << 3) { + dualstereo_bit = 1 << 3; + tbits_8ths -= 1 << 3; + } + } else { + intensitystereo_bit = 0; + } + } + + /* Trim offsets */ + for (i = f->start_band; i < f->end_band; i++) { + int trim = f->alloc_trim - 5 - f->size; + int band = ff_celt_freq_range[i] * (f->end_band - i - 1); + int duration = f->size + 3; + int scale = duration + f->channels - 1; + + /* PVQ minimum allocation threshold, below this value the band is + * skipped */ + threshold[i] = FFMAX(3 * ff_celt_freq_range[i] << duration >> 4, + f->channels << 3); + + trim_offset[i] = trim * (band << scale) >> 6; + + if (ff_celt_freq_range[i] << f->size == 1) + trim_offset[i] -= f->channels << 3; + } + + /* Bisection */ + low = 1; + high = CELT_VECTORS - 1; + while (low <= high) { + int center = (low + high) >> 1; + done = total = 0; + + for (i = f->end_band - 1; i >= f->start_band; i--) { + bandbits = ff_celt_freq_range[i] * ff_celt_static_alloc[center][i] + << (f->channels - 1) << f->size >> 2; + + if (bandbits) + bandbits = FFMAX(0, bandbits + trim_offset[i]); + bandbits += boost[i]; + + if (bandbits >= threshold[i] || done) { + done = 1; + total += FFMIN(bandbits, cap[i]); + } else if (bandbits >= f->channels << 3) + total += f->channels << 3; + } + + if (total > tbits_8ths) + high = center - 1; + else + low = center + 1; + } + high = low--; + + /* Bisection */ + for (i = f->start_band; i < f->end_band; i++) { + bits1[i] = ff_celt_freq_range[i] * ff_celt_static_alloc[low][i] + << (f->channels - 1) << f->size >> 2; + bits2[i] = high >= CELT_VECTORS ? cap[i] : + ff_celt_freq_range[i] * ff_celt_static_alloc[high][i] + << (f->channels - 1) << f->size >> 2; + + if (bits1[i]) + bits1[i] = FFMAX(0, bits1[i] + trim_offset[i]); + if (bits2[i]) + bits2[i] = FFMAX(0, bits2[i] + trim_offset[i]); + if (low) + bits1[i] += boost[i]; + bits2[i] += boost[i]; + + if (boost[i]) + skip_startband = i; + bits2[i] = FFMAX(0, bits2[i] - bits1[i]); + } + + /* Bisection */ + low = 0; + high = 1 << CELT_ALLOC_STEPS; + for (i = 0; i < CELT_ALLOC_STEPS; i++) { + int center = (low + high) >> 1; + done = total = 0; + + for (j = f->end_band - 1; j >= f->start_band; j--) { + bandbits = bits1[j] + (center * bits2[j] >> CELT_ALLOC_STEPS); + + if (bandbits >= threshold[j] || done) { + done = 1; + total += FFMIN(bandbits, cap[j]); + } else if (bandbits >= f->channels << 3) + total += f->channels << 3; + } + if (total > tbits_8ths) + high = center; + else + low = center; + } + + /* Bisection */ + done = total = 0; + for (i = f->end_band - 1; i >= f->start_band; i--) { + bandbits = bits1[i] + (low * bits2[i] >> CELT_ALLOC_STEPS); + + if (bandbits >= threshold[i] || done) + done = 1; + else + bandbits = (bandbits >= f->channels << 3) ? + f->channels << 3 : 0; + + bandbits = FFMIN(bandbits, cap[i]); + f->pulses[i] = bandbits; + total += bandbits; + } + + /* Band skipping */ + for (f->coded_bands = f->end_band; ; f->coded_bands--) { + int allocation; + j = f->coded_bands - 1; + + if (j == skip_startband) { + /* all remaining bands are not skipped */ + tbits_8ths += skip_bit; + break; + } + + /* determine the number of bits available for coding "do not skip" markers */ + remaining = tbits_8ths - total; + bandbits = remaining / (ff_celt_freq_bands[j+1] - ff_celt_freq_bands[f->start_band]); + remaining -= bandbits * (ff_celt_freq_bands[j+1] - ff_celt_freq_bands[f->start_band]); + allocation = f->pulses[j] + bandbits * ff_celt_freq_range[j] + + FFMAX(0, remaining - (ff_celt_freq_bands[j] - ff_celt_freq_bands[f->start_band])); + + /* a "do not skip" marker is only coded if the allocation is + above the chosen threshold */ + if (allocation >= FFMAX(threshold[j], (f->channels + 1) << 3)) { + const int do_not_skip = f->coded_bands <= f->skip_band_floor; + ff_opus_rc_enc_log(rc, do_not_skip, 1); + if (do_not_skip) + break; + + total += 1 << 3; + allocation -= 1 << 3; + } + + /* the band is skipped, so reclaim its bits */ + total -= f->pulses[j]; + if (intensitystereo_bit) { + total -= intensitystereo_bit; + intensitystereo_bit = ff_celt_log2_frac[j - f->start_band]; + total += intensitystereo_bit; + } + + total += f->pulses[j] = (allocation >= f->channels << 3) ? f->channels << 3 : 0; + } + + /* Encode stereo flags */ + if (intensitystereo_bit) { + f->intensity_stereo = FFMIN(f->intensity_stereo, f->coded_bands); + ff_opus_rc_enc_uint(rc, f->intensity_stereo, f->coded_bands + 1 - f->start_band); + } + if (f->intensity_stereo <= f->start_band) + tbits_8ths += dualstereo_bit; /* no intensity stereo means no dual stereo */ + else if (dualstereo_bit) + ff_opus_rc_enc_log(rc, f->dual_stereo, 1); + + /* Supply the remaining bits in this frame to lower bands */ + remaining = tbits_8ths - total; + bandbits = remaining / (ff_celt_freq_bands[f->coded_bands] - ff_celt_freq_bands[f->start_band]); + remaining -= bandbits * (ff_celt_freq_bands[f->coded_bands] - ff_celt_freq_bands[f->start_band]); + for (i = f->start_band; i < f->coded_bands; i++) { + int bits = FFMIN(remaining, ff_celt_freq_range[i]); + + f->pulses[i] += bits + bandbits * ff_celt_freq_range[i]; + remaining -= bits; + } + + /* Finally determine the allocation */ + for (i = f->start_band; i < f->coded_bands; i++) { + int N = ff_celt_freq_range[i] << f->size; + int prev_extra = extrabits; + f->pulses[i] += extrabits; + + if (N > 1) { + int dof; // degrees of freedom + int temp; // dof * channels * log(dof) + int offset; // fine energy quantization offset, i.e. + // extra bits assigned over the standard + // totalbits/dof + int fine_bits, max_bits; + + extrabits = FFMAX(0, f->pulses[i] - cap[i]); + f->pulses[i] -= extrabits; + + /* intensity stereo makes use of an extra degree of freedom */ + dof = N * f->channels + (f->channels == 2 && N > 2 && !f->dual_stereo && i < f->intensity_stereo); + temp = dof * (ff_celt_log_freq_range[i] + (f->size << 3)); + offset = (temp >> 1) - dof * CELT_FINE_OFFSET; + if (N == 2) /* dof=2 is the only case that doesn't fit the model */ + offset += dof << 1; + + /* grant an additional bias for the first and second pulses */ + if (f->pulses[i] + offset < 2 * (dof << 3)) + offset += temp >> 2; + else if (f->pulses[i] + offset < 3 * (dof << 3)) + offset += temp >> 3; + + fine_bits = (f->pulses[i] + offset + (dof << 2)) / (dof << 3); + max_bits = FFMIN((f->pulses[i] >> 3) >> (f->channels - 1), CELT_MAX_FINE_BITS); + + max_bits = FFMAX(max_bits, 0); + + f->fine_bits[i] = av_clip(fine_bits, 0, max_bits); + + /* if fine_bits was rounded down or capped, + give priority for the final fine energy pass */ + f->fine_priority[i] = (f->fine_bits[i] * (dof << 3) >= f->pulses[i] + offset); + + /* the remaining bits are assigned to PVQ */ + f->pulses[i] -= f->fine_bits[i] << (f->channels - 1) << 3; + } else { + /* all bits go to fine energy except for the sign bit */ + extrabits = FFMAX(0, f->pulses[i] - (f->channels << 3)); + f->pulses[i] -= extrabits; + f->fine_bits[i] = 0; + f->fine_priority[i] = 1; + } + + /* hand back a limited number of extra fine energy bits to this band */ + if (extrabits > 0) { + int fineextra = FFMIN(extrabits >> (f->channels + 2), + CELT_MAX_FINE_BITS - f->fine_bits[i]); + f->fine_bits[i] += fineextra; + + fineextra <<= f->channels + 2; + f->fine_priority[i] = (fineextra >= extrabits - prev_extra); + extrabits -= fineextra; + } + } + f->remaining = extrabits; + + /* skipped bands dedicate all of their bits for fine energy */ + for (; i < f->end_band; i++) { + f->fine_bits[i] = f->pulses[i] >> (f->channels - 1) >> 3; + f->pulses[i] = 0; + f->fine_priority[i] = f->fine_bits[i] < 1; + } +} + +static void celt_quant_coarse(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + int i, ch; + float alpha, beta, prev[2] = { 0, 0 }; + const uint8_t *pmod = ff_celt_coarse_energy_dist[f->size][f->intra]; + + /* Inter is really just differential coding */ + if (opus_rc_tell(rc) + 3 <= f->framebits) + ff_opus_rc_enc_log(rc, f->intra, 3); + else + f->intra = 0; + + if (f->intra) { + alpha = 0.0f; + beta = 1.0f - 4915.0f/32768.0f; + } else { + alpha = ff_celt_alpha_coef[f->size]; + beta = 1.0f - ff_celt_beta_coef[f->size]; + } + + for (i = f->start_band; i < f->end_band; i++) { + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *block = &f->block[ch]; + const int left = f->framebits - opus_rc_tell(rc); + const float last = FFMAX(-9.0f, s->last_quantized_energy[ch][i]); + float diff = block->energy[i] - prev[ch] - last*alpha; + int q_en = lrintf(diff); + if (left >= 15) { + ff_opus_rc_enc_laplace(rc, &q_en, pmod[i << 1] << 7, pmod[(i << 1) + 1] << 6); + } else if (left >= 2) { + q_en = av_clip(q_en, -1, 1); + ff_opus_rc_enc_cdf(rc, ((q_en & 1) << 1) | (q_en < 0), ff_celt_model_energy_small); + } else if (left >= 1) { + q_en = av_clip(q_en, -1, 0); + ff_opus_rc_enc_log(rc, (q_en & 1), 1); + } else q_en = -1; + + block->error_energy[i] = q_en - diff; + prev[ch] += beta * q_en; + } + } +} + +static void celt_quant_fine(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + int i, ch; + for (i = f->start_band; i < f->end_band; i++) { + if (!f->fine_bits[i]) + continue; + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *block = &f->block[ch]; + int quant, lim = (1 << f->fine_bits[i]); + float offset, diff = 0.5f - block->error_energy[i]; + quant = av_clip(floor(diff*lim), 0, lim - 1); + ff_opus_rc_put_raw(rc, quant, f->fine_bits[i]); + offset = 0.5f - ((quant + 0.5f) * (1 << (14 - f->fine_bits[i])) / 16384.0f); + block->error_energy[i] -= offset; + } + } +} + +static void celt_quant_final(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + int i, ch, priority; + for (priority = 0; priority < 2; priority++) { + for (i = f->start_band; i < f->end_band && (f->framebits - opus_rc_tell(rc)) >= f->channels; i++) { + if (f->fine_priority[i] != priority || f->fine_bits[i] >= CELT_MAX_FINE_BITS) + continue; + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *block = &f->block[ch]; + const float err = block->error_energy[i]; + const float offset = 0.5f * (1 << (14 - f->fine_bits[i] - 1)) / 16384.0f; + const int sign = FFABS(err + offset) < FFABS(err - offset); + ff_opus_rc_put_raw(rc, sign, 1); + block->error_energy[i] -= offset*(1 - 2*sign); + } + } + } +} + +static void celt_quant_bands(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + float lowband_scratch[8 * 22]; + float norm[2 * 8 * 100]; + + int totalbits = (f->framebits << 3) - f->anticollapse_needed; + + int update_lowband = 1; + int lowband_offset = 0; + + int i, j; + + for (i = f->start_band; i < f->end_band; i++) { + int band_offset = ff_celt_freq_bands[i] << f->size; + int band_size = ff_celt_freq_range[i] << f->size; + float *X = f->block[0].coeffs + band_offset; + float *Y = (f->channels == 2) ? f->block[1].coeffs + band_offset : NULL; + + int consumed = opus_rc_tell_frac(rc); + float *norm2 = norm + 8 * 100; + int effective_lowband = -1; + unsigned int cm[2]; + int b; + + /* Compute how many bits we want to allocate to this band */ + if (i != f->start_band) + f->remaining -= consumed; + f->remaining2 = totalbits - consumed - 1; + if (i <= f->coded_bands - 1) { + int curr_balance = f->remaining / FFMIN(3, f->coded_bands-i); + b = av_clip_uintp2(FFMIN(f->remaining2 + 1, f->pulses[i] + curr_balance), 14); + } else + b = 0; + + if (ff_celt_freq_bands[i] - ff_celt_freq_range[i] >= ff_celt_freq_bands[f->start_band] && + (update_lowband || lowband_offset == 0)) + lowband_offset = i; + + /* Get a conservative estimate of the collapse_mask's for the bands we're + going to be folding from. */ + if (lowband_offset != 0 && (f->spread != CELT_SPREAD_AGGRESSIVE || + f->blocks > 1 || f->tf_change[i] < 0)) { + int foldstart, foldend; + + /* This ensures we never repeat spectral content within one band */ + effective_lowband = FFMAX(ff_celt_freq_bands[f->start_band], + ff_celt_freq_bands[lowband_offset] - ff_celt_freq_range[i]); + foldstart = lowband_offset; + while (ff_celt_freq_bands[--foldstart] > effective_lowband); + foldend = lowband_offset - 1; + while (ff_celt_freq_bands[++foldend] < effective_lowband + ff_celt_freq_range[i]); + + cm[0] = cm[1] = 0; + for (j = foldstart; j < foldend; j++) { + cm[0] |= f->block[0].collapse_masks[j]; + cm[1] |= f->block[f->channels - 1].collapse_masks[j]; + } + } else + /* Otherwise, we'll be using the LCG to fold, so all blocks will (almost + always) be non-zero.*/ + cm[0] = cm[1] = (1 << f->blocks) - 1; + + if (f->dual_stereo && i == f->intensity_stereo) { + /* Switch off dual stereo to do intensity */ + f->dual_stereo = 0; + for (j = ff_celt_freq_bands[f->start_band] << f->size; j < band_offset; j++) + norm[j] = (norm[j] + norm2[j]) / 2; + } + + if (f->dual_stereo) { + cm[0] = ff_celt_encode_band(f, rc, i, X, NULL, band_size, b / 2, f->blocks, + effective_lowband != -1 ? norm + (effective_lowband << f->size) : NULL, f->size, + norm + band_offset, 0, 1.0f, lowband_scratch, cm[0]); + + cm[1] = ff_celt_encode_band(f, rc, i, Y, NULL, band_size, b/2, f->blocks, + effective_lowband != -1 ? norm2 + (effective_lowband << f->size) : NULL, f->size, + norm2 + band_offset, 0, 1.0f, lowband_scratch, cm[1]); + } else { + cm[0] = ff_celt_encode_band(f, rc, i, X, Y, band_size, b, f->blocks, + effective_lowband != -1 ? norm + (effective_lowband << f->size) : NULL, f->size, + norm + band_offset, 0, 1.0f, lowband_scratch, cm[0]|cm[1]); + cm[1] = cm[0]; + } + + f->block[0].collapse_masks[i] = (uint8_t)cm[0]; + f->block[f->channels - 1].collapse_masks[i] = (uint8_t)cm[1]; + f->remaining += f->pulses[i] + consumed; + + /* Update the folding position only as long as we have 1 bit/sample depth */ + update_lowband = (b > band_size << 3); + } +} + +static void celt_encode_frame(OpusEncContext *s, OpusRangeCoder *rc, CeltFrame *f) +{ + int i, ch; + + celt_frame_setup_input(s, f); + celt_apply_preemph_filter(s, f); + if (f->pfilter) { + /* Not implemented */ + } + celt_frame_mdct(s, f); + f->silence = celt_frame_map_norm_bands(s, f); + if (f->silence) { + f->framebits = 1; + return; + } + + ff_opus_rc_enc_log(rc, f->silence, 15); + + if (!f->start_band && opus_rc_tell(rc) + 16 <= f->framebits) + ff_opus_rc_enc_log(rc, f->pfilter, 1); + + if (f->pfilter) { + /* Not implemented */ + } + + if (f->size && opus_rc_tell(rc) + 3 <= f->framebits) + ff_opus_rc_enc_log(rc, f->transient, 3); + + celt_quant_coarse (s, rc, f); + celt_enc_tf (s, rc, f); + celt_bitalloc (s, rc, f); + celt_quant_fine (s, rc, f); + celt_quant_bands (s, rc, f); + + if (f->anticollapse_needed) + ff_opus_rc_put_raw(rc, f->anticollapse, 1); + + celt_quant_final(s, rc, f); + + for (ch = 0; ch < f->channels; ch++) { + CeltBlock *block = &f->block[ch]; + for (i = 0; i < CELT_MAX_BANDS; i++) + s->last_quantized_energy[ch][i] = block->energy[i] + block->error_energy[i]; + } +} + +static void ff_opus_psy_process(OpusEncContext *s, int end, int *need_more) +{ + int max_delay_samples = (s->options.max_delay_ms*s->avctx->sample_rate)/1000; + int max_bsize = FFMIN(OPUS_SAMPLES_TO_BLOCK_SIZE(max_delay_samples), CELT_BLOCK_960); + + s->pkt_frames = 1; + s->pkt_framesize = max_bsize; + s->mode = OPUS_MODE_CELT; + s->bandwidth = OPUS_BANDWIDTH_FULLBAND; + + *need_more = s->bufqueue.available*s->avctx->frame_size < (max_delay_samples + CELT_OVERLAP); + /* Don't request more if we start being flushed with NULL frames */ + *need_more = !end && *need_more; +} + +static void ff_opus_psy_celt_frame_setup(OpusEncContext *s, CeltFrame *f, int index) +{ + int frame_size = OPUS_BLOCK_SIZE(s->pkt_framesize); + + f->avctx = s->avctx; + f->dsp = s->dsp; + f->start_band = (s->mode == OPUS_MODE_HYBRID) ? 17 : 0; + f->end_band = ff_celt_band_end[s->bandwidth]; + f->channels = s->channels; + f->size = s->pkt_framesize; + + /* Decisions */ + f->silence = 0; + f->pfilter = 0; + f->transient = 0; + f->intra = 1; + f->tf_select = 1; + f->anticollapse = 0; + f->alloc_trim = 5; + f->skip_band_floor = f->end_band; + f->intensity_stereo = f->end_band; + f->dual_stereo = 0; + f->spread = CELT_SPREAD_NORMAL; + memset(f->tf_change, 0, sizeof(int)*CELT_MAX_BANDS); + memset(f->alloc_boost, 0, sizeof(int)*CELT_MAX_BANDS); + + f->blocks = f->transient ? frame_size/CELT_OVERLAP : 1; + f->framebits = FFALIGN(lrintf((double)s->avctx->bit_rate/(s->avctx->sample_rate/frame_size)), 8); +} + +static void opus_packet_assembler(OpusEncContext *s, AVPacket *avpkt) +{ + int i, offset, fsize_needed; + + /* Write toc */ + opus_gen_toc(s, avpkt->data, &offset, &fsize_needed); + + for (i = 0; i < s->pkt_frames; i++) { + ff_opus_rc_enc_end(&s->rc[i], avpkt->data + offset, s->frame[i].framebits >> 3); + offset += s->frame[i].framebits >> 3; + } + + avpkt->size = offset; +} + +/* Used as overlap for the first frame and padding for the last encoded packet */ +static AVFrame *spawn_empty_frame(OpusEncContext *s) +{ + int i; + AVFrame *f = av_frame_alloc(); + if (!f) + return NULL; + f->format = s->avctx->sample_fmt; + f->nb_samples = s->avctx->frame_size; + f->channel_layout = s->avctx->channel_layout; + if (av_frame_get_buffer(f, 4)) { + av_frame_free(&f); + return NULL; + } + for (i = 0; i < s->channels; i++) { + size_t bps = av_get_bytes_per_sample(f->format); + memset(f->extended_data[i], 0, bps*f->nb_samples); + } + return f; +} + +static int opus_encode_frame(AVCodecContext *avctx, AVPacket *avpkt, + const AVFrame *frame, int *got_packet_ptr) +{ + OpusEncContext *s = avctx->priv_data; + int i, ret, frame_size, need_more, alloc_size = 0; + + if (frame) { /* Add new frame to queue */ + if ((ret = ff_af_queue_add(&s->afq, frame)) < 0) + return ret; + ff_bufqueue_add(avctx, &s->bufqueue, av_frame_clone(frame)); + } else { + if (!s->afq.remaining_samples) + return 0; /* We've been flushed and there's nothing left to encode */ + } + + /* Run the psychoacoustic system */ + ff_opus_psy_process(s, !frame, &need_more); + + /* Get more samples for lookahead/encoding */ + if (need_more) + return 0; + + frame_size = OPUS_BLOCK_SIZE(s->pkt_framesize); + + if (!frame) { + /* This can go negative, that's not a problem, we only pad if positive */ + int pad_empty = s->pkt_frames*(frame_size/s->avctx->frame_size) - s->bufqueue.available + 1; + /* Pad with empty 2.5 ms frames to whatever framesize was decided, + * this should only happen at the very last flush frame. The frames + * allocated here will be freed (because they have no other references) + * after they get used by celt_frame_setup_input() */ + for (i = 0; i < pad_empty; i++) { + AVFrame *empty = spawn_empty_frame(s); + if (!empty) + return AVERROR(ENOMEM); + ff_bufqueue_add(avctx, &s->bufqueue, empty); + } + } + + for (i = 0; i < s->pkt_frames; i++) { + ff_opus_rc_enc_init(&s->rc[i]); + ff_opus_psy_celt_frame_setup(s, &s->frame[i], i); + celt_encode_frame(s, &s->rc[i], &s->frame[i]); + alloc_size += s->frame[i].framebits >> 3; + } + + /* Worst case toc + the frame lengths if needed */ + alloc_size += 2 + s->pkt_frames*2; + + if ((ret = ff_alloc_packet2(avctx, avpkt, alloc_size, 0)) < 0) + return ret; + + /* Assemble packet */ + opus_packet_assembler(s, avpkt); + + /* Remove samples from queue and skip if needed */ + ff_af_queue_remove(&s->afq, s->pkt_frames*frame_size, &avpkt->pts, &avpkt->duration); + if (s->pkt_frames*frame_size > avpkt->duration) { + uint8_t *side = av_packet_new_side_data(avpkt, AV_PKT_DATA_SKIP_SAMPLES, 10); + if (!side) + return AVERROR(ENOMEM); + AV_WL32(&side[4], s->pkt_frames*frame_size - avpkt->duration + 120); + } + + *got_packet_ptr = 1; + + return 0; +} + +static av_cold int opus_encode_end(AVCodecContext *avctx) +{ + int i; + OpusEncContext *s = avctx->priv_data; + + for (i = 0; i < CELT_BLOCK_NB; i++) + ff_mdct15_uninit(&s->mdct[i]); + + av_freep(&s->dsp); + ff_af_queue_close(&s->afq); + ff_bufqueue_discard_all(&s->bufqueue); + av_freep(&avctx->extradata); + + return 0; +} + +static av_cold int opus_encode_init(AVCodecContext *avctx) +{ + int i, ch, ret; + OpusEncContext *s = avctx->priv_data; + + s->avctx = avctx; + s->channels = avctx->channels; + + /* Opus allows us to change the framesize on each packet (and each packet may + * have multiple frames in it) but we can't change the codec's frame size on + * runtime, so fix it to the lowest possible number of samples and use a queue + * to accumulate AVFrames until we have enough to encode whatever the encoder + * decides is the best */ + avctx->frame_size = 120; + /* Initial padding will change if SILK is ever supported */ + avctx->initial_padding = 120; + + avctx->cutoff = !avctx->cutoff ? 20000 : avctx->cutoff; + + if (!avctx->bit_rate) { + int coupled = ff_opus_default_coupled_streams[s->channels - 1]; + avctx->bit_rate = coupled*(96000) + (s->channels - coupled)*(48000); + } else if (avctx->bit_rate < 6000 || avctx->bit_rate > 255000 * s->channels) { + int64_t clipped_rate = av_clip(avctx->bit_rate, 6000, 255000 * s->channels); + av_log(avctx, AV_LOG_ERROR, "Unsupported bitrate %li kbps, clipping to %li kbps\n", + avctx->bit_rate/1000, clipped_rate/1000); + avctx->bit_rate = clipped_rate; + } + + /* Extradata */ + avctx->extradata_size = 19; + avctx->extradata = av_malloc(avctx->extradata_size + AV_INPUT_BUFFER_PADDING_SIZE); + if (!avctx->extradata) { + av_log(avctx, AV_LOG_ERROR, "Failed to allocate extradata.\n"); + ret = AVERROR(ENOMEM); + goto fail; + } + opus_write_extradata(avctx); + + ff_af_queue_init(avctx, &s->afq); + + if (!(s->dsp = avpriv_float_dsp_alloc(avctx->flags & AV_CODEC_FLAG_BITEXACT))) + return AVERROR(ENOMEM); + + /* I have no idea why a base scaling factor of 68 works, could be the twiddles */ + for (i = 0; i < CELT_BLOCK_NB; i++) + if ((ret = ff_mdct15_init(&s->mdct[i], 0, i + 3, 68 << (CELT_BLOCK_NB - 1 - i)))) + goto fail; + + /* Zero out previous energy (matters for inter first frame) */ + for (ch = 0; ch < s->channels; ch++) + for (i = 0; i < CELT_MAX_BANDS; i++) + s->last_quantized_energy[ch][i] = 0.0f; + + /* Allocate an empty frame to use as overlap for the first frame of audio */ + ff_bufqueue_add(avctx, &s->bufqueue, spawn_empty_frame(s)); + + return 0; + +fail: + av_frame_free(&s->empty); + opus_encode_end(avctx); + return ret; +} + +#define OPUSENC_FLAGS AV_OPT_FLAG_ENCODING_PARAM | AV_OPT_FLAG_AUDIO_PARAM +static const AVOption opusenc_options[] = { + { "opus_max_delay_ms", "Maximum delay (and lookahead) in milliseconds", offsetof(OpusEncContext, options.max_delay_ms), AV_OPT_TYPE_FLOAT, { .dbl = OPUS_MAX_LOOKAHEAD }, 2.5f, OPUS_MAX_LOOKAHEAD, OPUSENC_FLAGS }, + { NULL }, +}; + +static const AVClass opusenc_class = { + .class_name = "Opus encoder", + .item_name = av_default_item_name, + .option = opusenc_options, + .version = LIBAVUTIL_VERSION_INT, +}; + +static const AVCodecDefault opusenc_defaults[] = { + { "b", "0" }, + { "compression_level", "10" }, + { NULL }, +}; + +AVCodec ff_opus_encoder = { + .name = "opus", + .long_name = NULL_IF_CONFIG_SMALL("Opus"), + .type = AVMEDIA_TYPE_AUDIO, + .id = AV_CODEC_ID_OPUS, + .defaults = opusenc_defaults, + .priv_class = &opusenc_class, + .priv_data_size = sizeof(OpusEncContext), + .init = opus_encode_init, + .encode2 = opus_encode_frame, + .close = opus_encode_end, + .caps_internal = FF_CODEC_CAP_INIT_THREADSAFE, + .capabilities = AV_CODEC_CAP_EXPERIMENTAL | AV_CODEC_CAP_SMALL_LAST_FRAME | AV_CODEC_CAP_DELAY, + .supported_samplerates = (const int []){ 48000, 0 }, + .channel_layouts = (const uint64_t []){ AV_CH_LAYOUT_MONO, + AV_CH_LAYOUT_STEREO, 0 }, + .sample_fmts = (const enum AVSampleFormat[]){ AV_SAMPLE_FMT_FLTP, + AV_SAMPLE_FMT_NONE }, +};