[FFmpeg-devel,1/2] opusdsp: adjust and optimize C function to match assembly

Submitted by Lynne on Aug. 15, 2019, 10:47 a.m.

Details

Message ID LmJspL3--3-1@lynne.ee
State New
Headers show

Commit Message

Lynne Aug. 15, 2019, 10:47 a.m.
The C and asm versions behaved differently _outside_ of the codec.

The C version returned pre-multiplied 'state' for the next execution
to use right away, while the assembly version outputted non-multiplied
'state' for the next execution to multiply to save instructions.
Since the initial state when initialized or seeking is always 0,
and since C and asm versions were never mixed, there was no issue.

However, comparing outputs directly in checkasm doesn't work without
dividing the initial state by CELT_EMPH_COEFF and multiplying the
returned state by CELT_EMPH_COEFF for the assembly function.

Since its actually faster to do this in C as well, copy the behavior the
asm versions use. As a reminder, add a note explaining the differences
between libopus on coefficient init.

Patch hide | download patch | download mbox

From b7f2fc24387310cf12d57dbe1ce06f0284a2a390 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 15 Aug 2019 11:13:35 +0100
Subject: [PATCH 1/2] opusdsp: adjust and optimize C function to match assembly

The C and asm versions behaved differently _outside_ of the codec.

The C version returned pre-multiplied 'state' for the next execution
to use right away, while the assembly version outputted non-multiplied
'state' for the next execution to multiply to save instructions.
Since the initial state when initialized or seeking is always 0,
and since C and asm versions were never mixed, there was no issue.

However, comparing outputs directly in checkasm doesn't work without
dividing the initial state by CELT_EMPH_COEFF and multiplying the
returned state by CELT_EMPH_COEFF for the assembly function.

Since its actually faster to do this in C as well, copy the behavior the
asm versions use. As a reminder, the initial state 0 is divided by
CELT_EMPH_COEFF on seek and init (just in case in the future this is
changed, its technically more correct to init with CELT_EMPH_COEFF than 0,
however when seeking this will result in more audiable pops, unlike with 0
where the output gets in sync over a few samples).
---
 libavcodec/opus_celt.c |  6 +++++-
 libavcodec/opusdsp.c   | 11 +++--------
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/libavcodec/opus_celt.c b/libavcodec/opus_celt.c
index 4655172b09..9dbeff1927 100644
--- a/libavcodec/opus_celt.c
+++ b/libavcodec/opus_celt.c
@@ -507,7 +507,11 @@  void ff_celt_flush(CeltFrame *f)
         memset(block->pf_gains_old, 0, sizeof(block->pf_gains_old));
         memset(block->pf_gains_new, 0, sizeof(block->pf_gains_new));
 
-        block->emph_coeff = 0.0;
+        /* libopus uses CELT_EMPH_COEFF on init, but 0 is better since there's
+         * a lesser discontinuity when seeking.
+         * The deemphasis functions differ from libopus in that they require
+         * an initial state divided by the coefficient. */
+        block->emph_coeff = 0.0f / CELT_EMPH_COEFF;
     }
     f->seed = 0;
 
diff --git a/libavcodec/opusdsp.c b/libavcodec/opusdsp.c
index 0e179c98c9..08df87ffbe 100644
--- a/libavcodec/opusdsp.c
+++ b/libavcodec/opusdsp.c
@@ -43,15 +43,10 @@  static void postfilter_c(float *data, int period, float *gains, int len)
 
 static float deemphasis_c(float *y, float *x, float coeff, int len)
 {
-    float state = coeff;
+    for (int i = 0; i < len; i++)
+        coeff = y[i] = x[i] + coeff*CELT_EMPH_COEFF;
 
-    for (int i = 0; i < len; i++) {
-        const float tmp = x[i] + state;
-        state = tmp * CELT_EMPH_COEFF;
-        y[i] = tmp;
-    }
-
-    return state;
+    return coeff;
 }
 
 av_cold void ff_opus_dsp_init(OpusDSP *ctx)
-- 
2.23.0.rc1