diff mbox

[FFmpeg-devel,WIP] v3 Opus Pyramid Vector Quantization Search in x86 SIMD asm

Message ID CABA=pqeSD22f9ZwrGo3i6gyJ1q6p0rhWqW-0A=XKGDw3QuuXLw@mail.gmail.com
State Superseded
Headers show

Commit Message

Ivan Kalvachev July 1, 2017, 12:10 a.m. UTC
First, I've removed the hacked formula.

While it seems to improve precision in the synthetic test,
it is not enough to avoid assigning pulses in padded area.
Worse, it also interferes with the special case with
subtraction of pulses from y[i]==0 output. Handling these cases
needs combining two masks and using "blendvps" instead the faster "maxps".
Most of these problems are related to the fact that
the hacked formula result is in the range -1.0 to 0.0
where 0.0 is the perfect match.

Second, fixed the all_float rounding mode,
I got it swapped in v2.

Third, reordered some code. Moved the branch
for exact match of the pre-search, to be first taken.
Load the constants needed in the pulse search
only when they will be used.

Use a macro for conditional loading/relabeling of the constants.
Use similar idea for loading and using PIC register.

Use "smartalign" on NASM, to avoid 15 consequitev 1 byte NOPs.
YASM is smart by default.

I attach a second patch, that is
slightly modified version of atomnuker's code.
It sums the distortions of C and SIMD implementations.

In my test approx#2 seems to be pretty close to the C version,
sometimes better, sometimes worse.
If you find a file with dramatic difference, I'm interested. :D

If there are no more issues, v4 would be cleaned up and ready for review.

Best Regards.
diff mbox

Patch

From a1005984a6e144fcd1701b4b71e9de5a17ebbca9 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev <ikalvachev@gmail.com>
Date: Sat, 1 Jul 2017 00:34:22 +0300
Subject: [PATCH 2/4] Code to measure the total distortion sum of the reference
 and dsp implementation.

---
 libavcodec/opus_pvq.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 70 insertions(+), 3 deletions(-)

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 6c7504296d..3ad995e7a4 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -420,10 +420,77 @@  static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_
     celt_exp_rotation(X, N, blocks, K, spread, 1);
 
     {
-    START_TIMER
-    gain /= sqrtf(pvq->pvq_search(X, y, K, N));
-    STOP_TIMER("pvq_search");
+    static double adist1 = 0.0f;
+    static double adist2 = 0.0f;
+    static int cnt,cntlg;
+        float  xn1[256];
+        float  xn2[256];
+        int    ysec[256];
+
+        float sq1 = 0.0f;
+        float sq2 = 0.0f;
+
+        float dist1 = 0.0f;
+        float dist2 = 0.0f;
+
+        int sum1 = 0, sum2 = 0;
+
+    //    START_TIMER
+        gain /= sqrtf(pvq->pvq_search(X, y, K, N));
+    //    STOP_TIMER("pvq_search");
+
+        ppp_pvq_search_c(X, ysec, K, N);
+
+
+        for (int i = 0; i < N; i++) {
+            sq1 += y[i]*y[i];
+            sq2 += ysec[i]*ysec[i];
+            sum1 += FFABS(y[i]);
+            sum2 += FFABS(ysec[i]);
+        }
+
+        if (sum1 != K) {
+            printf("\nERROR! sum Sy=%i K=%i N=%d \n", sum1, K, N);
+            printf(" X = ");
+            for(int i=0; i<N; i++){
+                printf("%8X, ",((unsigned int *)X)[i]);
+            }
+            printf("\n y1 = ");
+            for(int i=0; i<N; i++){
+                printf("%8d, ", y[i]);
+            }
+            printf("\n y2 = ");
+            for(int i=0; i<N; i++){
+                printf("%8d, ", ysec[i]);
+            }
+            printf("\nERROR!\n");
+            goto what;
+            //av_assert0(0);
+        }
+
+
+        for (int i = 0; i < N; i++) {
+            xn1[i] = (float)y[i]/sq1;
+            xn2[i] = (float)ysec[i]/sq2;
+            dist1 += (X[i] - xn1[i])*(X[i] - xn1[i]);
+            dist2 += (X[i] - xn2[i])*(X[i] - xn2[i]);
+        }
+
+        dist1 = sqrtf(dist1);
+        dist2 = sqrtf(dist2);
+
+        adist1 += (double)dist1;
+        adist2 += (double)dist2;
+
+        cnt++;
+        if(cnt>=2*cntlg){
+            cntlg=cnt;
+            printf("\nruns = %d\n", cnt);
+            printf("Distortion1 = %f\n", adist1);
+            printf("Distortion2 = %f\n", adist2);
+        }
     }
+what:
 
     celt_encode_pulses(rc, y,  N, K);
     celt_normalize_residual(y, X, N, gain);
-- 
2.13.0