[FFmpeg-devel] opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling

Submitted by Ivan Kalvachev on Aug. 19, 2017, 3:10 p.m.

Details

Message ID CABA=pqeMuN+v+siksxKLYB_PwL7CrFMZjzrKVty5nur2VhP_SA@mail.gmail.com
State New
Headers show

Commit Message

Ivan Kalvachev Aug. 19, 2017, 3:10 p.m.
Using named define properly documents the code paths.
It also avoids passing additional numbered arguments through
multiple levels of macro templates.

The suffix handling is done by concatenation, like in
other asm functions and avoid having two separate
"cglobal" defines.

---
I have to point few things out.

commit f386dd70acdc81d42d6bcb885d2889634cdf45b7
"opus_pvq_search: only use rsqrtps approximation on CPUs with avx"
was rushed hack job:

1. It broke compilation with yasm.
2. Its subject says the opposite of what the patch does.
3. The changes were numerous and intrusive, obfuscating code,
while the same result could have been achieved
with a single new line.

I did request revert of this commit on irc and in ffmpeg-cvs-log mail,
I did request a patch to be sent to ffmpeg-dev-eng, to discuss the
best way to make the change.

I did oppose the follow up commit 3c99523a2864af729a8576c3fffe81fb884fa0d5,
that tried to fix the compilation, by redoing the intrusive changes once again,
and making new changes that affect the C code too. (Thus making revert
a lot more messy.)

I find it extremely offensive that I as author of the code and FFmpeg developer
was totally ignored, without providing any technical reasons and with
something that sums up to "I like my code more".

This breaks good faith, COC and the written rules.

I'm not going to request punishment, reverts, public lynching...
I don't want drama.

I'm sending this patch to make the code as
I think it should have been done from the start and
I do ask you to push it without further bikeshedding.

That would be enough of an apology.

Comments

Rostislav Pehlivanov Aug. 19, 2017, 3:40 p.m.
On 19 August 2017 at 16:10, Ivan Kalvachev <ikalvachev@gmail.com> wrote:

> Using named define properly documents the code paths.
> It also avoids passing additional numbered arguments through
> multiple levels of macro templates.
>
> The suffix handling is done by concatenation, like in
> other asm functions and avoid having two separate
> "cglobal" defines.
>
> ---
> I have to point few things out.
>
> commit f386dd70acdc81d42d6bcb885d2889634cdf45b7
> "opus_pvq_search: only use rsqrtps approximation on CPUs with avx"
> was rushed hack job:
>

Well, your original patch was unpolished, so there's that.


>
> 3. The changes were numerous and intrusive, obfuscating code,
> while the same result could have been achieved
> with a single new line.
>
>
I disagree. Its completely clear and unobfuscated.


> I did request revert of this commit on irc and in ffmpeg-cvs-log mail,
> I did request a patch to be sent to ffmpeg-dev-eng, to discuss the
> best way to make the change.
>
> I did oppose the follow up commit 3c99523a2864af729a8576c3fffe81
> fb884fa0d5,
> that tried to fix the compilation, by redoing the intrusive changes once
> again,
> and making new changes that affect the C code too. (Thus making revert
> a lot more messy.)
>
> I find it extremely offensive that I as author of the code and FFmpeg
> developer
> was totally ignored, without providing any technical reasons and with
> something that sums up to "I like my code more".
>
> This breaks good faith, COC and the written rules.
>
>
As the maintainer of the encoder I have the last word on what goes on. I
could have instead amended your original patch but that would probably have
been even more offensive, hence I chose to have separate patches to fix the
problems.

On the other hand, the patch you've included is nothing more than a case of
"I like my code more". All you do is you add back a macro which doesn't
(and did not in the first place) even follow the code standard. It has
indentation, and it shouldn't.

The only way I'm going to accept this patch is if you send a V2 of the
patch which correctly removes the ident on the macro. And does away with
your pointless protests.
Hendrik Leppkes Aug. 19, 2017, 3:50 p.m.
On Sat, Aug 19, 2017 at 5:40 PM, Rostislav Pehlivanov
<atomnuker@gmail.com> wrote:
> On 19 August 2017 at 16:10, Ivan Kalvachev <ikalvachev@gmail.com> wrote:
>
>> Using named define properly documents the code paths.
>> It also avoids passing additional numbered arguments through
>> multiple levels of macro templates.
>>
>> The suffix handling is done by concatenation, like in
>> other asm functions and avoid having two separate
>> "cglobal" defines.
>>
>> ---
>> I have to point few things out.
>>
>> commit f386dd70acdc81d42d6bcb885d2889634cdf45b7
>> "opus_pvq_search: only use rsqrtps approximation on CPUs with avx"
>> was rushed hack job:
>>
>
> Well, your original patch was unpolished, so there's that.
>
>

You reviewed it, approved it, and pushed it. You only have yourself to
blame in that case.

- Hendrik
Ivan Kalvachev Aug. 20, 2017, 12:44 a.m.
The issue has been resolved.
The patch has been pushed.

Patch hide | download patch | download mbox

From 18324e5535dd0c928a3ec2e8f25babc583b031d5 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev <ikalvachev@gmail.com>
Date: Sat, 19 Aug 2017 14:29:40 +0300
Subject: [PATCH] opus_pvq_search: Restore the proper use of conditional define
 and simplify the function name suffix handling.

Using named define properly documents the code paths.
It also avoids passing additional numbered arguments through
multiple levels of macro templates.

The suffix handling is done by concatenation, like in
other asm functions and avoid having two separate
"cglobal" defines.

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
---
 libavcodec/x86/opus_pvq_search.asm | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/libavcodec/x86/opus_pvq_search.asm b/libavcodec/x86/opus_pvq_search.asm
index 8cf040465d..5c1e6d6174 100644
--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -82,7 +82,7 @@  SECTION .text
 %endif
 %endmacro
 
-%macro PULSES_SEARCH 2 ; %1 - add or sub, %2 - use approximation
+%macro PULSES_SEARCH 1
 ; m6 Syy_norm
 ; m7 Sxy_norm
     addps          m6, mm_const_float_0_5   ; Syy_norm += 1.0/2
@@ -96,17 +96,17 @@  align 16
     movaps         m4, [tmpY + r4]  ; y[i]
     movaps         m5, [tmpX + r4]  ; X[i]
 
-%if %2
+  %if USE_APPROXIMATION == 1
     xorps          m0, m0
     cmpps          m0, m0, m5, 4    ; m0 = (X[i] != 0.0)
-%endif
+  %endif
 
     addps          m4, m6           ; m4 = Syy_new = y[i] + Syy_norm
     addps          m5, m7           ; m5 = Sxy_new = X[i] + Sxy_norm
 
-%if %2
+  %if USE_APPROXIMATION == 1
     andps          m5, m0           ; if(X[i] == 0) Sxy_new = 0; Prevent aproximation error from setting pulses in array padding.
-%endif
+  %endif
 
 %else
     movaps         m5, [tmpY + r4]      ; m5 = y[i]
@@ -119,7 +119,7 @@  align 16
     andps          m5, m0               ; (0<y)?m5:0
 %endif
 
-%if %2
+%if USE_APPROXIMATION == 1
     rsqrtps        m4, m4
     mulps          m5, m4           ; m5 = p = Sxy_new*approx(1/sqrt(Syy) )
 %else
@@ -211,13 +211,8 @@  align 16
 ; uint32 K      - Number of pulses to have after quantizations.
 ; uint32 N      - Number of vector elements. Must be 0 < N < 256
 ;
-%macro PVQ_FAST_SEARCH 1 ; %1 - use approximation
-%if %1
-cglobal pvq_search_approx, 4, 5+num_pic_regs, 11, 256*4, inX, outY, K, N
-%else
-cglobal pvq_search_exact,  4, 5+num_pic_regs, 11, 256*4, inX, outY, K, N
-%endif
-
+%macro PVQ_FAST_SEARCH 1
+cglobal pvq_search%1, 4, 5+num_pic_regs, 11, 256*4, inX, outY, K, N
 %define tmpX rsp
 %define tmpY outYq
 
@@ -260,7 +255,7 @@  align 16
     jz   %%zero_input       ; if (Sx==0) goto zero_input
 
     cvtsi2ss  xm0, dword Kd ; m0 = K
-%if %1
+%if USE_APPROXIMATION == 1
     rcpss     xm1, xm1      ; m1 = approx(1/Sx)
     mulss     xm0, xm1      ; m0 = K*(1/Sx)
 %else
@@ -313,7 +308,7 @@  align 16
 align 16                        ; K - pulses > 0
 %%add_pulses_loop:
 
-    PULSES_SEARCH add, %1   ; m6 Syy_norm ; m7 Sxy_norm
+    PULSES_SEARCH add   ; m6 Syy_norm ; m7 Sxy_norm
 
     sub        Kd, 1
     jnz  %%add_pulses_loop
@@ -325,7 +320,7 @@  align 16                        ; K - pulses > 0
 align 16
 %%remove_pulses_loop:
 
-    PULSES_SEARCH sub, %1   ; m6 Syy_norm ; m7 Sxy_norm
+    PULSES_SEARCH sub   ; m6 Syy_norm ; m7 Sxy_norm
 
     add        Kd, 1
     jnz  %%remove_pulses_loop
@@ -376,11 +371,15 @@  align 16
 ; On Skylake & Ryzen the division is much faster (around 11c/3),
 ; that makes the full precision code about 2% slower.
 ; Opus also does use rsqrt approximation in their intrinsics code.
+%define USE_APPROXIMATION   1
+
 INIT_XMM sse2
-PVQ_FAST_SEARCH 1
+PVQ_FAST_SEARCH _approx
 
 INIT_XMM sse4
-PVQ_FAST_SEARCH 1
+PVQ_FAST_SEARCH _approx
+
+%define USE_APPROXIMATION   0
 
 INIT_XMM avx
-PVQ_FAST_SEARCH 0
+PVQ_FAST_SEARCH _exact
-- 
2.14.1