[FFmpeg-devel,WIP,0/6] sse2/xmm version of 8-bit simple_idct

Submitted by Ronald S. Bultje on June 6, 2017, 12:48 p.m.

Details

Message ID CAEEMt2nejrkvQ7R=yrqkydJBTz7ZmzOLr67USROmv00tzP9erA@mail.gmail.com
State New
Headers show

Commit Message

Ronald S. Bultje June 6, 2017, 12:48 p.m.
Hi,

On Mon, Jun 5, 2017 at 8:02 AM, Ronald S. Bultje <rsbultje@gmail.com> wrote:

> On Mon, Jun 5, 2017 at 7:23 AM, James Darnley <jdarnley@obe.tv> wrote:
>
>> I forgot to mention in my cover letter that although the dct test
>> passes, fate does not.  As I mentioned on IRC, changing them causes
>> errors elsewhere in fate.  I am currently looking into this problem and
>> I'm sure I will speak to you or others about it.
>
>
> I'll have a look at this.
>

This makes the output of dct-test exact:


How the final patch should look (i.e. change coefficients only for mpeg
idct and not for prores idct to keep fate happy? Or change C code for
prores so coefficients are identical?) is up to you, I don't have a
preference. Michael might have an opinion on that.

Ronald

Patch hide | download patch | download mbox

diff --git a/libavcodec/x86/simple_idct10.asm
b/libavcodec/x86/simple_idct10.asm
index ae848b7..0dd1ae5 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -52,6 +52,9 @@  times 4 dw %2, %3
 %define W6sh2  8867 ; W6 = 35468 =  8867<<2
 %define W7sh2  4520 ; W7 = 18081 =  4520<<2 + 1

+pw_round_20_div_w4: times 8 dw ((1 << (20 - 1)) / W4sh2)
+
+
 CONST_DEC  w4_plus_w2,   W4sh2, +W2sh2
 CONST_DEC  w4_min_w2,    W4sh2, -W2sh2
 CONST_DEC  w4_plus_w6,   W4sh2, +W6sh2
@@ -71,7 +74,7 @@  SECTION .text

 %macro idct_fn 0
 cglobal simple_idct8, 1, 1, 16, block
-    IDCT_FN    "", 11, "", 20
+    IDCT_FN    "", 11, pw_round_20_div_w4, 20
     RET

 cglobal simple_idct10, 1, 1, 16, block